Samuel Rönnqvist, PhD
Language technology & deep learning researcher

Bio ]
Code ]
Research ]
Teaching ]


I am heading AI research & development as Machine Learning Lead at Zefort (formerly Aivan AI), where we focus on document and natural language understanding for zero-effort contract management.

- Postdoctoral researcher @ TurkuNLP, University of Turku, Finland
- AI Scientist @ Silo AI, Finland
- Visiting researcher @ Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany
- PhD candidate @ Turku Centre for Computer Science / Data Mining Lab, Åbo Akademi Univ., Finland
- Associated researcher @ RiskLab, Arcada UAS, Finland
- CTO & Co-founder @ infolytika, Finland

Contact:  at


Find my research code on GitHub.


In my recent research, I have been working on multilingual web genre classification, natural language generation/controllable text generation, explainable AI/interpretable NLP, machine learning visualization and application of NLP and text analytics in new domains.


The Rise of Local LLMs – what to look out for in 2024
Samuel Rönnqvist. Zefort Blog.


AI in Contract Management: How to evaluate and trust answers provided by ChatGPT
Samuel Rönnqvist. Zefort Blog.

ChatGPT and privacy – what you need to know
Samuel Rönnqvist. Zefort Blog.


Explaining Classes through Stable Word Attributions
Samuel Rönnqvist, Aki-Juhani Kyröläinen, Amanda Myntti, Filip Ginter and Veronika Laippala. In Findings of the Association for Computational Linguistics: ACL 2022.
* Based on the extended abstract Explaining Classes through Word Attribution presented at the BlackboxNLP 2021 workshop at EMNLP.

Register identification from the unrestricted open Web using the Corpus of Online Registers of English
Veronika Laippala, Samuel Rönnqvist, Miika Oinonen, Aki-Juhani Kyröläinen, Anna Salmela, Douglas Biber, Jesse Egbert and Sampo Pyysalo. Language Resources and Evaluation (LREV) (2022).

Towards better structured and less noisy Web data: Oscar with Register annotations
Veronika Laippala, Anna Salmela, Samuel Rönnqvist, Alham Fikri Aji, Li-Hsin Chang, Asma Dhifallah, Larissa Goulart, Henna Kortelainen, Marc Pàmies, Deise Prina Dutra, Valtteri Skantsi, Lintang Sutawika and Sampo Pyysalo. In Proceedings of the Eight Workshop on Noisy User-generated Text (W-NUT 2022).

Assessing Banks' Distress Using News and Regular Financial Data
Paola Cerchiello, Giancarlo Nicola, Peter Sarlin and Samuel Rönnqvist. Frontiers in Artificial Intelligence, Volume 5.


Multilingual and Zero-Shot is Closing in on Monolingual Web Register Classification
Samuel Rönnqvist, Valtteri Skantsi, Miika Oinonen and Veronika Laippala. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa’21).

Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers
Liina Repo, Valtteri Skantsi, Samuel Rönnqvist, Saara Hellström, Miika Oinonen, Anna Salmela, Douglas Biber, Jesse Egbert, Sampo Pyysalo and Veronika Laippala. In Proceedings of the EACL 2021 Student Research Workshop.

Predicting Stock Price and Spread Movements from News
Pontus Wistbacka, Samuel Rönnqvist, Katia Vozian and Satchit Sagade. In Proceedings of the 54th Hawaii International Conference on System Sciences (HICSS). Track: Machine Learning and Predictive Analytics in Accounting, Finance, and Management.


From Web Crawl to Clean Register-Annotated Corpora
Veronika Laippala, Samuel Rönnqvist, Saara Hellström, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi and Sampo Pyysalo. In Proceedings of the 12th Web as Corpus Workshop: WAC-XII (LREC co-located).


Is Multilingual BERT Fluent in Language Generation?
Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski and Filip Ginter. In Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing.

Template-free Data-to-Text Generation of Finnish Sports News
Jenna Kanerva, Samuel Rönnqvist, Riina Kekki, Tapio Salakoski and Filip Ginter. In Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19).

Lithium-Ion Batteries: A Machine-Generated Summary of Current Research
Generated by software developed at the Applied Computational Linguistics Lab, Goethe University Frankfurt (Niko Schenk, Samuel Rönnqvist, Christian Chiarcos) in collaboration with Springer Nature.
* More info: Springer press release

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models
Nelda Kote, Marenglen Biba, Jenna Kanerva, Samuel Rönnqvist, Filip Ginter. arXiv pre-print.


Sentiment in Citizen Feedback: Exploration by Supervised Learning
Robin Lybeck, Samuel Rönnqvist and Sampo Ruoppila. In EGOV-CeDEM-ePart 2018 Proceedings (Electronic Government, E-Democracy/Open Government, and Electronic Participation Conference), ongoing research.

Deep Learning for Assessing Banks’ Distress from News and Numerical Financial Data
Paola Cerchiello, Giancarlo Nicola, Samuel Rönnqvist and Peter Sarlin. In Michael J. Brennan Irish Finance Working Paper Series, Research Paper No. 18-15.


Knowledge-Lean Text Mining
Samuel Rönnqvist. Doctoral thesis in Computer Science, Åbo Akademi University (TUCS Dissertations No 227).
  [Defense slides]

A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations
Samuel Rönnqvist, Niko Schenk and Christian Chiarcos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL).  
  [Source code] [Poster]

Bank distress in the news: Describing events through deep learning
Samuel Rönnqvist and Peter Sarlin. In Neurocomputing, Volume 264 (SI on Machine learning in finance).  
* Cited in Bloomberg View, "The Financial Threats That Machines Can See"
  [Neucomp version]


Do We Really Need All Those Rich Linguistic Features? A Neural Network-Based Approach to Implicit Sense Labeling
Niko Schenk, Christian Chiarcos, Kathrin Donandt, Samuel Rönnqvist, Evgeny A. Stepanov and Giuseppe Riccardi. In Proceedings of the Twentieth Conference on Computational Natural Language Learning - Shared Task, CoNLL 2016. Association for Computational Linguistics.  
  [Source code] [ACL Anthology]


Detect & Describe: Deep learning of bank stress in the news
Samuel Rönnqvist and Peter Sarlin. In Proceedings of the 2015 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).  
* Cited in The Riksbank Economic Commentary on Big Data

Bank networks from text: interrelations, centrality and determinants
Samuel Rönnqvist and Peter Sarlin. In Quantitative Finance, 15(10) (SI on Financial Data Analytics).
  [Live demo]   [QF version]   [ECB WP]   [Arcada WP]  

Exploratory topic modeling with distributional semantics
Samuel Rönnqvist. In Advances in Intelligent Data Analysis XIV, 241-252, Lecture Notes in Computer Science.    
  [Live demo]   [Source code]

Identifying bank stress by deep learning of news
Samuel Rönnqvist and Peter Sarlin. In Machine Learning Reports: Workshop New Challenges in Neural Computation 2015, 03/2015.  

Identifying financial risk from news: A semantic deep learning approach
Samuel Rönnqvist and Peter Sarlin. Presentation at Finnish Economic Association XXXVII Annual Meeting (KT-päivat), Helsinki, Finland.  


Combining human and computational intelligence through interactive visualization
Samuel Rönnqvist. Essay.  

Interactive Visual Exploration of Topic Models using Graphs
Samuel Rönnqvist, Xiaolu Wang and Peter Sarlin. In Proceedings of the Eurographics Conference on Visualization (EuroVis).    
  [Live demo]   [Source code]

Alluvial SOTM: Visualizing transitions and changes in cluster structure of the Self-Organizing Time Map
Samuel Rönnqvist and Peter Sarlin. In Proceedings of the Eurographics Conference on Visualization (EuroVis).    
  [Live demo]

From Text to Bank Interrelation Maps*
Samuel Rönnqvist and Peter Sarlin. In Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).
  * Received IEEE CIFEr 1st Best Student Paper Award
  * Cited in Bank of England CCBS Handbook "Text mining for central banks"  


Green vs. non-green customer behavior: A Self-Organizing Time Map over greenness
Annika H. Holmbom, Samuel Rönnqvist, Peter Sarlin, Tomas Eklund and Barbro Back. In Proceedings of the 13th IEEE International Conference on Data Mining Workshops (ICDMW).  

Syntax-based modeling of topic relations
Samuel Rönnqvist. Presentation at Machine Learning Summer School, Tübingen, Germany.  

Cluster coloring of the Self-Organizing Map: An information visualization perspective
Peter Sarlin and Samuel Rönnqvist. In Proceedings of the 17th International Conference on Information Visualisation.    


Mapping Bank Interrelations in Financial Discussion
Samuel Rönnqvist et al. Presentation at The Eleventh International Symposium on Intelligent Data Analysis (IDA 2012), Helsinki, Finland.    

Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations
Sofie Van Landeghem, Kai Hakala, Samuel Rönnqvist, Tapio Salakoski, Yves Van de Peer, and Filip Ginter. In Advances in Bioinformatics, Volume 2012.  
  [Online portal]


Introduction to NLP (spring 2020-2021, UTU and Arcada UAS): unsupervised and self-supervised NLP methods.

Visuality and Visualization of Information (spring 2015-, ÅAU): visual analytics (machine learning+interactive visualization), practical data visualization, information visualization theory.

Visual Analytics (fall 2016-2017, Arcada UAS): visualization for big data analytics, vis theory and practice, interactive visualization with d3.js.

Colloquium Applied Computational Linguistics (fall 2015, Uni. Frankfurt): tutorial on deep learning for NLP (word vectors and neural networks in Python).

Knowledge Management (fall 2015, Technische Hochschule Mittelhessen): tutorial on distributional semantics.

Data Mining and Text Mining (fall 2012–2015, ÅAU): text mining and practicals on statistical programming, analytics, text mining and machine learning.

Self-Organzing Maps Visualization Text Mining