Alena Vasilevich holds an international MSc degree in Language Science and Technology from Saarland University. At Coreon, she focuses on pragmatic data conversion, hands-on natural language processing, and data analytics. Having dived into trees and graphs, she concentrates on leveraging structured data in various NLP scenarios. Alena’s interests revolve around multilingual NLP and NLU, sentiment analysis of various granularity, and all things Python.
Talks and Events
2022 Talk: Building Multilingual Chatbots With Language-Agnostic KGs
CEFAT4Cities project targets the development of multilingual cross-border eGovernment services via a software layer that facilitates the conversion of natural-language administrative procedures into machine-readable data. We showcase the effectiveness of this solution embedded in SmartBot, a prototypical multilingual chatbot, developed for the Vienna Business Agency (VBA) in scope of the CEFAT4Cities project. SmartBot makes VBA’s services discoverable in a user-friendly way, fine-targeting such topics as starting a new business and finding relevant grants among hundreds of funding opportunities. It is driven by multilingual AI that contains the results of CEFAT4Cities workflows, integrated into its domain knowledge along with multilingual domain-specific vocabularies, represented in a language-agnostic knowledge graph in Coreon. Thanks to the integrated MKS, SmartBot is able to infer connections between language-agnostic concepts and can deal with terms, previously unseen by the bot’s language model.
2021 Talk: Benefits of Collaborative AI vs. Manual Creation of the Graph: Taxonomization of IATE, the EU Terminology
In the realm of data-driven businesses, structured data, being highly organized and easily understood by machines, is a valuable resource. IATE, with almost one million concepts storing multilingual terms and metadata, holds a large part of the textual knowledge of the EU. However, it can only be accessed lexically, and the database concepts stand-alone. If IATE were taxonomized, i.e. related concepts linked up into knowledge graphs yielding a full-fledged ontology, its data could not only be consumed by linguists but would also become accessible by the machine-readable SPARQL endpoint, which makes it a powerful resource for AI projects, particularly within SMEs that rarely have the means to create multilingual formalized knowledge.
Coreon team elevated a sub-domain of IATE terminology into a multilingual knowledge graph. We taxonomized a flat list of 425 concepts within the COVID sub-domain, benchmarking two approaches to tackle this task: automatically through a custom-enhanced off-the-shelf language model and a manual creation of the knowledge graph by a linguist expert. The automatically created knowledge graph was later revised by a human, corrections and time effort measured and compared with performance metrics of the manual approach. In this talk, we will dwell on the performance and resource-saving advantages of our custom method and show how the achieved productivity rate can make the taxonomization of even large terminology databases economically viable.
We demonstrate empirically the effectiveness of our collaborative-robot approach in a typical industry use case scenario: using the resulting IATE/Covid graph for initialization of a Convolutional Neural Network (CNN) in a multilingual document classification task, we get a classification granularity that is not reachable by state-of-the-art models, such as non-initialized CNNs and zero-shot classifiers.
The Bloomberg Knowledge Graph is a graph-centric representation of entities and relationships in the financial world which connects cross-domain data from various sources within Bloomberg. Recent developments in machine learning, knowledge graphs, and language technology have enabled intelligent ways to uncover interesting patterns amongst data that reveal previously hidden insights. By leveraging the entity and relationship information in the knowledge graph, interesting potential applications emerge, especially when combined with other information such as market data and news stories. This talk details how Bloomberg uses the knowledge graph and semantic technologies to enable various use cases, e.g., to link data across different domains, enrich news stories, and support financial analytics centered around entities. In addition, we will discuss the challenges we face to support these use cases, including representing and storing historical, point-in-time relationships between entities.