Subhabrata Mukherjee

Subhabrata Mukherjee is a Machine Learning Scientist at Amazon leading the information extraction efforts to build the Amazon Product Knowledge Graph, an authoritative knowledge graph for all products in the world. His work is focused on representation learning and neural networks for natural language understanding, web-scale knowledge extraction and integration.


He graduated summa cum laude from the Max Planck Institute for Informatics, Germany with a PhD in 2017. He was awarded the 2018 SIGKDD Doctoral Dissertation Runner-up Award for his thesis on credibility analysis and misinformation. He has previously worked at IBM Research on domain adaptation of question-answering systems, sentiment analysis and opinion mining. He has published over 40 research papers in top conferences on various topics in natural language processing and data mining.

Deep Learning for Knowledge Extraction and Integration to build the Amazon Product Graph

Knowledge Graphs (KG) have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. To do this, we extract structured product information from unstructured natural language text as well as semi-structured web data.


In this talk, we present two neural network-based systems for information extraction, cleaning and integration to build our graph. The first one (OpenTag) discovers product knowledge from unstructured product profile text leveraging active learning and distant supervision to minimize the cost of human annotation. The second one advances web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with KG. Both the systems enrich the graph with new facts and entities not seen before. This talk presents our progress, challenges, and future research opportunities.