Healthier eating for diabetics with the RPI + IBM Food KG
Tell us about your project.
The food knowledge graph (or foodkg) is a joint project between Rensselaer Polytechnic Institute and IBM (part of the IBM AI Horizons Network). The project’s official name is “Health Empowerment through Analytics Learning and Semantics”, or HEALS.
We researched effective computational and cognitive mechanisms for enabling individuals with chronic diseases, such as diabetes, to personalize self-management strategies. Since one of the main contributors to diabetes disease progression is unhealthy eating habits, we wanted to build a knowledge graph of food items and their nutrients, to connect diabetic and pre-diabetic individuals to healthier food suggestions. We thought semantic technologies would provide a good foundation for delivering explainable insights.
For this task, we leveraged the recipe1M dataset released by MIT and augmented that with other data such as USDA nutrient values and FoodOn concepts. The additions allowed for further exploration of the data, including the determination of hierarchical concepts of the food items based on their categories and tags, and obtaining the nutrient values for each ingredient in a recipe. In constructing the foodkg, we leveraged several natural language processing techniques, such as fuzzy matching and word embeddings, and semantic web techniques, such as semantic data dictionaries.
The work on the foodkg development and applications was published in the International Semantic Web Conference in 2019, both as a resource track paper and a demo paper. More information about the foodkg can be found here http://foodkg.github.io.
How has the project gone? What have you discovered – challenges, new ideas, insights?
One of the biggest challenges we face when sourcing the data for the foodkg is the restrictive licensing on the recipe websites available on the Web. Nevertheless, we are incorporating new data sources with permissive licenses and data for which we have obtained explicit permission.
We are expanding the work on personalized food suggestions by incorporating user’s food logs that map to the food identifiers in the foodkg, as well as guidelines for diabetes prevention and control. We have also utilized the foodkg in the development of an effective pre-trained model for recipe representation learning. These efforts have reinforced our belief in the need for developing personal knowledge graphs that can be used to tailor and explain general health knowledge based on individual life and health circumstances.
We are continuously improving the foodkg with new recipes, nutritional scores, etc. We are also expanding the usage of the foodkg. Currently, we have a ‘knowledge-base-question-answering’ service that uses the foodkg in a chatbot application. This application can answer facts about food, compare different food items to suggest which food option might be better, and apply several constraints such as food preferences, adherence to guidelines, etc.
We are actively looking at other avenues to utilize the foodkg. For example, recently, we have focused our attention on developing better models at recommending useful food substitutions. We welcome the community to explore the foodkg and propose suggestions and improvements!
How did you become involved in KGs?
I got involved with KGs through my graduate-level work at MIT. During my early days at MIT, I contributed to the Tabulator, a linked data browser, where I developed several ‘panes’ for knowledge graph-based applications. In my doctoral thesis research, I explored data reuse issues on the Web, which was at the intersection of the Semantic Web and Web Science research. As part of my thesis, I developed a Web protocol to enable accountable usage of data called HTTPA (Hyper Text Transfer Protocol with Accountability).
The core of the protocol assumed that the data on web pages would be linked to other data items on other web pages, resulting in an implicit knowledge graph. The protocol would leverage such relationships and usage restrictions on linked data to evaluate if secondary reuse was compliant with the original intent of the data creator. Since completing my doctoral work, I have engaged in knowledge graph research both in industrial and academic capacities. I have worked on knowledge graph provenance systems catering to enterprises, and in the past few years, I have been working on health-based knowledge graphs as part of the HEALS project.
What makes KGs interesting?
KGs are very useful for knowledge exploration and exploitation. Compared to data silos, KGs enable us to navigate related information serendipitously by following links that specify the relationships between entities in the knowledge graph. KGs, and more specifically ontologies, allow domain experts to capture knowledge in a structured format that can be used for reasoning and direct querying to derive an answer to a question. The reasoning process, especially when the original knowledge graph is combined with other knowledge graphs, leads to interesting insights that would not have been available in just one source alone.
There is a lot of potential in KGs, both on the applications of KGs and original research in KG representation and development. I’m very excited to be in this space.
What’s something about you that others may not know?
I was fortunate to be the first PhD graduate of the inventor of the world wide web, Tim Berners-Lee.