| |

KGC 2024: Entity-Resolved Knowledge Graphs

Please enjoy this talk “Entity-Resolved Knowledge Graphs,” from the Knowledge Graph Conference 2024 given by Paco Nathan, Principal DevRel Engineer at Senzing (also known as the “Gandalf of Graph Technology”).

Talk description:

Knowledge graphs have spiked recently in popular use, for example in retrieval augmented generation (RAG) methods used to mitigate hallucination in LLMs. Graphs emphasize relationships in data, adding semantics – more so than with SQL or vector databases.

However, data quality issues can degrade linking during knowledge graph construction and updating, which makes downstream use cases inaccurate and defeats the point of using a graph. When you have JOIN keys (unique identifiers) building relationships in a graph, it may be straightforward, although false positives (duplicate nodes) can result from: typos or minor differences in attributes like name, address, phone, etc.; family members sharing email; duplicate customer entries, and so on.

This masterclass provides a hands-on introduction to Entity-Resolved Knowledge Graph is, why it’s important, plus patterns for deploying entity resolution (ER) which are proven to work. We’ll cover how to make graphs more meaningful in data-centric architectures by repairing connected data: unify complex and noisy data from across multiple data sources, consolidate duplicate nodes and reveal hidden connections to create more accurate, intuitive graphs providing greater utility.

Course materials leverage open data from U.S. federal agencies – compliance audits, PPP loans, corporate ownership, etc. – layered atop a SafeGraph dataset about Las Vegas metro area businesses. We use open source, showing a Python API for ER in Senzing, constructing a knowledge graph in Neo4j from the results, then running graph analytics and graph visualization to show the before/after effects.

A general architectural pattern for ER is to use multiple levels of detail for graph data. A data graph tier in a high-resolution lower layer tracks provenance, while a knowledge graph tier in a higher layer adds structure and semantics. Senzing has a 200ms SLA in API calls, so when there are audits, feedback from end-users about data privacy updates, etc., these updates propagate into production use graphs rapidly, while maintaining provenance and evidence for the merge decisions. 

People who have natural language experience may ask about named entity recognition (NER), and how ER differs? NER provides labels for spans of parsed text in unstructured data, while ER resolves unique identifiers based on free-text name and address fields in structured or semi-structured data. ER provides means for linking data records, gives supporting evidence (provenance and data lineage) for decisions made, and efficient ways to audit and update entities, upstream from the knowledge graph. 

Typical use cases for Senzing leverage data about people and companies, resolving free-text data from fields related to names and addresses. For example, most of the voter registration records in the U.S. are managed using Senzing. Using plug-ins for custom feature comparators, other use cases have also resolved maritime vessel IDs, vehicle tracking, and more.

 

Do you enjoy talks like this one? Subscribe to the KGC newsletter and stay up to date with everything happening in the world of knowledge graphs, AI, and semantic tech.

Sign Me Up

***

Further Reading