Mike Welch is a Director of Software Engineer at Verizon Media, leading the Yahoo Knowledge Graph engineering team. He has designed and built text understanding, classification, and knowledge graph platforms for over 10 years. Mike is a co-architect and top code contributor across the core of Yahoo’s Knowledge Graph infrastructure, building scalable systems ranging from high throughput backend graph processing to low latency online serving for user queries in web search. He is the primary inventor on numerous patents and has published his work in multiple conferences including WWW, WSDM, SIGIR, VLDB, and CIKM. Prior to joining Yahoo / Verizon Media, Mike received his Ph.D in Computer Science from UCLA in 2010.
2021 Talk: Serving a web scale Knowledge Graph
The Yahoo Knowledge Graph powers entity data for user experiences across multiple products at Verizon Media, from search to media to ads. Nodes in the knowledge graph correspond to real world entities: people, places, movies, sports teams, and so on. The edges represent semantic relationships between these entities. A comprehensive entity experience requires collecting a use-case specific subgraph around an entity node in the graph and combining it with dynamic and multimedia content.
A “knowledge panel” for a public company on a web search results page might include basic data like the company’s founders and number of employees, combined with a realtime stock quote and relevant images. Other use cases, like a finance-focused website, may prefer to include additional detailed information like the board members, other companies those board members represent, revenue numbers, top competitors, and more. Legacy serving systems typically pre-collected and exposed a static view of an entity, which limited the ability for clients to traverse the graph or adapt the user experience for different contexts without offline preprocessing. They also required clients to implement a more complex series of requests to fetch partial data, extract dependencies, and query additional services to stitch together their full experience.
In this talk we will take you through our experience moving away from serving these static subgraph views with independent services and client side processing. We will describe how we addressed the shortcomings of such a system and built a federated, realtime, web-scale graph querying framework with GraphQL on top of Amazon Neptune, Vespa, and third party APIs. Our unified graph approach has enabled customer teams to experiment, rapidly iterate over their designs, and easily power rich experiences by bringing together all the relevant data under a single request.