Joshua is a researcher and software developer who specializes in graphs and data integration. He holds a PhD in Web science from RPI’s Tetherless World Constellation, and co-founded what is now Apache TinkerPop, contributing to the first common APIs for graph databases, the RDF-based query language that preceded Gremlin, and the first tools which aligned the property graph and RDF data models, starting with neo4j-rdf-sail in 2008. As part of the Data organization at Uber, he leads a company-wide effort to unify data models and schemas across RPC, streaming, and storage. He feels, now as ever, that the research, business, and open source communities have a lot to learn from each other with respect to graphs and knowledge representation.
2021 Talk: Anything-to-Graph
Show me your schemas, and I will show you a graph! Although graph databases have become very popular in the enterprise, deep expertise in graphs is still in short supply (see “Building an Enterprise Knowledge Graph @Uber: Lessons from Reality” from KGC 2019). Developers often think of graphs as a completely different kind of thing from the rest of their company’s data, and will go to great lengths to force their data into a “graph” shape. The amount of manual effort involved in building and maintaining ETL pipelines can become a bottleneck and a maintenance burden. In fact, there is usually a rich domain data model of entities, relationships, and properties which is already implicit in the company’s existing schemas, be they interface descriptions for microservices, relational schemas, or various other kinds of storage schemas. Taking advantage of these schemas, and mapping conforming data into the graph, ought to require relatively little extra work, but developers need appropriate tools. In this presentation, we will illustrate such mappings with real-world examples from Uber, as well as introducing formal techniques for schema and data migration. We will also look ahead to the emerging GQL standard as the foundation for a new generation of highly interoperable graph database tools.