An Interview with Paco Nathan on AI Realism & the Language of Knowledge Graphs [Part 1]
The man, the myth, the legend: Paco Nathan warrants no introduction.
Everyone who works with graph technology – whether in academia, startups, government, or commercial enterprises – has come across something Paco has designed, invented, popularized, patented, engineered, or written. That, or you’ve bumped into him at a conference, and probably more than once.
Educated at West Point, Stanford, Cal Poly, and in Provence, Paco’s career has logged time working for the likes of NASA, Bell Labs, Motorola, IMVU, Databricks, O’Reilly Media, and many, many more. For seven years, he was the President & Co-Founder of FringeWare. At the turn of the millennium, he even served as an Evil Mad Scientist for SPECTRE (ask him about it sometime). He’s also an artisan cheesemaker.
Today he’s the Principal Dev Rel Engineer at Senzing, an entity-resolution SDK, and regularly hosts the Graph Power Hour podcast and makes regular guest appearances in the GraphGeeks community.
Plenty of folks would be satisfied with signing off a four-decade career as long and as full of accomplishments as Paco’s, but talking with him, you never get the sense that he’s interested in slowing down or wrapping up any time soon. Rather, he brings an infectious energy to whatever topic or project is at hand, whether that’s linguistics, architecture, AI, or knowledge graphs.
In this interview with KGC (part 1 of 2), Paco Nathan sits down to talk about AI buzzwords we need to ditch, what’s special about the knowledge graph community, the relationship between AI and library science, the inevitable confusion over the term “graph” in “GraphRAG,” thinking of AI as a practice, and what he’s most looking forward to at this year’s Knowledge Graph Conference.
What’s your favorite part of attending KGC?
Paco Nathan: I didn’t attend the first Knowledge Graph Conference, but I heard about it from a friend and really wished I had been there. The second year, I think I gave a keynote, and I’ve been around every year since then.
I’m really grateful to get to know the folks who organize and who attend KGC. I’ve really enjoyed seeing this conference grow year over year.
This year, I’m glad to see more business verticals step forward and talk about what they’re doing with knowledge graphs. Some of these industries were more sensitive in the past around confidentiality, but now that graph technology is becoming a much more mainstream practice, they can finally share. That’s a great trend.
As part of the program committee, I’ve been interested to see how many different proposals we had from different groups about using knowledge graphs for controlling nuclear power plants. I’ve never seen that before!
From a social perspective, the thing that’s always stood out about KGC is the in-person factor, or the “hallway track” if you will. Getting to meet people and talk with them one-on-one about what’s going on, share ideas, and experience cross-pollination across disciplines – I really love it, and KGC is such a wonderful forum for that.
What are your thoughts on the value – past, present, future – of knowledge graphs?
Paco Nathan: If I were trying to explain knowledge graphs to a business executive six years ago, they wouldn’t understand because they would be thinking about a bar chart. To another group of people, they’d think you were talking about a graphical user interface (GUI). But today, I think the KG community has finally achieved a broader dissemination of what knowledge graphs are and how powerful they are across a lot of different business verticals.
If you work in insurance, there are three or four top-of-mind use cases that leverage graphs. Or if you work in FinCrime (financial crime detection and prevention), there’s another handful of knowledge graph use cases, and so on with other industries.
The more that we as a community can enumerate these really valuable use cases across different industries, then the more that knowledge graphs can have a lot more traction. Unfortunately, I think a lot of the confusion around knowledge graphs is tied up in the linguistics.
A lot of people think that there is a Knowledge Graph (in capital letters), and there’s only one of them. They might ask, “Should we use Google’s Knowledge Graph or should we use Microsoft’s Knowledge Graph?” But that’s all wrong.
My friends and colleagues at AstraZeneca have at least five different large-scale knowledge graphs tied to revenue generation (I think they might have even more still in the works). A particular application might use two or more of these knowledge graphs, but each KG has a specialized purpose – they don’t all interface with each other by default. There’s no one enterprise knowledge graph to rule them all.
You don’t want one big knowledge graph over everything because then all of your nuances and definitions would be squashed. Instead, it’s a plurality of different solutions for domain-specific contexts. Unfortunately, that plurality has not been communicated effectively enough to top decision makers outside the graph tech space.
Let’s consider an example: In my own graph practice, I might have a lexical graph at a low level. It’s really noisy. I would never use it in production. Rather, I use it as a place to organize things. That lexical graph is like my draft notebook: I’ll go up a couple of layers of abstraction before I use any of that information in my production graph.
So for just one application, I might have five different knowledge graphs, but if I say that in front of an executive they’ll scream at how expensive that sounds, because they don’t understand. These are the thorny details that we really have to overcome, and they’re much more cultural and linguistic.
One of my favorite experts on this issue is Jessica Talisman at Adobe. She comes from more of a library science background. If you want to learn more about how to handle AI, you should really talk to librarians. They’ve been working on some of these knowledge-organization topics for a much longer while.
In your professional opinion, what are some of the biggest challenges facing the world of knowledge graph technology in the years ahead?
Paco Nathan: This is a great question. I did an article for O’Reilly that the word graph in GraphRAG can mean about six (or more) different things depending on who’s talking. When you use the term “graph technology” there’s a lot of different camps, and they don’t really talk with each other. Academically, they publish at different conferences, and they probably don’t even know each other. (Of course, one aim of KGC is to bring several of these camps together in one place!)
First, you’ve got the group of people who are working on more axiomatic approaches and really sophisticated symbolic inference. This group uses a lot of strongly defined semantics and uses tools like RDF (Resource Description Framework), OWL (Web Ontology Language), and all of those related tools.
Second is the labeled property graphs camp. This group has a diverse number of standards and practices, but they all lead toward graph databases. Whenever people think of the landscape of graph technology, this is often the group they think of. In reality, they’re just a part of the whole.
Third, you have the graph visualization camp. Some of these graph data visualization tools are really elaborate and beautiful (like Graphistry for example), but not everyone needs their data visualized in such an elegant way. Conversely, you have firms like Tom Sawyer Software that do really complex network visualizations for niche use cases. For example, they’re often used for building management compliance. If you’re managing a skyscraper in Manhattan, you need a network visualization of your water supply, fire suppression, power routing, air conditioning, and all of the other systems that are a part of your building. Of course, these systems are all graphs. Integrating LLMs with graph visualization would be really interesting, but that doesn’t technically require a graph database, so that’s why I think the data viz people are in a different camp.
Fourth, you have the camp of people doing graph neural networks (GNNs), which is a very interesting field. Back in 2021, GNNs were all the rage. You probably couldn’t get a paper accepted unless you mentioned GNNs at least once. Then a couple years later, nobody’s doing GNNs anymore and everyone’s doing LLMs. The irony is that big players like Spotify, LinkedIn, and some big pharmaceutical companies have recently come out and published about large-scale commercial use of graph neural networks. They were using GNNs but couldn’t yet share their work.
This is a perfect example of the latency effect that Lukas Biewald, the CEO of Weights & Biases, spoke about. He gave the example that some new machine learning algorithm might come out, but it might take five, six, or even 10 years before we have a dataset that can actually demonstrate the benefits of that particular algorithm. Then it’s going to take some more time before people understand the main use cases of the algorithm, and then it takes another couple years before enterprise companies can hire the PhDs who actually understand how to use the algorithm. After that, it’ll be another two or three years before they’re even allowed to talk about the fact that they’re using it. So it’s easy to see how the time between when a breakthrough happens versus when its commercial use is being discussed in the industry can be a long number of years.
Of course, because these different graph technology camps don’t often talk to each other, there’s a lot of disconnect on the important uses of GNNs, such as drug repurposing for rare diseases, for example.
Fifth, you’ve got the camp of statistical relational learning (SRL). This group has had some amazing breakthroughs but also has a history of being computationally expensive. Take causality as an example. Judea Pearl put forth a notion of causal analysis using artificial intelligence where you have graphs that map to the entire world and then you boil the ocean and calculate everything all at once. It’s a glorious promise, but it’ll never happen. On the other hand, you have folks like Urbashi Mitra using reinforcement learning to optimize causality in subgraphs. It turns out, this approach is highly effective in use cases ranging from bank fraud analysis to mergers and acquisitions. There’s a lot of potential for using causal graphs with AI applications, but often these use cases are completely different from what people in the graph database camp are doing.
Sixth is the graph algorithms camp. There are a lot of reasons to use graph algorithms, especially if you work in criminal justice or FinCrime. Graph databases don’t necessarily support all graph algorithms. Some do with extra libraries, but generally speaking, you have to extract your data, put it into a different library, run a graph algorithm, and then bring the results back into your database. That boils down to the graph algorithm users discussing and publishing in separate spaces than other graph database users.
Taking a step back to look at the big picture, there are all these different camps. They’re all using graph technologies, but they don’t talk with each other. Some use cases from one camp would be really valuable in another. For example, using probabilistic soft logic from the statistical relational learning camp and using that for quality control on large graphs would be really useful, but many people in the graph databases camp have never heard of it or don’t want to use that approach.
So all in all, I think breaking down the barriers between these different camps in the graph technology space is probably one of the most difficult projects that we have ahead of us.
How do you think knowledge graphs and AI will change our day-to-day lives – and how we conduct business – in the future?
Paco Nathan: This is a poignant question, because I think the world is on the cusp of some important changes but there’s also a lot of risk and potential danger. Back in 2020, I wrote some notes to myself about the future, which included a lot of reflection on linguistics and AI. There were a few definitions that came of those notes that are particularly salient.
The first definition is automation. A lot of product managers talk about how AI is going to automate this or that in the future, but the reality is that automation is extremely rare. It almost never happens. I can think of a handful of examples.
The automatic transmission in a Mercedes E300 can outperform virtually any driver except for an expert race driver. It’s a top-of-the-line automatic transmission. It’s not fully automatic, because there are overrides, but it’s the closest thing I can see to actual automation.
Another example would be automatic brake systems, which I worked on during my time at Motorola. But again, they’re not fully automatic because you can still pump the brakes or use them in tactical driving. But that’s pretty close to fully automated.
The last example would be autopilot for airplanes. It’s an incredible, phenomenal, life-changing technology. Of course you don’t use autopilot for landing and takeoff either.
So there’s really very few instances of anything that’s fully automated, and yet so many press releases from tech companies will claim that their solution offers complete automation.
Most of the time when people talk about automation, they’re really talking about augmenting some business process or workflow. They aren’t doing automation; they’re doing augmentation. There are financial reasons why folks will use the term “automation,” and to me that’s a red flag. True automation is really rare.
Another salient definition that’s often thrown about is reasoning.
I think reasoning is a bit oversold. Decision making that most groups of people do is not generally based on Sherlock Holmes-style deductive reasoning. Humans engage in that kind of reasoning in single digit percentage points in terms of their cognition. Most of the time when people arrive at a decision – whether personally or in a group – they’re using all sorts of other cognitive methods. They’re not typically using reasoning.
I think it’s dangerous when people are proposing that we can use LLMs and do lots of reasoning. I mean, yes, you can do some reasoning, but it comes at a great computational cost and at a very large energy and carbon expense. It’s taxing the power grids, and ultimately it’s sort of hap-hazard reasoning that doesn’t work very well.
We have other ways of doing symbolic reasoning via a lot of different decision-support systems. This is an area that has been worked out in great detail and is very useful. For instance, a number of systems use neuro-symbolic AI. I don’t think we should do reasoning with deep learning systems because that is absolutely not what they’re built for, they don’t do it very well, and they do it very inefficiently.
In the human history of civilization, working together collaboratively on difficult, existential problems, reasoning has been a very small part of what we do. To overindex on this one skill is a waste of time, and in my opinion, it’s an absurd fantasy.
So if it’s not automation, and it’s not reasoning, what will AI and knowledge graphs help us with in the future? My framework is that AI is a practice.
These ideas have their origins in the Macy conferences of the mid-twentieth century where there was an effort to bring together different disciplines and grapple with the increasing complexity of the world that even back then was beyond the human scale of what people can actually work with and understand. These discussions were where a lot of notions of artificial intelligence first emerged mid-century.
Out of those conferences and building on the work of Margaret Mead, this group of scientists came up with a working definition of AI as teams of people and machines collaborating to produce results that are beyond the capabilities of a single individual. This is where I think you can start to see something that could legitimately be called artificial intelligence. It’s a collaborative practice.
AI is not something that has a pronoun or an article (a, an, or the) in front of it. When you see people talking about the AI, an AI, or all of these AIs, that is complete hubris. That does not exist. That rhetoric is probably serving a marketing, financial, or political aim. It has nothing to do with reality.
We have to scope out a future where we’re working collaboratively with teams of people and machines to realize AI as a practice. I would argue that this has been going on for a long while. We’re seeing AI as a long-scale arc. We don’t always have the words to describe it, but culturally it’s a thing.
Anything else you want to add or share?
Paco Nathan: There’s a lot that we still need to accomplish in computing, and we have to work collaboratively to move the ball forward on projects that matter. I hope these are things that we can discuss at this year’s Knowledge Graph Conference. I hope to see you there.
The Knowledge Graph Conference 2025 is just around the corner: Get your ticket to KGC 2025 today and meet experts like Paco Nathan and others from across the KG community.
***
(Catch Part 2 of our interview with Paco Nathan after the conference!)