Christos Boutsidis

VP of Relationship Analytics & Engineering, Goldman Sachs

Christos Boutsidis is the Vice President of the Relationship Analytics and Engineering Department at Goldman Sachs. The focus of his organization is on building the Goldman Sachs Knowledge Graph and making it available firm-wide. Towards that end, they combine multiple sources of structured (e.g., trades, transactions) and unstructured (e.g., text, voice) data into a single, highly heterogeneous, knowledge graph with hundreds of millions of nodes and billions of edges – their mission is to capture all the firm relationships related to communication, trading, as well as money transfers. From an algorithms perspective, the team develops scalable solutions (Hadoop, MapReduce, HDFS, HBase) for several graph mining problems such as vertex centrality (e.g., pagerank), vertex similarity, (shortest) paths, vertex deduplication, community detection, and graph embeddings, to name a few. From an applications point of view, their work is used 1) within the Compliance Division, to enable the development of regulatory surveillances such as detection of insider trading and anti money laundering 2) outside of the Compliance Division, in particular, in the “One Goldman Sachs” initiative, a cross-divisional client coverage initiative that will develop and implement a more integrated approach to serving the firm’s clients who interact with multiple divisions, including Securities, Investment Banking, Investment Management, and Merchant Banking.

Before that, Christos was a Research Scientist with the Scalable Machine Learning Group of Yahoo Research in New York and a Research Staff Member with the Mathematical Sciences Department of the IBM T. J. Watson Research Center in Yorktown Heights, NY. Dr. Boutsidis earned a Ph.D. in Computer Science from Rensselaer Polytechnic Institute in May of 2011 and a BS in Computer Engineering from the University of Patras, in Greece in July of 2006. Dr Boutsidis has published over 30 articles in conferences and journals in algorithms, machine learning, and statistical data analysis.

Talks and Events

A Distributed Vertex-Deduplication Framework for Large Graphs

Our framework is built on top of a distributed Hadoop/MapReduce/Hbase infrastructure capturing both “low-level” graph database operations as well as “higher level” algorithmic aspects such as vertex-vertex similarity, graph clustering, and “robust” vertex id stamping. We have been using the framework to de-duplicate the Goldman Sachs Knowledge Graph – in this talk we will report a few experimental results applying the framework to public datasets

Track: Systems and Scale