X-Pack Graph compared to other GraphDB

I'm trying to understand if x-pack Graph can be compared to a GraphDB like Neo4j, OrientDB, ArangoDB etc? How would x-pack Graph perform and what would be the pros or cons compared to a Graph database?


cc /@Mark_Harwood

It is different. Graph databases will force you to take your original business documents (e.g. tweet, email etc) and store them as multiple "edge" records with only 2 nodes per edge. This means you have to think about the model carefully:

  • Is the tweet a node?
  • Presumably author, @mentions and hashtags are nodes?
  • Do you create edge records linking nodes directly or do you always link them to the tweet?
  • Should you index text and hastags and accounts etc for search purposes?

Often creating edge records in a graph database is an act of censorship as not every item in the original business doc is deemed worthy of being a node in the graph along with all of its required edge records.

In elasticsearch we just index the whole business doc - it acts as super-edge linking all of the indexed terms contained within as associated. This is both a strength and a weakness.

"True" graph databases offer index-free-adjacency at search time essentially meaning everything is linked using pointers rather than terms that need looking up in an index. However this query-time benefit is bought and paid for with a tax on every write. Real-world identifiers (account handles, email addresses, hashtags, ip addresses etc) must be converted to internal pointers on insert using (guess what?) an index so in a graph database the cost of index look-ups is shifted from query-time to index-time.

The other thing to note about elasticsearch edges is they are derived on-the-fly from data in the index. Rather than returning a million transaction records saying that IP address A transacted with IP address B we can use the aggregations framework to efficiently summarise all of these documents into a single edge. I'm not sure how graph databases fare at that. Being from a search heritage we also use our knowledge about relevance ranking documents to relevance rank graph connections returned in order to help avoid the typical graph exploration problem of thundering off into "super nodes" that link all data together.

Having said all this dedicated Graph databases have pros that elasticsearch will likely never offer e.g. pathfinding algorithms. By optimising for this use case (using pointer-based traversal, assuming single machine scale-up architecture) they can provide these functions with the related costs minimised.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.