Knowlegde Graphs, ELK and NEO4j

Hello,

I would like to use ‘knowledge graph’ capabilities to detect and identify potential relationships and patterns in data (1). That data is currently stored in Elasticsearch indices. I would like to detect relations, etc in the same index and between indices.
I’ve had a look at NEO4J on how data and relations are defined and stored in that context (and played with the Cypher language). But I want to keep my data in one place.
I see that the ELK stack also has “Graph” capabilities. When looking deeper into the available information on that topic, I notice that most of the information is not that recent. I’ve found that Kibana has a ‘Graph’ capability (2) via an ‘xpack’ extention (not free) and Elasticsearch has a “Graph Explore API” (3) – ‘enabled by default’ apparently, but when trying to use it, it says I need a license. Only wanting to do some quick research at that moment.

I do want to use “graph” technology approaches but at this moment it’s not clear yet if (a) I should give the ELK Graph stuff a go, or (b) directly go for the Neo4J capabilities and potential solution, or – I honestly would like to keep my data in my Elastic indices and hence (c) I should go for a combination of both technologies.

Maybe you guys can give me some advice on this matter and/or maybe point me to some existing projects, (show)cases, etc? Also, deeper and more practical info on Elastic “Graph” is also welcome.

Thanks in advance!

(1) we are talking about millions of documents here ;
(2) Graph | Kibana Guide [8.12] | Elastic
(3) Graph explore API | Elasticsearch Guide [8.12] | Elastic

Hmm ... anyone :slight_smile: ?

May be @Mark_Harwood1 would like to share with you some ideas? :grinning:

If you want to try it, just activate the trial license from Kibana.

I like it for fraud analysis or Music recommandation for example. But it's not a Graph with connected documents as you would have with Neo4J.

Hi Wim,
It would help to know a little more about what you want to achieve and what sort of data you are working with.
Elasticsearch is not optimised for graph work due to its distributed nature, document-centric design, randomised shard allocation of those documents and use of inverted indexes. What graph-like functionality it offers it does within these constraints. However, with Neo4j you have to break up your business documents (like this web page you’re reading) and decide how to represent the information inside as pairs of nodes (is Mark connected to David? Is this message of mine connected to you? Is it connected to a “thread” object?)

With elasticsearch we just index documents and all the terms they use. In graph terms an elasticsearch document acts like a super-edge connecting all the terms in it. No need to model. We then use things like significance algorithms to tease out just the statistically significant pairings eg Mark is significantly connected to the term “graph”.
Because the data isn’t carefully curated in the way a Neo4j graph requires, elasticsearch has an over-abundance of connections between all terms in a document and significance filtering can be required on fields.
The graph api is essentially syntactic sugar for existing aggs and has a pre-built UI that calls it. The significant_terms or significant_text aggregations can be used to discover significant connections and the adjacency_matrix can then be used to show how these things are connected. This is how I summarise the key news topics each day on my news monitoring site

Hope this helps.

Hi David and Mark,

First of all, a big sorry for this very late reply. Meanwhile, I was doing other stuff, not related to this (interesting) topic, and that kept me from answering (because I also wanted to provide you a decent reply :wink: )

As a quick reply; yes, this certainly help. Thanks already for that.
But, Mark (and David), like you ask, I'll supply a bit more context so that you guys have an better idea of the question or challenge here.

First a bit of state and/or context: we have a specific product (used by a few thousand dedicated users). And functionality is growing as we speak. Classic web-front combined with several REST backend services. Content is persisted in Relational databases. So, classic relations and we talking about like 20 million of records here and growing.

Besides that content basis (rel dbs), we also want to make this content (or a significant subset of it) high searchable. Enter Elastic (over a year ago), so we're already acquainted with indices, mappings, indexing, querying, shards, scripting, etc. And we've already indexed (these) 20 mio as json docs to Elastic.
Besides that, we have a Search component that fires (api) requests (the usual suspects: _search, aggregations, full-text search etc). Works fine.

BUT and this is the challenge here, of course. Those indexed docs (let's call this index 'job' ) contain values/terms, that are like 'foreign keys' to docs in the same or in other indices in the cluster. And, besides that, we would also like to index the organisational structure of the compagny; think of individuals in their teams and teams-in-teams in an Organisational structure (let's call this index 'organisation' ). Mostly arranged as a tree-structure, but not necessarily: we can also have cross-team compositions, like 'workgroups' and such. More-over, certain terms in the 'job' index refer foreign-key like to unique entries in the 'organisation' index.
A 'job' can have many properties on itself, as has the 'origanisational entry'. So, this content also needs to be highly searchable and displayed eventually.

So, to make it more tangible: a job can have a (sub-)job relation, and each jobs is executed by more 'individuals and/or teams'. It's interesting to know which jobs are assigned to whom and/or find jobs that are related to a certain team (anywhere in the 'organisation').
Besides that, and this is a valuable use-case imo, it would also be nice to graphically show the organisation and be able to travers it, eventually ending up in more detail of the person or team and details on their related jobs.

And that's where, imho, a graph approach and the typically traversal possibilities come in. In other words: be able to meander and drill-down or -up during that journey.
This feels like a job for a Knowledge graph product, but we also need the capabilities to do full and fast searching. Hence why Elastic (search and potentially Graph (explore api and UI) ) and combi with Neo4j came into my view of interest :slight_smile:
It could be something like a combi: store detailed data in Elastic indices, store relations (triplets) from that data into neo4j. And use this approach to handle the use-cases described above.

There might be other interesting roads or combinations to discover here (maybe also using different products or integrations). But these especially came into mind.

Voila, the full story. Hope this sheds some light on the challenge (and nice opportunity here :wink: ).

So, any advice or thoughts from you guys are very welcome !!

Cheers,

Wim.

PS: we've already experimented with 'joins' in Elastic (and than will not be the solution) + I already have a fair knowlegde of Neo4j (and Cypher).