Building Graph Relationships Between Documents

Andrew_Stroz · September 13, 2018, 12:54pm

I have a index that contains a field with the text contents of .docx, .pptx and .pdf documents.

I have another index that holds documents that have a field with a string of characters representing a piece of equipment.

I would like to run a graph query that shows all of the related documents retrieved from full text search to all of the pieces of equipment.

Is this possible?

Thanks.

Mark_Harwood · September 13, 2018, 1:04pm

Is there a field the two indices share?

For the sake of argument let's call that field "equipment_id" and assume it is of the type keyword

I'm guessing the field is of the type text and hidden in the text are some references to items of equipment. If the pattern of an equipment_id is sufficiently unique (e.g. always an 11 digit number) then it might be possible to use a regex to extract these values from the text and place into keyword type field called equipment_id which is an array. Let's also assume each document has a keyword field called doc_id.

Given this setup it would be possible to create a graph of doc_id and equipment_id values and how they are connected purely using the document index (ignoring the equipment index).

This is mostly speculation about your data so I think you may need to fill in some more details about the problem here.

Andrew_Stroz · September 13, 2018, 1:44pm

Yes this is true.

This does not hold true. The equipment_id field is not sufficiently unique to perform regex to extract the values. That is why I was hoping I could relate equipment_id from one set of documents to the full text search results for that equipment_id in the indexed word/ppt/pdf.

Thanks.

Mark_Harwood · September 13, 2018, 2:01pm

If your app can't isolate the numbers from the text at ingest time then elasticsearch will equally have a hard time doing any analysis on this data at query time.

Andrew_Stroz · September 13, 2018, 2:06pm

Is there not some way to visualize with graph the equipment_id as a central node and all of its edges are connected to full text search query results? Preferably with the strength of the node being related to the score returned from the full text search.

Mark_Harwood · September 13, 2018, 2:18pm

Not a clean way, no. We rely on nodes being identified by a combination of fieldname and term which will make life complex if you can't extract equipment IDs out of the text into a field called equipmentID.

Andrew_Stroz · September 13, 2018, 3:00pm

Is there any way for a node to be identified by a document? Or a node to be identified as by a combination of fieldname and term but as the result of a search query.

I would really like to have the ability to relate a 'node being identified by a combination of fieldname and term' to documents that that match a query for that term.

This would allow me to harness the power of Elastic as a full text search service and the visualization that graph offers.

Mark_Harwood · September 13, 2018, 3:25pm

Is this a useful graph visualization? If I understand your requirement it is a star-shaped graph with a single central "query" node and lines connecting out to matching satellite "doc" nodes.

That sounds more usefully drawn as a horizontal bar chart with a bar per doc and bar lengths being doc score?

Andrew_Stroz · September 13, 2018, 4:09pm

The documents that are full text searched based on equipment_id also have other metadata that relate each-other ie. same document type, document status, etc.

The visualization I want to see is star like at the center but documents with equipment_id found in the full text search query are related to each other using other metadata that are in the document.

I will try and play around with graph in my Kibana instance to gain a better understanding of the relationships I can build.

Thanks for your help.

Mark_Harwood · September 13, 2018, 4:33pm

Useful graphs are those that use fields with high-cardinality (many unique terms).
Examples include bank accounts, email addresses or hashtags [1]. These create sparser, interesting shapes. Meaningful relationships exist between rarer terms.

If you choose fields with a small number of values (eg "gender" or your doc type/status fields) then you end up with "hairball" graphs with too many lines, connecting all the nodes. These tend to be much less interesting connections.

[1] http://hivemindmap.com/

system · October 11, 2018, 4:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Relationships among entities with graph Kibana elastic-stack-graph	2	1630	July 6, 2017
Kibana Regex check if a field contains the value of another Kibana kql-kibana-query-language	3	456	August 10, 2022
Extract edges between terms of a field based on another field[s] Kibana elastic-stack-graph	5	1225	July 6, 2017
Graph query across multiple documents Kibana elastic-stack-graph	9	2791	July 6, 2017
How to index graphs ? And how to search in graphs? Elasticsearch	4	468	August 12, 2019

Building Graph Relationships Between Documents

Related topics