I just started using X-Pack for Elasticsearch and want to connect vertices from a nested document type. However, looking for documentation on this hasn't got me anywhere.
What I have is an index of documents which have person names/ids as nested documents (one document can have many persons, one person can be related to many documents). The desired result is to get a graph data with connections between persons.
Does anyone have a clue or can tell me if this is even possible?
Unfortunately Graph does not support nested documents but you can use copy_to in your mappings to put the person data in an indexed field in the containing root document.
I can see that you have the classic problem of "computers-want-IDs-but-people-want-labels" and have both these values. In Graph (and arguably the rest of Kibana too) I suggest you use tokens that combine IDs for uniqueness' sake and names for readability by humans.
I see them as potentially useful in entity resolution scenarios which is a problem for just about every business that happens to deal with people. The approach is for each entity in a root document ("victim" vs "witness" vs "3rd party" vs "applicant" etc) to be given a nested doc with an array of all of the keys that were used to identify this particular individual. Strong keys can be formed by combining weak components e.g. surname+postcode+date_of_birth. Multiple key combinations should be used to allow a data fusion process. By separating out each entity's keys into a discrete Lucene document it is then possible to walk the graph of keys found in other nested docs to chain together all of the aliases used by a single person across many docs. A person's identity is itself a graph of keys and their uses.
The nice thing about using nested docs for this model is the assertions they contain live and die with the business document that stated them.
In some respects that was what nested docs were invented for - to avoid muddling one entity's properties with those of another.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.