Elasticsearch & X-Pack: how to get vertices/connections from nested documents

I just started using X-Pack for Elasticsearch and want to connect vertices from a nested document type. However, looking for documentation on this hasn't got me anywhere.

What I have is an index of documents which have person names/ids as nested documents (one document can have many persons, one person can be related to many documents). The desired result is to get a graph data with connections between persons.

Does anyone have a clue or can tell me if this is even possible?

Part of my mappings:

mappings: {
    legend: {
        properties: {
            persons: {
                type: 'nested',
                properties: {
                    id: {
                        type: 'string',
                        index: 'not_analyzed'
                    },
                    name: {
                        type: 'string',
                        index: 'not_analyzed'
                    }
                }
            }
        }
    }
}

And my Graph API query, which of course doesn't work because I don't know how to handle the "name" field of the nested "persons" field.

POST sagenkarta_v3/_xpack/_graph/_explore
{
  "controls": {
    "use_significance": true,
    "sample_size": 20000,
    "timeout": 2000
  },
  "vertices": [
    {
      "field": "persons.name"
    }
  ],
  "connections": {
    "vertices": [
      {
        "field": "persons.name"
      }
    ]
  }
}

Thanks in advance!

Unfortunately Graph does not support nested documents but you can use copy_to in your mappings to put the person data in an indexed field in the containing root document.

I can see that you have the classic problem of "computers-want-IDs-but-people-want-labels" and have both these values. In Graph (and arguably the rest of Kibana too) I suggest you use tokens that combine IDs for uniqueness' sake and names for readability by humans.

The copy_to and IDs-and-labels tips are part of the modelling suggestions in my elasticon talk this year: https://www.elastic.co/elasticon/conf/2017/sf/getting-your-data-graph-ready

1 Like

Thank you so much Mark! This definitely gets me started on this problem, and thanks also for the id-tip.

By the way, do you know if Graph will in a distant future support nested documents?

I see them as potentially useful in entity resolution scenarios which is a problem for just about every business that happens to deal with people. The approach is for each entity in a root document ("victim" vs "witness" vs "3rd party" vs "applicant" etc) to be given a nested doc with an array of all of the keys that were used to identify this particular individual. Strong keys can be formed by combining weak components e.g. surname+postcode+date_of_birth. Multiple key combinations should be used to allow a data fusion process. By separating out each entity's keys into a discrete Lucene document it is then possible to walk the graph of keys found in other nested docs to chain together all of the aliases used by a single person across many docs. A person's identity is itself a graph of keys and their uses.
The nice thing about using nested docs for this model is the assertions they contain live and die with the business document that stated them.

In some respects that was what nested docs were invented for - to avoid muddling one entity's properties with those of another.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.