Aggregation on multiple documents in same index, but relation defined in another index


(Rohit Raj) #1

I have a requirement to aggregate fields from 2 or more document in a single index(Pages index), and the relation between these documents is defined in another index(Relations index).

localhost:9200/pages/page/0277e82d-6c48-4557-bfa5-aadf9bcd52f6 contains

{
    _index: "pages",
    _type: "page",
    _id: "0277e82d-6c48-4557-bfa5-aadf9bcd52f6",  // this is institute id
    _score: 1,
    _source: {
        academics: [],
        display_picture: {},
        established_date: "1869-01-01 00:00:00",
        featured: "yes",
        follower_count: 82,
        id: "0277e82d-6c48-4557-bfa5-aadf9bcd52f6",
        institute_type: "public (autonomous)",
        location: {},
        profiles_added_to_count: 74,
        status: "1",
        subtype: "page_institution",
        title: "St. Xaviers College | Mumbai",
        type: "page"
    }
}

and localhost:9200/pages/page/5007e337-7a1c-4668-9204-5cf77b3aed97 contains

{
    _index: "pages",
    _type: "page",
    _id: "5007e337-7a1c-4668-9204-5cf77b3aed97",  // this is recruiter id
    _score: 1,
    _source: {
        featured: "no",
        follower_count: 0,
        id: "5007e337-7a1c-4668-9204-5cf77b3aed97",
        profiles_added_to_count: 0,
        status: "0",
        subtype: "page_organisation",
        title: "Oracle Financial Corporation",
        type: "page"
    }
}

now these 2 are related, and the relation is defined in second index (Relations).

localhost:9200/relations/recruiter/0277e82d-6c48-4557-bfa5-aadf9bcd52f6 contains

{
    _index: "relations",
    _type: "recruiter",
    _id: "0277e82d-6c48-4557-bfa5-aadf9bcd52f6",
    _version: 1,
    found: true,
    _source: {
        e6231b19-1489-42a1-951e-90ccd574fcd3: {
            id: "e6231b19-1489-42a1-951e-90ccd574fcd3",  // this is recruiter id
            type: "page"
        },
        5007e337-7a1c-4668-9204-5cf77b3aed97: {
            id: "5007e337-7a1c-4668-9204-5cf77b3aed97",  // this is recruiter id
            type: "page"
        }
    }
}

an institute can have multiple recruiters, so the relation here contains 2 different recruiter ids.

So my requirement is to join the above two documents(index as Pages) by their relations(index as Relations) for querying and getting aggregation based on the overall results.

I want to query over the Pages index, with name Xaviers, and want to get the recruiters details too in the result with aggregation of recruiters too.

Can someone please let me know how to achieve this.


(Mark Harwood) #2

This looks like a worrying bit of JSON - long IDs as field names doesn't scale well. Every new field name typically revises the mapping definition for an index which must be updated centrally and communicated to all other nodes in the cluster as part of the "cluster state". There are better ways of structuring your data which you need to consider first. You need to shift these IDs from the left hand side of key:value representations to the right hand side of a colon.


(Rohit Raj) #3

@Mark_Harwood

Thanks for pointing this out, I will figure out the alternative for this.
If in case I change the representation to as the nested objects in elasticsearch:

{
    _index: "relations",
    _type: "recruiter",
    _id: "0277e82d-6c48-4557-bfa5-aadf9bcd52f6",
    _version: 1,
    _score: 1,
    _source: {
        companies: [
           {
               id: "e6231b19-1489-42a1-951e-90ccd574fcd3",  // this is recruiter id
               type: "page"
           },
           {
               id: "5007e337-7a1c-4668-9204-5cf77b3aed97",  // this is recruiter id
               type: "page"
           }
        ]
    }
}

will this help you, in figuring out the necessary solution that I requested. Please let me know.


(Mark Harwood) #4

To do this sort of relationship walking you might want to take a look at the x-pack Graph feature. Here's an example of a data setup and the results of querying for Xavier :


(Rohit Raj) #5

@Mark_Harwood

So the solutions suggested by you, is basically to work with the mappings.

How do you want me to query ? Since my query will contain the aggregation query too, will this help in getting the aggregation ?

Can you please let me know about the /_alias that you defined.

This is the query that I want to make.

I am getting a bit of confused, whether it will work on my architecture. Since I have all the informative documents in the Pages index, and the relation between these documents are there in Relations index.

Please let me know if this will lead in the desired direction.


(Mark Harwood) #6

In your example you want to query an attribute of one index (institute name), take the results from that and query another index (relationships) and finally hit a third index (recruiter).

There is no core elasticsearch query that will do this for you in one request. However the Graph API in the x-pack extension has the syntax for walking these relationships if the underlying indices share common field names in the way I outlined in my gist.

The alias is required to work with Kibana so that the Graph UI can see the 3 indices as one.

The example I gave has the same core concepts (institutes, recruiters, relationships) but with the necessary core attributes required to answer the question you asked. You can download the 5.0 release candidate and trial x-pack to work with my example.


(system) #7