Elasticsearch 6.2.4: how to get distinct values from multiple fields from ordered documents?

Below are the documents like:

{
    "name": "Micheal",
    "title": "appliaction engeneer",
    "age": 30,
    "interest": "java",
    "manager": "Amily"
},
{
    "name": "Georgiana",
    "title": "appliaction engeneer",
    "age": 34,
    "interest": "java",
    "manager": "Amily"
},
{
    "name": "Benjamin",
    "title": "Product Manager",
    "age": 36,
    "interest": "management",
    "manager": "robinson "
},
{
    "name": "Selina",
    "title": "appliaction engeneer",
    "age": 30,
    "interest": "java",
    "manager": "Grey "
},
{
    "name": "Edison",
    "title": "appliaction engeneer",
    "age": 26,
    "interest": ".net",
    "manager": "Amily "
}

Now I need to exact three fields title, age, interest from the documents.

The results should be like:

{
    "title": "appliaction engeneer",
    "age": 30,
    "interest": "java",
},
{
    "title": "appliaction engeneer",
    "age": 34,
    "interest": "java",
},
{
    "title": "Product Manager",
    "age": 36,
    "interest": "management",
},
{
    "title": "appliaction engeneer",
    "age": 26,
    "interest": ".net",
}

One of the repeated record like below one should be removed.

{
    "title": "appliaction engeneer",
    "age": 30,
    "interest": "java"
}

And most importantly, the order of the obtained records should be consistent with the documents.

How can we achieve this?

You should be able to get this via _source. Try this ...

GET someindex/sometype/_search
{
    "_source":["title","age","interest"]
}

_source cannot give the distinct values, it just gives all values. That's not I want.

May be use a terms agg with a script that combines all the values (which is better to compute at index time)?

Yes I've tried this method but found the order of the results is not consistent with the original documents. So it also doesn't meet my requirements (as I emphasized in the bolding part in my question).

Well...

the order of the obtained records should be consistent with the documents.

I don't exactly understand what it means.

As I described firstly in the question, the order of the documents is:
{"name": "Micheal",...},
{"name": "Georgiana",...},
{"name": "Benjamin",...},
{"name": "Selina",...},
{"name": "Edison",...},

If I use the term agg with script that combines the 3 values interested ("title", "age", "interest" in this case), the obtained results may come from the following documents:
{"name": "Benjamin",...},
{"name": "Edison",...},
{"name": "Micheal",...},
{"name": "Selina",...},
{"name": "Georgiana",...},

The order is different than the original one.

How is this sorted in the first place?

It's sorted by query part in the DSL (I didn't list the query here) and have multiple roles to define the order. That's why we should follow this order and can't change it. Otherwise we can't get the most relevant records in the first part.

It's like: we already have the results by using:
GET index_name/_search
{"query": {...}}

Now we need to extract the 3 fields from these documents for user's interest, but the order of the final results we got at last should be consistent with the query results we got firstly.

In this case, it should be:
The first record which contains the 3 fields should come from the first document: {"name": "Micheal",...}
The second record which contains the 3 fields should come from the second document: {"name": "Georgiana",...}
...

Meanwhile, we should remove the repeated values/get the distinct values of the 3 fields which the user interests.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.