How to extract the content of elasticsearch indexes?

fmind · March 9, 2018, 4:08pm

Hello,

I want to perform an analysis on elasticsearch indexes.

For instance, I have an index that stores two fields: 'name' and 'age'.

The result I want are the documents associated to each value of each field:

name => ['bob' => [Doc#1, Doc#2, Doc#3],
'alice' => [Doc#4, Doc#5]]

age => ['20' => [Doc#2, Doc#4],
'30' => [Doc#1, Doc#5]
'40' => [Doc#3]]

Is there a way to perform this kind of query in elasticsearch ?

Thank you !

fmind · March 13, 2018, 10:52am

Bump

abdon · March 13, 2018, 2:41pm

You would an aggregation instead of a query for this. You could for example use a Terms aggregation.

Given these docs:

PUT my_index/doc/_bulk
{ "index" : { "_id" : "1" } }
{"name": "bob", "age": 30}
{ "index" : { "_id" : "2" } }
{"name": "bob", "age": 20}
{ "index" : { "_id" : "3" } }
{"name": "bob", "age": 30}
{ "index" : { "_id" : "4" } }
{"name": "alice", "age": 20}
{ "index" : { "_id" : "5" } }
{"name": "alice", "age": 30}

You can get the IDs by name using this aggregation:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "top_names": {
      "terms": {
        "field": "name.keyword",
        "size": 100
      },
      "aggs": {
        "top_ids": {
          "terms": {
            "field": "_id",
            "size": 100
          }
        }
      }
    }
  }
}

Which will return you:

"buckets": [
        {
          "key": "bob",
          "doc_count": 3,
          "top_ids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "1",
                "doc_count": 1
              },
              {
                "key": "2",
                "doc_count": 1
              },
              {
                "key": "3",
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "alice",
          "doc_count": 2,
          "top_ids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "4",
                "doc_count": 1
              },
              {
                "key": "5",
                "doc_count": 1
              }
            ]
          }
        }
      ]

Aggregating on _id is not possible on older versions of Elasticsearch. You may need to replace _id with _uid (which is a concatenation of the _type and _id) if you're using an older version.

To aggregate on age you would use replace name.keyword with age in the request above.

Note the "size": 100 in the request above. This will limit the response to contain the 100 most common names and will return you up to 100 IDs. You could increase that number if you need to retrieve more values (but you may run into memory limitations). Or alternatively, if you're on version 6.1 or later, you could also take a look at the composite aggregation to retrieve all values and IDs.

fmind · March 14, 2018, 9:33am

Perfect, this is exactly what I need !

Thank you

system · April 11, 2018, 9:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to retrieve the content of Elasticsearch reverse indexes? Elasticsearch	3	1113	September 3, 2018
Help with aggregations Elasticsearch	4	1862	December 25, 2017
Row and match count in elasticsearch Elasticsearch	1	363	August 15, 2019
Can you give me an advice to make the elasticsearch query? Elasticsearch	2	356	July 15, 2020
Return all documents that have the highest value in a field Elasticsearch	1	1289	March 22, 2018

How to extract the content of elasticsearch indexes?

Related topics