How to extract the content of elasticsearch indexes?

Hello,

I want to perform an analysis on elasticsearch indexes.

For instance, I have an index that stores two fields: 'name' and 'age'.

The result I want are the documents associated to each value of each field:

name => ['bob' => [Doc#1, Doc#2, Doc#3],
'alice' => [Doc#4, Doc#5]]

age => ['20' => [Doc#2, Doc#4],
'30' => [Doc#1, Doc#5]
'40' => [Doc#3]]

Is there a way to perform this kind of query in elasticsearch ?

Thank you !

Bump

You would an aggregation instead of a query for this. You could for example use a Terms aggregation.

Given these docs:

PUT my_index/doc/_bulk
{ "index" : { "_id" : "1" } }
{"name": "bob", "age": 30}
{ "index" : { "_id" : "2" } }
{"name": "bob", "age": 20}
{ "index" : { "_id" : "3" } }
{"name": "bob", "age": 30}
{ "index" : { "_id" : "4" } }
{"name": "alice", "age": 20}
{ "index" : { "_id" : "5" } }
{"name": "alice", "age": 30}

You can get the IDs by name using this aggregation:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "top_names": {
      "terms": {
        "field": "name.keyword",
        "size": 100
      },
      "aggs": {
        "top_ids": {
          "terms": {
            "field": "_id",
            "size": 100
          }
        }
      }
    }
  }
}

Which will return you:

"buckets": [
        {
          "key": "bob",
          "doc_count": 3,
          "top_ids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "1",
                "doc_count": 1
              },
              {
                "key": "2",
                "doc_count": 1
              },
              {
                "key": "3",
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "alice",
          "doc_count": 2,
          "top_ids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "4",
                "doc_count": 1
              },
              {
                "key": "5",
                "doc_count": 1
              }
            ]
          }
        }
      ]

Aggregating on _id is not possible on older versions of Elasticsearch. You may need to replace _id with _uid (which is a concatenation of the _type and _id) if you're using an older version.

To aggregate on age you would use replace name.keyword with age in the request above.

Note the "size": 100 in the request above. This will limit the response to contain the 100 most common names and will return you up to 100 IDs. You could increase that number if you need to retrieve more values (but you may run into memory limitations). Or alternatively, if you're on version 6.1 or later, you could also take a look at the composite aggregation to retrieve all values and IDs.

1 Like

Perfect, this is exactly what I need !

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.