Terms agg not giving all terms for huge text fields

krishna_chaitanya · March 29, 2018, 5:15pm

I have a text field which is big. (~3000 char). I am storing the flow/path of the user inside my application.

Doc1:
flow_path: "ABCD12345678,ABCD098765432,PQRS56789043,EFG321987309,ABCD12345678,ABCD098765432,PQRS56789043,EFG321987309,ABCD12345678,ABCD098765432,PQRS56789043,EFG321987309,ABCD12345678,ABCD098765432,PQRS56789043,EFG321987309,ABCD12345678,ABCD098765432,PQRS56789043,EFG321987309"

Doc2:
flow_path: "ABCD12345678,ABCD098765432,PQRS56789043,EFG321987309"

So, user steps are recorded inside this comma-separated string.

I want to be able to search and also aggregate on this field. So, by default, Elasticsearch created 1 analyzed and 1 non-analyzed string and I expected it to work for my usecase.

I am able to search using analyzed flow_path field as expected, but when aggregating on flow_path.keyword field, it is not producing all the terms as expected in Kibana.

Ex: (based on above example)
Term : Count
ABCD12345678,ABCD098765432,PQRS56789043,EFG321987309 : 1

Doc1 term is completely ignored. I increased the size parameter to huge value(20000), but still the issue persists.

Version: ES, Kibana 5.6.3

Please help how I can aggregate on such big text field getting all the terms.

abdon · March 30, 2018, 11:16am

If you do not provide an explicit mapping for your index, string fields will be mapped like this:

          "my_field": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }

If you go with this default mapping, the .keyword multifield (that you are aggregation on) has been mapped to ignore any values that have a length larger than 256 characters, because of the "ignore_above": 256 parameter. The value in your Doc1 is longer than that, and that's why you're not getting that document back with the terms aggregation.

To fix this, you could change your mappings to allow longer values, for example setting ignore_above to 512.

Note that you can not update the mappings for existing indexes. You will have to reindex your data to a new index that has the updated mappings applied.

krishna_chaitanya · March 30, 2018, 1:56pm

Thanks for the reply. Looks like that's the issue, I checked my mapping, and its 256 chars.

system · April 27, 2018, 1:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Visualizing Terms In A Field Kibana	4	345	February 7, 2020
Terms aggregation on _field_names field Kibana	4	988	July 6, 2017
Generate Aggregation List for Large Index Elasticsearch	4	493	January 25, 2017
Agg results can't support long words Elasticsearch	5	733	November 24, 2021
Kibana Word Cloud on Text Field Kibana	11	6286	October 30, 2020

Terms agg not giving all terms for huge text fields

Related Topics