Is there any length limitation of the key in term aggregation ? I mean results display

pill663 · August 15, 2016, 11:55am

I have some long string documents stored in elasticsearch(version 2.3).
I want to do a term aggregation for my documents. I choose one Field for the term , the records that stored in this field is some long strings.
the total count of the docs is 718. But the aggregation result is much less than the total count.
I think some long docs are filtered or dropped because the string is too long, some are longer than 256 characters.
So, I want to ask 'Is there any length limitation of the key in term aggregation'?
Is there any way to break this limitation?
You can see these are the aggregation results:
{
"took": 100,
"timed_out": false,
"_shards": {
"total": 36,
"successful": 36,
"failed": 0
},
"hits": {
"total": 718,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "select () from t_intg_dm_00v3 where XXX = XXX and ((((((fm_office in (XXX) and task_type in (XXX)) and task_status != XXX) and task_status != XXX) and task_status != XXX) and auto_schedulable != XXX) and change_time > XXX) and active = XXX limit XXX, XXX;",
"doc_count": 12
},
{
"key": "select () from t_intg_dm_00v3 where XXX = XXX and task_id = 'XX-XXXXXXX-XXXXX' and active = XXX order by operate_time asc limit XXX, XXX;",
"doc_count": 2
},
{
"key": "select () from t_intg_dm_00v3 where XXX = XXX and (task_id = 'XX-XXXXXXX-XXXXX' and (operate_type = XXX or operate_type = XXX)) and active = XXX order by task_log_id asc limit XXX, XXX;",
"doc_count": 1
},
{
"key": "select () from t_intg_dm_00v3 where XXX = XXX and (task_id = 'XX-XXXXXXX-XXXXX' and (operate_type = XXX or operate_type = XXX)) and active = XXX order by task_log_id asc limit XXX, XXX;",
"doc_count": 1
},
{
"key": "select count(1) count from t_intg_dm_00ui where XXX = XXX and create_time <= '2016-08-15 XXX:59:59' and active = XXX;",
"doc_count": 1
}]
}
}
}

cbuescher · August 15, 2016, 1:12pm

Whats the mapping for the field you are aggregating on?

pill663 · August 16, 2016, 2:43am

the name of mapping is 'slowquery'.
it has many keys, such as 'Pattern','Database','Schema'.
I'm doing term aggregation on the key 'Pattern'. It's string.

cbuescher · August 16, 2016, 8:49am

From your example its hard to tell, can you add how you are doing the aggregation?

pill663 · August 16, 2016, 10:58am

My query body is like this,the field is 'Pattern',mapping is 'slowquery':
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "type:slowquery",
"analyze_wildcard": true
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": 1471299986398,
"lte": 1471314386399,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"size": 0,
"aggs": {
"2": {
"terms": {
"field": "Pattern.raw",
"size": 500,
"order": {
"_count": "desc"
}
}
}
}
}

cbuescher · August 16, 2016, 12:40pm

If you run the query without the aggregation (and size set to some higher value), do all the results have the field set that you are aggregating on?

pill663 · August 17, 2016, 1:46am

If I query without aggregation, then the results number is correct.
such as, curl -XGET 'http://esurl/_all/_search?size=200' then the return size is correct.
when I add the aggregation query body, the result is much less.

Christian_Dahlqvist · August 17, 2016, 5:38am

Please provide the mapping for the field. You can retrieve this through the get mapping API.

pill663 · August 17, 2016, 7:04am

pill663 · August 17, 2016, 7:05am

It's a little bit long , so I didn't transfer it to json style.

pill663 · August 17, 2016, 7:08am

It seems the field has one attribute, "ignore_above": 256.
Is it cause the problem. if string is longer than 256,it will be ignored when doing aggregation?

Christian_Dahlqvist · August 17, 2016, 7:13am

I believe it will only index the first 256 bytes of the not_analyzed string, which could explain why expressions that differ only after the 256th byte gets grouped together.

pill663 · August 17, 2016, 7:15am

How to change it , such as to 1024 bytes.

Christian_Dahlqvist · August 17, 2016, 7:25am

You will need to change this in your index template.

pill663 · August 17, 2016, 10:08am

I'm using logstash to send data to es.
It seems that index template has no impact on existing indices.
If I want to change the existing indices, what can I do?

pill663 · August 17, 2016, 10:10am

And when I tried to update mappings,
curl -XPUT 'http://myesurl/logstash-2016.08.16/_mapping/slowquery' -d '
"properties": {
"Pattern": {
"type": "string",
"norms": {
"enabled": false
},
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 2560
}
}
},
"SQL": {
"type": "string",
"norms": {
"enabled": false
},
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 2560
}
}
}
}
'

It returned error:
{"error":{"root_cause":[{"type":"not_x_content_exception","reason":"not_x_content_exception: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}],"type":"not_x_content_exception","reason":"not_x_content_exception: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"},"status":500}

Christian_Dahlqvist · August 17, 2016, 10:18am

You can not change existing mappings, so you will need to reindex that data.

pill663 · August 17, 2016, 11:18am

the data is send to es in json format by logstash.
If the mapping was not exist,es will create it automatically according to the structure of json string.
So I think , is it possible to configure the value of ignore_above in logstash data,so that the mapping will be correctlly created at the very start.

pill663 · August 17, 2016, 11:20am

And the index is created by day, does index template impact on the slowquery mapping of all indices?

Christian_Dahlqvist · August 17, 2016, 11:42am

Any changes to the index template will only apply for new indices that are created. It will not affect existing ones, so the data in those may need to be reindexed.

Topic		Replies	Views
Too long values Elasticsearch	5	3101	July 5, 2017
Term filter failed on very long fields Elasticsearch	4	489	July 6, 2017
Cannot aggregate long string Elasticsearch	2	1325	October 21, 2017
Terms Aggregation of long value Elasticsearch	8	1720	October 12, 2017
Terms agg not giving all terms for huge text fields Elasticsearch	3	706	April 27, 2018

Is there any length limitation of the key in term aggregation ? I mean results display

Related topics