Too long values

Hi,
I'm using elasticsearch 1.6.
while trying to make an aggregation query I've noticed i have a pretty large glitch in my results, after some investigation, I've seen that when the of the aggregation is too long (not sure how long around 200-400 characters), the query just ignores it in the aggregation and still, counts it in the doc count.

Is there any fix or workaround for this issue?

Hi,
can you post and example of your query along with what response you get and what you would expect as a response instead?

Hi,
I've recreated something similar on my local machine
ive started a new index;
posted the following data to http://localhost:9200/test/testtype/2 :

{"gtype":"test1", "content":"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttest" }

then tried this query:
GET /test/_search
{
"size": 0,
"aggs": {
"agg1": {
"terms": {
"field": "content",
"size": 3000,
"order": {
"_count": "desc"
}
}
}
}
}

and got this result:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"agg1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "sttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttest",
"doc_count": 1
},
{
"key": "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttes",
"doc_count": 1
},
{
"key": "ttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttestte",
"doc_count": 1
}
]
}
}
}

which doesn't make sence.

Hi Boris,

thats a funky bug! I can reproduce this with ES 1.6, 1.7 and also 2.3.1. I might be missing something obvious but this looks so strange that I would ask you to open a github issue for this with the above reproduction.

A well no bug, I found the problem. The Standard Analyzer that is used when you don't specify an explicit string mapping is the culprit: if you look at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html you can see that the parameter max_token_length will split your term at 255 characters by default. So for your long string you get several tokens that all go to different buckets.