Strange terms aggregation result: the single document is placed in several buckets if field contains some special symbols


(Алексей Кононыхин) #1

Hello all

I'm trying to make the terms aggregation for simple string field. In the most cases all is ok but for string containing some symbols like / or @ the document with such string is placed into several buckets with key containing only part of original string.

Here is the exmple.
Index structure:
curl -XPUT "http://localhost:9200/test/" -d'
{
"mappings": {
"url": {
"properties": {
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"address": {
"type": "string"
}
}
}
}
}'

Bulk insert:
curl -XPOST "http://localhost:9200/test/_bulk" -d'
{"index":{"_index":"test","_type":"url", "_id": "1"}}
{"date":"2016-10-01", "address":"79031112233"}
{"index":{"_index":"test","_type":"url", "_id": "2"}}
{"date":"2016-10-02", "address":"part1/part2@part3"}
'

Aggregation request:
curl -XPOST "http://localhost:9200/test/url/_search?pretty" -d'
{
"size": 0,
"aggregations": {
"the_name": {
"terms": {
"field": "address"
}
}
}
}'

I expected to receive two buckets in the response with keys 79031112233 and part1/part2@part3 but received:
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"the_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "79031112233",
"doc_count" : 1
}, {
"key" : "part1",
"doc_count" : 1
}, {
"key" : "part2",
"doc_count" : 1
}, {
"key" : "part3",
"doc_count" : 1
} ]
}
}
}

Could anyone explain the result and point me how to get 'the correct' one with two buckets? I'm a newbit to elasticsearch so probably I missed something obvious in the docs.

Thanks in advance
Alexey


(Jun Ohtani) #2

Your address field is "analyzed" field.
See https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html#string
and https://www.elastic.co/guide/en/elasticsearch/guide/2.x/aggregations-and-analysis.html


(Алексей Кононыхин) #3

Thanks!


(system) #4