Incorrect value_count on a string value

Hi,

I have 6 simple documents:

{"id":1,"sid":"adf6eb4f-35a0-4099-95d4-00ce3d984cf2","asid":"577ce6b0-b8b7-49af-8528-4e4797027a12","_tid":"21"}
{"id":1,"sid":"adf6eb4f-35a0-4099-95d4-00ce3d984cf2","asid":"577ce6b0-b8b7-49af-8528-4e4797027a12","_tid":"21"}
{"id":2,"sid":"abcdef","asid":"fedcba","_tid":"21"}
{"id":3,"sid":"ghijk","asid":"kjihg","_tid":"21"}
{"id":4,"sid":"lmnop","asid":"ponml","_tid":"21"}
{"id":5,"sid":"prstuv","asid":"vutsrp","_tid":"21"}

I am running a value_count aggregation on this dataset using:

curl -XGET 'http://poc02.transerainc.com:9200/test/csrs/_search' -d '{"size":0,"aggregations":{"SUMMARY_0_sid":{"value_count":{"field":"sid"}}}}'

I am expecting the results to be 6 but I am getting 14!
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":6,"max_score":0.0,"hits":[]},"aggregations":{"SUMMARY_0_sid":{"value":14}}}

this seems to be an obvious error but what am I missing here?

Thanks,
Ramesh.

Hi,
If you didn't define any mapping, the "sid" field will be mapped to an analyzed string field by default (see dynamic mapping for details). The default analyzer used is the Standard Analyzer which will split words containing the - character. So for your first two documents the "sid" fields will be indexed as five tokens, giving you 14 in total.

2 Likes