Elasticsearch aggregation script not maintaining document ID integrity


(Benjamin Smith) #1

I have a aggregation script which concatenates 2 field values. The field is
a tag, and I am returning the tag.name and tag.id.

Everything (almost) works as expected, and in the correct format. The issue
is that the ID does not always match the name. Also, IDs are duplicated
within the result set.

Here is a truncated version of the result aggregations:

{
"buckets": [
{
"key": "type_a",
"doc_count": 445,
"tag": {
"buckets": [
{
"key": "352|Tag A",
"doc_count": 3
},
{
"key": "352|Tag B",
"doc_count": 2
},
{
"key": "223|Tag C",
"doc_count": 3
},
...
]
}
}
]
}

Issues:

  • tag id 352 is duplicated
  • id for Tag B is not 352

Any idea what is going wrong here? Here is the mapping and search request:

Mapping:

"tag" : {
"properties" : {
"id" : {
"type" : "string"
},
"name" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}

The _search request (this is a simplified version):

{
"index": "index_name",
"type": [
"article"
],
"body": {
"query": {
"filtered": {
"filter": [],
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": [
"title"
],
"query": "test"
}
}
]
}
}
}
},
"aggs": {
"tag": {
"terms": {
"field": "type"
},
"aggs": {
"tag": {
"terms": {
"size": 0,
"order": {
"_term": "asc"
},
"script":
"doc['tag.id'].value+'|'+doc['tag.name.raw'].value"
}
}
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f45146c-1def-4666-95d4-a34545b2d834%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(ElasticSearch Users mailing list) #2

Is it possible that you have a single document with tag.id = 352 and
tag.name = Tag B? And at the same time another document with tag.id = 352
and tag.name = Tag A? I'd query the data just to be sure.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d79d80d9-9e5a-4e28-9c9b-3d91e83b7085%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Benjamin Smith) #3

I've looked into this and that is not the case.

For example, by using the following script, I can determine one of the
document IDs that is passing the offending tag id/name combination:

doc['id'].value+'|'+doc['tag.id'].value+'|'+doc['tag.name.raw'].value

Which returns:

45|352|Tag B

When I look at that document, it has the following tag array:

{
"id": 45,
"tags": [
{
"id": "352",
"name": "Tag A"
},
{
"id": "355",
"name": "Tag B"
},
{
"id": "458",
"name": "Tag C"
}
]
}

The document is indexed correctly: Tag A = 352. My aggregation script is
returning "Tag B" with id 322, however.

Any other ideas what could cause this? Is there an issue with my mapping?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ccf90f9-f89b-4ff7-bc87-2be67334778b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4