Elasticsearch aggregation sort issue


(Akuzhan) #1

HI there,

elasticsearch is returning meaningless doc counts when we do aggregation sorting under certain circumstances. Issue is not there for 1.x and 2.x versions of elasticsearch.

Basically when I fire query below it returns 6 document for the specific link that I am looking for.

{
"aggs": {
"test": {
"nested": {
"path": "sharedLinks"
},
"aggs": {
"aggssharedlinks": {
"terms": {
"field": "sharedLinks.sharedLink",
"size": 1,
"order": {
"aggsfollowers>x": "desc"
}
},
"aggs": {
"aggsfollowers": {
"reverse_nested": {

          },
          "aggs": {
            "x": {
              "sum": {
                "field": "authors.followers"
              }
            }
          }
        }
      }
    }
  }
}

}
}

But when I add a specific term query for that url, it returns 33 document.

{
"aggs": {
"test": {
"nested": {
"path": "sharedLinks"
},
"aggs": {
"aggssharedlinks": {
"terms": {
"field": "sharedLinks.sharedLink",
"size": 1,
"order": {
"aggsfollowers>x": "desc"
}
},
"aggs": {
"aggsfollowers": {
"reverse_nested": {

          },
          "aggs": {
            "x": {
              "sum": {
                "field": "authors.followers"
              }
            }
          }
        }
      }
    }
  }
}

}
,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"terms": {
"sharedLinks.sharedLink": [
"http://www.abc.co.uk/1234542151551"
]
}
}
]
}
}
}
}
}

Here is my index structure;

{
"IndexName": {
"mappings": {
"article": {
"properties": {
"articleId": {
"type": "long",
"doc_values": true
},
"articleSource": {
"properties": {
"aggregationSourceName": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
}
}
},
"authors": {
"properties": {
"followers": {
"type": "integer",
"doc_values": true
}
}
},
"sharedLinks": {
"type": "nested",
"include_in_parent": true,
"properties": {
"sharedLink": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
}
}
}
}
}
}
}
}

Do you have any idea what could be the issue ?

I can share data and queries via email if it helps.

Cheers!


(Akuzhan) #2

Looks like this is expected result. I was having 10 shards and when I say give me top 1, it was not returning enough documents.

I guess I am going to use custom routed shards and try to distinguish shards based on client Id on my case.

I just created an index with 1 single shard and fired same query again. All documents are in there and aggregation doc count is correct, it is all related to shard mechanism of elasticsearch.


(system) #3