Query top 5 from multiple indices


#1

The title says it all really,

I need to write a query which will search multiple indices and returns the top 5 results for each index.
What can i use to do this?

Thx


(Abdon Pijpelink) #2

There are two options:

Downside of using field collapsing is that it won't work on the _index metadata fied. So you will need to add an additional field to your documents that indicate what index these docs are in, so you can collapse on that field.

For example, given these docs in three indexes a, b and c:

    POST _bulk
    { "index" : { "_index": "a", "_type": "doc", "_id" : "1"}}
    { "foo" : "bar a", "my_index": "a"}
    { "index" : { "_index": "a", "_type": "doc", "_id" : "2"}}
    { "foo" : "bar a", "my_index": "a"}
    { "index" : { "_index": "b", "_type": "doc", "_id" : "3"}}
    { "foo" : "bar b", "my_index": "b"}
    { "index" : { "_index": "b", "_type": "doc", "_id" : "4"}}
    { "foo" : "bar b", "my_index": "b"}
    { "index" : { "_index": "c", "_type": "doc", "_id" : "5"}}
    { "foo" : "bar c", "my_index": "c"}
    { "index" : { "_index": "c", "_type": "doc", "_id" : "6"}}
    { "foo" : "bar c", "my_index": "c"}

You could run a collapse on the my_index.keyword field:

    GET a,b,c/_search
    {
      "query": {
        "match": {
          "foo": "bar"
        }
      },
      "collapse": {
        "field": "my_index.keyword",
        "inner_hits": {
          "name": "my_top_5",
          "size": 5
        }
      }
    }

Probably easier (because it doesn't require that additional my_index field) is the top hits aggregation. Given the docs above, the following aggregation request gives you what you're looking for:

GET a,b,c/_search
{
  "query": {
    "match": {
      "foo": "bar"
    }
  },
  "size": 0,
  "aggs": {
    "indices": {
      "terms": {
        "field": "_index"
      },
      "aggs": {
        "my_top_hits": {
          "top_hits": {
            "size": 5
          }
        }
      }
    }
  }
}

#3

I figured i would need to use a top hits aggregation and i found something similar to your suggestion,
this approach seems to be working just fine!

The field collaplsing seems interesting too, but as you said it's probably better to use the tophits aggregation because of the _index field.

Thanks for the extensive reply.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.