How to get only n documents for each value

for example we've some documents

{ "id": 1, "name": "name1", "state": "NY" }
{ "id": 2, "name": "name2", "state": "NY" }
{ "id": 3, "name": "name3", "state": "NY" }
{ "id": 4, "name": "name4", "state": "NY" }
{ "id": 5, "name": "name5", "state": "NY" }
....
n-records
{ "id": n + 1, "name": "namen+1", "state": "NJ" }
{ "id": n + 1, "name": "namen+2", "state": "NJ" }
{ "id": n + 1, "name": "namen+3", "state": "NJ" }
{ "id": n + 1, "name": "namen+4", "state": "NJ" }
{ "id": n + 1, "name": "namen+5", "state": "NJ" }
....
m-records

and so on

and want to get only 5 documents for each states. (bonus: randomize n-records and get 5 docs)

how to query such case?

cheers!

Check out the top hits aggregation.

PUT top_hits_example
POST top_hits_example/doc/1
{
  "name": "name1",
  "state": "NY"
}
POST top_hits_example/doc/2
{
  "name": "name2",
  "state": "NY"
}
POST top_hits_example/doc/3
{
  "name": "name3",
  "state": "NY"
}
POST top_hits_example/doc/4
{
  "name": "name4",
  "state": "NJ"
}
POST top_hits_example/doc/5
{
  "name": "name5",
  "state": "NJ"
}
POST top_hits_example/doc/6
{
  "name": "name6",
  "state": "NJ"
}
GET top_hits_example/_search?size=0
{
  "query": {
    "function_score": {
      "random_score": {}
    }
  },
  "aggs": {
    "names": {
      "terms": {
        "field": "state"
      },
      "aggs": {
        "top_tag_hits": {
          "top_hits": {
            "size": 2
          }
        }
      }
    }
  }
}

thanks a lot

have some question too ))

need to get only documents which distance of between each documents location higher than Xkm

e.g. first we get first document (id#1) and if distance between second document (id#2) and first document (id#1) location higher than Xkm, we include, if not check with third document (id#3)

{ "id": 1, "name": "name1", "state": "NY", location: "lat, log" }
{ "id": 2, "name": "name2", "state": "NY", location: "lat, log" }
{ "id": 3, "name": "name3", "state": "NY", location: "lat, log" }
{ "id": 4, "name": "name4", "state": "NY", location: "lat, log" }
{ "id": 5, "name": "name5", "state": "NY", location: "lat, log" }

how to achieve such case?

cheers!

Once you have your lat-lon from document #1, you can write a Geo Distance Range Query to find all documents at least Xkm away from document #1's location.

thanks for your reply,
but the query is not suitable, you maybe misunderstood

need select those documents which their distance greater than a number from each others, not from only first document or specific geo coord, it maybe #3 <=> #2, #4 <=> #5, #3<=> #5, ....

cheers!

Sorry, I'm not familiar with any single ES query that would solve a proximity problem like that. The brute force approach of iterating through your set of points and running that Geo Distance Range Query is probably the most straightforward, but you may be able to implement something faster using computational geometry, like in the closest pair of points problem.

thanks a lot