Hi all,
I am a total newbie with ES but got a very good experience with Solr and Lucene. I am currently playing with ES to see if it has the same issues/limitations as Solr mainly regarding field collapsing and sharding.
In Solr, if I want to group on a certain field (and get good numbers), I need to route documents with the same doc-value for a field to the same shard. (i.e all documents with value X for field F need to be on the same shard).
Is that true for ES as well? I tried following the example from the doc regarding the top_hits aggregator and a sub aggregator like this:
So far my request looks like this. (Basically get a list of employers sorted by the number of places/cities):
GET _search
{
"size": 0,
"query": {
"match": {
"description": "java"
}
},
"aggs": {
"top_employers": {
"terms": {
"field": "employer_id",
"order": {
"empl-place": "desc"
},
"size": 1
},
"aggs": {
"top_employer_hits": {
"top_hits": {
"_source": [
"place_id",
"content_id"
],"size":10
}
},
"empl-place": {
"value_count": {
"field": "place_id"
}
}
}
}
}
}
and the result I am getting is ( I removed to stuff for clarity):
{
...
"aggregations": {
"top_employers": {
...
"buckets": [
{
...
"top_employer_hits": {
"hits": {
"total": 4,
"max_score": 0.9461352,
"hits": [
{
...
"_source": {
"content_id": "768474767",
"place_id": 485285
}
},
{
...
"_source": {
"content_id": "768474767",
"place_id": 485285
}
},
{
...
"_source": {
"content_id": "763490271",
"place_id": 0
}
},
{
....
"_source": {
"content_id": "768473591",
"place_id": 485285
}
}
]
}
}
}
]
}
}
}
As you can see I am receiving multiple hits with the same value for the place_id field. Could that be caused by the data not being distributed correctly?
Thanks