Really bad post_filter performance


(daveey) #1

I just upgraded to ES 1.0.1 from ES 0.9.2 and am seeing huge performance
problems.

I traced them to what I think is the post_filter.

Here is the query that we used to run against ES 0.9.2

{
filter": {
"and": [
{
"terms": {
"index_ids": [
2134616789944
]
}
},
{
"or": [
{
"term": {
"trashed_at": 0
}
},
{
"not": {
"exists": {
"field": "trashed_at"
}
}
}
]
}
]
}
}

This used to take the 0.9 cluster about 150ms to execute

The same query takes about 2.5s for the 1.0 cluster.

I rewrote it to conform to my understanding of the changes in 1.0, using a
filtered query, however, that didn't help.

I then tried to figure out which parts were slow. I now have the following
query
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"index_ids": [
2134616789944
]}
}
}
},
"post_filter": {
"or": [
{"term": {"trashed_at": 0}},
{"not": {"exists": {"field": "trashed_at"}}}
]}
}

It takes 2.5 s and returns 34 hits. However, removing the "post_filter"
clause:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"index_ids": [
2134616789944
]
}
}
}
}
}

Makes it take 50ms and return 34 results.

My conclusion is that it's taking 2.5 seconds to filter 34 results, and
that's confusing.

The cluster uses 3 machines, 50 shards, 2 replicas per shard. This means
that each machine has the entire copy of the index. We use the ?routing=
parameter, and are always hitting a single shard for the query.

Help?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd5b6bb1-7fce-4688-84cb-4ec6d0db8f93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2