回复: really bad post_filter performance


(spancer ray) #1

Too many shards may result in querying performance.

daveey daveey@gmail.com编写:

I just upgraded to ES 1.0.1 from ES 0.9.2 and am seeing huge performance problems.

I traced them to what I think is the post_filter.

Here is the query that we used to run against ES 0.9.2

{
filter": {

"and": [

 {

   "terms": {

     "index_ids": [

       2134616789944

     ]

   }

 },

 {

   "or": [

     {

       "term": {

         "trashed_at": 0

       }

     },

     {

       "not": {

         "exists": {

           "field": "trashed_at"

         }

       }

     }

   ]

 }

]

}

}

This used to take the 0.9 cluster about 150ms to execute

The same query takes about 2.5s for the 1.0 cluster.

I rewrote it to conform to my understanding of the changes in 1.0, using a filtered query, however, that didn't help.

I then tried to figure out which parts were slow. I now have the following query

{
"query": {

"filtered": {

 "query": {

   "match_all": {}

 },

 "filter": {

       "terms": {

         "index_ids": [

           2134616789944

         ]}

 }

}

},

"post_filter": {

"or": [

 {"term": {"trashed_at": 0}},

 {"not": {"exists": {"field": "trashed_at"}}}

 ]}

}

It takes 2.5 s and returns 34 hits. However, removing the "post_filter" clause:

{
"query": {

"filtered": {

 "query": {

   "match_all": {}

 },

 "filter": {

   "terms": {

     "index_ids": [

       2134616789944

     ]

   }

 }

}

}

}

Makes it take 50ms and return 34 results.

My conclusion is that it's taking 2.5 seconds to filter 34 results, and that's confusing.

The cluster uses 3 machines, 50 shards, 2 replicas per shard. This means that each machine has the entire copy of the index. We use the ?routing= parameter, and are always hitting a single shard for the query.

Help?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd5b6bb1-7fce-4688-84cb-4ec6d0db8f93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73fhfrw850xcj5nhwiuwk1o8.1396309621505%40email.android.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

I'd probably just collapse everything into a filtered query. Something like
this:

{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"index_ids": ["2134616789944"]
}
}
],
"should": [
{
"terms": {
"trashed_at": "0"
}
},
{
"not": {
"exists": {
"field": "trashed_at"
}
}
}
]
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0845cbee-26bb-43be-9318-7a36a08e6504%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3