Poor performance of scripted bitwise operations in facet filters


(Shane Witbeck) #1

I have a requirement where several facets need to do bitwise operations
similar to the following:

"opened_today_dev":{
"query":{
"query_string":{
"query":"categoryID:1 AND
dateCreated:[2012-04-12 TO *]"
}
},
"facet_filter" : {
"script":{

"script":"(doc['owner.userAccessTypeBit'].intValue & 8) > 0"
}
}
}

I've found that the face_filter adds a huge performance hit to the overall
query which has several other facets. This is unacceptable since I'm
running the query every five seconds or so.

  1. Is there a more efficient way of doing bitwise operations?
  2. I'm also curious if splitting up the bitmask
    (owner.userAccessTypeBit) above into individual bits and using a terms
    filter would be better?

Thanks,
Shane


(Dan Everton) #2

Script facets are going to be slow no matter what you do. You could make it
more efficient, but it's still going to execute that script for each
matching document.

Splitting the bitmask and using a term filter will be much, much faster as
you're using what ES is good it, search, rather than script execution.

Cheers,
Dan


(Shay Banon) #3

Dan point is good. I would add that you need to make sure you are not
counting the cost of loading the field to memory when you do the perf
tests. Also, it would be interesting to know how many total_hits that
filter executed on? Effectively, this is a CPU intensive execution, and if
its slow, one can always write a native Java implementation of that script,
though a terms/term filter will probably be better since it can be
effectively cached.

On Fri, Apr 13, 2012 at 2:27 AM, Dan Everton dan@iocaine.org wrote:

Script facets are going to be slow no matter what you do. You could make
it more efficient, but it's still going to execute that script for each
matching document.

Splitting the bitmask and using a term filter will be much, much faster as
you're using what ES is good it, search, rather than script execution.

Cheers,
Dan


(Shane Witbeck) #4

Thanks again. I decided to go with splitting the bitmask and using a terms
filter. Performance is much better although doing a bitwise operation in a
querystring query would have been ideal. Although, I believe this is more a
Lucene limitation.

On Friday, April 13, 2012 8:41:29 AM UTC-4, kimchy wrote:

Dan point is good. I would add that you need to make sure you are not
counting the cost of loading the field to memory when you do the perf
tests. Also, it would be interesting to know how many total_hits that
filter executed on? Effectively, this is a CPU intensive execution, and if
its slow, one can always write a native Java implementation of that script,
though a terms/term filter will probably be better since it can be
effectively cached.

On Fri, Apr 13, 2012 at 2:27 AM, Dan Everton dan@iocaine.org wrote:

Script facets are going to be slow no matter what you do. You could make
it more efficient, but it's still going to execute that script for each
matching document.

Splitting the bitmask and using a term filter will be much, much faster
as you're using what ES is good it, search, rather than script execution.

Cheers,
Dan


(Shay Banon) #5

Great. Just a note, its not really a Lucene limitation, scripting is
implemented on the elasticsearch level, and it will be CPU bound as it
needs to compute the script for each matching document.

On Fri, Apr 13, 2012 at 4:02 PM, Shane Witbeck shane@digitalsanctum.comwrote:

Thanks again. I decided to go with splitting the bitmask and using a terms
filter. Performance is much better although doing a bitwise operation in a
querystring query would have been ideal. Although, I believe this is more a
Lucene limitation.

On Friday, April 13, 2012 8:41:29 AM UTC-4, kimchy wrote:

Dan point is good. I would add that you need to make sure you are not
counting the cost of loading the field to memory when you do the perf
tests. Also, it would be interesting to know how many total_hits that
filter executed on? Effectively, this is a CPU intensive execution, and if
its slow, one can always write a native Java implementation of that script,
though a terms/term filter will probably be better since it can be
effectively cached.

On Fri, Apr 13, 2012 at 2:27 AM, Dan Everton dan@iocaine.org wrote:

Script facets are going to be slow no matter what you do. You could make
it more efficient, but it's still going to execute that script for each
matching document.

Splitting the bitmask and using a term filter will be much, much faster
as you're using what ES is good it, search, rather than script execution.

Cheers,
Dan


(Shane Witbeck) #6

I wasn't clear on what I meant about the Lucene limitation. What I meant
was support for bitwise operations.

Is this something I should add as a feature request for ES or would it be
better suited for a Lucene feature request?

On Saturday, April 14, 2012 1:27:01 PM UTC-4, kimchy wrote:

Great. Just a note, its not really a Lucene limitation, scripting is
implemented on the elasticsearch level, and it will be CPU bound as it
needs to compute the script for each matching document.

On Fri, Apr 13, 2012 at 4:02 PM, Shane Witbeck shane@digitalsanctum.comwrote:

Thanks again. I decided to go with splitting the bitmask and using a
terms filter. Performance is much better although doing a bitwise operation
in a querystring query would have been ideal. Although, I believe this is
more a Lucene limitation.

On Friday, April 13, 2012 8:41:29 AM UTC-4, kimchy wrote:

Dan point is good. I would add that you need to make sure you are not
counting the cost of loading the field to memory when you do the perf
tests. Also, it would be interesting to know how many total_hits that
filter executed on? Effectively, this is a CPU intensive execution, and if
its slow, one can always write a native Java implementation of that script,
though a terms/term filter will probably be better since it can be
effectively cached.

On Fri, Apr 13, 2012 at 2:27 AM, Dan Everton dan@iocaine.org wrote:

Script facets are going to be slow no matter what you do. You could
make it more efficient, but it's still going to execute that script for
each matching document.

Splitting the bitmask and using a term filter will be much, much faster
as you're using what ES is good it, search, rather than script execution.

Cheers,
Dan


(Shay Banon) #7

bitwise can be implemented as a built in filter, but effectively it will do
something similar to what you did in the script, just in Java (which you
can do yourself as well by writing a native Java script module).

On Mon, Apr 16, 2012 at 6:09 PM, Shane Witbeck shane@digitalsanctum.comwrote:

I wasn't clear on what I meant about the Lucene limitation. What I meant
was support for bitwise operations.

Is this something I should add as a feature request for ES or would it be
better suited for a Lucene feature request?

On Saturday, April 14, 2012 1:27:01 PM UTC-4, kimchy wrote:

Great. Just a note, its not really a Lucene limitation, scripting is
implemented on the elasticsearch level, and it will be CPU bound as it
needs to compute the script for each matching document.

On Fri, Apr 13, 2012 at 4:02 PM, Shane Witbeck shane@digitalsanctum.comwrote:

Thanks again. I decided to go with splitting the bitmask and using a
terms filter. Performance is much better although doing a bitwise operation
in a querystring query would have been ideal. Although, I believe this is
more a Lucene limitation.

On Friday, April 13, 2012 8:41:29 AM UTC-4, kimchy wrote:

Dan point is good. I would add that you need to make sure you are not
counting the cost of loading the field to memory when you do the perf
tests. Also, it would be interesting to know how many total_hits that
filter executed on? Effectively, this is a CPU intensive execution, and if
its slow, one can always write a native Java implementation of that script,
though a terms/term filter will probably be better since it can be
effectively cached.

On Fri, Apr 13, 2012 at 2:27 AM, Dan Everton dan@iocaine.org wrote:

Script facets are going to be slow no matter what you do. You could
make it more efficient, but it's still going to execute that script for
each matching document.

Splitting the bitmask and using a term filter will be much, much
faster as you're using what ES is good it, search, rather than script
execution.

Cheers,
Dan


(system) #8