Problems with Post filter in 0.90.2; script filter and related question


(Pramod N) #1

Hello,

I've been working on a problem where modelling the search filters as ES
filters has almost become impossible.
e.g: The incoming filter is generated by an application enforcing certain
constraints on search. The filter itself is hierarchical and dynamic in
nature depending on certain parameters. Think of it as a directory
structure(ex. is completely different and over simplification, but comes
very close in terms of representation)

aFilter = {
"root": {
"usr": {
"elasticsearch": "present",
"es-index-notification": "absent"
},
"system": {
"elasticsearch": "present",
"es-index-notification": "present",
"attachment": "present"
}
}
}

The filter predicate essentially involves traversing this structure breadth
first (not a big and growing structure; space and time complexity would be
reasonably constant) and match it with information in the indexed
document(one field in particular)

Was thinking of using the ScriptFilter for the traversal. The other trouble
is performance if this filter is applied along with search(large document
set).

This is where the post_filter made sense, in order to reduce the
performance impact.

My question is two fold:

  1. the documentation claims post_filter is available as part of 0.90.x but
    its only available for versions after 0.90.8(was able to verify with that);
    am i missing something?
  1. is there any place where i can find info on the query order pre_filter,
    query, post_filter, pagination and sort etc?

Finally the problem statement in general - is there a non script way of
achieving this in elasticsearch?

Apologies for the long email, any insight on this is appreciated.

Thank you,

Pramod N
@machinelearner https://twitter.com/machinelearner

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGEoYnu7SiMPJUKLC0zABy10kpuUmEZUCZV3SCgfAwUUkUOZvA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

The post filter is not a addition in 0.90.8, just a renaming of a field
that was ambiguous:

Pre filters are simply filtered queries. In most cases, you want to use the
pre filters. Queries are expensive in Lucene since you have to score each
document. Filters are the other hand are cheap since they are bitsets. If
you filter out the documents beforehand, there is less to score. You only
want to use post filters for expensive filters (geo, filters that
cannot/shouldn't be cached) or when you want to calculate
facets/aggregations on unfiltered documents. The post filter page has some
more details:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-request-post-filter.html

As far as your use case goes, some more details would help. Not just a
sample document, but an example query and a result. I have never used the
script filters, but in general, hierarchal data is difficult to manipulate.

Cheers,

Ivan

On Tue, Jul 22, 2014 at 2:35 AM, Pramod N npramod05@gmail.com wrote:

Hello,

I've been working on a problem where modelling the search filters as ES
filters has almost become impossible.
e.g: The incoming filter is generated by an application enforcing certain
constraints on search. The filter itself is hierarchical and dynamic in
nature depending on certain parameters. Think of it as a directory
structure(ex. is completely different and over simplification, but comes
very close in terms of representation)

aFilter = {
"root": {
"usr": {
"elasticsearch": "present",
"es-index-notification": "absent"
},
"system": {
"elasticsearch": "present",
"es-index-notification": "present",
"attachment": "present"
}
}
}

The filter predicate essentially involves traversing this structure
breadth first (not a big and growing structure; space and time complexity
would be reasonably constant) and match it with information in the indexed
document(one field in particular)

Was thinking of using the ScriptFilter for the traversal. The other
trouble is performance if this filter is applied along with search(large
document set).

This is where the post_filter made sense, in order to reduce the
performance impact.

My question is two fold:

  1. the documentation claims post_filter is available as part of 0.90.x but
    its only available for versions after 0.90.8(was able to verify with that);
    am i missing something?
  1. is there any place where i can find info on the query order pre_filter,
    query, post_filter, pagination and sort etc?

Finally the problem statement in general - is there a non script way of
achieving this in elasticsearch?

Apologies for the long email, any insight on this is appreciated.

Thank you,

Pramod N
@machinelearner https://twitter.com/machinelearner

--

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGEoYnu7SiMPJUKLC0zABy10kpuUmEZUCZV3SCgfAwUUkUOZvA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGEoYnu7SiMPJUKLC0zABy10kpuUmEZUCZV3SCgfAwUUkUOZvA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAMWTpQDz9sSqNp%2BNb0-oJtdY9Dmxr%2Bj2MxcyQXVT%3D24g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3