Question on query combination


(Scott Decker) #1

Hey All, a fun query combination for you all.

here is what we are trying to do:

pseudo query:

(
content:"<some string that the query parser on es side will parse" -
must match at least 2 times
size:[* TO 300]
)
OR
(
content:"<some string that the query parser on es side will parse" -
must match at least 3 times
size:[* TO 600]
)

something like
(
+content:'johnny depp" <must be in the content at least 2 times>
+size:[* TO 300]
)
OR
(
+content:'johnny depp" <must be in the content at least 3 times>
+size:[* TO 600]
)

In what way could we do the above query? Is there a way to do it with
dismax? Is there some way to do it with terms, and if so, how do we get the
right analyzer to run against the text in our client versus what is on
elasticsearch?
We are basically trying to say we think it is a match for Johnny Depp if he
is in the text at least twice and the content size is small.
The problem with just using scoring, is that we also match on some other
fields in a constant way. if is present in a field, then you are
also a match.
So, how to combine that constant query along with something that is more of
a scored query?


(Shay Banon) #2

You men how to combine the content query with the size one? It can be a
filtered query, with the query on the content, and a filter of range on the
size.

On Wed, May 16, 2012 at 10:03 PM, Scott Decker scott@publishthis.comwrote:

Hey All, a fun query combination for you all.

here is what we are trying to do:

pseudo query:

(
content:"<some string that the query parser on es side will parse" -
must match at least 2 times
size:[* TO 300]
)
OR
(
content:"<some string that the query parser on es side will parse" -
must match at least 3 times
size:[* TO 600]
)

something like
(
+content:'johnny depp" <must be in the content at least 2 times>
+size:[* TO 300]
)
OR
(
+content:'johnny depp" <must be in the content at least 3 times>
+size:[* TO 600]
)

In what way could we do the above query? Is there a way to do it with
dismax? Is there some way to do it with terms, and if so, how do we get the
right analyzer to run against the text in our client versus what is on
elasticsearch?
We are basically trying to say we think it is a match for Johnny Depp if
he is in the text at least twice and the content size is small.
The problem with just using scoring, is that we also match on some other
fields in a constant way. if is present in a field, then you are
also a match.
So, how to combine that constant query along with something that is more
of a scored query?


(Scott Decker) #3

well, yes, sort of.

Basically, the problem is in the minimum must match problem.

the full query is something like
+(
tag:topic_id.89998.0
)
+(
content:"johnny depp" <this must be in the content 3 times or more if the
content is greater than say 500 characters>
)
+(
content:"pirates" <this must be in the content 2 times or more if the
content is greater than say 300 characters>
)

not really sure how to do this type of query. might be span queries, or
maybe filter. Not sure if dismax query could do this?

On Thursday, May 17, 2012 3:21:43 PM UTC-7, kimchy wrote:

You men how to combine the content query with the size one? It can be a
filtered query, with the query on the content, and a filter of range on the
size.

Hey All, a fun query combination for you all.

here is what we are trying to do:

pseudo query:

(
content:"<some string that the query parser on es side will parse" -
must match at least 2 times
size:[* TO 300]
)
OR
(
content:"<some string that the query parser on es side will parse" -
must match at least 3 times
size:[* TO 600]
)

something like
(
+content:'johnny depp" <must be in the content at least 2 times>
+size:[* TO 300]
)
OR
(
+content:'johnny depp" <must be in the content at least 3 times>
+size:[* TO 600]
)

In what way could we do the above query? Is there a way to do it with
dismax? Is there some way to do it with terms, and if so, how do we get the
right analyzer to run against the text in our client versus what is on
elasticsearch?
We are basically trying to say we think it is a match for Johnny Depp if
he is in the text at least twice and the content size is small.
The problem with just using scoring, is that we also match on some other
fields in a constant way. if is present in a field, then you are
also a match.
So, how to combine that constant query along with something that is more
of a scored query?


(system) #4