Match on arrays

Hello, I have a multi-select field populated as an array. Let's
consider I have a field
cities = ["NY","SF","CH",AL","LA"]

I want to find entries which contains "NY" and "AL", so I perform a
textQuery with field cities and query as "NY AND AL". However, this
would return any entry which contains NY and AL, for eg ["NY","SF"]
and ["CH","AL"] would both be returned.

However, I would also like to only provide only those entries which
only contain both NY and AL. Any entry which contains only either of
this fields should not be returned. I solve this by constructing a
query string which contains ["NY","AL"] and doing a phrase_prefix
query. This returns to me only entries which contain for eg
["NY","AL"] or even an entry which contains ["NY","AL","LA"] in cities

However, it cannot return the entry ["NY","SF","AL"] or
["NY","SF,"CH","AL","LA"]..because SF and CH are entries between NY
and AL. Do I have to construct then a permutation of all possible
entries between NY and AL..eg. phrase_prefix of ["NY","SF","AL"] and
["NY","SF","CH","AL"] to return what I need. Is what I'm trying to do
correct Or is there an easier and more correct way of doing this?

Also, at a later point in time I would like to provide any 2/3 city
match though its only something I might think about adding at a later
point in time? How could I also achieve this? Thanks!

The text query doesn't parse the text as a boolean expression. It basically
splits the provided text into tokens and combines them into a
boolean expression using a boolean operator. By default, it is using OR
operator and combines your query into something like this:

cities:NY OR cities:AND OR cities:AL

or if the field cities is analyzed using standard analyzer (which is
default) then AND is treated as a stop word and your terms are converted to
lower case, and your query becomes this:

cities:ny OR cities:al

While it's possible to switch text query from using OR operator to using
AND operator, it will provide only temporary solution since the moment you
add Indianapolis to your index, some of your queries will stop working
since "in" is a stop word and it will be ignored by standard analyzer. So,
I would recommend to make the cities field not analyzedhttp://www.elasticsearch.org/guide/reference/mapping/core-types.html and
use one term queryhttp://www.elasticsearch.org/guide/reference/query-dsl/term-query.html per
city combined into a boolean queryhttp://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html using
"must" clause or use the terms queryhttp://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
with minimum_match:2.

Igor

On Friday, July 6, 2012 1:36:08 PM UTC-4, coys wrote:

Hello, I have a multi-select field populated as an array. Let's
consider I have a field
cities = ["NY","SF","CH",AL","LA"]

I want to find entries which contains "NY" and "AL", so I perform a
textQuery with field cities and query as "NY AND AL". However, this
would return any entry which contains NY and AL, for eg ["NY","SF"]
and ["CH","AL"] would both be returned.

However, I would also like to only provide only those entries which
only contain both NY and AL. Any entry which contains only either of
this fields should not be returned. I solve this by constructing a
query string which contains ["NY","AL"] and doing a phrase_prefix
query. This returns to me only entries which contain for eg
["NY","AL"] or even an entry which contains ["NY","AL","LA"] in cities

However, it cannot return the entry ["NY","SF","AL"] or
["NY","SF,"CH","AL","LA"]..because SF and CH are entries between NY
and AL. Do I have to construct then a permutation of all possible
entries between NY and AL..eg. phrase_prefix of ["NY","SF","AL"] and
["NY","SF","CH","AL"] to return what I need. Is what I'm trying to do
correct Or is there an easier and more correct way of doing this?

Also, at a later point in time I would like to provide any 2/3 city
match though its only something I might think about adding at a later
point in time? How could I also achieve this? Thanks!

Thanks a lot Igor, your solution was perfect.