ElasticSearch array query (multiselect)


(coys) #1

Hello, I have a multi-select listbox e.g. Region - ["A","B","D"] (3
selections out of 6) which is populated from an array entry in
elasticsearch ["A","B",C","D","E","F"].
If I do search for e.g. only on A elasticsearch returns correctly.
If I want to search for the all entries which contain Region "A" and
"B", I can combine it into a text query which returns the string "A,B"
and it will give me all such entries..e.g. even return a field which
contains ["A","B","D","F"].

I now want to return for e.g. all entries which contain "A", "B" and
"D". In this case my text-query falls short as it can return an entry
which contains ["A","B","D","F"] which is nice but cannot return
entries which contain region for e.g. ["A","B","C","D","F"] (C in
middle of text string "A","B" and "D"). and my approach looks terribly
wrong.

Can you please tell me how I should formulate this query? I have no
idea how to proceed and I'm stuck.Thanks!


(Clinton Gormley) #2

On Mon, 2012-06-25 at 09:56 -0700, coys wrote:

Hello, I have a multi-select listbox e.g. Region - ["A","B","D"] (3
selections out of 6) which is populated from an array entry in
elasticsearch ["A","B",C","D","E","F"].
If I do search for e.g. only on A elasticsearch returns correctly.
If I want to search for the all entries which contain Region "A" and
"B", I can combine it into a text query which returns the string "A,B"
and it will give me all such entries..e.g. even return a field which
contains ["A","B","D","F"].

I now want to return for e.g. all entries which contain "A", "B" and
"D". In this case my text-query falls short as it can return an entry
which contains ["A","B","D","F"] which is nice but cannot return
entries which contain region for e.g. ["A","B","C","D","F"] (C in
middle of text string "A","B" and "D"). and my approach looks terribly
wrong.

Can you please tell me how I should formulate this query? I have no
idea how to proceed and I'm stuck.Thanks!

Bunging these all into a text query is probably not the best way to do
this.

I'm assuming that your A..F values are 'enums', ie you want to match
exactly what is in that field. So 'Foo' matches 'Foo' but not 'foo' or
'FOO' etc.

In other words, you don't want your values to be analyzed - they're not
full text, they are just values.

So create your index like this (note the 'tags' field is not_analyzed):

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"text" : {
"type" : "string"
},
"tags" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}
}
}
'

Then we can add some data (I've added docs with multiple tags, but they
could equally have a single tag each):

curl -XPUT 'http://127.0.0.1:9200/test/test/1?pretty=1' -d '
{
"text" : "the quick brown fox",
"tags" : [
"A",
"C"
]
}
'
curl -XPUT 'http://127.0.0.1:9200/test/test/2?pretty=1' -d '
{
"text" : "the quick brown fox",
"tags" : [
"A",
"C",
"D"
]
}
'
curl -XPUT 'http://127.0.0.1:9200/test/test/3?pretty=1' -d '
{
"text" : "the quick brown fox",
"tags" : [
"D",
"E"
]
}
'

Then we can search for docs, using the A..F values as a filter. We use
the 'term' filter because we want an exact match:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"tags" : [
"A",
"D"
]
}
},
"query" : {
"text" : {
"text" : "quick"
}
}
}
}
}
'

[Mon Jun 25 19:23:19 2012] Response:

{

"hits" : {

"hits" : [

{

"_source" : {

"text" : "the quick brown fox",

"tags" : [

"A",

"C"

]

},

"_score" : 0.15342641,

"_index" : "test",

"_id" : "1",

"_type" : "test"

},

{

"_source" : {

"text" : "the quick brown fox",

"tags" : [

"A",

"C",

"D"

]

},

"_score" : 0.15342641,

"_index" : "test",

"_id" : "2",

"_type" : "test"

},

{

"_source" : {

"text" : "the quick brown fox",

"tags" : [

"D",

"E"

]

},

"_score" : 0.15342641,

"_index" : "test",

"_id" : "3",

"_type" : "test"

}

],

"max_score" : 0.15342641,

"total" : 3

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 5

}

clint


(coys) #3

Hi Clint, thanks for the quick reply, that's very close to what I'd
like to return. Ideally on a search for ["A","D"] i'd only like to
return entries which contain both A and D.For eg. Let's say I have 4
entries A,B,C and D containing a field city populated from an enum
["NY",LN","ZU","FF"]
where A = ["NY","LN",ZU"]
B = ["NY","ZU","FF"]
C = ["NY", "LN"]
D = ["LN", "ZU"]
on a search for ["NY","ZU"] I'd only like to return A and B. Thanks!

Also, sorry to diverge from the topic but I noticed that if I have a
query like this (I'm using a C# helper which provides the
elasticSearch API)
mustQuery.Text(text => text.Field("city")
.Query("A AND D"));
This would also return any entry which has either A or D or both A and
D present. However, is this a correct approach or would the term
filter be the right way to solve this? Thanks!


(Clinton Gormley) #4

On Tue, 2012-06-26 at 08:06 -0700, coys wrote:

Hi Clint, thanks for the quick reply, that's very close to what I'd
like to return. Ideally on a search for ["A","D"] i'd only like to
return entries which contain both A and D.For eg. Let's say I have 4
entries A,B,C and D containing a field city populated from an enum
["NY",LN","ZU","FF"]
where A = ["NY","LN",ZU"]
B = ["NY","ZU","FF"]
C = ["NY", "LN"]
D = ["LN", "ZU"]
on a search for ["NY","ZU"] I'd only like to return A and B. Thanks!

Have a look at the docs for the terms filter:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter.html

or alternatively, you could use the and filter:
http://www.elasticsearch.org/guide/reference/query-dsl/and-filter.html

they query dsl is your friend :slight_smile:

http://www.elasticsearch.org/guide/reference/query-dsl

clint


(coys) #5

Thanks! will go through the dsl:)

On Jun 26, 9:18 am, Clinton Gormley cl...@traveljury.com wrote:

On Tue, 2012-06-26 at 08:06 -0700, coys wrote:

Hi Clint, thanks for the quick reply, that's very close to what I'd
like to return. Ideally on a search for ["A","D"] i'd only like to
return entries which contain both A and D.For eg. Let's say I have 4
entries A,B,C and D containing a field city populated from an enum
["NY",LN","ZU","FF"]
where A = ["NY","LN",ZU"]
B = ["NY","ZU","FF"]
C = ["NY", "LN"]
D = ["LN", "ZU"]
on a search for ["NY","ZU"] I'd only like to return A and B. Thanks!

Have a look at the docs for the terms filter:http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter.html

or alternatively, you could use the and filter:http://www.elasticsearch.org/guide/reference/query-dsl/and-filter.html

they query dsl is your friend :slight_smile:

http://www.elasticsearch.org/guide/reference/query-dsl

clint


(system) #6