Best way to search scoped within a single array element?


(Andrew Cholakian) #1

I've got a bunch of data like this:

curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "A
test", "tags" : ["brand_gucci,cat_purse,color_red", "brand_forever
21,color_orange,cat_socks"]}'

I'd like to find articles that only have gucci brand purses. So, I want a
query that will match only if the terms appear all in one array element. If
the terms span two elements in the 'tags' array they cannot be considered a
match.

I've tried a query like:
{
"terms" : {
"tags" : [ "brand_gucci", "cat_purse" ],
"minimum_match" : 2
}

but that will match a document like:

curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "A
test", "tags" : ["brand_gucci,cat_hat,color_red", "brand_forever
21,color_purse,cat_socks"]}'

which I don't want. Does anyone have any idea what the best way to do this
would be? Help would be much appreciated.

BTW, I should mention, I'm using a custom analyzer that splits on ',' to
make sure we tokenize tags correctly


(Clinton Gormley) #2

Hiya

curl -X POST "http://localhost:9200/articles/article" -d '{"title" :
"A test", "tags" : ["brand_gucci,cat_purse,color_red", "brand_forever
21,color_orange,cat_socks"]}'

I'd like to find articles that only have gucci brand purses. So, I
want a query that will match only if the terms appear all in one array
element. If the terms span two elements in the 'tags' array they
cannot be considered a match.

Then you need to use 'nested' fields (which internally are indexed as
separate documents), otherwise the tokens are flattened into a single
field.

clint

I've tried a query like:
{
"terms" : {
"tags" : [ "brand_gucci", "cat_purse" ],
"minimum_match" : 2
}

but that will match a document like:

curl -X POST "http://localhost:9200/articles/article" -d '{"title" :
"A test", "tags" : ["brand_gucci,cat_hat,color_red", "brand_forever
21,color_purse,cat_socks"]}'

which I don't want. Does anyone have any idea what the best way to do
this would be? Help would be much appreciated.

BTW, I should mention, I'm using a custom analyzer that splits on ','
to make sure we tokenize tags correctly


(Andrew Cholakian) #3

Ah, thanks! Works like a charm!

On Friday, June 15, 2012 3:42:57 AM UTC-7, Clinton Gormley wrote:

Hiya

curl -X POST "http://localhost:9200/articles/article" -d '{"title" :
"A test", "tags" : ["brand_gucci,cat_purse,color_red", "brand_forever
21,color_orange,cat_socks"]}'

I'd like to find articles that only have gucci brand purses. So, I
want a query that will match only if the terms appear all in one array
element. If the terms span two elements in the 'tags' array they
cannot be considered a match.

Then you need to use 'nested' fields (which internally are indexed as
separate documents), otherwise the tokens are flattened into a single
field.

clint

I've tried a query like:
{
"terms" : {
"tags" : [ "brand_gucci", "cat_purse" ],
"minimum_match" : 2
}

but that will match a document like:

curl -X POST "http://localhost:9200/articles/article" -d '{"title" :
"A test", "tags" : ["brand_gucci,cat_hat,color_red", "brand_forever
21,color_purse,cat_socks"]}'

which I don't want. Does anyone have any idea what the best way to do
this would be? Help would be much appreciated.

BTW, I should mention, I'm using a custom analyzer that splits on ','
to make sure we tokenize tags correctly


(system) #4