Terms query


(Alex Chen) #1

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(Marcin Dojwa) #2

Hi,

You can use:
{
"and":[
{
"term":{
"tags":1
}
},
{
"term":{
"tags":2
}
},
{
"term":{
"tags":3
}
}
]
}

You may want to check this too:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
I am not sure what is minimum_match for but I can guess that the query
below could return what you want too:
{
"terms":{
"tags":[
1,
2,
3
],
"minimym_match":3
}
}

2012/4/13 Alex Chen chen650@yahoo.com

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(Alex Chen) #3

Thanks Marcin for your help. Both queries you suggested would match doc2
(tags: [1,2,3]), but will miss doc1 (tags: [1,2]).
if a doc's tags are subset of the query tags, it should be matched. the
minimum_match should be the total number of tags for each doc.
Is it possible for ES to support something like this in the terms query?
"minimum_match" : "ALL"

Thanks.

-alex

On Friday, April 13, 2012 1:39:30 AM UTC-7, Marcin Dojwa wrote:

Hi,

You can use:
{
"and":[
{
"term":{
"tags":1
}
},
{
"term":{
"tags":2
}
},
{
"term":{
"tags":3
}
}
]
}

You may want to check this too:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
I am not sure what is minimum_match for but I can guess that the query
below could return what you want too:
{
"terms":{
"tags":[
1,
2,
3
],
"minimym_match":3
}
}

2012/4/13 Alex Chen chen650@yahoo.com

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(Igor Motov) #4

This is a tough one. I think there are two possible strategies here - you
can either store all tags in a single field and include all permutations in
your query, or you can find a complimentary set of tags and use it to
exclude all undesired records from your results.

In the first case, you will need to create an additional field (taglist,
for example) that would contain a sorted merged list of all tags:

doc1 will have taglist: 1-2
doc1 will have taglist: 1-2-3
doc1 will have taglist: 1-2-4

And then your will be able to use terms query for terms 1, 2, 3, 1-2, 1-3,
2-3, and 1-2-3. Needless to say this solution is not going to scale very
well if your query will contain more then a few tags.

In the second case, assuming that you have 10 different tags, the
complimentary tags will be 4,5,6,7,8,9, and 10 and your query will look
like this:

-tags:4 -tags:5 -tags:6 -taglist:7 -taglist:8 -taglist:9 -taglist:10

You can retrieve list of all available tags using faceted search
and implement query as a terms filter wrapped into a not filter.

On Friday, April 13, 2012 1:58:21 PM UTC-4, Alex Chen wrote:

Thanks Marcin for your help. Both queries you suggested would match doc2
(tags: [1,2,3]), but will miss doc1 (tags: [1,2]).
if a doc's tags are subset of the query tags, it should be matched. the
minimum_match should be the total number of tags for each doc.
Is it possible for ES to support something like this in the terms query?
"minimum_match" : "ALL"

Thanks.

-alex

On Friday, April 13, 2012 1:39:30 AM UTC-7, Marcin Dojwa wrote:

Hi,

You can use:
{
"and":[
{
"term":{
"tags":1
}
},
{
"term":{
"tags":2
}
},
{
"term":{
"tags":3
}
}
]
}

You may want to check this too:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
I am not sure what is minimum_match for but I can guess that the query
below could return what you want too:
{
"terms":{
"tags":[
1,
2,
3
],
"minimym_match":3
}
}

2012/4/13 Alex Chen chen650@yahoo.com

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(Alex Chen) #5

Thanks Igor. The average number of tags in the query is above 200, so I
cannot use the permutation method. The total number of unique tags are in
the order of 100K, so i cannot use the complimentary set.

It seems to me the server code needs to be modified to return results only
if all tags in the document are in the query set. It would be great if ES
could add a new feature in terms query
that supports "minimum_match: ALL".

thanks.

-alex

On Friday, April 13, 2012 2:45:36 PM UTC-7, Igor Motov wrote:

This is a tough one. I think there are two possible strategies here - you
can either store all tags in a single field and include all permutations in
your query, or you can find a complimentary set of tags and use it to
exclude all undesired records from your results.

In the first case, you will need to create an additional field (taglist,
for example) that would contain a sorted merged list of all tags:

doc1 will have taglist: 1-2
doc1 will have taglist: 1-2-3
doc1 will have taglist: 1-2-4

And then your will be able to use terms query for terms 1, 2, 3, 1-2, 1-3,
2-3, and 1-2-3. Needless to say this solution is not going to scale very
well if your query will contain more then a few tags.

In the second case, assuming that you have 10 different tags, the
complimentary tags will be 4,5,6,7,8,9, and 10 and your query will look
like this:

-tags:4 -tags:5 -tags:6 -taglist:7 -taglist:8 -taglist:9 -taglist:10

You can retrieve list of all available tags using faceted search
and implement query as a terms filter wrapped into a not filter.

On Friday, April 13, 2012 1:58:21 PM UTC-4, Alex Chen wrote:

Thanks Marcin for your help. Both queries you suggested would match doc2
(tags: [1,2,3]), but will miss doc1 (tags: [1,2]).
if a doc's tags are subset of the query tags, it should be matched. the
minimum_match should be the total number of tags for each doc.
Is it possible for ES to support something like this in the terms query?
"minimum_match" : "ALL"

Thanks.

-alex

On Friday, April 13, 2012 1:39:30 AM UTC-7, Marcin Dojwa wrote:

Hi,

You can use:
{
"and":[
{
"term":{
"tags":1
}
},
{
"term":{
"tags":2
}
},
{
"term":{
"tags":3
}
}
]
}

You may want to check this too:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
I am not sure what is minimum_match for but I can guess that the query
below could return what you want too:
{
"terms":{
"tags":[
1,
2,
3
],
"minimym_match":3
}
}

2012/4/13 Alex Chen chen650@yahoo.com

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(Igor Motov) #6

How would you implement it on the server side efficiently? How can you find
that doc3 shouldn't be included because it has tag 4? Unless you know that
tag 4 exists, you would have to retrieve all documents that contain tags 1,
2 or 3 and verify that they don't have any other tags. Such solution might
result in retrieving and checking a lot of records, potentially, entire
index. If this works for you, you can actually implement it now using
script filter:

{
"query" : {
"filtered" : {
"query": {
"terms" : {
"tags" : ["1", "2", "3"],
"minimum_match" : 1
}
},
"filter" : {
"script" : {
"script" : "foreach(tag : doc.tags.values) {
if(!filter_tags.containsKey(tag)) return false }; return true",
"params" : {
"filter_tags" : { "1" : {}, "2" : {}, "3" : {} }
}
}
}
}
},
"fields" : ["tags"]
}

On Friday, April 13, 2012 6:30:52 PM UTC-4, Alex Chen wrote:

Thanks Igor. The average number of tags in the query is above 200, so I
cannot use the permutation method. The total number of unique tags are in
the order of 100K, so i cannot use the complimentary set.

It seems to me the server code needs to be modified to return results only
if all tags in the document are in the query set. It would be great if ES
could add a new feature in terms query
that supports "minimum_match: ALL".

thanks.

-alex

On Friday, April 13, 2012 2:45:36 PM UTC-7, Igor Motov wrote:

This is a tough one. I think there are two possible strategies here - you
can either store all tags in a single field and include all permutations in
your query, or you can find a complimentary set of tags and use it to
exclude all undesired records from your results.

In the first case, you will need to create an additional field (taglist,
for example) that would contain a sorted merged list of all tags:

doc1 will have taglist: 1-2
doc1 will have taglist: 1-2-3
doc1 will have taglist: 1-2-4

And then your will be able to use terms query for terms 1, 2, 3, 1-2,
1-3, 2-3, and 1-2-3. Needless to say this solution is not going to scale
very well if your query will contain more then a few tags.

In the second case, assuming that you have 10 different tags, the
complimentary tags will be 4,5,6,7,8,9, and 10 and your query will look
like this:

-tags:4 -tags:5 -tags:6 -taglist:7 -taglist:8 -taglist:9 -taglist:10

You can retrieve list of all available tags using faceted search
and implement query as a terms filter wrapped into a not filter.

On Friday, April 13, 2012 1:58:21 PM UTC-4, Alex Chen wrote:

Thanks Marcin for your help. Both queries you suggested would match
doc2 (tags: [1,2,3]), but will miss doc1 (tags: [1,2]).
if a doc's tags are subset of the query tags, it should be matched. the
minimum_match should be the total number of tags for each doc.
Is it possible for ES to support something like this in the terms query?
"minimum_match" : "ALL"

Thanks.

-alex

On Friday, April 13, 2012 1:39:30 AM UTC-7, Marcin Dojwa wrote:

Hi,

You can use:
{
"and":[
{
"term":{
"tags":1
}
},
{
"term":{
"tags":2
}
},
{
"term":{
"tags":3
}
}
]
}

You may want to check this too:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
I am not sure what is minimum_match for but I can guess that the query
below could return what you want too:
{
"terms":{
"tags":[
1,
2,
3
],
"minimym_match":3
}
}

2012/4/13 Alex Chen chen650@yahoo.com

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(Marcin Dojwa) #7

Sorry Alex, I misunderstood you, now I get what you wanted :slight_smile:

As I can see in documentation, maybe must_not bool query is what you want?

Something like this:
{
"bool":{
"must":{
"terms":{
"tags":[
1,
2,
3
]
},
"minimum_match":1
},
"must_not":{
"and":[
{
"not":{
"term":{
"tags":1
}
}
},
{
"not":{
"term":{
"tags":2
}
}
},
{
"not":{
"term":{
"tags":3
}
}
}
]
}
}
}
I have no idea if it works but maybe it will lead you to the solution
somehow :slight_smile:

Best regards.

2012/4/14 Igor Motov imotov@gmail.com

How would you implement it on the server side efficiently? How can you
find that doc3 shouldn't be included because it has tag 4? Unless you know
that tag 4 exists, you would have to retrieve all documents that contain
tags 1, 2 or 3 and verify that they don't have any other tags. Such
solution might result in retrieving and checking a lot of records,
potentially, entire index. If this works for you, you can actually
implement it now using script filter:

{
"query" : {
"filtered" : {
"query": {
"terms" : {
"tags" : ["1", "2", "3"],
"minimum_match" : 1
}
},
"filter" : {
"script" : {
"script" : "foreach(tag : doc.tags.values) {
if(!filter_tags.containsKey(tag)) return false }; return true",
"params" : {
"filter_tags" : { "1" : {}, "2" : {}, "3" : {} }
}
}
}
}
},
"fields" : ["tags"]
}

On Friday, April 13, 2012 6:30:52 PM UTC-4, Alex Chen wrote:

Thanks Igor. The average number of tags in the query is above 200, so I
cannot use the permutation method. The total number of unique tags are in
the order of 100K, so i cannot use the complimentary set.

It seems to me the server code needs to be modified to return results
only if all tags in the document are in the query set. It would be great
if ES could add a new feature in terms query
that supports "minimum_match: ALL".

thanks.

-alex

On Friday, April 13, 2012 2:45:36 PM UTC-7, Igor Motov wrote:

This is a tough one. I think there are two possible strategies here -
you can either store all tags in a single field and include all
permutations in your query, or you can find a complimentary set of tags and
use it to exclude all undesired records from your results.

In the first case, you will need to create an additional field (taglist,
for example) that would contain a sorted merged list of all tags:

doc1 will have taglist: 1-2
doc1 will have taglist: 1-2-3
doc1 will have taglist: 1-2-4

And then your will be able to use terms query for terms 1, 2, 3, 1-2,
1-3, 2-3, and 1-2-3. Needless to say this solution is not going to scale
very well if your query will contain more then a few tags.

In the second case, assuming that you have 10 different tags, the
complimentary tags will be 4,5,6,7,8,9, and 10 and your query will look
like this:

-tags:4 -tags:5 -tags:6 -taglist:7 -taglist:**
8 -taglist:9 -taglist:10

You can retrieve list of all available tags using faceted search
and implement query as a terms filter wrapped into a not filter.

On Friday, April 13, 2012 1:58:21 PM UTC-4, Alex Chen wrote:

Thanks Marcin for your help. Both queries you suggested would match
doc2 (tags: [1,2,3]), but will miss doc1 (tags: [1,2]).
if a doc's tags are subset of the query tags, it should be matched.
the minimum_match should be the total number of tags for each doc.
Is it possible for ES to support something like this in the terms query?
"minimum_match" : "ALL"

Thanks.

-alex

On Friday, April 13, 2012 1:39:30 AM UTC-7, Marcin Dojwa wrote:

Hi,

You can use:
{
"and":[
{
"term":{
"tags":1
}
},
{
"term":{
"tags":2
}
},
{
"term":{
"tags":3
}
}
]
}

You may want to check this too: http://www.elasticsearch.**
org/guide/reference/query-dsl/**terms-query.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
I am not sure what is minimum_match for but I can guess that the
query below could return what you want too:
{
"terms":{
"tags":[
1,
2,
3
],
"minimym_match":3
}
}

2012/4/13 Alex Chen chen650@yahoo.com

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(Alex Chen) #8

Thanks Igor for suggesting this script. It works for me. one minor change
i have to make is filter_tags.containsKey(tag.toString()).
if the script can access the number of matched terms in the query, it can
be further simplified by something like:
if (doc.num_matched == length(doc.tags))
this is probably not available in current version of ES. It would be nice
if it can be supported.

Thanks again for all your help. it is greatly appreciated.

-alex

On Friday, April 13, 2012 4:13:36 PM UTC-7, Igor Motov wrote:

How would you implement it on the server side efficiently? How can you
find that doc3 shouldn't be included because it has tag 4? Unless you know
that tag 4 exists, you would have to retrieve all documents that contain
tags 1, 2 or 3 and verify that they don't have any other tags. Such
solution might result in retrieving and checking a lot of records,
potentially, entire index. If this works for you, you can actually
implement it now using script filter:

{
"query" : {
"filtered" : {
"query": {
"terms" : {
"tags" : ["1", "2", "3"],
"minimum_match" : 1
}
},
"filter" : {
"script" : {
"script" : "foreach(tag : doc.tags.values) {
if(!filter_tags.containsKey(tag)) return false }; return true",
"params" : {
"filter_tags" : { "1" : {}, "2" : {}, "3" : {} }
}
}
}
}
},
"fields" : ["tags"]
}

On Friday, April 13, 2012 6:30:52 PM UTC-4, Alex Chen wrote:

Thanks Igor. The average number of tags in the query is above 200, so I
cannot use the permutation method. The total number of unique tags are in
the order of 100K, so i cannot use the complimentary set.

It seems to me the server code needs to be modified to return results
only if all tags in the document are in the query set. It would be great
if ES could add a new feature in terms query
that supports "minimum_match: ALL".

thanks.

-alex

On Friday, April 13, 2012 2:45:36 PM UTC-7, Igor Motov wrote:

This is a tough one. I think there are two possible strategies here -
you can either store all tags in a single field and include all
permutations in your query, or you can find a complimentary set of tags and
use it to exclude all undesired records from your results.

In the first case, you will need to create an additional field (taglist,
for example) that would contain a sorted merged list of all tags:

doc1 will have taglist: 1-2
doc1 will have taglist: 1-2-3
doc1 will have taglist: 1-2-4

And then your will be able to use terms query for terms 1, 2, 3, 1-2,
1-3, 2-3, and 1-2-3. Needless to say this solution is not going to scale
very well if your query will contain more then a few tags.

In the second case, assuming that you have 10 different tags, the
complimentary tags will be 4,5,6,7,8,9, and 10 and your query will look
like this:

-tags:4 -tags:5 -tags:6 -taglist:7 -taglist:8 -taglist:9 -taglist:10

You can retrieve list of all available tags using faceted search
and implement query as a terms filter wrapped into a not filter.

On Friday, April 13, 2012 1:58:21 PM UTC-4, Alex Chen wrote:

Thanks Marcin for your help. Both queries you suggested would match
doc2 (tags: [1,2,3]), but will miss doc1 (tags: [1,2]).
if a doc's tags are subset of the query tags, it should be matched.
the minimum_match should be the total number of tags for each doc.
Is it possible for ES to support something like this in the terms query?
"minimum_match" : "ALL"

Thanks.

-alex

On Friday, April 13, 2012 1:39:30 AM UTC-7, Marcin Dojwa wrote:

Hi,

You can use:
{
"and":[
{
"term":{
"tags":1
}
},
{
"term":{
"tags":2
}
},
{
"term":{
"tags":3
}
}
]
}

You may want to check this too:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
I am not sure what is minimum_match for but I can guess that the
query below could return what you want too:
{
"terms":{
"tags":[
1,
2,
3
],
"minimym_match":3
}
}

2012/4/13 Alex Chen chen650@yahoo.com

Hi,

I have a query that is very similar to the terms query, but with more
restriction. I would like to get
the documents that have all the terms appear in the query.

for example:
doc1 has "tags": [1, 2]
doc2 has "tags": [1, 2, 3]
doc3 has "tags": [1, 2, 4]

query "tags" : [1, 2, 3] should return doc1 and doc2, but not doc3,
because tag 4 is not in the query.
how can i write a query that does this?
thanks.

-alex


(system) #9