ES query help


(thale jacobs) #1

Hello - I am fairly new to ES and need some help with a ES query.

This is how I built my index:

curl -XPOST 'http://127.0.0.1:9200/test/country?pretty=1' -d '{
"country": ["United States"]}'
curl -XPOST 'http://127.0.0.1:9200/test/province?pretty=1' -d '{
"country": ["United States"],"province" : "NY"}'
curl -XPOST 'http://127.0.0.1:9200/test/city?pretty=1' -d '{ "country":
["United States"],"province" : "NY", "city" : "Albany"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "West Albany"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "Albany Center"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "Shopping District"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "Empirical Mile"}'

(It is a simple index with 4 document types: country, province,city, and
district).

From a user, I get a string of text that are that search terms that looks
like the following: "Albany NY".

Every query I have attempted to write never returns my desired top response
which would be a match to the "city" document type (the user never
entered "West Albany"), but this is the top matched returned from the query
below:

curl -XGET 'http://127.0.0.1:9200/test/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : {
"query_string" : {
"query" : "Albany NY"
}
}
}
}
}'

Here are the result of the query:
1){"country": ["United States"],"province" :"NY","city" : "Albany","district" : "West Albany"}
2){"country": ["United States"],"province" :"NY","city" : "Albany","district" :"Albany Center"}
3){"country": ["United States"],"province" :"NY", "city" : "Albany"}

So is there a way to filter out documents that contain terms (eg: "West" or "Center") that were not in passed in the query string
and return {"country": ["United States"],"province" :"NY", "city" : "Albany"} as the first result?

(I hope this question is clear...I posted it on another board, but I was not clear enough :-(, so I am trying again)

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #2

That's a Lucene query string. So a phrase might probably be surrounded by
double quotes:

"query" : "city:"Albany NY""

Or you might just enter the following, using + to denote MUST (aka AND)
to require both words in any order:

"query" : "+Albany +NY"

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Christian Th.) #3

Have you already tried it with the search type dfs_query_then_fetch?

It makes an additional roundtrip but it is more accurate, when you are
using multiple shards for an index. I assume, that you are using standard
config with 5 shards.
http://www.elasticsearch.org/guide/reference/api/search/search-type/

curl -XGET 'http://127.0.0.1:9200/test/_search?search_type=dfs_query_then_fetch&pretty=1 http://127.0.0.1:9200/test/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : {
"query_string" : {
"query" : "Albany NY"
}
}
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #4

Christian Th - Thanks for your suggestions. I did try it with your ideas:

  1. I set the number of shards to 1, and this does lead to consistent
    results, but I am still not able to get the results I desire. 2) I also
    tried your suggestion of dfs_query_then_fetch, but again, same results. I
    understand why ES is producing the results it does (the term "Albany" is
    more frequent in the top results), but I still do not know how to write the
    query to produces results that do not contain terms for which I did not
    search.

On Tuesday, October 1, 2013 6:45:02 PM UTC-4, Christian Th. wrote:

Have you already tried it with the search type dfs_query_then_fetch?

It makes an additional roundtrip but it is more accurate, when you are
using multiple shards for an index. I assume, that you are using standard
config with 5 shards.
http://www.elasticsearch.org/guide/reference/api/search/search-type/

curl -XGET 'http://127.0.0.1:9200/test/_search?search_type=dfs_query_then_fetch&pretty=1 http://127.0.0.1:9200/test/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : {
"query_string" : {
"query" : "Albany NY"
}
}
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #5

Hello Brian - Thanks for you reply. I did try you suggestion, but it
produced the same result order:

Here are the result of the query:
1){"country": ["United States"],"province" :"NY","city" : "Albany","district" : "West Albany"}
2){"country": ["United States"],"province" :"NY","city" : "Albany","district" :"Albany Center"}
3){"country": ["United States"],"province" :"NY", "city" : "Albany"}

Because "West" and "Center" are not part of my search string, I do not want
the query to return them as the top result. Document 3) has no "unmatched"
terms from my query string, so I would like to be able to write a query to
produce this as my top result.

Thanks again for you suggestion.

On Tuesday, October 1, 2013 4:52:50 PM UTC-4, InquiringMind wrote:

That's a Lucene query string. So a phrase might probably be surrounded by
double quotes:

"query" : "city:"Albany NY""

Or you might just enter the following, using + to denote MUST (aka AND)
to require both words in any order:

"query" : "+Albany +NY"

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #6

Thale,

One suggestion I've read before was to create a numeric field and index it
with the number of words in a field or in the document. Then filter (or
reverse sort) on that field, such that the documents with more words than
your query either don't show up (or show up last).

Brian

On Wednesday, October 2, 2013 7:59:30 AM UTC-4, thale jacobs wrote:

Christian Th - Thanks for your suggestions. I did try it with your
ideas: 1) I set the number of shards to 1, and this does lead to
consistent results, but I am still not able to get the results I desire.
2) I also tried your suggestion of dfs_query_then_fetch, but again, same
results. I understand why ES is producing the results it does (the term
"Albany" is more frequent in the top results), but I still do not know how
to write the query to produces results that do not contain terms for which
I did not search.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #7

The trick is anti-phrasing.

If a user enters a phrase, and you know that it should match values in the
"city" field only, but not in the other fields, then you have to filter
this phrase against the other fields explicitly.

Example:

curl -XGET 'http://127.0.0.1:9200/test/_search?pretty' -d '
{
"query" : {
"query_string" : { "query" : "Albany NY" }
},
"filter" : {
"query" : {
"query_string" : { "query" : "!district:Albany !country:Albany
!province:Albany" }
}
}
}
'

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #8

Hello Jörg - Thanks for the reply. I tried your suggestion and it did
produce the correct result...However, the query you constructed assumes
some information is known about the components search string... When the
search string comes in to me I do know know that Albany is the city, and NY
is the state... so the query you constructed has the knowledge
"!city:Albany" should not be used in the "anti-phrasing". I will read up
more on "anti-phrasing" as it may have some potential to help here. Thanks
again.

On Thursday, October 3, 2013 3:53:41 PM UTC-4, Jörg Prante wrote:

The trick is anti-phrasing.

If a user enters a phrase, and you know that it should match values in the
"city" field only, but not in the other fields, then you have to filter
this phrase against the other fields explicitly.

Example:

curl -XGET 'http://127.0.0.1:9200/test/_search?pretty' -d '
{
"query" : {
"query_string" : { "query" : "Albany NY" }
},
"filter" : {
"query" : {
"query_string" : { "query" : "!district:Albany !country:Albany
!province:Albany" }
}
}
}
'

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #9

On Monday, October 7, 2013 8:18:34 AM UTC-4, thale jacobs wrote:

Hello Jörg - Thanks for the reply. I tried your suggestion and it did
produce the correct result...However, the query you constructed assumes
some information is known about the components search string... When the
search string comes in to me I do not know know that Albany is the city,
and NY is the state... so the query you constructed has the knowledge
"!city:Albany" should not be used in the "anti-phrasing". I will read up
more on "anti-phrasing" as it may have some potential to help here. Thanks
again.

On Thursday, October 3, 2013 3:53:41 PM UTC-4, Jörg Prante wrote:

The trick is anti-phrasing.

If a user enters a phrase, and you know that it should match values in
the "city" field only, but not in the other fields, then you have to filter
this phrase against the other fields explicitly.

Example:

curl -XGET 'http://127.0.0.1:9200/test/_search?pretty' -d '
{
"query" : {
"query_string" : { "query" : "Albany NY" }
},
"filter" : {
"query" : {
"query_string" : { "query" : "!district:Albany
!country:Albany !province:Albany" }
}
}
}
'

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #10

Here is another link to a similar problem:

http://elasticsearch-users.115913.n3.nabble.com/Exact-phrase-match-city-names-example-td4019310.html#a4042574

On Tuesday, October 1, 2013 1:20:39 PM UTC-4, thale jacobs wrote:

Hello - I am fairly new to ES and need some help with a ES query.

This is how I built my index:

curl -XPOST 'http://127.0.0.1:9200/test/country?pretty=1' -d '{
"country": ["United States"]}'
curl -XPOST 'http://127.0.0.1:9200/test/province?pretty=1' -d '{
"country": ["United States"],"province" : "NY"}'
curl -XPOST 'http://127.0.0.1:9200/test/city?pretty=1' -d '{
"country": ["United States"],"province" : "NY", "city" : "Albany"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "West Albany"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "Albany Center"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "Shopping District"}'
curl -XPOST 'http://127.0.0.1:9200/test/district?pretty=1' -d '{
"country": ["United States"],"province" : "NY","city" : "Albany","district"
: "Empirical Mile"}'

(It is a simple index with 4 document types: country, province,city, and
district).

From a user, I get a string of text that are that search terms that looks
like the following: "Albany NY".

Every query I have attempted to write never returns my desired top
response which would be a match to the "city" document type (the user
never entered "West Albany"), but this is the top matched returned from the
query below:

curl -XGET 'http://127.0.0.1:9200/test/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : {
"query_string" : {
"query" : "Albany NY"
}
}
}
}
}'

Here are the result of the query:
1){"country": ["United States"],"province" :"NY","city" : "Albany","district" : "West Albany"}
2){"country": ["United States"],"province" :"NY","city" : "Albany","district" :"Albany Center"}
3){"country": ["United States"],"province" :"NY", "city" : "Albany"}

So is there a way to filter out documents that contain terms (eg: "West" or "Center") that were not in passed in the query string
and return {"country": ["United States"],"province" :"NY", "city" : "Albany"} as the first result?

(I hope this question is clear...I posted it on another board, but I was not clear enough :-(, so I am trying again)

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #11