Text query


(Owen Coutts) #1

Hi,

I'm running the following query:

{
"filter": {
"range": {
"time_upload": {
"to": "2011-01-19",
"from": "2011-01-16"
}
}
},
"query": {
"bool": {
"must": [
{
"text": {
"report_type": "application-functional"
}
},
{
"text": {
"application_locale": "en-US"
}
}
]
}
}
}

I would expect that if there were no documents where there was a match for
one of the must queries this would return no documents yet that is not the
behaviour that I see. Any advice?

Thanks,
Owen

--
Owen Coutts
University of Waterloo


(David Pilato) #2

I suggest trying to make 2 "must" queries and not one "must" query within 2 queries.

Something like :

"query": {
    "bool": {
        "must": { },
        "must": { }}}

hope this helps
David :wink:

Le 7 juil. 2011 à 03:57, Owen Coutts owen@owencoutts.com a écrit :

Hi,

I'm running the following query:

{
"filter": {
"range": {
"time_upload": {
"to": "2011-01-19",
"from": "2011-01-16"
}
}
},
"query": {
"bool": {
"must": [
{
"text": {
"report_type": "application-functional"
}
},
{
"text": {
"application_locale": "en-US"
}
}
]
}
}
}

I would expect that if there were no documents where there was a match for one of the must queries this would return no documents yet that is not the behaviour that I see. Any advice?

Thanks,
Owen

--
Owen Coutts
University of Waterloo


(Clinton Gormley) #3

On Thu, 2011-07-07 at 05:50 +0200, David Pilato wrote:

I suggest trying to make 2 "must" queries and not one "must" query within 2 queries.

Something like :

"query": {
    "bool": {
        "must": { },
        "must": { }}}

This syntax is incorrect. In JSON, the second "must" key would override
the first (although this form may actually work in ES because it reads
JSON as a stream).

Owen, a couple of comments about your query:

{
"filter": {
"range": {
"time_upload": {
"to": "2011-01-19",
"from": "2011-01-16"
}
}
},

First, unless you specifically want to produce facets based on the
unfiltered results of your query, and then filter the results with the
above filter, you should be using a filtered query instead, ie:

{ query: {
filtered: {
query: { bool: {.....}},
filter: { range: {.....}
}
}

Second, what mapping do 'report_type' and 'application_locale' have?
Did you set them to be {index: not_analyzed} ?

If not, they will automatically be considered to be full text, and thus
analyzed. Then your "text" query will do a full text search on them, ie
look for ("application" or "functional") and ("en" or "us").

If you set them to be not_analyzed, then the "text" query will look for
EXACTLY the term "en-US", which seems to be what you want.

Thirdly, there is no relevance in this query ie you're not looking for
the document that best matches "interesting treatise on the query dsl"

Your documents either match, or they don't. It's boolean: true or false.

In this case, it would be better to use filters rather than a bool
query, so your query would be better rewritten as:

{
"query" : {
"constant_score" : {
"filter" : {
"and" : [
{
"term" : {
"application_locale" : "en_US"
}
},
{
"term" : {
"report_type" : "application-functional"
}
},
{
"numeric_range" : {
"time_upload" : {
"lte" : "2011-01-16",
"gte" : "2011-01-19"
}
}
}
]
}
}
}
}

Notes:

  1. The above query won't work until you change the mapping of the
    application_locale and report_type fields to not_analyzed, and reindex
    your data.

  2. I'm guessing that your 'time_upload' is a date-time field, with
    potentially very many values. In this case, it is better to use a
    numeric_range filter instead of a range filter.

clint


(system) #4