Query string not working with keyword tokenizer


(Andrei) #1

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?


Query_string breaks search term on space even when keyword tokenizer is used
(Adriano Ferreira) #2

On Fri, Jan 14, 2011 at 7:48 PM, Andrei andrei@zmievski.org wrote:

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

This is using the "standard" analyzer, which breaks the "coney island" into
terms "coney" and "island". In your doc which uses a "keyword" analyzer, the
term in there is "coney island" itself and not the words. So use the "term"
query like you did below, or explicitly tell the "query_string" which
analyzer you want to use (but I am not certain "keyword" and "query_string"
parse will work well together).

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?


(Andrei) #3

I am not sure where you see the "standard" analyzer. I created the
"eulang_tags" custom analyzer as shown above and set the "tags" field
to use it.

-Andrei

On Jan 14, 2:40 pm, Adriano Ferreira a.r.ferre...@gmail.com wrote:

On Fri, Jan 14, 2011 at 7:48 PM, Andrei and...@zmievski.org wrote:

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

This is using the "standard" analyzer, which breaks the "coney island" into
terms "coney" and "island". In your doc which uses a "keyword" analyzer, the
term in there is "coney island" itself and not the words. So use the "term"
query like you did below, or explicitly tell the "query_string" which
analyzer you want to use (but I am not certain "keyword" and "query_string"
parse will work well together).

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?


(Adriano Ferreira) #4

On Fri, Jan 14, 2011 at 11:28 PM, Andrei andrei@zmievski.org wrote:

I am not sure where you see the "standard" analyzer. I created the
"eulang_tags" custom analyzer as shown above and set the "tags" field
to use it.

The "standard" analyzer should be the default analyzer for "query_string"
unless you change the defaults or explicitly request another analyzer. This
is not tied to the fact you have used a custom analyzer to some of your
fields in a certain type and index.

-Andrei

On Jan 14, 2:40 pm, Adriano Ferreira a.r.ferre...@gmail.com wrote:

On Fri, Jan 14, 2011 at 7:48 PM, Andrei and...@zmievski.org wrote:

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

This is using the "standard" analyzer, which breaks the "coney island"
into
terms "coney" and "island". In your doc which uses a "keyword" analyzer,
the
term in there is "coney island" itself and not the words. So use the
"term"
query like you did below, or explicitly tell the "query_string" which
analyzer you want to use (but I am not certain "keyword" and
"query_string"
parse will work well together).

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?


(Clinton Gormley) #5
    I am not sure where you see the "standard" analyzer. I
    created the
    "eulang_tags" custom analyzer as shown above and set the
    "tags" field
    to use it. 

The "standard" analyzer should be the default analyzer for
"query_string" unless you change the defaults or explicitly request
another analyzer. This is not tied to the fact you have used a custom
analyzer to some of your fields in a certain type and index.

Actually, the ES docs for mapping mention:

  • index_analyzer: used when indexing a field
  • search_analyzer: used when analyzing a field that is part of
    a query string
  • analyzer: sets both index_analyzer and search_analyzer

See http://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/

So, my reading of this is that it should work - Andrei is, after all,
searching on "fields": ["tags"]

This may be a bug.

Andrei, what happens if you search for:

  • "query_string": "tags:coney\ island"
  • "query_string": "tags:coney\ island"
  • "query_string": "tags:(coney island)"

I'm suggesting a few possibilities, because I'm not sure how to
represent an embedded space in the query string.

clint


(Andrei) #6

Clint,

"query_string": "tags:coney\ island" seemed to work. Though now I'm
not sure whether I should just escape the space this way or other non-
alphanumeric characters as well. Maybe Shay can shed some light on
this.

-Andrei

On Jan 15, 8:30 am, Clinton Gormley clin...@iannounce.co.uk wrote:

Actually, the ES docs for mapping mention:

  • index_analyzer: used when indexing a field
  • search_analyzer: used when analyzing a field that is part of
    a query string
  • analyzer: sets both index_analyzer and search_analyzer

Seehttp://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/

So, my reading of this is that it should work - Andrei is, after all,
searching on "fields": ["tags"]

This may be a bug.

Andrei, what happens if you search for:

  • "query_string": "tags:coney\ island"
  • "query_string": "tags:coney\ island"
  • "query_string": "tags:(coney island)"

I'm suggesting a few possibilities, because I'm not sure how to
represent an embedded space in the query string.

clint


(Shay Banon) #7

The query parser breaks down on whitespaces as well. So, the things that gets passed to the analyzer and then construct the query is "coney" and then "island" without doing the actual escaping. Not ideal, but thats how it works...
On Sunday, January 16, 2011 at 12:54 AM, Andrei wrote:

Clint,

"query_string": "tags:coney\ island" seemed to work. Though now I'm
not sure whether I should just escape the space this way or other non-
alphanumeric characters as well. Maybe Shay can shed some light on
this.

-Andrei

On Jan 15, 8:30 am, Clinton Gormley clin...@iannounce.co.uk wrote:

Actually, the ES docs for mapping mention:

  • index_analyzer: used when indexing a field
  • search_analyzer: used when analyzing a field that is part of
    a query string
  • analyzer: sets both index_analyzer and search_analyzer

Seehttp://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/

So, my reading of this is that it should work - Andrei is, after all,
searching on "fields": ["tags"]

This may be a bug.

Andrei, what happens if you search for:

  • "query_string": "tags:coney\ island"
  • "query_string": "tags:coney\ island"
  • "query_string": "tags:(coney island)"

I'm suggesting a few possibilities, because I'm not sure how to
represent an embedded space in the query string.

clint


(Andrei) #8

My current query (before this change), actually does this:

{"query_string": {
"fields": ["title", "notes", "tags"],
"query": "coney island",
"default_operator": "AND",
"use_dis_max": true
}}

All the fields had the "standard" analyzer. I wanted to change it so
that the tags are matched completely, without breaking them up into
words, while maintaining the current behavior with regard to title and
notes. What is the best way of achieving this?

-Andrei

On Jan 16, 1:55 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The query parser breaks down on whitespaces as well. So, the things that gets passed to the analyzer and then construct the query is "coney" and then "island" without doing the actual escaping. Not ideal, but thats how it works...


(Andrei) #9

So, do I need to convert this into a dis_max query? Because simply
escaping the whitespace in the query string will not work for title or
notes, because then it forces the words to be a phrase, basically.

On Jan 16, 11:18 am, Andrei and...@zmievski.org wrote:

My current query (before this change), actually does this:

{"query_string": {
"fields": ["title", "notes", "tags"],
"query": "coney island",
"default_operator": "AND",
"use_dis_max": true

}}

All the fields had the "standard" analyzer. I wanted to change it so
that the tags are matched completely, without breaking them up into
words, while maintaining the current behavior with regard to title and
notes. What is the best way of achieving this?

-Andrei

On Jan 16, 1:55 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The query parser breaks down on whitespaces as well. So, the things that gets passed to the analyzer and then construct the query is "coney" and then "island" without doing the actual escaping. Not ideal, but thats how it works...


(system) #10