Query_string queries and special characters

Curt_Kohler · August 27, 2013, 12:13pm

We've been using query_string queries with ElasticSearch as part of a quick
proof of concept in the company. While we have had pretty good success, we
have seen some things we don't understand as well. Hoping someone can
shed some light/confirm some suspicions.

We have field (call it field1) in our content for which we defined a
mapping with a custom analyzer (keyword tokenizer, lower case filter). The
field has textual data (including some non-alphanum characters such as '-'
and '/'). An example might be: Fksdj-hfge/76543-89-0. Running the
_analyze endpoint shows that it is being treated as one token and has been
lowercased as expected. In the index, every document has a unique value in
this field. We also have a default all field set on the index.

When we submit a query like this: field1:Fksdj-hfge/76543-89-0 we get no
answers
When I escape the '/' like this: field1:Fksdj-hfge/76543-89-0 it finds
the document.

Based on the results, I assume that unlike match queries, query_string
doesn't apply the analyzer from the field being searched to the query.
Assuming that is true, some questions:

What analyzer is used by query_string queries to process the search string
by default?
Do you need to escape any special/non-alphanum character to get it to pass
through the query parser (assuming we let it use it's default analyzer)?
I assume the analyzer parameter on the query_string query refers to the
query parser's analyzer, will the query_string query select the correct
analyzer for the specified field once it gets past parsing the query?

Thanks,
Curt

PS: I know I can use term queries, however we are trying to hook into an
existing system that is providing Lucene syntax queries and were trying to
avoid the extra development for the proof of concept.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · August 27, 2013, 4:45pm

Query_string passes the query straight through to the Lucene parser, so it
acts just like the Lucene QueryParser. Specifically, the parser will:

Tokenize your query with it's own tokenizer (not the one in your
analyzer) so as to find tokens/phrases and special characters
Rewrite your query to use any special operations (fuzzy, wildcard,
etc)
Pass the tokens/phrases to the field's analyzer for analysis.

So to answer your question, you'll have to escape special characters (a
full list can be found here<Apache Lucene - Query Parser Syntax
Special Characters>) so that the QueryParser keeps them as part of the
token. Once the query has made it through the "query parsing" phase, the
leftover tokens will be passed to the fields analyzer (or whatever analyzer
you specify in your query).

-Zach

On Tuesday, August 27, 2013 8:13:38 AM UTC-4, Curt Kohler wrote:

We've been using query_string queries with Elasticsearch as part of a
quick proof of concept in the company. While we have had pretty good
success, we have seen some things we don't understand as well. Hoping
someone can shed some light/confirm some suspicions.

We have field (call it field1) in our content for which we defined a
mapping with a custom analyzer (keyword tokenizer, lower case filter). The
field has textual data (including some non-alphanum characters such as '-'
and '/'). An example might be: Fksdj-hfge/76543-89-0. Running the
_analyze endpoint shows that it is being treated as one token and has been
lowercased as expected. In the index, every document has a unique value in
this field. We also have a default all field set on the index.

When we submit a query like this: field1:Fksdj-hfge/76543-89-0 we get no
answers
When I escape the '/' like this: field1:Fksdj-hfge/76543-89-0 it finds
the document.

Based on the results, I assume that unlike match queries, query_string
doesn't apply the analyzer from the field being searched to the query.
Assuming that is true, some questions:

What analyzer is used by query_string queries to process the search string
by default?
Do you need to escape any special/non-alphanum character to get it to pass
through the query parser (assuming we let it use it's default analyzer)?
I assume the analyzer parameter on the query_string query refers to the
query parser's analyzer, will the query_string query select the correct
analyzer for the specified field once it gets past parsing the query?

Thanks,
Curt

PS: I know I can use term queries, however we are trying to hook into an
existing system that is providing Lucene syntax queries and were trying to
avoid the extra development for the proof of concept.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Search eith string_query for special charcters in not_analyzed fields Elasticsearch	1	348	July 6, 2017
Query_string query with asterisk and escaped char not working Elasticsearch	3	719	September 27, 2022
Special Characters in Query String Elasticsearch	2	2096	April 23, 2020
Query String query is not accepting analyzer which defined in settings Elasticsearch	1	342	July 6, 2017
Query_string in 5.1 is case sensitive Elasticsearch	9	2163	April 14, 2017

Query_string queries and special characters

Related topics