Query_string can't find token that _analyze shows is generated, but term query can


(ben) #1

I have attached a short bash script to recreate the situation. I have a
fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run since
the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer": "my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=true"
-d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" :
{"_index":"example","_type":"example","_id":"2169167","_version_type":"internal","_timestamp":0}
}
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field even
though the path is just_name. i also tried escaping space per documentation
and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to find
with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c98000a-bfaf-4eae-b908-bec71ed18643%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(ben) #2

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have a
fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run since
the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer": "my_analyzer"
},
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=true"
-d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" :
{"_index":"example","_type":"example","_id":"2169167","_version_type":"internal","_timestamp":0}
}
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field even
though the path is just_name. i also tried escaping space per documentation
and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to find
with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #3

I suspect the issue is the way the query parser works. The query phrase
"exampleof bug" will be parsed into a query for the tokens "exampleof" and
"bug" that are adjacent to each other. The issue is that you do not have
two such tokens, instead you have a token with the value "exampleof bug",
which is a single token with a space in it. According to Lucene, they are
not the same thing. You would need to create an analyzer that would create
the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billumina2@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have a
fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run since
the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer": "my_analyzer"
},
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=true"
-d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"example","id":"2169167","
version_type":"internal","_timestamp":0} }
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field even
though the path is just_name. i also tried escaping space per documentation
and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to find
with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQACjqdmqZ5s5YoCXN8SsEZv0sbfi2rx-tWPWM9h17t-5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(ben) #4

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match token
exists.

The syntax for lucene under "Fields" section shows a double quote is the
correct character for
this. http://lucene.apache.org/core/2_9_4/queryparsersyntax.html The term
query is found by query_string when using single quotes, but that doesn't
match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query phrase
"exampleof bug" will be parsed into a query for the tokens "exampleof" and
"bug" that are adjacent to each other. The issue is that you do not have
two such tokens, instead you have a token with the value "exampleof bug",
which is a single token with a space in it. According to Lucene, they are
not the same thing. You would need to create an analyzer that would create
the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben <billu...@gmail.com <javascript:>>
wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have a
fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run since
the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=true"
-d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"example","id":"2169167","
version_type":"internal","_timestamp":0} }
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field even
though the path is just_name. i also tried escaping space per documentation
and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to
find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #5

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It will
tokenize based on whitespace and then apply the filters to each term. Your
index does not contain the token "exampleof" and your analyze API example
confirms it. The issue of the query parser is a long standing one in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben billumina2@gmail.com wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match token
exists.

The syntax for lucene under "Fields" section shows a double quote is the
correct character for this.
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html The term query
is found by query_string when using single quotes, but that doesn't match
lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query phrase
"exampleof bug" will be parsed into a query for the tokens "exampleof" and
"bug" that are adjacent to each other. The issue is that you do not have
two such tokens, instead you have a token with the value "exampleof bug",
which is a single token with a space in it. According to Lucene, they are
not the same thing. You would need to create an analyzer that would create
the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have a
fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=true"
-d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"example","id":"2169167","
version_type":"internal","_timestamp":0} }
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field even
though the path is just_name. i also tried escaping space per documentation
and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to
find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDWj1ND61ZyuAi-invvk5JXkJb4%2B25EomOv03YMddLqAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #6

Here is the Lucene issue: https://issues.apache.org/jira/browse/LUCENE-2605

--
Ivan

On Thu, Aug 21, 2014 at 10:09 AM, Ivan Brusic ivan@brusic.com wrote:

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It will
tokenize based on whitespace and then apply the filters to each term. Your
index does not contain the token "exampleof" and your analyze API example
confirms it. The issue of the query parser is a long standing one in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben billumina2@gmail.com wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match token
exists.

The syntax for lucene under "Fields" section shows a double quote is the
correct character for this.
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html The term
query is found by query_string when using single quotes, but that doesn't
match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query phrase
"exampleof bug" will be parsed into a query for the tokens "exampleof" and
"bug" that are adjacent to each other. The issue is that you do not have
two such tokens, instead you have a token with the value "exampleof bug",
which is a single token with a space in it. According to Lucene, they are
not the same thing. You would need to create an analyzer that would create
the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have
a fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=
true" -d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"example","id":"2169167","
version_type":"internal","_timestamp":0} }
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field
even though the path is just_name. i also tried escaping space per
documentation and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to
find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBrMh2LdMcDcGJ1kGABA4ijkvXPRh9UTV%2Bn%3D0N86uwM6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(ben) #7

Well crap. By creating tokens that match it eliminates the exact match I'm
trying correct?

If I indexed two documents with each of the strings below... (assuming the
tokens are generated as you stated above)

exampleof bug
exampleof sample bug

Then ran a query:

name:"exampleof bug"

Would return both documents, which isn't an exact match but equivalent to
these queries.

name:exampleof bug
name:exampleof OR bug

Thanks!

On Thursday, August 21, 2014 10:11:42 AM UTC-7, Ivan Brusic wrote:

Here is the Lucene issue:
https://issues.apache.org/jira/browse/LUCENE-2605

--
Ivan

On Thu, Aug 21, 2014 at 10:09 AM, Ivan Brusic <iv...@brusic.com
<javascript:>> wrote:

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It will
tokenize based on whitespace and then apply the filters to each term. Your
index does not contain the token "exampleof" and your analyze API example
confirms it. The issue of the query parser is a long standing one in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben <billu...@gmail.com <javascript:>>
wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match
token exists.

The syntax for lucene under "Fields" section shows a double quote is the
correct character for this.
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html The term
query is found by query_string when using single quotes, but that doesn't
match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query phrase
"exampleof bug" will be parsed into a query for the tokens "exampleof" and
"bug" that are adjacent to each other. The issue is that you do not have
two such tokens, instead you have a token with the value "exampleof bug",
which is a single token with a space in it. According to Lucene, they are
not the same thing. You would need to create an analyzer that would create
the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have
a fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=
true" -d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"example","id":"2169167","
version_type":"internal","_timestamp":0} }
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field
even though the path is just_name. i also tried escaping space per
documentation and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to
find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/634a0d2c-4e53-4e83-8337-039c33c191f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(ben) #8

Any idea why single quotes work?

This works but doesn't match the lucene query syntax.

curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 10:09:29 AM UTC-7, Ivan Brusic wrote:

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It will
tokenize based on whitespace and then apply the filters to each term. Your
index does not contain the token "exampleof" and your analyze API example
confirms it. The issue of the query parser is a long standing one in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben <billu...@gmail.com <javascript:>>
wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match token
exists.

The syntax for lucene under "Fields" section shows a double quote is the
correct character for this.
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html The term
query is found by query_string when using single quotes, but that doesn't
match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query phrase
"exampleof bug" will be parsed into a query for the tokens "exampleof" and
"bug" that are adjacent to each other. The issue is that you do not have
two such tokens, instead you have a token with the value "exampleof bug",
which is a single token with a space in it. According to Lucene, they are
not the same thing. You would need to create an analyzer that would create
the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have
a fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=
true" -d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"example","id":"2169167","
version_type":"internal","_timestamp":0} }
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field
even though the path is just_name. i also tried escaping space per
documentation and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to
find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #9

In general, if you are using the keyword tokenizer or non analyzed fields,
then query string queries should probably not be used. Phrase queries and
the keyword tokenizer also do not mix well.

Your OR queries succeed because "bug" is a token in your index.

--
Ivan

On Thu, Aug 21, 2014 at 10:26 AM, ben billumina2@gmail.com wrote:

Any idea why single quotes work?

This works but doesn't match the lucene query syntax.

curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 10:09:29 AM UTC-7, Ivan Brusic wrote:

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It will
tokenize based on whitespace and then apply the filters to each term. Your
index does not contain the token "exampleof" and your analyze API example
confirms it. The issue of the query parser is a long standing one in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben billu...@gmail.com wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match
token exists.

The syntax for lucene under "Fields" section shows a double quote is the
correct character for this. http://lucene.apache.org/core/2_9_4/
queryparsersyntax.html The term query is found by query_string when
using single quotes, but that doesn't match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query phrase
"exampleof bug" will be parsed into a query for the tokens "exampleof" and
"bug" that are adjacent to each other. The issue is that you do not have
two such tokens, instead you have a token with the value "exampleof bug",
which is a single token with a space in it. According to Lucene, they are
not the same thing. You would need to create an analyzer that would create
the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match lucene
query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I have
a fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=
true" -d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"example","id":"2169167","
version_type":"internal","_timestamp":0} }
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field
even though the path is just_name. i also tried escaping space per
documentation and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to
find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwy2mcpqGf221mrPti4WJmKVLUBAg3orWHJriMDSgPUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(ben) #10

In the ES documentation is talks about escape characters and space is one
of them. Seems like if you escaped the query with a "\ " it would ignore
that during the parsing.

Thanks for your help.

On Thursday, August 21, 2014 10:42:32 AM UTC-7, Ivan Brusic wrote:

In general, if you are using the keyword tokenizer or non analyzed fields,
then query string queries should probably not be used. Phrase queries and
the keyword tokenizer also do not mix well.

Your OR queries succeed because "bug" is a token in your index.

--
Ivan

On Thu, Aug 21, 2014 at 10:26 AM, ben <billu...@gmail.com <javascript:>>
wrote:

Any idea why single quotes work?

This works but doesn't match the lucene query syntax.

curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 10:09:29 AM UTC-7, Ivan Brusic wrote:

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It
will tokenize based on whitespace and then apply the filters to each term.
Your index does not contain the token "exampleof" and your analyze API
example confirms it. The issue of the query parser is a long standing one
in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben billu...@gmail.com wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match
token exists.

The syntax for lucene under "Fields" section shows a double quote is
the correct character for this. http://lucene.apache.org/core/2_9_4/
queryparsersyntax.html The term query is found by query_string when
using single quotes, but that doesn't match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query
phrase "exampleof bug" will be parsed into a query for the tokens
"exampleof" and "bug" that are adjacent to each other. The issue is that
you do not have two such tokens, instead you have a token with the
value "exampleof bug", which is a single token with a space in it.
According to Lucene, they are not the same thing. You would need to create
an analyzer that would create the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match
lucene query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I
have a fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?analyzer=my_analyzer&pretty=
true" -d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"
example","_id":"2169167","_version_type":"internal","_timestamp":0}
}
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field
even though the path is just_name. i also tried escaping space per
documentation and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable to
find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d4e0ad0-cea8-46fa-8c1a-ebacf602f0af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #11

One more thing! The match query does not go through the query parser phase.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_comparison_to_query_string_field

curl -XPOST "http://localhost:9200/example/example/_search?pretty=true" -d '
{
"query": {
"match": {
"name": ""exampleof bug""
}
}
}
'

On Thu, Aug 21, 2014 at 10:49 AM, ben billumina2@gmail.com wrote:

In the ES documentation is talks about escape characters and space is one
of them. Seems like if you escaped the query with a "\ " it would ignore
that during the parsing.

Thanks for your help.

On Thursday, August 21, 2014 10:42:32 AM UTC-7, Ivan Brusic wrote:

In general, if you are using the keyword tokenizer or non analyzed
fields, then query string queries should probably not be used. Phrase
queries and the keyword tokenizer also do not mix well.

Your OR queries succeed because "bug" is a token in your index.

--
Ivan

On Thu, Aug 21, 2014 at 10:26 AM, ben billu...@gmail.com wrote:

Any idea why single quotes work?

This works but doesn't match the lucene query syntax.

curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 10:09:29 AM UTC-7, Ivan Brusic wrote:

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It
will tokenize based on whitespace and then apply the filters to each term.
Your index does not contain the token "exampleof" and your analyze API
example confirms it. The issue of the query parser is a long standing one
in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben billu...@gmail.com wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match
token exists.

The syntax for lucene under "Fields" section shows a double quote is
the correct character for this. http://lucene.apache.org/core/2_9_4/
queryparsersyntax.html The term query is found by query_string when
using single quotes, but that doesn't match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query
phrase "exampleof bug" will be parsed into a query for the tokens
"exampleof" and "bug" that are adjacent to each other. The issue is that
you do not have two such tokens, instead you have a token with the
value "exampleof bug", which is a single token with a space in it.
According to Lucene, they are not the same thing. You would need to create
an analyzer that would create the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match
lucene query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I
have a fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string", "analyzer":
"my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?a
nalyzer=my_analyzer&pretty=true" -d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"
example","_id":"2169167","_version_type":"internal","_timestamp":0}
}
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field
even though the path is just_name. i also tried escaping space per
documentation and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable
to find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce
5-4272-8b80-1f148e96f8ae%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d4e0ad0-cea8-46fa-8c1a-ebacf602f0af%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5d4e0ad0-cea8-46fa-8c1a-ebacf602f0af%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAjYAVZLYZcD_-1G7VqU2vL2zt-XqyjPYvpeJGvEX7WZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(ben) #12

Interesting! If that supports straight lucene syntax then this is golden.
Our system must support full lucene syntax along with "fuzzy" searches
which is why I've been using query_string.

Thanks!

On Thursday, August 21, 2014 10:55:36 AM UTC-7, Ivan Brusic wrote:

One more thing! The match query does not go through the query parser phase.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_comparison_to_query_string_field

curl -XPOST "http://localhost:9200/example/example/_search?pretty=true"
-d '
{
"query": {
"match": {
"name": ""exampleof bug""
}
}
}
'

On Thu, Aug 21, 2014 at 10:49 AM, ben <billu...@gmail.com <javascript:>>
wrote:

In the ES documentation is talks about escape characters and space is one
of them. Seems like if you escaped the query with a "\ " it would ignore
that during the parsing.

Thanks for your help.

On Thursday, August 21, 2014 10:42:32 AM UTC-7, Ivan Brusic wrote:

In general, if you are using the keyword tokenizer or non analyzed
fields, then query string queries should probably not be used. Phrase
queries and the keyword tokenizer also do not mix well.

Your OR queries succeed because "bug" is a token in your index.

--
Ivan

On Thu, Aug 21, 2014 at 10:26 AM, ben billu...@gmail.com wrote:

Any idea why single quotes work?

This works but doesn't match the lucene query syntax.

curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 10:09:29 AM UTC-7, Ivan Brusic wrote:

The query string query is a phrase query ""exampleof bug""
The term query is looking for a single token "exampleof bug"

The query parser will not use your tokenizer to parse the phrase. It
will tokenize based on whitespace and then apply the filters to each term.
Your index does not contain the token "exampleof" and your analyze API
example confirms it. The issue of the query parser is a long standing one
in Lucene.

--
Ivan

On Thu, Aug 21, 2014 at 9:56 AM, ben billu...@gmail.com wrote:

But the query is this...

name:"exampleof bug"

This should find an exact match in the field name. That exact match
token exists.

The syntax for lucene under "Fields" section shows a double quote is
the correct character for this. http://lucene.apache.org/core/2_9_4/
queryparsersyntax.html The term query is found by query_string when
using single quotes, but that doesn't match lucene query documentation.

Thanks!

On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:

I suspect the issue is the way the query parser works. The query
phrase "exampleof bug" will be parsed into a query for the tokens
"exampleof" and "bug" that are adjacent to each other. The issue is that
you do not have two such tokens, instead you have a token with the
value "exampleof bug", which is a single token with a space in it.
According to Lucene, they are not the same thing. You would need to create
an analyzer that would create the tokens "exampleof" and "bug".

Cheers,

Ivan

On Thu, Aug 21, 2014 at 8:47 AM, ben billu...@gmail.com wrote:

Also meant to include this in the script.

echo "query_string query using singe quote which does not match
lucene query documentation"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:''exampleof bug''"
}
}
}
'

On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:

I have attached a short bash script to recreate the situation. I
have a fairly simple custom analyzer that I want to break on camel case so
lowercase is last. Using the _analyze endpoint I can see the token I am
searching for is generated by the analyzer, however searching for it with
query_string yields a different result that a term query. I put comments in
the script to explain in more detail.

Thanks for any help!

#!/bin/sh

url="http://localhost:9200"
defaultIndex="example"

echo "Start over...this will fail the first time the script is run
since the index will not exist"
curl -XDELETE "$url/$defaultIndex?refresh=true"

echo "Create index with custom analyzer"
curl -XPUT "$url/$defaultIndex" -d '{
"index": {
"analysis": {
"filter": {
"my_worddelim": {
"type": "word_delimiter",
"split_on_case_change": true,
"preserve_original": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "keyword",
"filter": [ "stop", "my_worddelim", "lowercase" ]
}
}
}
}
}'

echo

curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
"example" : {
"properties" : {
"name": {
"type" : "multi_field",
"path": "just_name",
"fields" : {
"name": { "type": "string", "analyzer":
"my_analyzer" },
"sample" : {"type" : "string", "index" : "not_analyzed" },
"sample_name" : {"type" : "string",
"analyzer": "my_analyzer" }
}
}
}
}
}'

echo "Shows the lowercase token exampleofbug is generated"
curl -XGET "$url/$defaultIndex/_analyze?a
nalyzer=my_analyzer&pretty=true" -d 'ExampleOf Bug'

echo "Post the document (haven't tried with non-bulk request)"
curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
{ "index" : {"_index":"example","_type":"
example","_id":"2169167","_version_type":"internal","_timestamp":0}
}
{"name":"ExampleOf Bug"}
'

echo

echo "query_string query is unable to find token in the name field
even though the path is just_name. i also tried escaping space per
documentation and it fails to parse"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "name:"exampleof bug""
}
}
}
'

echo

echo "Can successfully find token in name field that I was unable
to find with query_string"
curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
{
"query": {
"term": {
"name": "exampleof bug"
}
}
}
'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce
5-4272-8b80-1f148e96f8ae%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d4e0ad0-cea8-46fa-8c1a-ebacf602f0af%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5d4e0ad0-cea8-46fa-8c1a-ebacf602f0af%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f4b52de7-65b1-467c-942d-b4d1858c88cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #13