Spaces in terms in request body make the query return no results


(Alexey Kotlyarov) #1

Given a simple index:

curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{"message": "test 

message"}'

A query for "test" returns the tweet:

curl -XPOST 'http://localhost:9200/twitter/tweet/_search?pretty' -d 

'{"query": {"prefix": {"message": "test"}}}'
curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty' -d
'{"query": {"prefix": {"message": "test"}}}'

However, if I search for "test ", there are no results:

curl -XPOST 'http://localhost:9200/twitter/tweet/_search?pretty' -d 

'{"query": {"prefix": {"message": "test "}}}'
curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty' -d
'{"query": {"prefix": {"message": "test "}}}'

However again, the same query works fine if put into the URL (using wget
and not curl because curl tries to expand the braces):

wget -O - 

'http://localhost:9200/twitter/tweet/_search?{"query":{"prefix":{"message":"test%20"}}}'

How do I make the queries with "test " work when they are supplied in the
request body?

My Elasticsearch version is 1.1.1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e17173c2-7e9d-429f-b36e-e897a695b56e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Radu Gheorghe) #2

Hi Alexey,

Your message field is analyzed by default using the Standard Analyzer:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html

This means your "test message" will become ["test", "message"].

On the other hand, the prefix query isn't analyzed. Which means "test" will
match but "test " won't, because you have no term that begins with that
string.

One solution for this is to index your message field as not_analyzed. This
will only generate the term "test message" which will match both "test" and
"test " prefixes. However, if you search for the "test" term, it won't
match because you have no such term.

You can have the best of both worlds by indexing the same text multiple
times with multiple settings:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Apr 24, 2014 at 9:42 AM, Alexey Kotlyarov koterpillar@gmail.comwrote:

Given a simple index:

curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{"message":

"test message"}'

A query for "test" returns the tweet:

curl -XPOST 'http://localhost:9200/twitter/tweet/_search?pretty' -d

'{"query": {"prefix": {"message": "test"}}}'
curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty' -d
'{"query": {"prefix": {"message": "test"}}}'

However, if I search for "test ", there are no results:

curl -XPOST 'http://localhost:9200/twitter/tweet/_search?pretty' -d

'{"query": {"prefix": {"message": "test "}}}'
curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty' -d
'{"query": {"prefix": {"message": "test "}}}'

However again, the same query works fine if put into the URL (using wget
and not curl because curl tries to expand the braces):

wget -O - '

http://localhost:9200/twitter/tweet/_search?{"query":{"prefix":{"message":"test%20"}}}http://localhost:9200/twitter/tweet/_search?{"query":{"prefix":{"message":"test%20"}}}
'

How do I make the queries with "test " work when they are supplied in the
request body?

My Elasticsearch version is 1.1.1.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e17173c2-7e9d-429f-b36e-e897a695b56e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e17173c2-7e9d-429f-b36e-e897a695b56e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1xAjxqGHTcT-7dQqEQRWoqLEXwkG5czbOUWNpasFpjdg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexey Kotlyarov) #3

Your message field is analyzed by default using the Standard Analyzer:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html

This means your "test message" will become ["test", "message"].

On the other hand, the prefix query isn't analyzed. Which means "test"
will match but "test " won't, because you have no term that begins with
that string.

One solution for this is to index your message field as not_analyzed. This
will only generate the term "test message" which will match both "test" and
"test " prefixes. However, if you search for the "test" term, it won't
match because you have no such term.

You can have the best of both worlds by indexing the same text multiple
times with multiple settings:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html

Why does the query work if given in the request query string then?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0d9a28b3-5a61-4be2-9549-d6688e1715c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Radu Gheorghe) #4

On Thu, Apr 24, 2014 at 9:57 AM, Alexey Kotlyarov koterpillar@gmail.comwrote:

Your message field is analyzed by default using the Standard Analyzer:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/analysis-standard-analyzer.html

This means your "test message" will become ["test", "message"].

On the other hand, the prefix query isn't analyzed. Which means "test"
will match but "test " won't, because you have no term that begins with
that string.

One solution for this is to index your message field as not_analyzed.
This will only generate the term "test message" which will match both
"test" and "test " prefixes. However, if you search for the "test" term, it
won't match because you have no such term.

You can have the best of both worlds by indexing the same text multiple
times with multiple settings: http://www.elasticsearch.org/guide/en/
elasticsearch/reference/current/_multi_fields.html

Why does the query work if given in the request query string then?

I think that isn't a valid query. I've just tried it and if I put "bla" in
there I still get the result back. Basically, it will run a match_all
query. It's like doing this:

curl http://localhost:9200/twitter/tweet/_search?bla

If you want to do an URI search, you need to put things in the "q"
parameter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-uri-request.html

But not that a URI search will run a query_string query, which is analyzed:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2bi33sFpjtTXB7BDQZM-bm78YP96SOrRuJnwD5oJOm8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexey Kotlyarov) #5

I think that isn't a valid query. I've just tried it and if I put "bla" in
there I still get the result back. Basically, it will run a match_all
query. It's like doing this:

curl http://localhost:9200/twitter/tweet/_search?bla

If you want to do an URI search, you need to put things in the "q"
parameter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-uri-request.html

But not that a URI search will run a query_string query, which is
analyzed:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

I have tested this; you're right, the query submitted in this way is
equivalent to a "match_all". Thank you for the explanations!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79c28ade-2830-481b-8841-be9d55cc9022%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Radu Gheorghe) #6

Great! You're welcome

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Apr 25, 2014 at 10:30 AM, Alexey Kotlyarov koterpillar@gmail.comwrote:

I think that isn't a valid query. I've just tried it and if I put "bla" in

there I still get the result back. Basically, it will run a match_all
query. It's like doing this:

curl http://localhost:9200/twitter/tweet/_search?bla

If you want to do an URI search, you need to put things in the "q"
parameter: http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/search-uri-request.html

But not that a URI search will run a query_string query, which is
analyzed: http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/query-dsl-query-string-query.html

I have tested this; you're right, the query submitted in this way is
equivalent to a "match_all". Thank you for the explanations!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/79c28ade-2830-481b-8841-be9d55cc9022%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/79c28ade-2830-481b-8841-be9d55cc9022%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1cix_eY7L2RY-1wz6Hs6g8Jm06d6d1tf9QsoBcE%3Dg8Tg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7