Lucene based query string, analyzers and wildcard usage


(pulkitsinghal) #1

Hello,

My question might end up being a flaw in my understanding of Lucene
but please humor me and let me know what is going on here.

I have the following data indexed:
{"author":"PULKIT SINGHAL"}

But when I try any of the following searches, it does not yield
results:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT'
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT*'

The only time I get a result is with:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT
%20SINGHAL'

Can anyone help me understand what is going on here?

I checked how the data is being analyzed by running:
tiklup-mac:~ pulkitsinghal$ curl -XGET 'http://localhost:9200/bbyopen/
_analyze?pretty=true' -d 'PULKIT SINGHAL'

And it is understandably being divided into two terms - "pulkit" and
"singhal"
{
"tokens" : [ {
"token" : "pulkit",
"start_offset" : 0,
"end_offset" : 6,
"type" : "",
"position" : 1
}, {
"token" : "singhal",
"start_offset" : 7,
"end_offset" : 14,
"type" : "",
"position" : 2
} ]

Then why wouldn't this query match:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT'
It has one of the terms from the analyzed results in it.

Also what about the query with the wildcard:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT*'
why is it failing?


(Karussell) #2

did you tried it with the query_string query too?

http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html

On 18 Okt., 17:32, pulkitsinghal pulkitsing...@gmail.com wrote:

Hello,

My question might end up being a flaw in my understanding of Lucene
but please humor me and let me know what is going on here.

I have the following data indexed:
{"author":"PULKIT SINGHAL"}

But when I try any of the following searches, it does not yield
results:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT'
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT*'

The only time I get a result is with:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT
%20SINGHAL'

Can anyone help me understand what is going on here?

I checked how the data is being analyzed by running:
tiklup-mac:~ pulkitsinghal$ curl -XGET 'http://localhost:9200/bbyopen/
_analyze?pretty=true' -d 'PULKIT SINGHAL'

And it is understandably being divided into two terms - "pulkit" and
"singhal"
{
"tokens" : [ {
"token" : "pulkit",
"start_offset" : 0,
"end_offset" : 6,
"type" : "",
"position" : 1
}, {
"token" : "singhal",
"start_offset" : 7,
"end_offset" : 14,
"type" : "",
"position" : 2
} ]

Then why wouldn't this query match:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT'
It has one of the terms from the analyzed results in it.

Also what about the query with the wildcard:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT*'
why is it failing?


(Shay Banon) #3

Is the field analyzed or not? Can you gist a full recreation as explained
here: http://www.elasticsearch.org/help.

On Tue, Oct 18, 2011 at 5:32 PM, pulkitsinghal pulkitsinghal@gmail.comwrote:

Hello,

My question might end up being a flaw in my understanding of Lucene
but please humor me and let me know what is going on here.

I have the following data indexed:
{"author":"PULKIT SINGHAL"}

But when I try any of the following searches, it does not yield
results:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT'
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT*'

The only time I get a result is with:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT
%20SINGHAL'

Can anyone help me understand what is going on here?

I checked how the data is being analyzed by running:
tiklup-mac:~ pulkitsinghal$ curl -XGET 'http://localhost:9200/bbyopen/
_analyze?pretty=true' -d 'PULKIT SINGHAL'

And it is understandably being divided into two terms - "pulkit" and
"singhal"
{
"tokens" : [ {
"token" : "pulkit",
"start_offset" : 0,
"end_offset" : 6,
"type" : "",
"position" : 1
}, {
"token" : "singhal",
"start_offset" : 7,
"end_offset" : 14,
"type" : "",
"position" : 2
} ]

Then why wouldn't this query match:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT'
It has one of the terms from the analyzed results in it.

Also what about the query with the wildcard:
curl -XGET 'http://localhost:9200/bbyopen/_search?q=author:PULKIT*'
why is it failing?


(system) #4