How to make a specific edgeNGram query


(Neil) #1

I have the following index

curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},

"analysis": {
    "analyzer": {
        "containsText" : {
           "tokenizer": "whitespace",
           "filter": ["asciifolding", "lowercase", "autocomplete"]
        }
     },
    "filter": {
        "autocomplete": {"type": "edgeNGram", "min_gram": "1",

"max_gram": "100", "side": "front"}
}
},

"mappings": {
    "program" : {
        "properties" : {
            "title": {"type" : "string", "store" : "yes",

"index" : "analyzed" , "term_vector" : "with_positions_offsets",
"analyzer" : "containsText"}
}
}
}
}'

curl -XPUT http://localhost:9200/test/program/1652094 -d '{
"title": "James Franco"
}'

So my question is why does this return a result?

curl -XGET http://localhost:9200/test/program/_search -d '{
"query" : {
"queryString" : {
"default_field" : "title",
"query" : "james i",
"analyzer" : "containsText"
}
}
}'

I don't want it to since it doesn't match on the last name. What do I
need to change to get my expected result?
I would like to have my query string analyzed the same way as my
indexed string.

If it helps -- I was able to do this is solr as such:

  <fieldType name="containsText" class="solr.TextField"

positionIncrementGap="100">












Thanks,

Neil


(Shay Banon) #2

Your index creation is wrong, the analysis part needs to come under the
"settings" element. I created a gist that shows it:
https://gist.github.com/1268256. You might not have seen a proper error
response from the create index request because of this bug (
https://github.com/elasticsearch/elasticsearch/issues/1359).

Also, you ask why you get a result for the query you execute. Both with the
standard analyzer and the ngram based filter one, you will get a result for
it.

On Thu, Oct 6, 2011 at 6:59 AM, Neil neilmatthewlott@gmail.com wrote:

I have the following index

curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},

"analysis": {
"analyzer": {
"containsText" : {
"tokenizer": "whitespace",
"filter": ["asciifolding", "lowercase", "autocomplete"]
}
},
"filter": {
"autocomplete": {"type": "edgeNGram", "min_gram": "1",
"max_gram": "100", "side": "front"}
}
},

"mappings": {
"program" : {
"properties" : {
"title": {"type" : "string", "store" : "yes",
"index" : "analyzed" , "term_vector" : "with_positions_offsets",
"analyzer" : "containsText"}
}
}
}
}'

curl -XPUT http://localhost:9200/test/program/1652094 -d '{
"title": "James Franco"
}'

So my question is why does this return a result?

curl -XGET http://localhost:9200/test/program/_search -d '{
"query" : {
"queryString" : {
"default_field" : "title",
"query" : "james i",
"analyzer" : "containsText"
}
}
}'

I don't want it to since it doesn't match on the last name. What do I
need to change to get my expected result?
I would like to have my query string analyzed the same way as my
indexed string.

If it helps -- I was able to do this is solr as such:

 <fieldType name="containsText" class="solr.TextField"

positionIncrementGap="100">












Thanks,

Neil


(Neil) #3

I'm sorry about goofing up the analysis -- however even when the
analyzer is correct -- verified using the mapping:

curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"containsText" : {
"tokenizer": "whitespace",
"filter": ["asciifolding", "lowercase",
"autocomplete"]
}
},
"filter": {
"autocomplete": {"type": "edgeNGram", "min_gram":
"1",
"max_gram": "100", "side": "front"}
}
}
},
"mappings": {
"program" : {
"properties" : {
"title": {"type" : "string", "store" : "yes",
"index" : "analyzed" , "term_vector" : "with_positions_offsets",
"analyzer" : "containsText"}
}
}
}
}'

curl -XGET 'http://localhost:9200/test/program/_mapping'
{"program":{"properties":{"title":
{"store":"yes","analyzer":"containsText","term_vector":"with_positions_offsets","type":"string"}}}}

curl -XPUT http://localhost:9200/test/program/1652094 -d '{
"title": "James Franco"
}'

And I run the following query -- it returns a result.

curl -XGET http://localhost:9200/test/program/_search -d '{
"query" : {
"queryString" : {
"default_field" : "title",
"query" : "james i",
"analyzer" : "containsText"
}
}
}'


(Shay Banon) #4

Why shouldn't it return a result? Based on your analyzer definition, it will
return a result. Use the analyze API to see what "James Franco" and "james
i" get broken to.

On Thu, Oct 6, 2011 at 9:04 PM, Neil neilmatthewlott@gmail.com wrote:

I'm sorry about goofing up the analysis -- however even when the
analyzer is correct -- verified using the mapping:

curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"containsText" : {
"tokenizer": "whitespace",
"filter": ["asciifolding", "lowercase",
"autocomplete"]
}
},
"filter": {
"autocomplete": {"type": "edgeNGram", "min_gram":
"1",
"max_gram": "100", "side": "front"}
}
}
},
"mappings": {
"program" : {
"properties" : {
"title": {"type" : "string", "store" : "yes",
"index" : "analyzed" , "term_vector" : "with_positions_offsets",
"analyzer" : "containsText"}
}
}
}
}'

curl -XGET 'http://localhost:9200/test/program/_mapping'
{"program":{"properties":{"title":

{"store":"yes","analyzer":"containsText","term_vector":"with_positions_offsets","type":"string"}}}}

curl -XPUT http://localhost:9200/test/program/1652094 -d '{
"title": "James Franco"
}'

And I run the following query -- it returns a result.

curl -XGET http://localhost:9200/test/program/_search -d '{
"query" : {
"queryString" : {
"default_field" : "title",
"query" : "james i",
"analyzer" : "containsText"
}
}
}'


(Neil) #5

Thanks I did.

What is happening is that --

"james i" was being broken up into a Boolean Query with two clauses
where the occur is set to should:

QueryStringQueryParser:184 value of Query object
(title:j title:ja title:jam title:jame title:james) title:c

This means that if one matched then it would return successful.

What I really want to have happen is to have a PhraseQuery which only
happens if you surround the query string in quotes.

curl -XGET http://localhost:9200/test/program/_search -d '{
"query" : {
"queryString" : {
"default_field" : "title",
"query" : ""james i"",
"analyzer" : "containsText"
}
}


(system) #6