How to use ElasticSearch to implement Autocompleter?


(dark_shadow) #1

Hi,

I'm trying to use elasticsearch to implement a autocompleter for my
college project just like some travel websites use it for implementing
their autocompleter but facing some issues in implementation.

I'm using following mapping for my case:-

curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" :
["lowercase","asciifolding","suggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
},
"search": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
}
}
}
}

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'

and then in my java code, i'm forming query like:-

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ? 1
: doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);

Some explanation of fields:
text: contains statements like "things to do in goa"
ad: address
st: state
cn: city name
co: country

Now, if I type "things to do in" in my autocompleter box, i'm getting
these results:

things to do in rann
things to do in bulandshahr
things to do in gondai
things to do in rewa
things to do in goa

But I want "things to do in goa" on top.

Earlier, I thought idf in Elasticsearch is creating problem, So I override
the Default similarity and created CustomSimilarity which sets idf to 1.
But it's still not solving not my problem. Instead it started giving me
results like this:

things to do in toronto on top.

I think may be I'm doing something wrong in my index_analyzer and
search_analyzer. I tried other tokenizers and token filters in different
order but not able to get any solution.

I could have implemented simple prefix autocompleter but that way it
doesn't make any sense to use Elasticsearch since searching for terms in
between sentences gives user more flexibility. Also, in travel industry a
person can search for a particular thing in different manners. like instead
of searching for exactly "things to do in" he/she can also wrote "what are
the best things to do in" or "what are things to do" and many other
possibilities. That way a prefix autocompleter won't work effectively.
That's why I tried implementing autocompleter using ElasticSearch but I'm
not doing it right way.

For better results, I also introduced a popularity factor which keeps
updating on every user click so that its score keeps increasing in every
search using custom score query. Also, giving text field 30% weightage and
lesser weightage to other fields. But something is not going right.

I guess I'm not able to use ElasticSearch capabilities properly for my use
case. Can you please help me with this ?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39ce69bc-e2b8-4c27-9240-d6dbcc5a0656%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(joa) #2

You should look at the the completion suggester added in 0.90.30 instead of
using edgengrams.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

On Friday, January 17, 2014 5:04:14 PM UTC+1, coder wrote:

Hi,

I'm trying to use elasticsearch to implement a autocompleter for my
college project just like some travel websites use it for implementing
their autocompleter but facing some issues in implementation.

I'm using following mapping for my case:-

curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" :
["lowercase","asciifolding","suggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
},
"search": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
}
}
}
}

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'

and then in my java code, i'm forming query like:-

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ?
1 : doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);

Some explanation of fields:
text: contains statements like "things to do in goa"
ad: address
st: state
cn: city name
co: country

Now, if I type "things to do in" in my autocompleter box, i'm getting
these results:

things to do in rann
things to do in bulandshahr
things to do in gondai
things to do in rewa
things to do in goa

But I want "things to do in goa" on top.

Earlier, I thought idf in Elasticsearch is creating problem, So I override
the Default similarity and created CustomSimilarity which sets idf to 1.
But it's still not solving not my problem. Instead it started giving me
results like this:

things to do in toronto on top.

I think may be I'm doing something wrong in my index_analyzer and
search_analyzer. I tried other tokenizers and token filters in different
order but not able to get any solution.

I could have implemented simple prefix autocompleter but that way it
doesn't make any sense to use Elasticsearch since searching for terms in
between sentences gives user more flexibility. Also, in travel industry a
person can search for a particular thing in different manners. like instead
of searching for exactly "things to do in" he/she can also wrote "what are
the best things to do in" or "what are things to do" and many other
possibilities. That way a prefix autocompleter won't work effectively.
That's why I tried implementing autocompleter using ElasticSearch but I'm
not doing it right way.

For better results, I also introduced a popularity factor which keeps
updating on every user click so that its score keeps increasing in every
search using custom score query. Also, giving text field 30% weightage and
lesser weightage to other fields. But something is not going right.

I guess I'm not able to use ElasticSearch capabilities properly for my use
case. Can you please help me with this ?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3fb42188-c58a-4ab0-bcb8-48c1b075eb71%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dark_shadow) #3

But the problem still remains. The completion suggester will give you
results only if there is an exact match but as previously mentioned there
can be many types of queries which can be done by a user at travel website.

Thanks

On Fri, Jan 17, 2014 at 9:41 PM, joa joafeldmann@gmail.com wrote:

You should look at the the completion suggester added in 0.90.30 instead
of using edgengrams.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
http://www.elasticsearch.org/blog/you-complete-me/

On Friday, January 17, 2014 5:04:14 PM UTC+1, coder wrote:

Hi,

I'm trying to use elasticsearch to implement a autocompleter for my
college project just like some travel websites use it for implementing
their autocompleter but facing some issues in implementation.

I'm using following mapping for my case:-

curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type": "org.elasticsearch.index.similarity.
CustomSimilarityProvider"
},
"search": {
"type": "org.elasticsearch.index.similarity.
CustomSimilarityProvider"
}
}
}
}

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'

and then in my java code, i'm forming query like:-

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ?
1 : doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")
.defaultOperator(Operator.AND)
).script(script);

Some explanation of fields:
text: contains statements like "things to do in goa"
ad: address
st: state
cn: city name
co: country

Now, if I type "things to do in" in my autocompleter box, i'm getting
these results:

things to do in rann
things to do in bulandshahr
things to do in gondai
things to do in rewa
things to do in goa

But I want "things to do in goa" on top.

Earlier, I thought idf in Elasticsearch is creating problem, So I
override the Default similarity and created CustomSimilarity which sets idf
to 1. But it's still not solving not my problem. Instead it started giving
me results like this:

things to do in toronto on top.

I think may be I'm doing something wrong in my index_analyzer and
search_analyzer. I tried other tokenizers and token filters in different
order but not able to get any solution.

I could have implemented simple prefix autocompleter but that way it
doesn't make any sense to use Elasticsearch since searching for terms in
between sentences gives user more flexibility. Also, in travel industry a
person can search for a particular thing in different manners. like instead
of searching for exactly "things to do in" he/she can also wrote "what are
the best things to do in" or "what are things to do" and many other
possibilities. That way a prefix autocompleter won't work effectively.
That's why I tried implementing autocompleter using ElasticSearch but I'm
not doing it right way.

For better results, I also introduced a popularity factor which keeps
updating on every user click so that its score keeps increasing in every
search using custom score query. Also, giving text field 30% weightage and
lesser weightage to other fields. But something is not going right.

I guess I'm not able to use ElasticSearch capabilities properly for my
use case. Can you please help me with this ?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3fb42188-c58a-4ab0-bcb8-48c1b075eb71%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAVTvp5LwHRzvMgL2iDKLK4m002oCic%2BZZ4%2B4VoG1HPzRaOeog%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(joa) #4

You can index a term in mutliple ways with the suggestion completer. See (
http://www.elasticsearch.org/blog/you-complete-me/http://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fblog%2Fyou-complete-me%2F&sa=D&sntz=1&usg=AFQjCNE7l1bQE4K3E-uZpWW1Las-1VRrQA),
they are showing hotel bookings as use case!
curl -X PUT localhost:9200/hotels/hotel/1 -d '
{
"name" : "Mercure Hotel Munich",
"city" : "Munich",
"name_suggest" : {
"input" : [
"Mercure Hotel Munich",
"Mercure Munich",
"ADD OTHER WORD COMBINATIONS HERE..."
]
}
}'

If you mean by exact matches you also want fuzzy suggests (e.g. suggest
even with misspelling) you can set the the fuzzy param:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "n",
"completion" : {
"field" : "suggest",
"fuzzy" : {
"edit_distance" : 2
}
}
}
}'

On Friday, January 17, 2014 5:20:36 PM UTC+1, coder wrote:

But the problem still remains. The completion suggester will give you
results only if there is an exact match but as previously mentioned there
can be many types of queries which can be done by a user at travel website.

Thanks

On Fri, Jan 17, 2014 at 9:41 PM, joa <joafe...@gmail.com <javascript:>>wrote:

You should look at the the completion suggester added in 0.90.30 instead
of using edgengrams.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
http://www.elasticsearch.org/blog/you-complete-me/

On Friday, January 17, 2014 5:04:14 PM UTC+1, coder wrote:

Hi,

I'm trying to use elasticsearch to implement a autocompleter for my
college project just like some travel websites use it for implementing
their autocompleter but facing some issues in implementation.

I'm using following mapping for my case:-

curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type": "org.elasticsearch.index.
similarity.CustomSimilarityProvider"
},
"search": {
"type": "org.elasticsearch.index.
similarity.CustomSimilarityProvider"
}
}
}
}

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'

and then in my java code, i'm forming query like:-

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0
? 1 : doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);

Some explanation of fields:
text: contains statements like "things to do in goa"
ad: address
st: state
cn: city name
co: country

Now, if I type "things to do in" in my autocompleter box, i'm getting
these results:

things to do in rann
things to do in bulandshahr
things to do in gondai
things to do in rewa
things to do in goa

But I want "things to do in goa" on top.

Earlier, I thought idf in Elasticsearch is creating problem, So I
override the Default similarity and created CustomSimilarity which sets idf
to 1. But it's still not solving not my problem. Instead it started giving
me results like this:

things to do in toronto on top.

I think may be I'm doing something wrong in my index_analyzer and
search_analyzer. I tried other tokenizers and token filters in different
order but not able to get any solution.

I could have implemented simple prefix autocompleter but that way it
doesn't make any sense to use Elasticsearch since searching for terms in
between sentences gives user more flexibility. Also, in travel industry a
person can search for a particular thing in different manners. like instead
of searching for exactly "things to do in" he/she can also wrote "what are
the best things to do in" or "what are things to do" and many other
possibilities. That way a prefix autocompleter won't work effectively.
That's why I tried implementing autocompleter using ElasticSearch but I'm
not doing it right way.

For better results, I also introduced a popularity factor which keeps
updating on every user click so that its score keeps increasing in every
search using custom score query. Also, giving text field 30% weightage and
lesser weightage to other fields. But something is not going right.

I guess I'm not able to use ElasticSearch capabilities properly for my
use case. Can you please help me with this ?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3fb42188-c58a-4ab0-bcb8-48c1b075eb71%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d060acb6-eb00-4a35-b707-7d626844f220%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dark_shadow) #5

yeah...that's what I'm saying. I'll have to give all the possible
combinations in input field. Fuzzy logic is fine but giving all possible
combinations is not practically possible.

On Fri, Jan 17, 2014 at 10:04 PM, joa joafeldmann@gmail.com wrote:

You can index a term in mutliple ways with the suggestion completer. See (
http://www.elasticsearch.org/blog/you-complete-me/http://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fblog%2Fyou-complete-me%2F&sa=D&sntz=1&usg=AFQjCNE7l1bQE4K3E-uZpWW1Las-1VRrQA),
they are showing hotel bookings as use case!
curl -X PUT localhost:9200/hotels/hotel/1 -d '
{
"name" : "Mercure Hotel Munich",
"city" : "Munich",
"name_suggest" : {
"input" : [
"Mercure Hotel Munich",
"Mercure Munich",
"ADD OTHER WORD COMBINATIONS HERE..."
]
}
}'

If you mean by exact matches you also want fuzzy suggests (e.g. suggest
even with misspelling) you can set the the fuzzy param:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "n",
"completion" : {
"field" : "suggest",
"fuzzy" : {
"edit_distance" : 2
}
}
}
}'

On Friday, January 17, 2014 5:20:36 PM UTC+1, coder wrote:

But the problem still remains. The completion suggester will give you
results only if there is an exact match but as previously mentioned there
can be many types of queries which can be done by a user at travel website.

Thanks

On Fri, Jan 17, 2014 at 9:41 PM, joa joafe...@gmail.com wrote:

You should look at the the completion suggester added in 0.90.30 instead
of using edgengrams.
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/search-suggesters-completion.html
http://www.elasticsearch.org/blog/you-complete-me/

On Friday, January 17, 2014 5:04:14 PM UTC+1, coder wrote:

Hi,

I'm trying to use elasticsearch to implement a autocompleter for my
college project just like some travel websites use it for implementing
their autocompleter but facing some issues in implementation.

I'm using following mapping for my case:-

curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","s
uggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type": "org.elasticsearch.index.simil
arity.CustomSimilarityProvider"
},
"search": {
"type": "org.elasticsearch.index.simil
arity.CustomSimilarityProvider"
}
}
}
}

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'

and then in my java code, i'm forming query like:-

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0
? 1 : doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(quer
y)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);

Some explanation of fields:
text: contains statements like "things to do in goa"
ad: address
st: state
cn: city name
co: country

Now, if I type "things to do in" in my autocompleter box, i'm getting
these results:

things to do in rann
things to do in bulandshahr
things to do in gondai
things to do in rewa
things to do in goa

But I want "things to do in goa" on top.

Earlier, I thought idf in Elasticsearch is creating problem, So I
override the Default similarity and created CustomSimilarity which sets idf
to 1. But it's still not solving not my problem. Instead it started giving
me results like this:

things to do in toronto on top.

I think may be I'm doing something wrong in my index_analyzer and
search_analyzer. I tried other tokenizers and token filters in different
order but not able to get any solution.

I could have implemented simple prefix autocompleter but that way it
doesn't make any sense to use Elasticsearch since searching for terms in
between sentences gives user more flexibility. Also, in travel industry a
person can search for a particular thing in different manners. like instead
of searching for exactly "things to do in" he/she can also wrote "what are
the best things to do in" or "what are things to do" and many other
possibilities. That way a prefix autocompleter won't work effectively.
That's why I tried implementing autocompleter using ElasticSearch but I'm
not doing it right way.

For better results, I also introduced a popularity factor which keeps
updating on every user click so that its score keeps increasing in every
search using custom score query. Also, giving text field 30% weightage and
lesser weightage to other fields. But something is not going right.

I guess I'm not able to use ElasticSearch capabilities properly for my
use case. Can you please help me with this ?

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3fb42188-c58a-4ab0-bcb8-48c1b075eb71%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d060acb6-eb00-4a35-b707-7d626844f220%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAVTvp6qKA8mZ51tsFHXbVHQPTeQrZcd8r8NYERQigvMJZvoFA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6