Hi,
I'm trying to use elasticsearch to implement a autocompleter for my
college project just like some travel websites use it for implementing
their autocompleter but facing some issues in implementation.
I'm using following mapping for my case:-
curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" :
["lowercase","asciifolding","suggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
},
"search": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
}
}
}
}
curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'
and then in my java code, i'm forming query like:-
String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ? 1
: doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")
.defaultOperator(Operator.AND)).script(script);
Some explanation of fields:
text: contains statements like "things to do in goa"
ad: address
st: state
cn: city name
co: country
Now, if I type "things to do in" in my autocompleter box, i'm getting
these results:
things to do in rann
things to do in bulandshahr
things to do in gondai
things to do in rewa
things to do in goa
But I want "things to do in goa" on top.
Earlier, I thought idf in Elasticsearch is creating problem, So I override
the Default similarity and created CustomSimilarity which sets idf to 1.
But it's still not solving not my problem. Instead it started giving me
results like this:
things to do in toronto on top.
I think may be I'm doing something wrong in my index_analyzer and
search_analyzer. I tried other tokenizers and token filters in different
order but not able to get any solution.
I could have implemented simple prefix autocompleter but that way it
doesn't make any sense to use Elasticsearch since searching for terms in
between sentences gives user more flexibility. Also, in travel industry a
person can search for a particular thing in different manners. like instead
of searching for exactly "things to do in" he/she can also wrote "what are
the best things to do in" or "what are things to do" and many other
possibilities. That way a prefix autocompleter won't work effectively.
That's why I tried implementing autocompleter using ElasticSearch but I'm
not doing it right way.
For better results, I also introduced a popularity factor which keeps
updating on every user click so that its score keeps increasing in every
search using custom score query. Also, giving text field 30% weightage and
lesser weightage to other fields. But something is not going right.
I guess I'm not able to use ElasticSearch capabilities properly for my use
case. Can you please help me with this ?
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39ce69bc-e2b8-4c27-9240-d6dbcc5a0656%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.