Autcomplete implementation


(Abdelhamid Cherif) #1

Hello,

I want to implement autocomplete with elasticsearch and I'm unable to do it.
I want to have something like the following :
My indexed strings are for e.g :

"Developpeur Java"
"Developpeur C#"
"Je suis Developpeur"
"Je suis écrivan"
"Il est developpeur C++"

For input "developpeur", I want as output :
"developpeur Java"
"developpeur C#"
"developpeur C++"

for input "suis", I want as ouput :
"suis developpeur"
"suis écrivan"

I tried to acheive this using completion suggester :

here's the elasticsearch I'm using :
"number": "6.2.2",
"build_hash": "10b1edd",
"build_date": "2018-02-16T19:01:30.685723Z",
"build_snapshot": false,
"lucene_version": "7.2.1",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"

the mapping :
{
"settings": {
"number_of_shards": "1",
"analysis": {
"filter": {
"prefix_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"ngram_filter": {
"type": "nGram",
"min_gram": "3",
"max_gram": "3"
},
"synonym_filter": {
"type": "synonym",
"synonyms": [
"hackwillbereplacedatindexcreation,hackwillbereplacedatindexcreation"
]
},
"french_stop": {
"type": "stop",
"stopwords": "french"
}
},
"analyzer": {
"word": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"french_stop"
],
"char_filter": []
},
"prefix": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"synonym_filter",
"prefix_filter"
],
"char_filter": []
},
"ngram_with_synonyms": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"synonym_filter",
"ngram_filter"
],
"char_filter": []
},
"ngram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"ngram_filter"
],
"char_filter": []
}
}
}
},
"mappings": {
"training": {
"properties": {
"id": {
"type": "text",
"index": false
},
"label": {
"type": "text",
"index_options": "docs",
"copy_to": "full_label",
"analyzer": "word",
"fields": {
"prefix": {
"type": "text",
"index_options": "docs",
"analyzer": "prefix",
"search_analyzer": "word"
},
"ngram": {
"type": "text",
"index_options": "docs",
"analyzer": "ngram_with_synonyms",
"search_analyzer": "ngram"
}
}
},
"labelSuggest": {
"type": "completion",
"analyzer": "word"
},
}
}
}
}

Then when I create the index with my data I do this (this is the body of the put call made to the ES api, I'm using pyhon for this):

            body = {
            "label": r["title"],
            "labelSuggest": {
                "input": r["title"].ngrams()
            },
            "weight": 1.
        }

r["title"].ngrams() gets all the ngrams of the title. e.g :
"Development research biotech" would give : "Development", "research", "biotech", "Development research", "research biotech" and "Development research biotech"

then to call the suggseter, I do :

   POST  http://localhost:9200/training/_search ? pretty{
"suggest": {
	"labelSuggest": {
		"text": "developpeur",
		"completion": {
			"field": "labelSuggest",
            "skip_duplicates": true

		}
	}
}

}

The result is :
{
"text": "développement",
"_index": "activity_20180518092449",
"_type": "activity",
"_id": "2031ce8b-6589-3270-afdf-7901aa21efa1",
"_score": 1,
"_source": {
"id": "2031ce8b-6589-3270-afdf-7901aa21efa1",
"name": "development research biotech",
"labelSuggest": [
"development",
"research",
"biotech",
"development research",
"research biotech",
"development research biotech"
]
}
But I want something that gives me : "development", "development research" and "development research biotech" (supposing we only have that document as input)

I hope my question is clear. I searched a lot about it in vain.

Thanks in advance


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.