ElasticSearch Auto Completion and Term/Phrase Suggesters


(Ramdev Wudali) #1

Hi All:
I am looking into providing a phrase suggester/ inline completion
feature for search. I have been looking at the features as provided in
Elasticsearch. However, I have not been able to understand the following :

  1. Term/Phrase suggesters - Are these features search time features and
    How does mapping of content play into this feature ?
  2. Auto Completion - I think I a working example :

(sample mapping provided :

"Titles": {
    "properties": {
        "DOC_ID": {
            "type":"string"
        },
        "TITLE": {
            "type": "string"
        },
        "TITLE_suggest" : {
            "type" : "completion",
            "index_analyzer" : "simple",
            "search_analyzer" : "simple",
            "payloads" : true
        }
    }
}

)

For content indexed , I am able to execute requests like the following : :

curl -XPOST http:/host:920/index/_suggest -d
'{"Titles":{"text":"on","completion":{"field":"TITLE_suggest"}}}'

and I get back suggestions. But only those that start with the phrase "on".

{"_shards":{"total":5,"successful":5,"failed":0},"Titles":[{"text":"on","offset":0,"length":2,"options":[{"text":"'Once
Upon A Time...' tells a torrid tale of forbidden love and eventual
betrayal","score":1.0},{"text":"On Drill Program
Progress<DRK.V>","score":1.0},{"text":"On Foreign
Shores","score":1.0},{"text":"On golden pond: Back to back Olympic
champs","score":1.0},{"text":"Once Upon A Time In Mumbaai Dobaara fails to
impress critics","score":1.0}]}]}

However, What I want to get is, where the phrase" on" appears in the title,
as part of a term or a phrase. How would I tweak my request or making to
be able to get that ?

Thanks much

Ramdev

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7edd9a45-3fa5-4dae-bfb0-19b0df700e62%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Sloan Ahrens) #2

It sounds like completion suggest may not be exactly what you want. One way you could solve your problem is to use a shingle (ngram) analyzer in your mapping, combined with a prefix query. A description of the basic technique can be found here: http://developer.rackspace.com/blog/qbox.html#.UZ0yEWRATQ4

Qbox has a demo (here: http://be6c2e3260c3e2af000.qbox.io/_plugin/demo/tablemap/index.html) that uses this technique for auto-complete. Queries are matched using a prefix query, against a field that has been analyzed using a shingle filter.

So, adapted to the information you've given, a solution might look like this:

Create the index with the following analyzer:

curl -XPUT http://[endpoint]/[index_name]
{
"settings": {
"analysis": {
"filter": {
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
}
},
"analyzer": {
"shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle_filter"
]
}
}
}
}
}

and then apply this mapping:

curl -XPUT http://[endpoint]/[index_name]/titles/_mapping
{
"titles": {
"properties": {
"DOC_ID": {
"type":"string"
},
"TITLE": {
"type": "string",
"index_analyzer": "shingle_analyzer"
}
}
}
}

once you have some documents indexed you can query with "on" using the following query structure

curl -XPOST http://[endpoint]/_search
{
"query": {
"prefix": {
"TITLE": "on"
}
}
}

This should give you the behavior you are looking for.


(system) #3