Understanding substring matching


(JRoesner) #1

Hello Community,

we are pretty new to search using Lucene and voted ES over SOLR,
followed by confusion regarding our simple usecase. Hopefully someone
here can enlighten us.

What we'd like to achive is a location search based on geocoordinates
combined with a full text pattern search at the same time. Here's what
we did so far:

curl -XPOST ftl:9200/ft3
curl -XPUT ftl:9200/ft3/_mapping -d '{"pin":{"properties":{"location":
{"type":"geo_point"}},"index_analyzer":"whitespace"}}'

We'd like to analyze every attributes of our objects, the whitespace
analyzer seems to be the right one here and is hopefully correctly
configured that way.

curl -XPUT ftl:9200/ft3/pin/1 -d '{"pin":{"location":{"lat":
52.520791,"lon":13.4095},"attributes":{"name":"Restaurant
Aussichtsplattform Berliner Fernsehturm","address":"Gontardstr.
7","zipcode":"10178","city":"Berlin","country":"Germany"}}}'
curl -XPUT ftl:9200/ft3/pin/2 -d '{"pin":{"location":{"lat":
52.522671,"lon":13.40199},"attributes":{"name":"Hackescher
Markt","address":"Am Zwirngraben
1","zipcode":"10178","city":"Berlin","country":"Germany"}}}'

Two locations ... for search.

curl -XPOST ftl:9200/ft3/pin/_search -d '{"query":{"match_all":{}}}'

And yea, the come back. Even distance search works fine. Now next
step.

curl -XPOST ftl:9200/ft3/pin/_search -d '{"query":{"filtered":{"query":
{"flt":{"like_text":"Zwirngraben","min_similarity":0.2}},"filter":
{"geo_distance":{"distance":"1km","pin.location":{"lat":
52.517474,"lon":13.407612}}}}}}'

Still works. But if we give a substring only ...

'{"query":{"filtered":{"query":{"flt":
{"like_text":"Zwirn","min_similarity":0.2}},"filter":{"geo_distance":
{"distance":"1km","pin.location":{"lat":52.517474,"lon":
13.407612}}}}}}'

... no results are coming back.

AFAIK it's a problem of understanding how analyzers, tokenizers work.
Question is: is it possible to easily achive the goal of fulltext
pattern matching of all attributes combined with distance search?

Thx in advance.
Jan


(JRoesner) #2

First iteration solution seems to be something like that:

curl -XPOST ftl:9200/ft3 -d '{"index":{"number_of_shards":5,"analysis":
{"filter":{"mynGram":{"type":"nGram","min_gram":2,"max_gram":
10}},"analyzer":{"a1":{"type":"custom","tokenizer":"standard","filter":
["lowercase","mynGram"]}}}}}}'
curl -XPUT ftl:9200/ft3/_mapping -d '{"pin":
{"index_analyzer":"a1","search_analyzer":"standard","properties":
{"location":{"type":"geo_point"}}}}'

But I am quite unsure if that is the correct solution.

Any suggestions.


(system) #3