I recently ran a search called "toronto dental" on google and noticed it even matches for keywords inside of domain names too:
the domain is a single word, yet they are able to search for the two words (in any order) inside of the 2 domain results above.
I was just wondering if there is a similar mapping or querying approach that can accomplish this in Elasticsearch? Find keywords inside of a string with no spaces, and also find the words in any order as long as they are in the string.
Thanks and hope you guys are having a good week,
to clarify, if the search query is
is there a mapping which can find these 2 keywords inside a "domain" field with values like:
One approach is to use Ngrams - that's choosing to index arbitrary length substrings within a string rather than just the whole thing.
It doesn't identify particular words but that can be ambiguous anyway as the registered domain holders of the pen store
PenIsland.com and the actor-agent search service
WhoRepresents.com found out.
haha thanks for sharing Mark.
I now have
min_gram set to 1 and
max_gram set to 1 as well on my
However, when I go into Kibana, I noticed the following:
this matches any doc where the new_domain field has the letters D, E, N, T, A, or L ... which is almost my entire dataset
this matches the "expected" way of looking across the new_domain field values for docs where there is a sequence of D-E-N-T-A-L producing a correct amount.
The problem is when I'm using Elasticsearch DSL (python) to make a search, looking up "dental" or even "'dental'" as the query produces results like the first time in Kibana. Instead of looking for the sequence of letters, it's matching any documents with those letters.
Is there anything I can do? How can I query elasticsearch to look for the sequence in the new_domain field? Thank you!
Normally you’d use ngrams of 3 or more
But even in the above example, say with a mapping of 4, wouldn't it match "dent" with:
Or "tist" with:
Right, but if the search is “Toronto dental” then those urls that contain more of the ngrams than others will rank highest in results.
The search engine rewards matches that contain more of the search terms than others and also ranks rarer terms highly (eg the ‘oro’ and ‘nto’ from Toronto)
Thank you for clarifying Mark, I see what you're saying now. It has a "bigger picture" and multiple factors which end up creating the score, not just the domain.
What method of querying do you recommend? I have set minimum n-gram size to 3 and max to 7. Unfortunately,
match_phrase for Toronto Dentist did not contribute to results (same overall score, same # of results) ... and a simple
match is behaving oddly as well for the same query.
Just re-visited the various options available in the ES docs ...
Instead of a
math_phrase, do you think a
term query would be more suitable in this scenario?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.