Problem:
Recently I wanted to do a proximity search on elastic search index. I
wanted to search all docs where ‘measles’ and ‘vaccin*’ were with 25
characters to each other. Plus I wanted both of them to be in order.
The elastic search proximity search wasn’t an option because of two
reasons.
-
Proximity search doesn’t support wildcards. e.g (“measles vaccine”)~25
is supported but (“measles vacci*”)~25 or (“measle* vacci*”) is not
supported. -
Proximity search doesn’t check the respect the order of words in phrase
e.g (“measles vaccine”)~25 and (“vaccine measles”)~25 will give same
results.
Solution:
Few examples to resolve this issue using span_near
- (“measles vacci*”)~25
{
"query": {
"span_near": {
"clauses": [
{
"span_or": {
"clauses": [
{
"span_term": {
"text": "measles"
}
}
]
}
},
{
"span_or": {
"clauses": [
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "vacci"
}
}
}
}
}
]
}
}
],
"slop": 25,
"in_order": "true”,
"collect_payloads": "true"
}
}
}
// in_order can be used to toggle between ordered or unordered.
-
“measle* vacci*”
{
"query": {
"span_near": {
"clauses": [
{
"span_or": {
"clauses": [
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "measle"
}
}
}
}
}
]
}
},
{
"span_or": {
"clauses": [
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "vacci"
}
}
}
}
}
]
}
}
],
"slop": 0,
"in_order": "true",
"collect_payloads": "true"
}
}
} -
Grouping. Now lets assume you want to find all docs where (canada OR
toronto OR “North york”) NEAR (measles OR vaccin*). And they should be near
to each other by 30 characters.
{
"query": {
"span_near": {
"clauses": [
{
"span_or": {
"clauses": [
{
"span_near": {
"clauses": [
{
"span_term": {
"text": "North"
}
},
{
"span_term": {
"text": "york"
}
}
],
"slop": 0,
"in_order": "true",
"collect_payloads": "true"
}
},
{
"span_term": {
"text": "toronto"
}
},
{
"span_term": {
"text": "canada"
}
}
]
}
},
{
"span_or": {
"clauses": [
{
"span_term": {
"text": "measles"
}
},
{
"span_multi": {
"match": {
"prefix": {
"text": {
"value": "vaccin"
}
}
}
}
}
]
}
}
],
"slop": 30,
"in_order": "false",
"collect_payloads": "true"
}
}
}
If any one knows better solution than this one please comment. Any
suggestions how I can build a parser to take query from user e.g (quick AND
near(foxes OR rats, toronto OR ontario, 30)) and convert that to elastic
search span_near using above workaround. Boolean operators and parenthesis
precedence is what I am finding hard to handle. Any open source PHP library
which can help me change user written queries with parenthesis and boolean
operator to ES filters.
@shay banon, @ steven @uri: any plans to have such operator support in the
query_string
References:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-span-near-query.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-span-or-query.html
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/248757df-b957-4c67-8df7-6ba4efc93623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.