Workaround for using wild cards in phrases and proximity searches(Elastic Search)


(saud ur rehman) #1

Problem:

Recently I wanted to do a proximity search on elastic search index. I
wanted to search all docs where ‘measles’ and ‘vaccin*’ were with 25
characters to each other. Plus I wanted both of them to be in order.

The elastic search proximity search wasn’t an option because of two
reasons.

  1. Proximity search doesn’t support wildcards. e.g (“measles vaccine”)~25
    is supported but (“measles vacci*”)~25 or (“measle* vacci*”) is not
    supported.

  2. Proximity search doesn’t check the respect the order of words in phrase
    e.g (“measles vaccine”)~25 and (“vaccine measles”)~25 will give same
    results.

Solution:
Few examples to resolve this issue using span_near

  1. (“measles vacci*”)~25
    {
    "query": {
    "span_near": {
    "clauses": [
    {
    "span_or": {
    "clauses": [
    {
    "span_term": {
    "text": "measles"
    }
    }
    ]
    }
    },
    {
    "span_or": {
    "clauses": [
    {
    "span_multi": {
    "match": {
    "prefix": {
    "text": {
    "value": "vacci"
    }
    }
    }
    }
    }
    ]
    }
    }
    ],
    "slop": 25,
    "in_order": "true”,
    "collect_payloads": "true"
    }
    }
    }

// in_order can be used to toggle between ordered or unordered.

  1. “measle* vacci*”
    {
    "query": {
    "span_near": {
    "clauses": [
    {
    "span_or": {
    "clauses": [
    {
    "span_multi": {
    "match": {
    "prefix": {
    "text": {
    "value": "measle"
    }
    }
    }
    }
    }
    ]
    }
    },
    {
    "span_or": {
    "clauses": [
    {
    "span_multi": {
    "match": {
    "prefix": {
    "text": {
    "value": "vacci"
    }
    }
    }
    }
    }
    ]
    }
    }
    ],
    "slop": 0,
    "in_order": "true",
    "collect_payloads": "true"
    }
    }
    }

  2. Grouping. Now lets assume you want to find all docs where (canada OR
    toronto OR “North york”) NEAR (measles OR vaccin*). And they should be near
    to each other by 30 characters.
    {
    "query": {
    "span_near": {
    "clauses": [
    {
    "span_or": {
    "clauses": [
    {
    "span_near": {
    "clauses": [
    {
    "span_term": {
    "text": "North"
    }
    },
    {
    "span_term": {
    "text": "york"
    }
    }
    ],
    "slop": 0,
    "in_order": "true",
    "collect_payloads": "true"
    }
    },
    {
    "span_term": {
    "text": "toronto"
    }
    },
    {
    "span_term": {
    "text": "canada"
    }
    }
    ]
    }
    },
    {
    "span_or": {
    "clauses": [
    {
    "span_term": {
    "text": "measles"
    }
    },
    {
    "span_multi": {
    "match": {
    "prefix": {
    "text": {
    "value": "vaccin"
    }
    }
    }
    }
    }
    ]
    }
    }
    ],
    "slop": 30,
    "in_order": "false",
    "collect_payloads": "true"
    }
    }
    }

If any one knows better solution than this one please comment. Any
suggestions how I can build a parser to take query from user e.g (quick AND
near(foxes OR rats, toronto OR ontario, 30)) and convert that to elastic
search span_near using above workaround. Boolean operators and parenthesis
precedence is what I am finding hard to handle. Any open source PHP library
which can help me change user written queries with parenthesis and boolean
operator to ES filters.

@shay banon, @ steven @uri: any plans to have such operator support in the
query_string

References:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-span-near-query.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-span-multi-term-query.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-span-or-query.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/248757df-b957-4c67-8df7-6ba4efc93623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Kevin Brown) #2

Saud, thanks for your post. Did you by chance figure a good way to convert user written queries into Elasticsearch queries? I'm facing the same issues!


(system) #3