Elasticsearch plugin - Atleast functionality - Get Analyser for field

Hi all!

I have built a ScriptPlugin (for ES6.1.3, soon to be upgraded to ES7) that allows users to filter documents that have a minimal occurrence of a certain term or phrase within a given field/fields. I created this plugin for a client in the patent domain, where this functionality is more common. I couldn't find any existing queries (other than scripts) that could deliver this functionality.

How it works:

DELETE test
PUT test
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "abstract": {
          "type": "text",
          "analyzer": "my_analyzer", 
          "index_options": "offsets"
        },
        "title": {
          "type": "text",
          "analyzer": "my_analyzer", 
          "index_options": "offsets"
        }
      }
    }
  }
}

POST test/test/1
{
  "title": "Activity of a cell signaling pathway TGF-b in a subject ...",
  "abstract": "The present invention relates to a computer-implemented method for inferring activity of a TGF-β cellular signaling pathway in a subject ..."
}

POST test/test/2
{
  "title": "Activity of a cell signaling pathway TGF-b in a subject ...",
  "abstract": "This doesn't have the search text referred to later"
}

GET test/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source": "atleast",
            "lang": "byron_scripts",
            "params": {
              "fields": [
                "abstract",
                "title"
                ],
              "term": "signaling pathway",
              "occurrences": 2
            }
          }
        }
      }
    }
  }
}

Which only returns the first document, since it's the only one with 2 occurrences of the phrase
"signalling pathway"

I use postings in order to find the occurrences of both terms for the specified fields and use their start and end offsets to match documents that have the occurrences occur as a phrase.
One of the downsides of using script for this, is that I'm not analysing the user input.
Eg. If the user types "SIGNALING PATHWAY", this will not match any terms in the index, since the content get's lowercased by the analyser specified for that field.

My question, is it possible to retrieve the analyser for a specific field in a Plugin/ScriptPlugin, so that I can analyse the user input before getting the postings for each given term?

Based on the search request you are making, it looks like you are implementing FilterScript. The constructor takes SearchLookup. This indirectly holds a reference to the MapperService, in lookup.doc().mapperService(), which has all the mappings for the index.

Awesome :smiley:

Not sure why I didn't notice that one before.
Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.