Querying shingles

I have used the following settings to index content and generate bigrams.


es.indices.create(
  index= "shingles",
  body=  {
    "settings": {
      "analysis": {
        "analyzer": {
          "shingle_analyzer": {
            "tokenizer": "standard",
              "filter": [
                "lowercase",
                "stop",
                "shingle_filter",
                "trim",
                "kill_filler"]
          }
      },
      "filter": {
        "shingle_filter":{
          "type" : "shingle",
          "max_shingle_size" : 2,
          "min_shingle_size" : 2,
          "output_unigrams" : "false",
          "output_unigrams_if_no_shingles" : "true",
          "enable_position_increments":"false"
        },
        "kill_filler": {
          "type": "pattern_replace",
          "pattern": ".*_.*",
          "replace": "" 
        }
      }
    }
  },
  "mappings": {
        "properties": {
          "my_join_field": { 
            "type": "join",
            "relations": {
              "document": "page" 
            }
          }   
        }
  }    
})

When I run analyze API on a string "Word1 Word2 StopWord1 StopWord2 Word3 Word4", I correctly get 4 shingles
i) Word1 Word2
ii) ""
iii) ""
iv) Word3 Word4

What query can I use to retrieve content matching i) and/or iv) ?

I have not been able to get match_phrase or bool must match to work ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.