Completion suggest migrate from 2.x to newer Versions of ES | Aggregations on non matching hits in string type with array content

Generally a very similar thread is Aggregation on suggestions results, but it doesn't really get to a good solution.

Let's go for the first approach and we can get almost the previous behavior.

First mapping and sample docs for easy reproduction; I used 7.0 here, but this should work on 6.x just the same way. Note that in the mapping only the suggest field is relevant and everything else could be skipped:

PUT test
{
  "settings": {
    "number_of_shards": 1, 
    "analysis": {
      "filter": {
        "edgeNGram_filter": {
          "type": "edgeNGram",
          "min_gram": 2,
          "max_gram": 25,
          "side": "front"
        },
        "custom_ascii_folding": {
          "type": "asciifolding",
          "preserve_original": true
        }
      },
      "analyzer": {
        "edge_nGram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edgeNGram_filter",
            "custom_ascii_folding"
          ]
        },
        "custom_suggest": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "custom_ascii_folding"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "terms": {
          "type": "keyword"
        },
        "payload": {
          "type": "text",
          "fields": {
            "autocomplete": {
              "type": "text",
              "analyzer": "edge_nGram_analyzer",
              "search_analyzer": "standard"
            },
            "raw": {
              "type": "keyword"
            }
          }
        },
        "suggest": {
          "type": "completion"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "id": "1",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete"
    ]
  }
}

PUT test/_doc/2
{
  "id": "2",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete",
    "thirkell",
    "angela"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete",
    "Thirkell, Angela"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete",
      "thirkell",
      "angela"
    ]
  }
}

PUT test/_doc/3
{
  "id": "3",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete",
    "bowen",
    "elizabeth"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete",
    "Bowen, Elizabeth"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete",
      "bowen",
      "elizabeth"
    ]
  }
}

PUT test/_doc/4
{
  "id": "4",
  "terms": [
    "austen",
    "jane",
    "krämer",
    "ilse"
  ],
  "payload": [
    "Austen, Jane",
    "Krämer, Ilse"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "krämer",
      "ilse"
    ]
  }
}

PUT test/_doc/5
{
  "id": "5",
  "terms": [
    "jane",
    "austen",
    "mozart"
  ],
  "payload": "Jane Austen and Mozart",
  "suggest": {
    "input": [
      "jane",
      "austen",
      "mozart"
    ]
  }
}

And then the query is:

GET test/_search
{
  "_source": false,
  "suggest": {
    "term-suggest": {
      "prefix": "a",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true
      }
    }
  }
}

Which gets you the result (only the suggest part):

"term-suggest" : [
  {
    "text" : "a",
    "offset" : 0,
    "length" : 1,
    "options" : [
      {
        "text" : "angela",
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0
      },
      {
        "text" : "austen",
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0
      }
    ]
  }
]

The important parts are:

  • prefix query, since I assume we need to start with the right letter(s) to get to any results.
  • skip_duplicates to have every completion term only once. This renders the _id field pretty useless since it could be multiple IDs but we are only returning one.
  • "_source": false to avoid getting the actual documents back.
1 Like