Completion suggest migrate from 2.x to newer Versions of ES | Aggregations on non matching hits in string type with array content

Hello! I realized a search application with bibliographic data in ES 2.4.4.
I testet different approaches of autocomplete:

  1. completion suggest
  2. aggregation

My data (examples):
{"id":"1","terms":["austen","jane","rauchenberger","margarete"],"payload":["Austen, Jane","Rauchenberger, Margarete"],"suggest":{"input":["austen","jane","rauchenberger","margarete"]}}

{"id":"2","terms":["austen","jane","rauchenberger","margarete","thirkell","angela"],"payload":["Austen, Jane","Rauchenberger, Margarete","Thirkell, Angela"],"suggest":{"input":["austen","jane","rauchenberger","margarete","thirkell","angela"]}}

{"id":"3","terms":["austen","jane","rauchenberger","margarete","bowen","elizabeth"],"payload":["Austen, Jane","Rauchenberger, Margarete","Bowen, Elizabeth"],"suggest":{"input":["austen","jane","rauchenberger","margarete","bowen","elizabeth"]}}

{"id":"4","terms":["austen","jane","kr\u00e4mer","ilse"],"payload":["Austen, Jane","Kr\u00e4mer, Ilse"],"suggest":{"input":["austen","jane","kr\u00e4mer","ilse"]}}

{"id":"5","terms":["jane","austen","mozart"],"payload":"Jane Austen and Mozart","suggest":{"input":["jane","austen","mozart"]}}


My index: /mysuggest/title/
{
"settings": {
"analysis": {
"filter": {
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 25,
"side": "front"
},
"custom_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
}
},
"analyzer": {
"edge_nGram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edgeNGram_filter",
"custom_ascii_folding"
]
},
"custom_suggest": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"custom_ascii_folding"
]
}
}
}
},
"mappings": {
"title": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"terms": {
"type": "string",
"index": "not_analyzed"
},
"payload": {
"type": "string",
"fields": {
"autocomplete": {
"type": "string",
"analyzer": "edge_nGram_analyzer",
"search_analyzer": "standard"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"suggest": {
"type": "completion",
"analyzer": "custom_suggest",
"search_analyzer": "standard",
"payloads": false
}
}
}
}
}


  1. First approach completion suggest:
    The disired output of hits is an array of objects as implemented in ES Version < 5.x:
    {text: "xxx", score: 1}
    This has changed in Version >= 5.x

Query:
http://localhost:9200/mysuggest/title/_search
{
"size": 0,
"suggest": {
"term-suggest": {
"text": "a",
"completion": {
"field": "suggest",
"size": "20"
}
}
}
}

Result:
{

"took": 1,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 5,
    "max_score": 0,
    "hits": [ ]
},
"suggest": {
    "term-suggest": [
        {
            "text": "a",
            "offset": 0,
            "length": 1,
            "options": [
                {
                    "text": "angela",
                    "score": 1
                }
                ,
                {
                    "text": "austen",
                    "score": 1
                }
            ]
        }
    ]
}

}

Question: How can I achieve this with ES 6?


  1. Second approach aggreagations:
    The problem with aggregations is, that I get results of non matching terms:

Query:
http://localhost:9200/mysuggest/title/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"terms": "austen"
}
}
]
}
},
"aggregations": {
"top_terms": {
"terms": {
"field": "payload.raw",
"size": 10,
"order": {
"_count": "desc"
}
}
}
}
}

Result:
{

"took": 1,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 5,
    "max_score": 0,
    "hits": [ ]
},
"aggregations": {
    "top_terms": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "Austen, Jane",
                "doc_count": 4
            }
            ,
            {
                "key": "Rauchenberger, Margarete",
                "doc_count": 3
            }
            ,
            {
                "key": "Bowen, Elizabeth",
                "doc_count": 1
            }
            ,
            {
                "key": "Jane Austen and Mozart",
                "doc_count": 1
            }
            ,
            {
                "key": "Krämer, Ilse",
                "doc_count": 1
            }
            ,
            {
                "key": "Thirkell, Angela",
                "doc_count": 1
            }
        ]
    }
}

}

I only want these results in the bucket because only these match with "austen":
{"key": "Austen, Jane", "doc_count": 4}
{ "key": "Jane Austen and Mozart","doc_count": 1}

I've tryed filtering the query ... same result.
The only way to get the desired result set is to programmatically filter out unwanted results.

Is there any solution?
See this online: http://rxs.bibliothecauniversalis.net/ with 4 optional settings.
Thanks alot for your help.

Kind regards
JP Weiner

Generally a very similar thread is Aggregation on suggestions results, but it doesn't really get to a good solution.

Let's go for the first approach and we can get almost the previous behavior.

First mapping and sample docs for easy reproduction; I used 7.0 here, but this should work on 6.x just the same way. Note that in the mapping only the suggest field is relevant and everything else could be skipped:

PUT test
{
  "settings": {
    "number_of_shards": 1, 
    "analysis": {
      "filter": {
        "edgeNGram_filter": {
          "type": "edgeNGram",
          "min_gram": 2,
          "max_gram": 25,
          "side": "front"
        },
        "custom_ascii_folding": {
          "type": "asciifolding",
          "preserve_original": true
        }
      },
      "analyzer": {
        "edge_nGram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edgeNGram_filter",
            "custom_ascii_folding"
          ]
        },
        "custom_suggest": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "custom_ascii_folding"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "terms": {
          "type": "keyword"
        },
        "payload": {
          "type": "text",
          "fields": {
            "autocomplete": {
              "type": "text",
              "analyzer": "edge_nGram_analyzer",
              "search_analyzer": "standard"
            },
            "raw": {
              "type": "keyword"
            }
          }
        },
        "suggest": {
          "type": "completion"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "id": "1",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete"
    ]
  }
}

PUT test/_doc/2
{
  "id": "2",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete",
    "thirkell",
    "angela"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete",
    "Thirkell, Angela"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete",
      "thirkell",
      "angela"
    ]
  }
}

PUT test/_doc/3
{
  "id": "3",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete",
    "bowen",
    "elizabeth"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete",
    "Bowen, Elizabeth"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete",
      "bowen",
      "elizabeth"
    ]
  }
}

PUT test/_doc/4
{
  "id": "4",
  "terms": [
    "austen",
    "jane",
    "krämer",
    "ilse"
  ],
  "payload": [
    "Austen, Jane",
    "Krämer, Ilse"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "krämer",
      "ilse"
    ]
  }
}

PUT test/_doc/5
{
  "id": "5",
  "terms": [
    "jane",
    "austen",
    "mozart"
  ],
  "payload": "Jane Austen and Mozart",
  "suggest": {
    "input": [
      "jane",
      "austen",
      "mozart"
    ]
  }
}

And then the query is:

GET test/_search
{
  "_source": false,
  "suggest": {
    "term-suggest": {
      "prefix": "a",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true
      }
    }
  }
}

Which gets you the result (only the suggest part):

"term-suggest" : [
  {
    "text" : "a",
    "offset" : 0,
    "length" : 1,
    "options" : [
      {
        "text" : "angela",
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0
      },
      {
        "text" : "austen",
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0
      }
    ]
  }
]

The important parts are:

  • prefix query, since I assume we need to start with the right letter(s) to get to any results.
  • skip_duplicates to have every completion term only once. This renders the _id field pretty useless since it could be multiple IDs but we are only returning one.
  • "_source": false to avoid getting the actual documents back.
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.