Completion suggest migrate from 2.x to newer Versions of ES | Aggregations on non matching hits in string type with array content

jpweiner · March 28, 2019, 11:56am

Hello! I realized a search application with bibliographic data in ES 2.4.4.
I testet different approaches of autocomplete:

completion suggest
aggregation

My data (examples):
{"id":"1","terms":["austen","jane","rauchenberger","margarete"],"payload":["Austen, Jane","Rauchenberger, Margarete"],"suggest":{"input":["austen","jane","rauchenberger","margarete"]}}

{"id":"2","terms":["austen","jane","rauchenberger","margarete","thirkell","angela"],"payload":["Austen, Jane","Rauchenberger, Margarete","Thirkell, Angela"],"suggest":{"input":["austen","jane","rauchenberger","margarete","thirkell","angela"]}}

{"id":"3","terms":["austen","jane","rauchenberger","margarete","bowen","elizabeth"],"payload":["Austen, Jane","Rauchenberger, Margarete","Bowen, Elizabeth"],"suggest":{"input":["austen","jane","rauchenberger","margarete","bowen","elizabeth"]}}

{"id":"4","terms":["austen","jane","kr\u00e4mer","ilse"],"payload":["Austen, Jane","Kr\u00e4mer, Ilse"],"suggest":{"input":["austen","jane","kr\u00e4mer","ilse"]}}

{"id":"5","terms":["jane","austen","mozart"],"payload":"Jane Austen and Mozart","suggest":{"input":["jane","austen","mozart"]}}

My index: /mysuggest/title/
{
"settings": {
"analysis": {
"filter": {
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 25,
"side": "front"
},
"custom_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
}
},
"analyzer": {
"edge_nGram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edgeNGram_filter",
"custom_ascii_folding"
]
},
"custom_suggest": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"custom_ascii_folding"
]
}
}
}
},
"mappings": {
"title": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"terms": {
"type": "string",
"index": "not_analyzed"
},
"payload": {
"type": "string",
"fields": {
"autocomplete": {
"type": "string",
"analyzer": "edge_nGram_analyzer",
"search_analyzer": "standard"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"suggest": {
"type": "completion",
"analyzer": "custom_suggest",
"search_analyzer": "standard",
"payloads": false
}
}
}
}
}

First approach completion suggest:
The disired output of hits is an array of objects as implemented in ES Version < 5.x:
{text: "xxx", score: 1}
This has changed in Version >= 5.x

Query:
http://localhost:9200/mysuggest/title/_search
{
"size": 0,
"suggest": {
"term-suggest": {
"text": "a",
"completion": {
"field": "suggest",
"size": "20"
}
}
}
}

Result:
{

"took": 1,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 5,
    "max_score": 0,
    "hits": [ ]
},
"suggest": {
    "term-suggest": [
        {
            "text": "a",
            "offset": 0,
            "length": 1,
            "options": [
                {
                    "text": "angela",
                    "score": 1
                }
                ,
                {
                    "text": "austen",
                    "score": 1
                }
            ]
        }
    ]
}

}

Question: How can I achieve this with ES 6?

Second approach aggreagations:
The problem with aggregations is, that I get results of non matching terms:

Query:
http://localhost:9200/mysuggest/title/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"terms": "austen"
}
}
]
}
},
"aggregations": {
"top_terms": {
"terms": {
"field": "payload.raw",
"size": 10,
"order": {
"_count": "desc"
}
}
}
}
}

Result:
{

"took": 1,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 5,
    "max_score": 0,
    "hits": [ ]
},
"aggregations": {
    "top_terms": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "Austen, Jane",
                "doc_count": 4
            }
            ,
            {
                "key": "Rauchenberger, Margarete",
                "doc_count": 3
            }
            ,
            {
                "key": "Bowen, Elizabeth",
                "doc_count": 1
            }
            ,
            {
                "key": "Jane Austen and Mozart",
                "doc_count": 1
            }
            ,
            {
                "key": "Krämer, Ilse",
                "doc_count": 1
            }
            ,
            {
                "key": "Thirkell, Angela",
                "doc_count": 1
            }
        ]
    }
}

}

I only want these results in the bucket because only these match with "austen":
{"key": "Austen, Jane", "doc_count": 4}
{ "key": "Jane Austen and Mozart","doc_count": 1}

I've tryed filtering the query ... same result.
The only way to get the desired result set is to programmatically filter out unwanted results.

Is there any solution?
See this online: http://rxs.bibliothecauniversalis.net/ with 4 optional settings.
Thanks alot for your help.

Kind regards
JP Weiner

xeraa · April 20, 2019, 9:32pm

Generally a very similar thread is Aggregation on suggestions results, but it doesn't really get to a good solution.

Let's go for the first approach and we can get almost the previous behavior.

First mapping and sample docs for easy reproduction; I used 7.0 here, but this should work on 6.x just the same way. Note that in the mapping only the suggest field is relevant and everything else could be skipped:

PUT test
{
  "settings": {
    "number_of_shards": 1, 
    "analysis": {
      "filter": {
        "edgeNGram_filter": {
          "type": "edgeNGram",
          "min_gram": 2,
          "max_gram": 25,
          "side": "front"
        },
        "custom_ascii_folding": {
          "type": "asciifolding",
          "preserve_original": true
        }
      },
      "analyzer": {
        "edge_nGram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edgeNGram_filter",
            "custom_ascii_folding"
          ]
        },
        "custom_suggest": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "custom_ascii_folding"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "terms": {
          "type": "keyword"
        },
        "payload": {
          "type": "text",
          "fields": {
            "autocomplete": {
              "type": "text",
              "analyzer": "edge_nGram_analyzer",
              "search_analyzer": "standard"
            },
            "raw": {
              "type": "keyword"
            }
          }
        },
        "suggest": {
          "type": "completion"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "id": "1",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete"
    ]
  }
}

PUT test/_doc/2
{
  "id": "2",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete",
    "thirkell",
    "angela"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete",
    "Thirkell, Angela"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete",
      "thirkell",
      "angela"
    ]
  }
}

PUT test/_doc/3
{
  "id": "3",
  "terms": [
    "austen",
    "jane",
    "rauchenberger",
    "margarete",
    "bowen",
    "elizabeth"
  ],
  "payload": [
    "Austen, Jane",
    "Rauchenberger, Margarete",
    "Bowen, Elizabeth"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "rauchenberger",
      "margarete",
      "bowen",
      "elizabeth"
    ]
  }
}

PUT test/_doc/4
{
  "id": "4",
  "terms": [
    "austen",
    "jane",
    "krämer",
    "ilse"
  ],
  "payload": [
    "Austen, Jane",
    "Krämer, Ilse"
  ],
  "suggest": {
    "input": [
      "austen",
      "jane",
      "krämer",
      "ilse"
    ]
  }
}

PUT test/_doc/5
{
  "id": "5",
  "terms": [
    "jane",
    "austen",
    "mozart"
  ],
  "payload": "Jane Austen and Mozart",
  "suggest": {
    "input": [
      "jane",
      "austen",
      "mozart"
    ]
  }
}

And then the query is:

GET test/_search
{
  "_source": false,
  "suggest": {
    "term-suggest": {
      "prefix": "a",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true
      }
    }
  }
}

Which gets you the result (only the suggest part):

"term-suggest" : [
  {
    "text" : "a",
    "offset" : 0,
    "length" : 1,
    "options" : [
      {
        "text" : "angela",
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0
      },
      {
        "text" : "austen",
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0
      }
    ]
  }
]

The important parts are:

prefix query, since I assume we need to start with the right letter(s) to get to any results.
skip_duplicates to have every completion term only once. This renders the _id field pretty useless since it could be multiple IDs but we are only returning one.
"_source": false to avoid getting the actual documents back.

system · May 18, 2019, 9:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch autocomplete suggestion on array object Elasticsearch	15	1206	February 7, 2023
Completion suggester suggesting full document Elasticsearch	5	3305	March 16, 2017
Completion suggest array Elasticsearch	1	906	July 5, 2017
Autocomplete feature Elasticsearch	3	459	March 24, 2017
Completion suggester not working as expected for field with array of strings Elasticsearch	1	361	September 24, 2020

Completion suggest migrate from 2.x to newer Versions of ES | Aggregations on non matching hits in string type with array content

Related topics