Aggregation on suggestions results

Hi,

How do I aggregate on a suggestions list?

I want to 'group by' on the text field returned by the suggest :slight_smile:

I use the _search endpoint with this as the query

{
"_source": "suggest",
	"suggest": {
    		"suggest" : {
			"prefix" : "bm",
        		"completion" : {
            		"field" : "suggest",
				"size": 10
        		}
    		}
	},
	"aggs": {
    		"group": {
      		"terms": {
        			"field": "text"
      		}
    		}
  	}
}

I tried all different kind of paths, for the term agg. but without any luck...

Since this was a feature in the old style suggester (1.7) I hope this feature is not gone :blush:

Hey,

can you link to the documentation where this feature was mentioned in the 1.7 documentation? I am not aware of this having worked in earlier versions or current (but maybe I am just missing it).

--Alex

Hi Alex,

The difference is that in 1.7 the list returned was grouped. You got a list of unique values with a score.

In elasticsearch 5 you get results with the same text (since it's a search I guess).

So to convert this to an 'autocomplete' functionality you have to first iterate through all results (which are millions in my case' before I can give a filtered list to a user.

I hope this is now more clear :wink:

Christoph

One way to work around this problem would be to have a dedicated suggest index. In the index, have a document for each unique suggestion with a list of users. So the result of running a suggest query would have hits for unique suggestion and you get back a list of users with each entry to filter by.

Example suggest index mapping:

 {
   "suggestions_for_users": {
       "mappings": {
              "properties":  {
                  # unique suggestion text
                  "suggestion": {"type": "completion"},
                  # users that can see the suggestion  
                  "users": {"type":"keyword"} 
              } 
        }       
    }
 }

Hi Areek,

That's not really the same. I can still do a normal term search and then group them. But the suggest/completion documentation cleary states 'The completion suggester provides auto-complete/search-as-you-type functionality' :slight_smile:

Furthermore I provide more input variants per document; like for a car brand I would index 'mercedes-benz', 'mercedesbenz', 'mercedes', benz', and also normalised versions if needed, but they would still point to 'Mercedes-Benz'.
This worked like a charm in 1.7 and 2.*.

In a dedicated search index this would be possible, but it's extra work to maintain and I think this was an awesome feature of the elasticsaerch Completion Suggester.

Hi Alex,

Is this a breaking change, and should I use a different approach, or is this a bug, and will it be fixed soon?

Christoph

you can do this easily with completion suggester v5, assuming you have a completion field named suggest_name, your document can be as follows:

 {
  "suggest_name": ["mercedes-benz",  "mercedesbenz", "mercedes", "benz"],
  "name": "Mercedes-Benz"
 }

Now executing suggest on any of the suggest_name will return you the associated document. From the
document you can parse out the name.
The completion suggester v5 is actually more flexible here as it returns the associated document (i.e. you can parse out any field from the document) that a suggestion belongs in, without specifying them upfront (in 2.x this was payloads)

yes i saw that. But I don't need the documents, I only need the suggestions for autocomplete purposes. Just like the suggest-completion worked in version before 5 :slight_smile:

I will give you the responses with the same dataset in ES 1.7 en ES5:

I get this when I want to give suggestions for 'me' in ES5

{
"_source": "suggest",
	"suggest": {
    		"merk-suggest" : {
			"prefix" : "me",
        		"completion" : {
            		"field" : "merk_suggest",
				"size": 5
        		}
    		}
	}
}

and get this result in ES5

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0.0,
    "hits": []
  },
  "suggest": {
    "merk-suggest": [
      {
        "text": "me",
        "offset": 0,
        "length": 2,
        "options": [
          {
            "text": "mega",
            "_index": "rdw",
            "_type": "kenteken",
            "_id": "jggp39",
            "_score": 1.0,
            "_source": {}
          },
          {
            "text": "mega",
            "_index": "rdw",
            "_type": "kenteken",
            "_id": "f464df",
            "_score": 1.0,
            "_source": {}
          },
          {
            "text": "mega",
            "_index": "rdw",
            "_type": "kenteken",
            "_id": "77dpp3",
            "_score": 1.0,
            "_source": {}
          },
          {
            "text": "meierling",
            "_index": "rdw",
            "_type": "kenteken",
            "_id": "on46zt",
            "_score": 1.0,
            "_source": {}
          },
          {
            "text": "meierling",
            "_index": "rdw",
            "_type": "kenteken",
            "_id": "op32lt",
            "_score": 1.0,
            "_source": {}
          }              
        ]
      }
    ]
  }
}

With ES 1.7:

{
    "song-suggest" : {
        "text" : "me",
        "completion" : {
            "field" : "merk_suggest",
           "size": 5
        }
    }
}

results in ES 1.7

{
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "song-suggest": [
    {
      "text": "me",
      "offset": 0,
      "length": 2,
      "options": [
        {
          "text": "mercedes-benz",
          "score": 82106.0,
          "payload": {
            "url": "mercedes-benz"
          }
        },
        {
          "text": "mercury",
          "score": 143.0,
          "payload": {
            "url": "mercury"
          }
        },
        {
          "text": "mercedes-amg",
          "score": 124.0,
          "payload": {
            "url": "mercedes-amg"
          }
        },
        {
          "text": "mega",
          "score": 118.0,
          "payload": {
            "url": "mega"
          }
        },
        {
          "text": "meto",
          "score": 87.0,
          "payload": {
            "url": "meto"
          }
        }
      ]
    }
  ]
}

The results is already grouped by name. I don't need the actual document, I just want the suggestion, and a payload would be awesome, but that's also dropped in ES5 :frowning:

Hi Christoph,

In 5.0, the completion suggester has been fully re-written, namely completion suggester is now document-oriented as in the document source is returned with every suggestion. One major reason for this is to make the suggest API consistent with other search APIs in elasticsearch among others. For a full list of breaking changes see [1].

[1] also explains why payloads have been removed and encourages users to use document fields instead of payloads. If you don't want to return the entire document with each suggestion, you can use source filtering [2] to only return fields that are relevant.

Because 5.0 is a major version, it is expected applications based on 2.x have to adapt to these breaking changes. The document-oriented nature of completion suggester and removal of payloads are not bugs but features, which provide more flexibility (ability to return any document fields at query-time), ensures bounded heap memory usage (removal of payloads) and correctness in near-real time use-case (deleted documents will never show up, see note on using optimize in [3] for 2.x completion suggester).

[1] Suggester changes | Elasticsearch Guide [5.0] | Elastic
[2] Source filtering | Elasticsearch Guide [5.0] | Elastic
[3] Completion Suggester | Elasticsearch Guide [2.4] | Elastic

Hi Christoph,

As an example, here is how you can use the completions suggester in 5.0, taking advantage of attaching multiple completions to a document and using document fields instead of 'payloads':

Setup mappings for completion field:

NOTE: fields other than completion fields can be mapped dynamically mapped at index time, no need to explicitly have a mapping for them if not needed.

curl -XPUT "http://localhost:9200/test_suggest" -d'
{
  "mappings": {
    "suggestionType": {
      "properties": {
        "name_suggest": {
          "type": "completion"
        },
        "url": {
          "type": "keyword"
        },
        "name": {
          "type": "keyword"
        }
      }
    }
  }
}'

Index some suggestions, along with arbitary fields that may or may not be relevant to suggestions:

NOTE: you can tune how each suggestion is ranked by assigning them a weight, this determines how the returned suggestions are ordered. see [1] for adding weights to all completion entries or individual completion entries while indexing a document with completions

curl -XPOST "http://localhost:9200/test_suggest/suggestionType" -d'
{
  "name_suggest": ["mercedes-benz",  "mercedesbenz", "mercedes", "benz"],
  "name": "Mercedes-Benz",
  "url": "http://example.com/1"
}'

curl -XPOST "http://localhost:9200/test_suggest/suggestionType" -d'
{
  "name_suggest": ["mercury", "moto", "mega"],
  "name": "Mercury Mega",
  "url": "http://example.com/2"
}'

You can choose to filter fields only relevant for the suggestion, using source_filtering [2], here we exclude the url field:

NOTE: not specifying _source in the request body will return the entire document

curl -XGET "http://localhost:9200/test_suggest/_search" -d'
{
  "_source": {"excludes": "url"}, 
  "suggest": {
    "name-suggest": {
      "text": "me",
      "completion": {
        "field": "name_suggest"
      }
    }
  }
}'

Response:

NOTE: each option contains a text to indicate the highest weighted completion along with _source containing filtered fields according to specified _source in the associated suggest document.

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "suggest": {
    "name-suggest": [
      {
        "text": "me",
        "offset": 0,
        "length": 2,
        "options": [
          {
            "text": "mega",
            "_index": "test_suggest",
            "_type": "suggestionType",
            "_id": "AVhLSawPUm340BmH7_Vk",
            "_score": 1,
            "_source": {
              "name_suggest": [
                "mercury",
                "moto",
                "mega"
              ],
              "name": "Mercury Mega"
            }
          },
          {
            "text": "mercedes",
            "_index": "test_suggest",
            "_type": "suggestionType",
            "_id": "AVhLSaUqUm340BmH7_Vj",
            "_score": 1,
            "_source": {
              "name_suggest": [
                "mercedes-benz",
                "mercedesbenz",
                "mercedes",
                "benz"
              ],
              "name": "Mercedes-Benz"
            }
          }
        ]
      }
    ]
  }
}

Hope this helps :slight_smile:

[1] https://www.elastic.co/guide/en/elasticsearch/reference/5.0/search-suggesters-completion.html#indexing
[2] https://www.elastic.co/guide/en/elasticsearch/reference/5.0/search-request-source-filtering.html

Thanks for the detailed how-to. I will make separated indexes for the auto-completion.

Maybe a nice feature for the future is to have an term aggregate function build in (or a way to aggregate manually over the results). That would do the trick for me I guess.

Christoph

3 Likes