Remove results with same id from Elasticsearch search result

Hi,

I just joined the forums and would like to aks you for a little help with the following question I am not able to answer reading the docs:

Let's assume we have a search result with 3 documents. Two of them share a key attribute (product-ID or similar).

Is it possible to remove "doubled" documents from the search result by using Elasticsearch, so that only 2 documents would be returned in that case? I don't want to implement this in application logic as I would still like to use pagination, aggregation, etc. It does not matter which of the two documents with the same id is removed.

This would be the example in Elasticsearch:

{
  "mappings": {
    "properties": {
      "name":    { "type": "text" },  
      "articleNumber":  { "type": "keyword"  }
    }
  }
}

PUT /tmp_pd_articles/_doc/1
{
  "name": "My Book 1",
  "articleNumber": "A9781"
}

PUT /tmp_pd_articles/_doc/2
{
  "name": "My Book 1 (with some other title)",
  "articleNumber": "A9781"
}

PUT /tmp_pd_articles/_doc/3
{
  "name": "My Book 2",
  "articleNumber": "A9782"
}


GET /tmp_pd_articles/_search
{
  "query": { "match_all": {} }

}

The goal is to write a query that returns only two articles instead of all three:
#1 ("A9781", "My Book 1") OR
#2 ("A9781", "My Book 1 (with some other title)")

AND
#3 ("A9782", "My Book 2")

This reduction should be applied because #1 and #2 share the same productNumber "A9781". I wonder whether there is a Elasticsearch query to accomplish this goal.

Thank you very much for your help in advance,
Philipp

Welcome! You can run a terms aggregation on the articleNumber and add a top_hits agg inside.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.