Hi,
I just joined the forums and would like to aks you for a little help with the following question I am not able to answer reading the docs:
Let's assume we have a search result with 3 documents. Two of them share a key attribute (product-ID or similar).
Is it possible to remove "doubled" documents from the search result by using Elasticsearch, so that only 2 documents would be returned in that case? I don't want to implement this in application logic as I would still like to use pagination, aggregation, etc. It does not matter which of the two documents with the same id is removed.
This would be the example in Elasticsearch:
{
"mappings": {
"properties": {
"name": { "type": "text" },
"articleNumber": { "type": "keyword" }
}
}
}
PUT /tmp_pd_articles/_doc/1
{
"name": "My Book 1",
"articleNumber": "A9781"
}
PUT /tmp_pd_articles/_doc/2
{
"name": "My Book 1 (with some other title)",
"articleNumber": "A9781"
}
PUT /tmp_pd_articles/_doc/3
{
"name": "My Book 2",
"articleNumber": "A9782"
}
GET /tmp_pd_articles/_search
{
"query": { "match_all": {} }
}
The goal is to write a query that returns only two articles instead of all three:
#1 ("A9781", "My Book 1") OR
#2 ("A9781", "My Book 1 (with some other title)")
AND
#3 ("A9782", "My Book 2")
This reduction should be applied because #1 and #2 share the same productNumber "A9781". I wonder whether there is a Elasticsearch query to accomplish this goal.
Thank you very much for your help in advance,
Philipp