Is it possible to eliminate duplication of search response when using nested query?

Here's the sample of mapping, register, and search query.

mapping

curl -X PUT "es:9200/english" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "_doc": {
      "properties": {
        "title" : {
          "type" : "text"
        },
        "contents": {
          "type": "nested"
        }
      }
    }
  }
}
'

register

curl -X PUT "es:9200/english/_doc/1?refresh" -H 'Content-Type: application/json' -d'
{
  "title": "Test title",
  "contents": [
    {
      "header": "something special",
      "body": "I am John."
    },
    {
      "header": "anything hot",
      "body": "This is a cup."
    }
  ]
}
'

curl -X PUT "es:9200/english/_doc/2?refresh" -H 'Content-Type: application/json' -d'
{
  "title": "Test title",
  "contents": [
    {
      "header": "something special",
      "body": "I am John."
    },
    {
      "header": "anything hot",
      "body": "That is a glass."
    }
  ]
}
'

search

curl -XGET "es:9200/english/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "_source": 'false',
  "size": 20,
  "query": {
    "nested": {
      "path": "contents",
      "score_mode": "max",
      "query": {
          "simple_query_string":{
          "query": "I am",
          "fields": ["contents.header","contents.body"],
          "auto_generate_synonyms_phrase_query": 'true'
        }
      },
      "inner_hits": {
        "size": 1
      }
    }
  }
}
'

result

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.4723401,
    "hits" : [
      {
        "_index" : "english",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.4723401,
        "inner_hits" : {
          "contents" : {
            "hits" : {
              "total" : 1,
              "max_score" : 1.4723401,
              "hits" : [
                {
                  "_index" : "english",
                  "_type" : "_doc",
                  "_id" : "2",
                  "_nested" : {
                    "field" : "contents",
                    "offset" : 0
                  },
                  "_score" : 1.4723401,
                  "_source" : {
                    "header" : "something special",
                    "body" : "I am John."
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index" : "english",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.4723401,
        "inner_hits" : {
          "contents" : {
            "hits" : {
              "total" : 1,
              "max_score" : 1.4723401,
              "hits" : [
                {
                  "_index" : "english",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "contents",
                    "offset" : 0
                  },
                  "_score" : 1.4723401,
                  "_source" : {
                    "header" : "something special",
                    "body" : "I am John."
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

In the search response, there is two hits. And two is same content except for "_id".

Then I would like to remove one hit which is similar to another.

If someone have good solution for it, please help me...!!


The following solution of eliminating duplication "Field Collapsing" doesn't seems to be fit in using nested query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-collapse.html

You should may be do that at index time and basically index only one document.

Thank you for your reply.

I'll try to change way of indexing only one document.