Is it possible to eliminate duplication of search response when dealing with so long text?

Masanori_Ohnishi · May 21, 2019, 2:15pm

mapping

PUT /my_index/_mapping/blogpost

{
  "properties": {
    “document”: {
      "properties": {
        "content": { 
          "type": "text",
          "fields": {
            "raw": { 
              "type": "keyword",
              "ignore_above": 32766
            }
          }
        }
      }
    }
  }
}

register

POST /my_index/_doc/1

{
  "contents": "so long text which exceeds 32766 bytes"
}

search

GET /my_index/_search

{
    "query": {
        "match": {
            "contents": "somothing"
        }
    },
    "collapse" : {
        "field" : "contents.row" 
    }
}

Using this mapping, when searching text which is smaller than 32766 bytes,
I can eliminate duplication of search response.

However, when searching text which is larger than 32766 bytes, I can't.

Is there another way of solving my requirements?

Mark_Harwood · May 21, 2019, 2:30pm

A hash of the content would be a shorter value to de-duplicate on but while it will have no false negatives (it will always recognise a duplicate text) it can have a small number of false positives (declaring non-duplicate texts as identical)

Masanori_Ohnishi · May 28, 2019, 7:39am

Thank you very much.

By using a hash of content, I could resolve the problem!

system · June 25, 2019, 7:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What is the solution in order to search for the entire text accurately without increasing ignore_above in order for the space to remain the same? Elasticsearch	19	63	January 16, 2025
Is it possible to eliminate duplication of search response when using nested query? Elasticsearch	3	393	April 10, 2019
What all are the possible problems if I don't give keyword mapping to my text type? Elasticsearch	6	604	December 28, 2017
Mappings (Ignore_above) Elasticsearch	15	92	July 14, 2024
How to find duplicate documents containing super long text fields? Elasticsearch	4	2487	November 27, 2018

Is it possible to eliminate duplication of search response when dealing with so long text?

mapping

register

search

Related topics