What are the potential drawbacks of changing ( "ignore_above": 4000) and is there a better way to recommend it?

monther · November 2, 2024, 9:35pm

I have a field in which I store very textual data that may reach 4000 characters, which are symbols, numbers, and (/,*,-,_,.). When I store this data in the default state, it is normal and receives the data, but when I search in term, it does not return any value. So I changed "ignore_above" to 4000 so that I can search accurately using term, but I notice that there is a delay and the use of more space.

Is there another way to store this data with a length of 4000 characters and search for it accurately using term?

Christian_Dahlqvist · November 3, 2024, 7:56am

I would say that is expected as you are storing a high-cardinality field with very large terms which take up a lot of space. If you are searching based on the full term, might it be an option to store a hash of the field in a separate field and search based on this?

monther · November 3, 2024, 3:17pm

No, I cannot divide the data because I rely on it in my search operations to a large extent, and at the same time I want the search to be as fast and accurate as possible.

monther · November 3, 2024, 3:19pm

No, I cannot divide the data because I rely on it in my search operations to a large extent, and at the same time I want the search to be as fast and accurate as possible.

Christian_Dahlqvist · November 4, 2024, 4:30am

I did not suggest dividing it but rather calculate a hash and store that in a separate field and use that for exact term lookup. This requires you to hash the query string as well and may result in hash collisions, although you may reduce the risk of that by selecting an appropriate hash function.

No, not that I am aware of.

Topic		Replies	Views
What is the solution in order to search for the entire text accurately without increasing ignore_above in order for the space to remain the same? Elasticsearch	19	63	January 16, 2025
ElasticSearch sorting by text length and huge ignore_above Elasticsearch	2	43	August 24, 2024
If I reduce the number in "ignore_above" will I use less disk space? Elasticsearch	2	426	February 18, 2021
How big a field can be Elasticsearch	3	23535	December 28, 2018
Ignore_above behavior Elasticsearch	3	1033	September 19, 2017

What are the potential drawbacks of changing ( "ignore_above": 4000) and is there a better way to recommend it?

Related topics