Is there a limit on how big a field can be when we ingest a document to ES ? What are the known issues or drawbacks while ingesting huge text / string data in to a single field and after ingestion, what are the drawbacks when it comes to search on those large fields. I am using ES version 6.2.
This option is also useful for protecting against Lucene’s term byte-length limit of 32766 .
The value for ignore_above is the character count , but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes.
Quite did not understand what this means .. Thanks
You can have fairly long fields. Megabytes of text will work but some thing aren't super efficient. keyword typed fields that long won't be properly searchable because of ignore_above but text fields will be analysed and searchable. In general when you have fields that big you should think about breaking up the document into parts some how which will let you more quickly point users to which part matched their query. That isn't always the right thing to do, but it is a thing to think about when you have very large fields.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.