Store Text Content of PDF in elastic search

I had a database which contain millions of document text content records. I would like to sync those text into Elasticsearch index for document content searching purpose. Could you please suggest what is the best way to store those texts in ES as it will take a lot of storage

1 Like

I have research around and Seem like ES is already compressed stored field according to this topic Large string fields - Elastic Stack / Elasticsearch - Discuss the Elastic Stack

There is also a settings to compress it even more
Store compression in Lucene and Elasticsearch | Elastic Blog

curl -XPUT ā€˜localhost:9200/my_indexā€™ -d ā€˜{
  ā€œsettingsā€: {
    ā€œindex.codecā€: ā€œbest_compressionā€
  }
}ā€™

Is this setting suitable for our storage optimization if we store millions of document content?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.