Store Text Content of PDF in elastic search

a.nguyentuan · July 21, 2023, 7:10am

I had a database which contain millions of document text content records. I would like to sync those text into Elasticsearch index for document content searching purpose. Could you please suggest what is the best way to store those texts in ES as it will take a lot of storage

a.nguyentuan · July 22, 2023, 12:34am

I have research around and Seem like ES is already compressed stored field according to this topic Large string fields - Elastic Stack / Elasticsearch - Discuss the Elastic Stack

There is also a settings to compress it even more
Store compression in Lucene and Elasticsearch | Elastic Blog

curl -XPUT ‘localhost:9200/my_index’ -d ‘{
  “settings”: {
    “index.codec”: “best_compression”
  }
}’

Is this setting suitable for our storage optimization if we store millions of document content?

system · August 19, 2023, 12:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Large string fields Elasticsearch	6	4871	February 15, 2017
Compression in Elasticsearch documents Elasticsearch	5	3064	July 6, 2017
Configuring compression Elasticsearch	1	447	July 6, 2017
Indexing PDF file in ElasticSearch using Java Code Elasticsearch	2	2645	August 28, 2018
Elasticsearch storage usage Elasticsearch	5	384	July 6, 2017

Store Text Content of PDF in elastic search

Related topics