How ElasticSearch indexes the content of the files which are already compressed

rohansjsu · September 13, 2015, 7:56pm

I have multiple files which are compressed in .gz format and I need to have search indexing on the content of the files which are compressed. How does Elasticsearch performs this ?

magnusbaeck · September 13, 2015, 8:03pm

You'll have to feed Elasticsearch uncompressed data.

rohansjsu · September 13, 2015, 8:17pm

thanks,

Since I have huge amount of data (around 100 TB) I was considering to compress the data. Is there any other way without uncompressing the data for indexing. (like using tools such as Solr and Apache tika) ?

magnusbaeck · September 13, 2015, 8:21pm

I'm not sure I understand the problem. You can still keep the data compressed while at rest, but at the exact moment you feed it to Elasticsearch via an API of some sort it must not be compressed. The uncompressed source data never needs to touch the disk though.

If you have 100 TB data I'd be more worried about the amount of horsepower (CPU, RAM, and disk) that it's going to take for Elasticsearch to index the data.

Topic		Replies	Views
How to compress data when write into es Elasticsearch	5	3429	March 17, 2022
Large string fields Elasticsearch	6	4873	February 15, 2017
How to check if index is compressed? Elasticsearch	2	712	July 6, 2017
File Compression in Elasticsearch Elasticsearch	4	839	July 5, 2017
Elastic Search Index Data Compression (v1.4.2) Elasticsearch	11	4223	July 6, 2017

How ElasticSearch indexes the content of the files which are already compressed

Related topics