How ElasticSearch indexes the content of the files which are already compressed

(Rohan) #1

I have multiple files which are compressed in .gz format and I need to have search indexing on the content of the files which are compressed. How does Elasticsearch performs this ?

(Magnus Bäck) #2

You'll have to feed Elasticsearch uncompressed data.

(Rohan) #3


Since I have huge amount of data (around 100 TB) I was considering to compress the data. Is there any other way without uncompressing the data for indexing. (like using tools such as Solr and Apache tika) ?

(Magnus Bäck) #4

I'm not sure I understand the problem. You can still keep the data compressed while at rest, but at the exact moment you feed it to Elasticsearch via an API of some sort it must not be compressed. The uncompressed source data never needs to touch the disk though.

If you have 100 TB data I'd be more worried about the amount of horsepower (CPU, RAM, and disk) that it's going to take for Elasticsearch to index the data.

(system) #5