Storing large amount of data in ES

Anand_Nalya · April 1, 2013, 2:07pm

Hi,

We have a use-case in which we need to index petabytes of data into ES. I
was assuming that all the indices will be stored in HDFS using the HDFS
gateway. But from the guide, I understand that each ES node will also
maintain a local copy of the indices. Is this the correct interpretation?
If yes, then what strategy can I use to distribute the data among various
ES nodes as having that much storage on a single node is not possible.

Also, since HDFS gateway is deprecated, is there some other way of storing
the indices on HDFS.

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · April 1, 2013, 9:47pm

If you use the local gateway, you don't need a single location to hold all
of your index data.

Indexes always need to be stored locally on each node for
searching/indexing operations and the local gateway utilizes this local
store for persistence.

So, you need lots of nodes with lots of local storage available on each
node. Remember to take into account replicas if you want to be tolerant to
losing a node.

Best Regards,
Paul

On Monday, April 1, 2013 8:07:22 AM UTC-6, anand nalya wrote:

Hi,

We have a use-case in which we need to index petabytes of data into ES. I
was assuming that all the indices will be stored in HDFS using the HDFS
gateway. But from the guide, I understand that each ES node will also
maintain a local copy of the indices. Is this the correct interpretation?
If yes, then what strategy can I use to distribute the data among various
ES nodes as having that much storage on a single node is not possible.

Also, since HDFS gateway is deprecated, is there some other way of storing
the indices on HDFS.

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

otisg · April 2, 2013, 3:30am

Hi,

Regarding the HDFS part. Do you really want to store indices on HDFS or
just (raw) data? Storing indices in HDFS doesn't have a ton of value other
than treating HDFS as backup with its replication. But if you want to do
that, you can simply copy indices to HDFS while there are no writes being
done on them. If you want to store the raw data to HDFS, you could do it
at write time. Lots of people hook up Kafka or Storm in the indexing
pipeline and use Kafka's and Storm's (or Flume's) support for writing to
HDFS.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Monday, April 1, 2013 10:07:22 AM UTC-4, anand nalya wrote:

Hi,

We have a use-case in which we need to index petabytes of data into ES. I
was assuming that all the indices will be stored in HDFS using the HDFS
gateway. But from the guide, I understand that each ES node will also
maintain a local copy of the indices. Is this the correct interpretation?
If yes, then what strategy can I use to distribute the data among various
ES nodes as having that much storage on a single node is not possible.

Also, since HDFS gateway is deprecated, is there some other way of storing
the indices on HDFS.

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch and Hadoop Questions Elasticsearch	10	377	July 6, 2017
Index data in HDFS and Elasticsearch query it from HDFS Elasticsearch	1	427	July 6, 2017
Store indexes in ES while the data stays in HDFS Elasticsearch es-hadoop	4	967	July 6, 2017
Hadoop as storage Elasticsearch	2	358	July 6, 2017
Elasticsearch with Hadoop HDFS Elasticsearch	3	498	July 6, 2017

Storing large amount of data in ES

Otis

Related topics