Elasticsearch with Hadoop HDFS


(Jongmin Kim) #1

I was searching infos about ES with HDFS. What I see is, using ES with
Hadoop does not mean using HDFS as main storage for ES.

ES updates indexes to HDFS every 10 sec. as backup. Is that right?

Is there any way that I can use HDFS like main storage for elasticsearch?

I'm using AWS and the maximum size of EBS that I can set is 1TB.

What can I do if I need to index and save more than 1TB datas.

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6e3463cf-7103-4a41-935c-aa3c8c880421%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #2

Hi,

On 11/02/2014 6:40 AM, Jong Min Kim wrote:

I was searching infos about ES with HDFS. What I see is, using ES with Hadoop does not mean using HDFS as main storage
for ES.

You can use HDFS as the main storage of ES if you mount it as a local filesystem (typically NFS). However, your
performance will suffer since a proper local disk is significantly faster (several orders of magnitude) than HDFS.

ES updates indexes to HDFS every 10 sec. as backup. Is that right?

Not sure what you mean by that - there's no implicit backing up; you can install the HDFS snapshot/restore plugin and
use that but there's no automatic backing - and that is on purpose; you can simply use crontab or something like that to
trigger the backing/snapshoting.

Is there any way that I can use HDFS like main storage for elasticsearch?

See above.

I'm using AWS and the maximum size of EBS that I can set is 1TB.

If you're using AWS, you might want to look at the AWS plugin.

What can I do if I need to index and save more than 1TB datas.

This is a generic question of what happens if my local storage is limited to X.
There are various things - the easiest one mounting multiple EBSs to the same or other EC2 nodes; think of the local
disk analogy, just as you use ES across multiple machines, with their own SSD/HDD, you can use ES across multiple EC2
nodes with their own EBS instances.

If you want to 'deprecate' data you can use the AWS plugin to do snapshots of 'old' data to S3 and then remove that from
the cluster; if needed you can easily restore it back from S3 as well.

Hope this helps,

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6e3463cf-7103-4a41-935c-aa3c8c880421%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52FA03A8.8010903%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jongmin Kim) #3

Thanks. Everytime I post question, I get answers with knowledge. :slight_smile:

2014년 2월 11일 화요일 오후 8시 4분 8초 UTC+9, Costin Leau 님의 말:

Hi,

On 11/02/2014 6:40 AM, Jong Min Kim wrote:

I was searching infos about ES with HDFS. What I see is, using ES with
Hadoop does not mean using HDFS as main storage
for ES.

You can use HDFS as the main storage of ES if you mount it as a local
filesystem (typically NFS). However, your
performance will suffer since a proper local disk is significantly faster
(several orders of magnitude) than HDFS.

ES updates indexes to HDFS every 10 sec. as backup. Is that right?

Not sure what you mean by that - there's no implicit backing up; you can
install the HDFS snapshot/restore plugin and
use that but there's no automatic backing - and that is on purpose; you
can simply use crontab or something like that to
trigger the backing/snapshoting.

Is there any way that I can use HDFS like main storage for
elasticsearch?

See above.

I'm using AWS and the maximum size of EBS that I can set is 1TB.

If you're using AWS, you might want to look at the AWS plugin.

What can I do if I need to index and save more than 1TB datas.

This is a generic question of what happens if my local storage is limited
to X.
There are various things - the easiest one mounting multiple EBSs to the
same or other EC2 nodes; think of the local
disk analogy, just as you use ES across multiple machines, with their own
SSD/HDD, you can use ES across multiple EC2
nodes with their own EBS instances.

If you want to 'deprecate' data you can use the AWS plugin to do snapshots
of 'old' data to S3 and then remove that from
the cluster; if needed you can easily restore it back from S3 as well.

Hope this helps,

Thanks in advance.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/6e3463cf-7103-4a41-935c-aa3c8c880421%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fae0b6e6-587c-4bbc-b48c-cda2008cb6ba%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4