Hadoop as Long-Term Archive for Elasticsearch, but will it work using hadoop compression like gzip?

rodrigomerino · April 10, 2015, 9:11am

Hi all,

we are going to receive a huge amount of logs and we are going to process
them with the typical ELK stack. But because of volume we plan to keep,
in ElasticSeach a week or at most a month of logs. After that time we plan
to use the Elasticsearch - hadoop integration, mainly for Long-Term
Archive, so older logs are moved to Hadoop. Disk space is a very
important issue here and we plan to use hadoop compression codecs like
gzip, for those logs older than a month.

If we use compression in hadoop, can we still index + graph data stored in
logs older than a month? Or for this goal, logs must be stored in a raw
format?

As far as I have read, if logs (in raw) are stored in hadoop, thanks to the
elasticsearch-hadoop integration, elasticsearch can index on them, and
kibana can report seamlessly both logs, current logs (stored locally in
elasticsearch servers) or older logs (in hadoop). Please correct me if I am
wrong. The question is if we lose this feature if we compress the logs in
hadoop.

Any help is appreciated.

Thanks and best regards,
Rodrigo.

P.S.: Do not hesitate to challenge the architecture/data-flow as well, if
you think there are better ways to do it.Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5810a405-8fd8-41f6-b949-9c36cb856b9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · April 10, 2015, 11:32am

Es-Hadoop leverages the existing Hadoop infrastrcture. So whatever compression or splitting is used by your
infrastructure, it will simply work with Elasticsearch as well.
Take a map reduce job - on the reading part you can use whatever InputFormat you are currently using (to deal with gzip
or what have you) and as an OutputFormat the Elasticsearch one. Everything is transparent and works through your
existing infrastructure.

Notice that you don't need a raw format (what is that) - as long as your data can be read into Hadoop Map/Reduce, Pig,
Hive, Cascading, Storm or Spark it can also be written/indexed to Elasticsearch. And vice-versa.

On 4/10/15 12:11 PM, Rodrigo Merino wrote:

Hi all,

we are going to receive a huge amount of logs and we are going to process them with the typical ELK stack. But
because of volume we plan to keep, in ElasticSeach a week or at most a month of logs. After that time we plan to use the
Elasticsearch - hadoop integration, mainly for Long-Term Archive, so older logs are moved to Hadoop. Disk space is a
very important issue here and we plan to use hadoop compression codecs like gzip, for those logs older than a month.

If we use compression in hadoop, can we still index + graph data stored in logs older than a month? Or for this goal,
logs must be stored in a raw format?

As far as I have read, if logs (in raw) are stored in hadoop, thanks to the elasticsearch-hadoop integration,
elasticsearch can index on them, and kibana can report seamlessly both logs, current logs (stored locally in
elasticsearch servers) or older logs (in hadoop). Please correct me if I am wrong. The question is if we lose this
feature if we compress the logs in hadoop.

Any help is appreciated.

Thanks and best regards,
Rodrigo.

P.S.: Do not hesitate to challenge the architecture/data-flow as well, if you think there are better ways to do it.Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5810a405-8fd8-41f6-b949-9c36cb856b9d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5810a405-8fd8-41f6-b949-9c36cb856b9d%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5527B4BB.9040409%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 10, 2015, 11:42am

Once it is out of ES you cannot use kibana on it.

On 10 April 2015 at 19:11, Rodrigo Merino rodrigo.merino@gmail.com wrote:

Hi all,

we are going to receive a huge amount of logs and we are going to
process them with the typical ELK stack. But because of volume we plan
to keep, in ElasticSeach a week or at most a month of logs. After that time
we plan to use the Elasticsearch - hadoop integration, mainly for Long-Term
Archive, so older logs are moved to Hadoop. Disk space is a very
important issue here and we plan to use hadoop compression codecs like
gzip, for those logs older than a month.

If we use compression in hadoop, can we still index + graph data stored
in logs older than a month? Or for this goal, logs must be stored in a
raw format?

As far as I have read, if logs (in raw) are stored in hadoop, thanks to
the elasticsearch-hadoop integration, elasticsearch can index on them, and
kibana can report seamlessly both logs, current logs (stored locally in
elasticsearch servers) or older logs (in hadoop). Please correct me if I am
wrong. The question is if we lose this feature if we compress the logs in
hadoop.

Any help is appreciated.

Thanks and best regards,
Rodrigo.

P.S.: Do not hesitate to challenge the architecture/data-flow as well, if
you think there are better ways to do it.Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5810a405-8fd8-41f6-b949-9c36cb856b9d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5810a405-8fd8-41f6-b949-9c36cb856b9d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8PAGQuZzPs6uW-CMPs5mQit78hsrmSJpcOQ_1a518dog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Hadoop / Elasticsearch functionality Elasticsearch es-hadoop	20	3338	July 6, 2017
Elastic search for Hadoop Elasticsearch	6	390	July 6, 2017
Elasticsearch with Hadoop Elasticsearch	21	517	July 6, 2017
Compression flag is not wrking Elasticsearch	3	696	July 6, 2017
Save and search data with es & hadoop Elasticsearch es-hadoop	4	1260	July 6, 2017

Hadoop as Long-Term Archive for Elasticsearch, but will it work using hadoop compression like gzip?

Related topics