Is it beneficial to use ES-Hadoop over ES when you are not an hadoop user already?


(im.khosravi) #1

Dear all

I have a question in my mind about es-hadoop and it's benefits over running
ES lonely.

The question is if we are not a hadoop user already, is this a good idea to
use es-hadoop?
In other words is there any performance or reliability gain that will come
in hadoop edition that we missed when using ES lonely?

until now i thought that maybe the es-hadoop is good for those who has
hadoop clusters up and running and now they could integrate their ES
environments with hadoop.

Thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df68552b-8743-4f02-a480-80d3bd8407de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #2

To answer your initial question, no.

Disclaimer: I am leading the es-hadoop project so clearly I'm biased :slight_smile:

es-hadoop is a 'connector' between es and Hadoop (whether it's traditional
Map/Reduce or the new real-time approaches like Spark). If you are not
using the above, then simply start using ES as is. In both cases, ES is
exactly the same.

Unless there's a (functional) requirement, the extra hardware can be
allocated to ES (to improve performance and handle even more data).

That being said, if you do find yourself using Hadoop or its ilk, then do
give es-hadoop a try since it's likely to simplify a LOT of things and
leverage the distributed architecture (aka Hadoop).

On Tue, Jul 29, 2014 at 6:34 PM, im.khosravi@gmail.com wrote:

Dear all

I have a question in my mind about es-hadoop and it's benefits over
running ES lonely.

The question is if we are not a hadoop user already, is this a good idea
to use es-hadoop?
In other words is there any performance or reliability gain that will come
in hadoop edition that we missed when using ES lonely?

until now i thought that maybe the es-hadoop is good for those who has
hadoop clusters up and running and now they could integrate their ES
environments with hadoop.

Thanks in advance

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df68552b-8743-4f02-a480-80d3bd8407de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df68552b-8743-4f02-a480-80d3bd8407de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmeGocfgvpUg-CpocVm0FC-k%3DO1wxxR%3Dx3LUUrdyyJwTzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(im.khosravi) #3

Thanks Costin, fair anwser :slight_smile:

I've been using ES for more than a year and i think I'm good with it.

On Tuesday, July 29, 2014 10:05:29 PM UTC+4:30, Costin Leau wrote:

To answer your initial question, no.

Disclaimer: I am leading the es-hadoop project so clearly I'm biased :slight_smile:

es-hadoop is a 'connector' between es and Hadoop (whether it's traditional
Map/Reduce or the new real-time approaches like Spark). If you are not
using the above, then simply start using ES as is. In both cases, ES is
exactly the same.

Unless there's a (functional) requirement, the extra hardware can be
allocated to ES (to improve performance and handle even more data).

That being said, if you do find yourself using Hadoop or its ilk, then do
give es-hadoop a try since it's likely to simplify a LOT of things and
leverage the distributed architecture (aka Hadoop).

On Tue, Jul 29, 2014 at 6:34 PM, <im.kh...@gmail.com <javascript:>> wrote:

Dear all

I have a question in my mind about es-hadoop and it's benefits over
running ES lonely.

The question is if we are not a hadoop user already, is this a good idea
to use es-hadoop?
In other words is there any performance or reliability gain that will
come in hadoop edition that we missed when using ES lonely?

until now i thought that maybe the es-hadoop is good for those who has
hadoop clusters up and running and now they could integrate their ES
environments with hadoop.

Thanks in advance

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df68552b-8743-4f02-a480-80d3bd8407de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df68552b-8743-4f02-a480-80d3bd8407de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/470e2482-aec4-4d5a-a8ed-11a9ea609f2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(John Conlon) #4

Hi Costin,

I too have a similar question, modified by the fact that we will have streaming, (Kafka?) and ETL integration requirements hitting us within the next 6 months. So I have been looking at Spark and am very impressed with Spark programming model and your ES-Hadoop connector.

Problem domain is IOT/sensor data and our primary datastore is an Eclipse EMF/CDO model repository and supporting services running in an OSGi Eqinox framework on Java 8. Will be syncing this primary datasource to ES.

Now the question is do I jump right into Spark and use it to do the Model Repo->ES even though our first use case could get by without it and probably just use the Java API and bulk indexing?

thanks,
John


(mike) #5

we have ES and will forward data that is older than 90 days to the hadoop hdfs. How would I access this data in hdfs using es-hadoop? Does es-hadoop have http api?


(system) #6