I have a question in my mind about es-hadoop and it's benefits over running
ES lonely.
The question is if we are not a hadoop user already, is this a good idea to
use es-hadoop?
In other words is there any performance or reliability gain that will come
in hadoop edition that we missed when using ES lonely?
until now i thought that maybe the es-hadoop is good for those who has
hadoop clusters up and running and now they could integrate their ES
environments with hadoop.
Disclaimer: I am leading the es-hadoop project so clearly I'm biased
es-hadoop is a 'connector' between es and Hadoop (whether it's traditional
Map/Reduce or the new real-time approaches like Spark). If you are not
using the above, then simply start using ES as is. In both cases, ES is
exactly the same.
Unless there's a (functional) requirement, the extra hardware can be
allocated to ES (to improve performance and handle even more data).
That being said, if you do find yourself using Hadoop or its ilk, then do
give es-hadoop a try since it's likely to simplify a LOT of things and
leverage the distributed architecture (aka Hadoop).
I have a question in my mind about es-hadoop and it's benefits over
running ES lonely.
The question is if we are not a hadoop user already, is this a good idea
to use es-hadoop?
In other words is there any performance or reliability gain that will come
in hadoop edition that we missed when using ES lonely?
until now i thought that maybe the es-hadoop is good for those who has
hadoop clusters up and running and now they could integrate their ES
environments with hadoop.
I've been using ES for more than a year and i think I'm good with it.
On Tuesday, July 29, 2014 10:05:29 PM UTC+4:30, Costin Leau wrote:
To answer your initial question, no.
Disclaimer: I am leading the es-hadoop project so clearly I'm biased
es-hadoop is a 'connector' between es and Hadoop (whether it's traditional
Map/Reduce or the new real-time approaches like Spark). If you are not
using the above, then simply start using ES as is. In both cases, ES is
exactly the same.
Unless there's a (functional) requirement, the extra hardware can be
allocated to ES (to improve performance and handle even more data).
That being said, if you do find yourself using Hadoop or its ilk, then do
give es-hadoop a try since it's likely to simplify a LOT of things and
leverage the distributed architecture (aka Hadoop).
On Tue, Jul 29, 2014 at 6:34 PM, <im.kh...@gmail.com <javascript:>> wrote:
Dear all
I have a question in my mind about es-hadoop and it's benefits over
running ES lonely.
The question is if we are not a hadoop user already, is this a good idea
to use es-hadoop?
In other words is there any performance or reliability gain that will
come in hadoop edition that we missed when using ES lonely?
until now i thought that maybe the es-hadoop is good for those who has
hadoop clusters up and running and now they could integrate their ES
environments with hadoop.
I too have a similar question, modified by the fact that we will have streaming, (Kafka?) and ETL integration requirements hitting us within the next 6 months. So I have been looking at Spark and am very impressed with Spark programming model and your ES-Hadoop connector.
Problem domain is IOT/sensor data and our primary datastore is an Eclipse EMF/CDO model repository and supporting services running in an OSGi Eqinox framework on Java 8. Will be syncing this primary datasource to ES.
Now the question is do I jump right into Spark and use it to do the Model Repo->ES even though our first use case could get by without it and probably just use the Java API and bulk indexing?
we have ES and will forward data that is older than 90 days to the hadoop hdfs. How would I access this data in hdfs using es-hadoop? Does es-hadoop have http api?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.