If you use the Hadoop gateway to ship all your ES data to HDFS, is it
in a format amenable to running map-reduce jobs over, independently of
ES?
For example, it would be really useful to be able to do Pig queries
over the raw JSON document contents. Wonderdog (https://github.com/
infochimps/wonderdog) lets you do this via the ES cluster as a scan
query, but that will put load on ES. If the data's already being
written to the Hadoop cluster via a gateway, can you just analyse it
there? And if so, does anyone have an example?
No, its not really storing it in a way that you can easily read the actual json document.
On Tuesday, March 6, 2012 at 6:55 PM, Andrew Clegg wrote:
Hi,
If you use the Hadoop gateway to ship all your ES data to HDFS, is it
in a format amenable to running map-reduce jobs over, independently of
ES?
For example, it would be really useful to be able to do Pig queries
over the raw JSON document contents. Wonderdog (https://github.com/
infochimps/wonderdog) lets you do this via the ES cluster as a scan
query, but that will put load on ES. If the data's already being
written to the Hadoop cluster via a gateway, can you just analyse it
there? And if so, does anyone have an example?
Could you read the indices with the lucene libraries?
Craig
On Tue, Mar 6, 2012 at 1:56 PM, Shay Banon kimchy@gmail.com wrote:
No, its not really storing it in a way that you can easily read the
actual json document.
On Tuesday, March 6, 2012 at 6:55 PM, Andrew Clegg wrote:
Hi,
If you use the Hadoop gateway to ship all your ES data to HDFS, is it
in a format amenable to running map-reduce jobs over, independently of
ES?
For example, it would be really useful to be able to do Pig queries
over the raw JSON document contents. Wonderdog (https://github.com/
infochimps/wonderdog) lets you do this via the ES cluster as a scan
query, but that will put load on ES. If the data's already being
written to the Hadoop cluster via a gateway, can you just analyse it
there? And if so, does anyone have an example?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.