Elasticsearch with hadoop and pig


(srivibalu) #1

I am doing a poc project using elasticsearch-hadoop and using pig to create
and read the index. Is there an elaborate documentation on how to use
elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS
just a backup store only?

Also, load from es index command is not returning the schema (index
mapping) in pig, so i am unable to reference the fields using the name in
subsequent steps. filters work fine but sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8453cfd0-ea5c-4614-bf6d-e3c838e5980f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Yann Barraud) #2

Hi,

I poublished this one last week.

https://github.com/hortonworks/hadoop-tutorials/blob/master/Community/T07_Elasticsearch_Hadoop_Integration.md

Le mardi 4 mars 2014 15:56:25 UTC+1, sriv...@gmail.com a écrit :

I am doing a poc project using elasticsearch-hadoop and using pig to
create and read the index. Is there an elaborate documentation on how to
use elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS
just a backup store only?

Also, load from es index command is not returning the schema (index
mapping) in pig, so i am unable to reference the fields using the name in
subsequent steps. filters work fine but sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aeb2a1b7-6c88-4324-9003-2d1ab65add82%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #3

Have you looked at the reference documentation [1] ? There also an overview video with the main es-hadoop features [2].

Regarding HDFS, you can mount it as a NFS and expose it to ES which can use it directly.

As for the mapping, you need to define it in Pig as it will not be inferred from the ES mapping. This is done on purpose
since his is how all libraries in Hadoop work as the data isn't guaranteed to be normalized. Additionally you typically
want just a section of the data so the Hadoop-side mapping indicates the view applied to ES.
In the end, the driver is Hadoop hence its settings take precedence.

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/pig.html
[2] http://www.elasticsearch.org/videos/search-and-analytics-with-hadoop-and-elasticsearch/

On 3/4/2014 4:56 PM, srivibalu@gmail.com wrote:

I am doing a poc project using elasticsearch-hadoop and using pig to create and read the index. Is there an elaborate
documentation on how to use elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS just a backup store only?
Also, load from es index command is not returning the schema (index mapping) in pig, so i am unable to reference the
fields using the name in subsequent steps. filters work fine but sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8453cfd0-ea5c-4614-bf6d-e3c838e5980f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53170022.7070701%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4