Elasticsearch with hadoop and pig

srivibalu · March 4, 2014, 2:56pm

I am doing a poc project using elasticsearch-hadoop and using pig to create
and read the index. Is there an elaborate documentation on how to use
elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS
just a backup store only?

Also, load from es index command is not returning the schema (index
mapping) in pig, so i am unable to reference the fields using the name in
subsequent steps. filters work fine but sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8453cfd0-ea5c-4614-bf6d-e3c838e5980f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yann_Barraud · March 5, 2014, 9:44am

Hi,

I poublished this one last week.

https://github.com/hortonworks/hadoop-tutorials/blob/master/Community/T07_Elasticsearch_Hadoop_Integration.md

Le mardi 4 mars 2014 15:56:25 UTC+1, sriv...@gmail.com a écrit :

I am doing a poc project using elasticsearch-hadoop and using pig to
create and read the index. Is there an elaborate documentation on how to
use elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS
just a backup store only?

Also, load from es index command is not returning the schema (index
mapping) in pig, so i am unable to reference the fields using the name in
subsequent steps. filters work fine but sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aeb2a1b7-6c88-4324-9003-2d1ab65add82%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · March 5, 2014, 10:44am

Have you looked at the reference documentation [1] ? There also an overview video with the main es-hadoop features [2].

Regarding HDFS, you can mount it as a NFS and expose it to ES which can use it directly.

As for the mapping, you need to define it in Pig as it will not be inferred from the ES mapping. This is done on purpose
since his is how all libraries in Hadoop work as the data isn't guaranteed to be normalized. Additionally you typically
want just a section of the data so the Hadoop-side mapping indicates the view applied to ES.
In the end, the driver is Hadoop hence its settings take precedence.

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic
[2] Elasticsearch Platform — Find real-time answers at scale | Elastic

On 3/4/2014 4:56 PM, srivibalu@gmail.com wrote:

I am doing a poc project using elasticsearch-hadoop and using pig to create and read the index. Is there an elaborate
documentation on how to use elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS just a backup store only?
Also, load from es index command is not returning the schema (index mapping) in pig, so i am unable to reference the
fields using the name in subsequent steps. filters work fine but sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8453cfd0-ea5c-4614-bf6d-e3c838e5980f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53170022.7070701%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
ElasticSearch and Hadoop Elasticsearch	1	279	July 6, 2017
[hadoop] Extra Documents in Elastic Search Elasticsearch	3	356	July 6, 2017
Elastcisearch Hadoop customized index mapping? Elasticsearch es-hadoop	2	585	June 27, 2018
[Hadoop] storing data in ES using pig script Elasticsearch	8	429	July 6, 2017
Elasticsearch index settings in pig? Elasticsearch	1	350	July 6, 2017

Elasticsearch with hadoop and pig

Related topics