I am doing a poc project using elasticsearch-hadoop and using pig to create
and read the index. Is there an elaborate documentation on how to use
elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS
just a backup store only?
Also, load from es index command is not returning the schema (index
mapping) in pig, so i am unable to reference the fields using the name in
subsequent steps. filters work fine but sorting doesn't work.
I am doing a poc project using elasticsearch-hadoop and using pig to
create and read the index. Is there an elaborate documentation on how to
use elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS
just a backup store only?
Also, load from es index command is not returning the schema (index
mapping) in pig, so i am unable to reference the fields using the name in
subsequent steps. filters work fine but sorting doesn't work.
Have you looked at the reference documentation [1] ? There also an overview video with the main es-hadoop features [2].
Regarding HDFS, you can mount it as a NFS and expose it to ES which can use it directly.
As for the mapping, you need to define it in Pig as it will not be inferred from the ES mapping. This is done on purpose
since his is how all libraries in Hadoop work as the data isn't guaranteed to be normalized. Additionally you typically
want just a section of the data so the Hadoop-side mapping indicates the view applied to ES.
In the end, the driver is Hadoop hence its settings take precedence.
I am doing a poc project using elasticsearch-hadoop and using pig to create and read the index. Is there an elaborate
documentation on how to use elasticsearch with pig and hadoop?
I want to know if the index could be saved in HDFS permanently or is HDFS just a backup store only?
Also, load from es index command is not returning the schema (index mapping) in pig, so i am unable to reference the
fields using the name in subsequent steps. filters work fine but sorting doesn't work.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.