To index Hadoop data into elasticsearch as I understand,
We create an external table with essstorage handler and then copy the data
from another internal hive table doesn't it duplicate the data in HDFS?
Is there any way to use the hive internal tables directly to index instead
of having two tables with same data?
There is no duplication per-se in HDFS. Hive tables are just 'views' of data - one sits unindexed, in raw format in HDFS
the other one is indexed and analyzed in Elasticsearch.
You can't combine the two since they are completely different things - one is a file-system, the other one is a search
and analytics engine.
On 09/01/2014 9:49 AM, Badal Mohapatra wrote:
Hi,
To index Hadoop data into elasticsearch as I understand,
We create an external table with essstorage handler and then copy the data from another internal hive table doesn't it
duplicate the data in HDFS?
Is there any way to use the hive internal tables directly to index instead of having two tables with same data?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.