How do we store elasticsearch data into Hadoop cluster(Amabri)

kishore419 · March 18, 2020, 10:57am

Hello All
i am using 3 node elasticsearch(7.4.2) cluster in my environment and generating 100GB(appx) data daily, hence i would like to maintain only 30days data in elasticsearch and remaining data should be stored in hadoop(Ambari 2.7.0 - hdfs 3.0.0).
i found 'Hadoop HDFS Repository' plugin for doing the job
but could not able to use it.
can someone please share complete steps for completing the above requirement.

Note: i have installed plugin in all the three elasticsearch nodes
sudo bin/elasticsearch-plugin install repository-hdfs

unable to understand the below query usage
PUT _snapshot/my_hdfs_repository
{
"type": "hdfs",
"settings": {
"uri": "hdfs://namenode:8020/",
"path": "elasticsearch/repositories/my_hdfs_repository",
"conf.dfs.client.read.shortcircuit": "true"
}
}

how to configure path etc value.
please help me

rameshkr1994 · March 18, 2020, 11:26am

Hi @kishore419.

you want to store elastic search index data into HDFS ?or HDFS Data into elastic search?

if you want to store data into HDFS then create Hive external table and point on your elastic search index data.then use CTAS to store external hive table into another hive table.

what mean by maintain only 30days data in elasticsearch ?

Please explore more ...

Thanks
HadoopHelp

kishore419 · March 19, 2020, 10:22am

rameshkr1994:

f you want to store data into HDFS then create Hive external table and point on your elastic search index data.then use CTAS to store external hive table into another hive table.
what mean by maintain only 30days data in elasticsearch ?
[kishore]
15+ daily indices are creating in my cluster with 100GB size(total) and i will keep only 30 days indices data in elasticsearch and move the older data to Hadoop cluster

Please explore more ...

hi Ramesh
i want to store Elasticsearch data to Hadoop
i have found 'Hadoop HDFS Repository Plugin' for doing this job
but i was stuck while executing the following command from kibana(dev tools)

PUT _snapshot/my_hdfs_repository
{
"type": "hdfs",
"settings": {
"uri": "hdfs://namenode:8020/",
"path": "elasticsearch/repositories/my_hdfs_repository",
"conf.dfs.client.read.shortcircuit": "true"
}
}

i am new to hadoop and using hadoop 3 node cluster using ambari.
do we need to create any path in hadoop cluster
or suggest hadoop configuration changes.

let me know if you have any questions.

system · April 16, 2020, 10:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding Elasticsearch-Hadoop Elasticsearch	2	385	July 6, 2017
Create index to HDFS from ElasticSearch Elasticsearch	7	1772	July 5, 2017
HDFS as elastic search data repository Elasticsearch	7	1043	July 5, 2017
Hadoop as storage Elasticsearch	2	381	July 6, 2017
Save and search data with es & hadoop Elasticsearch es-hadoop	4	1267	July 6, 2017

How do we store elasticsearch data into Hadoop cluster(Amabri)

Related topics