Regarding production configuration of elasticsearch

sohilelasticsearch · June 17, 2015, 7:02pm

Hi ,

I am planning to import 1billion records from mysql to ElasticSearch on production machine having 32GB RAM and 500GB of hard disk space.
I am using elasticsearch-1.4.2 and elasticsearch-river-jdbc-1.4.0.9.
I am running only one ElasticSearch node with default setting.
Can ane one suggest me whether do I need to add any extra configuration in ElasticSearch or it will work fine with default settings?

Thanks,
Sohil

warkolm · June 18, 2015, 2:52am

It will work fine out of the box.

sohilelasticsearch · June 18, 2015, 4:32am

Thanks @warkolm.

I have imported 1billion data from mysql to ElasticSearch and it took ~2 hours.
Is there any way we can make this faster or it's already giving good performance?

Thanks,
Sohil

warkolm · June 18, 2015, 4:42am

That's not too bad!

How are you extracting the data from MySQL and getting it into ES? What heap have you assigned ES?

sohilelasticsearch · June 18, 2015, 5:06am

Hi @warkolm,

curl -XPUT '10.XXX.XXX.XX:9200/_river/url_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy": "simple",
"autocommit": true,
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://10.XXX.XXX.XX:3306/DomainDB",
"user" : "sohil",
"password" : "XXX",
"sql" : [
{
"statement" : "select id as "_id", p_id as "pid" ,s_id as "sid" ,domain_name as "domain" from domaintable"
}
],
"maxbulkactions":5000,
"maxconcurrrentbulkactions":1,
"index" : "urls",
"type" : "url",
"type_mapping": {"urls" : {"properties" : {"_id":{"type":"long","store":"yes"},"domain":{"type":"string","store":"yes","index":"not_analyzed"},"pid":{"type":"long","store":"yes"},"sid":{"type":"long","store":"yes"}}}}
}

}'

This is how I am extracting data from mysql to ElasticSearch.

I have set ES_MIN_MEM=4g and ES_MAX_MEM=24g

warkolm · June 18, 2015, 5:07am

You should set max and min heap to be the same, see here for more.

sohilelasticsearch · June 18, 2015, 5:24am

Thanks @warkolm.
I hope by keeping max and min heap to be same will increase the performance of exporting data from mysql to ES.

warkolm · June 18, 2015, 5:34am

Also please be aware that rivers are being deprecated.

Topic		Replies	Views
What's best production setup for handling 1 billion records? Elasticsearch	13	1754	July 5, 2017
Logstash output Performance Logstash	1	483	February 8, 2017
Bulk insert is too slow Elasticsearch	9	2056	August 14, 2021
Queries on Elastic Search Configuration and Bulk Import Elasticsearch	1	339	July 6, 2017
ES jdbc river problem Elasticsearch	1	351	July 6, 2017

Regarding production configuration of elasticsearch

Related topics