Regarding production configuration of elasticsearch

Hi ,

I am planning to import 1billion records from mysql to ElasticSearch on production machine having 32GB RAM and 500GB of hard disk space.
I am using elasticsearch-1.4.2 and elasticsearch-river-jdbc-1.4.0.9.
I am running only one ElasticSearch node with default setting.
Can ane one suggest me whether do I need to add any extra configuration in ElasticSearch or it will work fine with default settings?

Thanks,
Sohil

It will work fine out of the box.

Thanks @warkolm.

I have imported 1billion data from mysql to ElasticSearch and it took ~2 hours.
Is there any way we can make this faster or it's already giving good performance?

Thanks,
Sohil

That's not too bad!

How are you extracting the data from MySQL and getting it into ES? What heap have you assigned ES?

Hi @warkolm,

curl -XPUT '10.XXX.XXX.XX:9200/_river/url_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy": "simple",
"autocommit": true,
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://10.XXX.XXX.XX:3306/DomainDB",
"user" : "sohil",
"password" : "XXX",
"sql" : [
{
"statement" : "select id as "_id", p_id as "pid" ,s_id as "sid" ,domain_name as "domain" from domaintable"
}
],
"maxbulkactions":5000,
"maxconcurrrentbulkactions":1,
"index" : "urls",
"type" : "url",
"type_mapping": {"urls" : {"properties" : {"_id":{"type":"long","store":"yes"},"domain":{"type":"string","store":"yes","index":"not_analyzed"},"pid":{"type":"long","store":"yes"},"sid":{"type":"long","store":"yes"}}}}
}

}'

This is how I am extracting data from mysql to ElasticSearch.

I have set ES_MIN_MEM=4g and ES_MAX_MEM=24g

You should set max and min heap to be the same, see here for more.

Thanks @warkolm.
I hope by keeping max and min heap to be same will increase the performance of exporting data from mysql to ES.

Also please be aware that rivers are being deprecated.