Regarding production configuration of elasticsearch


#1

Hi ,

I am planning to import 1billion records from mysql to ElasticSearch on production machine having 32GB RAM and 500GB of hard disk space.
I am using elasticsearch-1.4.2 and elasticsearch-river-jdbc-1.4.0.9.
I am running only one ElasticSearch node with default setting.
Can ane one suggest me whether do I need to add any extra configuration in ElasticSearch or it will work fine with default settings?

Thanks,
Sohil


(Mark Walkom) #2

It will work fine out of the box.


#3

Thanks @warkolm.

I have imported 1billion data from mysql to ElasticSearch and it took ~2 hours.
Is there any way we can make this faster or it's already giving good performance?

Thanks,
Sohil


(Mark Walkom) #4

That's not too bad!

How are you extracting the data from MySQL and getting it into ES? What heap have you assigned ES?


#5

Hi @warkolm,

curl -XPUT '10.XXX.XXX.XX:9200/_river/url_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy": "simple",
"autocommit": true,
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://10.XXX.XXX.XX:3306/DomainDB",
"user" : "sohil",
"password" : "XXX",
"sql" : [
{
"statement" : "select id as "_id", p_id as "pid" ,s_id as "sid" ,domain_name as "domain" from domaintable"
}
],
"maxbulkactions":5000,
"maxconcurrrentbulkactions":1,
"index" : "urls",
"type" : "url",
"type_mapping": {"urls" : {"properties" : {"_id":{"type":"long","store":"yes"},"domain":{"type":"string","store":"yes","index":"not_analyzed"},"pid":{"type":"long","store":"yes"},"sid":{"type":"long","store":"yes"}}}}
}

}'

This is how I am extracting data from mysql to ElasticSearch.

I have set ES_MIN_MEM=4g and ES_MAX_MEM=24g


(Mark Walkom) #6

You should set max and min heap to be the same, see here for more.


#7

Thanks @warkolm.
I hope by keeping max and min heap to be same will increase the performance of exporting data from mysql to ES.


(Mark Walkom) #8

Also please be aware that rivers are being deprecated.


(system) #9