Adding millions of documents, performance decay

Fabio_Pezzoni · November 30, 2012, 10:57am

Adding millions of documents, performance decay.

Hi,

I'm new to ElasticSearch and I'm trying to transfer a database of several
millions of JSON documents to a Lucene index through ES. Currently we can
use just a single node with 8 CPUs and we use the Java API to add
sequentially each document. We didn't changed the default options,
therefore our index has 5 shards. At the beginning the process was very
fast! In a few our we added about 50 millions of documents, then the
performance gradually fell and currently it can take seconds to add a
single document. Query performance are still very good!
There is a way to overtake this situation? Maybe changing the setting or
the number of shards...

Thank you very much, Fabio.

Here some node stats:

{"ok":true,"cluster_name":"twitter","nodes":{"j5RTlp7jTreoAlE6Tb3gwA":{"name":"xxx","transport_address":"inet[/xxx.xxx.xxx.xxx:9300]","hostname":"xxx","http_address":"inet[/xxx.xxx.xxx.xxx:9200]","settings":{"path.home":"/home/twitter/elasticsearch-0.19.11","foreground":"yes","logger.prefix":"","max-open-files":"true","node.name":"xxx","cluster.name":"twitter","name":"xxx","path.logs":"/home/twitter/elasticsearch-0.19.11/logs"},"os":{"refresh_interval":1000,"cpu":{"vendor":"Intel","model":"Xeon","mhz":3192,"total_cores":8,"total_sockets":8,"cores_per_socket":16,"cache_size":"8kb","cache_size_in_bytes":8192},"mem":{"total":"15.4gb","total_in_bytes":16543477760},"swap":{"total":"3.9gb","total_in_bytes":4294963200}},"process":{"refresh_interval":1000,"id":20301,"max_file_descriptors":65535},"jvm":{"pid":20301,"version":"1.6.0_34","vm_name":"Java
HotSpot(TM) 64-Bit Server VM","vm_version":"20.9-b04","vm_vendor":"Sun
Microsystems
Inc.","start_time":1354178513836,"mem":{"heap_init":"256mb","heap_init_in_bytes":268435456,"heap_max":"1011.2mb","heap_max_in_bytes":1060372480,"non_heap_init":"23.1mb","non_heap_init_in_bytes":24313856,"non_heap_max":"130mb","non_heap_max_in_bytes":136314880,"direct_max":"1011.2mb","direct_max_in_bytes":1060372480}}}}}

Index stats:

--

Igor_Motov · November 30, 2012, 4:38pm

If you haven't done this yet, set refresh interval to -1 by running:

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

when you are done with bulk reindexing you can turn it back on by running

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "1s"
}
}'

On Friday, November 30, 2012 5:57:29 AM UTC-5, Fabio Pezzoni wrote:

Adding millions of documents, performance decay.

Hi,

I'm new to Elasticsearch and I'm trying to transfer a database of several
millions of JSON documents to a Lucene index through ES. Currently we can
use just a single node with 8 CPUs and we use the Java API to add
sequentially each document. We didn't changed the default options,
therefore our index has 5 shards. At the beginning the process was very
fast! In a few our we added about 50 millions of documents, then the
performance gradually fell and currently it can take seconds to add a
single document. Query performance are still very good!
There is a way to overtake this situation? Maybe changing the setting or
the number of shards...

Thank you very much, Fabio.

Here some node stats:

{"ok":true,"cluster_name":"twitter","nodes":{"j5RTlp7jTreoAlE6Tb3gwA":{"name":"xxx","transport_address":"inet[/xxx.xxx.xxx.xxx:9300]","hostname":"xxx","http_address":"inet[/xxx.xxx.xxx.xxx:9200]","settings":{"path.home":"/home/twitter/elasticsearch-0.19.11","foreground":"yes","logger.prefix":"","max-open-files":"true","
node.name":"xxx","cluster.name":"twitter","name":"xxx","path.logs":"/home/twitter/elasticsearch-0.19.11/logs"},"os":{"refresh_interval":1000,"cpu":{"vendor":"Intel","model":"Xeon","mhz":3192,"total_cores":8,"total_sockets":8,"cores_per_socket":16,"cache_size":"8kb","cache_size_in_bytes":8192},"mem":{"total":"15.4gb","total_in_bytes":16543477760},"swap":{"total":"3.9gb","total_in_bytes":4294963200}},"process":{"refresh_interval":1000,"id":20301,"max_file_descriptors":65535},"jvm":{"pid":20301,"version":"1.6.0_34","vm_name":"Java
HotSpot(TM) 64-Bit Server VM","vm_version":"20.9-b04","vm_vendor":"Sun
Microsystems
Inc.","start_time":1354178513836,"mem":{"heap_init":"256mb","heap_init_in_bytes":268435456,"heap_max":"1011.2mb","heap_max_in_bytes":1060372480,"non_heap_init":"23.1mb","non_heap_init_in_bytes":24313856,"non_heap_max":"130mb","non_heap_max_in_bytes":136314880,"direct_max":"1011.2mb","direct_max_in_bytes":1060372480}}}}}

Index stats:

{"ok":true,"_shards":{"total":10,"successful":5,"failed":0},"_all":{"primaries":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"total":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"indices":{"twitter":{"primaries":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"total":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}}}}}}

--

Michael_Sick · November 30, 2012, 5:33pm

Also, have you tried to group you documents into bulk statements?

On Fri, Nov 30, 2012 at 11:38 AM, Igor Motov imotov@gmail.com wrote:

If you haven't done this yet, set refresh interval to -1 by running:

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

when you are done with bulk reindexing you can turn it back on by running

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "1s"
}
}'

On Friday, November 30, 2012 5:57:29 AM UTC-5, Fabio Pezzoni wrote:

Adding millions of documents, performance decay.

Hi,

I'm new to Elasticsearch and I'm trying to transfer a database of several
millions of JSON documents to a Lucene index through ES. Currently we can
use just a single node with 8 CPUs and we use the Java API to add
sequentially each document. We didn't changed the default options,
therefore our index has 5 shards. At the beginning the process was very
fast! In a few our we added about 50 millions of documents, then the
performance gradually fell and currently it can take seconds to add a
single document. Query performance are still very good!
There is a way to overtake this situation? Maybe changing the setting or
the number of shards...

Thank you very much, Fabio.

Here some node stats:

{"ok":true,"cluster_name":"twitter","nodes":{"
j5RTlp7jTreoAlE6Tb3gwA":{"name":"xxx","transport_
address":"inet[/xxx.xxx.xxx.xxx:9300]","hostname":"xxx","
http_address":"inet[/xxx.xxx.xxx.xxx:9200]","settings":{"
path.home":"/home/twitter/elasticsearch-0.19.11","
foreground":"yes","logger.prefix":"","max-open-files":"true","
node.name":"xxx","cluster.name http://cluster.name
":"twitter","name":"xxx","path.logs":"/home/
twitter/elasticsearch-0.19.11/logs"},"os":{"refresh_
interval":1000,"cpu":{"vendor":"Intel","model":"Xeon","mhz":**
3192,"total_cores":8,"total_sockets":8,"cores_per_socket":
16,"cache_size":"8kb","cache_size_in_bytes":8192},"mem":{"
total":"15.4gb","total_in_bytes":16543477760},"swap":{"
total":"3.9gb","total_in_bytes":4294963200}},"process":
{"refresh_interval":1000,"id":20301,"max_file_descriptors":
65535},"jvm":{"pid":20301,"version":"1.6.0_34","vm_name":"Java
HotSpot(TM) 64-Bit Server VM","vm_version":"20.9-b04","**vm_vendor":"Sun
Microsystems Inc.","start_time":1354178513836,"mem":{"heap_
init":"256mb","heap_init_in_bytes":268435456,"heap_max":"
1011.2mb","heap_max_in_bytes":1060372480,"non_heap_init":"
23.1mb","non_heap_init_in_bytes":24313856,"non_heap_max"
:"130mb","non_heap_max_in_bytes":136314880,"direct_max":
"1011.2mb","direct_max_in_**bytes":1060372480}}}}}

Index stats:

{"ok":true,"shards":{"total":10,"successful":5,"
failed":0},"all":{"primaries":{"docs":{"count":54298508,"
deleted":24537},"store":{"size":"50.6gb","size_in_bytes"
:54435026631,"throttle_time":"0s","throttle_time_in_millis":
0},"indexing":{"index_total":63351487,"index_time":"2d","
index_time_in_millis":173455844,"index_current":0,"
delete_total":0,"delete_time":"0s","delete_time_in_millis":
0,"delete_current":0},"get":{"**total":0,"time":"0s","time_in**
millis":0,"exists_total":0,"**exists_time":"0s","exists**
time_in_millis":0,"missing_total":0,"missing_time":"0s","
missing_time_in_millis":0,"current":0},"search":{"query_
total":185,"query_time":"1.2m","query_time_in_millis":73521,
"query_current":0,"fetch_total":71,"fetch_time":"4.6m",
"fetch_time_in_millis":279532,"fetch_current":0},"merges":{"
current":6,"current_docs":60,"current_size":"462.5kb","
current_size_in_bytes":473615,"total":761368,"total_time":"
3.1d","total_time_in_millis":276398054,"total_docs":
224483235,"total_size":"251.7gb","total_size_in_bytes":
270307183589},"refresh":{"total":177694,"total_time":"1.
3d","total_time_in_millis":116764453},"flush":{"total":
7797,"total_time":"1.2d","total_time_in_millis":
111453937}},"total":{"docs":{"count":54298508,"deleted":
24537},"store":{"size":"50.6gb","size_in_bytes":
54435026631,"throttle_time":"0s","throttle_time_in_millis":
0},"indexing":{"index_total":63351487,"index_time":"2d","
index_time_in_millis":173455844,"index_current":0,"
delete_total":0,"delete_time":"0s","delete_time_in_millis":
0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_
millis":0,"exists_total":0,"exists_time":"0s","exists_
time_in_millis":0,"missing_total":0,"missing_time":"0s","
missing_time_in_millis":0,"current":0},"search":{"query_
total":185,"query_time":"1.2m","query_time_in_millis":73521,
"query_current":0,"fetch_total":71,"fetch_time":"4.6m",
"fetch_time_in_millis":279532,"fetch_current":0},"merges":{"
current":6,"current_docs":60,"current_size":"462.5kb","
current_size_in_bytes":473615,"total":761368,"total_time":"
3.1d","total_time_in_millis":276398054,"total_docs":
224483235,"total_size":"251.7gb","total_size_in_bytes":
270307183589},"refresh":{"total":177694,"total_time":"1.
3d","total_time_in_millis":116764453},"flush":{"total":
7797,"total_time":"1.2d","total_time_in_millis":
111453937}},"indices":{"twitter":{"primaries":{"docs":
{"count":54298508,"deleted":24537},"store":{"size":"50.
6gb","size_in_bytes":54435026631,"throttle_time":"
0s","throttle_time_in_millis":0},"indexing":{"index_total":
63351487,"index_time":"2d","index_time_in_millis":
173455844,"index_current":0,"delete_total":0,"delete_time":
"0s","delete_time_in_millis":0,"delete_current":0},"get":{"
total":0,"time":"0s","time_in_millis":0,"exists_total":0,"
exists_time":"0s","exists_time_in_millis":0,"missing_
total":0,"missing_time":"0s","missing_time_in_millis":0,"
current":0},"search":{"query_total":185,"query_time":"1.2m"
,"query_time_in_millis":73521,"query_current":0,"fetch_
total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,
"fetch_current":0},"merges":{"current":6,"current_docs":60,"
current_size":"462.5kb","current_size_in_bytes":473615,
"total":761368,"total_time":"3.1d","total_time_in_millis":
276398054,"total_docs":224483235,"total_size":"251.
7gb","total_size_in_bytes":270307183589},"refresh":{"
total":177694,"total_time":"1.3d","total_time_in_millis":
116764453},"flush":{"total":7797,"total_time":"1.2d","
total_time_in_millis":111453937}},"total":{"docs":{"
count":54298508,"deleted":24537},"store":{"size":"50.
6gb","size_in_bytes":54435026631,"throttle_time":"
0s","throttle_time_in_millis":0},"indexing":{"index_total":
63351487,"index_time":"2d","index_time_in_millis":
173455844,"index_current":0,"delete_total":0,"delete_time":
"0s","delete_time_in_millis":0,"delete_current":0},"get":{"
total":0,"time":"0s","time_in_millis":0,"exists_total":0,"
exists_time":"0s","exists_time_in_millis":0,"missing_
total":0,"missing_time":"0s","missing_time_in_millis":0,"
current":0},"search":{"query_total":185,"query_time":"1.2m"
,"query_time_in_millis":73521,"query_current":0,"fetch_
total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,
"fetch_current":0},"merges":{"current":6,"current_docs":60,"
current_size":"462.5kb","current_size_in_bytes":473615,
"total":761368,"total_time":"3.1d","total_time_in_millis":
276398054,"total_docs":224483235,"total_size":"251.
7gb","total_size_in_bytes":270307183589},"refresh":{"
total":177694,"total_time":"1.3d","total_time_in_millis":
116764453},"flush":{"total":7797,"total_time":"1.2d","
total_time_in_millis":**111453937}}}}}}

--

--

jprante · November 30, 2012, 7:27pm

You experience massive GC because you use the default out-of-the-box
maximum JVM heap settings of 1 GB. You are lucky you could even add 50
million docs with that small setting! But, you have 16 GB RAM. As a rule of
thumb, assign around 50% RAM (4-8 GB) to Elasticsearch's heap. Check
bin/elasticsearch.in.sh for ES_MAX_MEM or better ES_HEAP_SIZE. Other
advanced tuning is also available, but, check bulk indexing first. Happy
indexing!

Best regards,

Jörg

--

Fabio_Pezzoni · December 3, 2012, 10:13am

Thank you very much for the advises! I was already using Java API bulk
statements. Now with 8g of heap and refresh_interval=-1 it works far
better. It's still slower than at the beginning but maybe it's normal
for a single-node cluster (it indexes in bursts). I hope to have more
nodes and power soon!

Fabio

On Fri, Nov 30, 2012 at 8:27 PM, Jörg Prante joergprante@gmail.com wrote:

You experience massive GC because you use the default out-of-the-box maximum
JVM heap settings of 1 GB. You are lucky you could even add 50 million docs
with that small setting! But, you have 16 GB RAM. As a rule of thumb, assign
around 50% RAM (4-8 GB) to Elasticsearch's heap. Check
bin/elasticsearch.in.sh for ES_MAX_MEM or better ES_HEAP_SIZE. Other
advanced tuning is also available, but, check bulk indexing first. Happy
indexing!

Best regards,

Jörg

--

--

Hadar_Rottenberg · December 4, 2012, 1:57pm

Hey Fabbio,
Would it be possible for you to post some benchmarks and statistics about
your data set.
such as
how big is the dataset?
what is the avg document size?
how long did it take to index 50M documents?
querying benchmarks?queries per second?

Thanks

On Monday, December 3, 2012 12:13:39 PM UTC+2, Fabio Pezzoni wrote:

Thank you very much for the advises! I was already using Java API bulk
statements. Now with 8g of heap and refresh_interval=-1 it works far
better. It's still slower than at the beginning but maybe it's normal
for a single-node cluster (it indexes in bursts). I hope to have more
nodes and power soon!

Fabio

On Fri, Nov 30, 2012 at 8:27 PM, Jörg Prante <joerg...@gmail.com<javascript:>>
wrote:

You experience massive GC because you use the default out-of-the-box
maximum
JVM heap settings of 1 GB. You are lucky you could even add 50 million
docs
with that small setting! But, you have 16 GB RAM. As a rule of thumb,
assign
around 50% RAM (4-8 GB) to Elasticsearch's heap. Check
bin/elasticsearch.in.sh for ES_MAX_MEM or better ES_HEAP_SIZE. Other
advanced tuning is also available, but, check bulk indexing first. Happy
indexing!

Best regards,

Jörg

--

--

Topic		Replies	Views
Single node, large database index performance Elasticsearch	9	581	June 23, 2021
Performance Issues Elasticsearch	3	447	July 6, 2017
Elasticsearch max number of documents for one index Elasticsearch	13	64642	July 5, 2017
Improve Query Speed Elasticsearch	12	488	July 6, 2017
Elasticsearch 5 - Large (size) query performance with complex mapping Elasticsearch	2	1074	July 18, 2017

Adding millions of documents, performance decay

Related topics