Adding millions of documents, performance decay

Adding millions of documents, performance decay.

Hi,

I'm new to ElasticSearch and I'm trying to transfer a database of several
millions of JSON documents to a Lucene index through ES. Currently we can
use just a single node with 8 CPUs and we use the Java API to add
sequentially each document. We didn't changed the default options,
therefore our index has 5 shards. At the beginning the process was very
fast! In a few our we added about 50 millions of documents, then the
performance gradually fell and currently it can take seconds to add a
single document. Query performance are still very good!
There is a way to overtake this situation? Maybe changing the setting or
the number of shards...

Thank you very much, Fabio.

Here some node stats:

{"ok":true,"cluster_name":"twitter","nodes":{"j5RTlp7jTreoAlE6Tb3gwA":{"name":"xxx","transport_address":"inet[/xxx.xxx.xxx.xxx:9300]","hostname":"xxx","http_address":"inet[/xxx.xxx.xxx.xxx:9200]","settings":{"path.home":"/home/twitter/elasticsearch-0.19.11","foreground":"yes","logger.prefix":"","max-open-files":"true","node.name":"xxx","cluster.name":"twitter","name":"xxx","path.logs":"/home/twitter/elasticsearch-0.19.11/logs"},"os":{"refresh_interval":1000,"cpu":{"vendor":"Intel","model":"Xeon","mhz":3192,"total_cores":8,"total_sockets":8,"cores_per_socket":16,"cache_size":"8kb","cache_size_in_bytes":8192},"mem":{"total":"15.4gb","total_in_bytes":16543477760},"swap":{"total":"3.9gb","total_in_bytes":4294963200}},"process":{"refresh_interval":1000,"id":20301,"max_file_descriptors":65535},"jvm":{"pid":20301,"version":"1.6.0_34","vm_name":"Java
HotSpot(TM) 64-Bit Server VM","vm_version":"20.9-b04","vm_vendor":"Sun
Microsystems
Inc.","start_time":1354178513836,"mem":{"heap_init":"256mb","heap_init_in_bytes":268435456,"heap_max":"1011.2mb","heap_max_in_bytes":1060372480,"non_heap_init":"23.1mb","non_heap_init_in_bytes":24313856,"non_heap_max":"130mb","non_heap_max_in_bytes":136314880,"direct_max":"1011.2mb","direct_max_in_bytes":1060372480}}}}}

Index stats:

{"ok":true,"_shards":{"total":10,"successful":5,"failed":0},"_all":{"primaries":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"total":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"indices":{"twitter":{"primaries":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"total":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}}}}}}

--

If you haven't done this yet, set refresh interval to -1 by running:

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

when you are done with bulk reindexing you can turn it back on by running

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "1s"
}
}'

On Friday, November 30, 2012 5:57:29 AM UTC-5, Fabio Pezzoni wrote:

Adding millions of documents, performance decay.

Hi,

I'm new to Elasticsearch and I'm trying to transfer a database of several
millions of JSON documents to a Lucene index through ES. Currently we can
use just a single node with 8 CPUs and we use the Java API to add
sequentially each document. We didn't changed the default options,
therefore our index has 5 shards. At the beginning the process was very
fast! In a few our we added about 50 millions of documents, then the
performance gradually fell and currently it can take seconds to add a
single document. Query performance are still very good!
There is a way to overtake this situation? Maybe changing the setting or
the number of shards...

Thank you very much, Fabio.

Here some node stats:

{"ok":true,"cluster_name":"twitter","nodes":{"j5RTlp7jTreoAlE6Tb3gwA":{"name":"xxx","transport_address":"inet[/xxx.xxx.xxx.xxx:9300]","hostname":"xxx","http_address":"inet[/xxx.xxx.xxx.xxx:9200]","settings":{"path.home":"/home/twitter/elasticsearch-0.19.11","foreground":"yes","logger.prefix":"","max-open-files":"true","
node.name":"xxx","cluster.name":"twitter","name":"xxx","path.logs":"/home/twitter/elasticsearch-0.19.11/logs"},"os":{"refresh_interval":1000,"cpu":{"vendor":"Intel","model":"Xeon","mhz":3192,"total_cores":8,"total_sockets":8,"cores_per_socket":16,"cache_size":"8kb","cache_size_in_bytes":8192},"mem":{"total":"15.4gb","total_in_bytes":16543477760},"swap":{"total":"3.9gb","total_in_bytes":4294963200}},"process":{"refresh_interval":1000,"id":20301,"max_file_descriptors":65535},"jvm":{"pid":20301,"version":"1.6.0_34","vm_name":"Java
HotSpot(TM) 64-Bit Server VM","vm_version":"20.9-b04","vm_vendor":"Sun
Microsystems
Inc.","start_time":1354178513836,"mem":{"heap_init":"256mb","heap_init_in_bytes":268435456,"heap_max":"1011.2mb","heap_max_in_bytes":1060372480,"non_heap_init":"23.1mb","non_heap_init_in_bytes":24313856,"non_heap_max":"130mb","non_heap_max_in_bytes":136314880,"direct_max":"1011.2mb","direct_max_in_bytes":1060372480}}}}}

Index stats:

{"ok":true,"_shards":{"total":10,"successful":5,"failed":0},"_all":{"primaries":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"total":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"indices":{"twitter":{"primaries":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}},"total":{"docs":{"count":54298508,"deleted":24537},"store":{"size":"50.6gb","size_in_bytes":54435026631,"throttle_time":"0s","throttle_time_in_millis":0},"indexing":{"index_total":63351487,"index_time":"2d","index_time_in_millis":173455844,"index_current":0,"delete_total":0,"delete_time":"0s","delete_time_in_millis":0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_millis":0,"exists_total":0,"exists_time":"0s","exists_time_in_millis":0,"missing_total":0,"missing_time":"0s","missing_time_in_millis":0,"current":0},"search":{"query_total":185,"query_time":"1.2m","query_time_in_millis":73521,"query_current":0,"fetch_total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,"fetch_current":0},"merges":{"current":6,"current_docs":60,"current_size":"462.5kb","current_size_in_bytes":473615,"total":761368,"total_time":"3.1d","total_time_in_millis":276398054,"total_docs":224483235,"total_size":"251.7gb","total_size_in_bytes":270307183589},"refresh":{"total":177694,"total_time":"1.3d","total_time_in_millis":116764453},"flush":{"total":7797,"total_time":"1.2d","total_time_in_millis":111453937}}}}}}

--

Also, have you tried to group you documents into bulk statements?

On Fri, Nov 30, 2012 at 11:38 AM, Igor Motov imotov@gmail.com wrote:

If you haven't done this yet, set refresh interval to -1 by running:

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "-1"
}
}'

when you are done with bulk reindexing you can turn it back on by running

curl -XPUT localhost:9200/twitter/_settings -d '{
"index" : {
"refresh_interval" : "1s"
}
}'

On Friday, November 30, 2012 5:57:29 AM UTC-5, Fabio Pezzoni wrote:

Adding millions of documents, performance decay.

Hi,

I'm new to Elasticsearch and I'm trying to transfer a database of several
millions of JSON documents to a Lucene index through ES. Currently we can
use just a single node with 8 CPUs and we use the Java API to add
sequentially each document. We didn't changed the default options,
therefore our index has 5 shards. At the beginning the process was very
fast! In a few our we added about 50 millions of documents, then the
performance gradually fell and currently it can take seconds to add a
single document. Query performance are still very good!
There is a way to overtake this situation? Maybe changing the setting or
the number of shards...

Thank you very much, Fabio.

Here some node stats:

{"ok":true,"cluster_name":"twitter","nodes":{"
j5RTlp7jTreoAlE6Tb3gwA":{"name":"xxx","transport_
address":"inet[/xxx.xxx.xxx.xxx:9300]","hostname":"xxx","
http_address":"inet[/xxx.xxx.xxx.xxx:9200]","settings":{"
path.home":"/home/twitter/elasticsearch-0.19.11","
foreground":"yes","logger.prefix":"","max-open-files":"true","
node.name":"xxx","clust
er.name http://cluster.name
":"twitter","name":"xxx","path.logs":"/home/
twitter/elasticsearch-0.19.11/logs"},"os":{"refresh_
interval":1000,"cpu":{"vendor"
:"Intel","model":"Xeon","mhz":**
3192,"total_cores":8,"total_sockets":8,"cores_per_socket":
16,"cache_size":"8kb","cache_size_in_bytes":8192},"mem":{"
total":"15.4gb","total_in_bytes":16543477760},"swap":{"
total":"3.9gb","total_in_bytes":4294963200}},"process":
{"refresh_interval":1000,"id":20301,"max_file_descriptors":
65535},"jvm":{"pid":20301,"version":"1.6.0_34","vm_name":"Java
HotSpot(TM) 64-Bit Server VM","vm_version":"20.9-b04","**vm_vendor":"Sun
Microsystems Inc.","start_time":1354178513836,"mem":{"heap_
init":"256mb","heap_init_in_bytes":268435456,"heap_max":"
1011.2mb","heap_max_in_bytes":1060372480,"non_heap_init":"
23.1mb","non_heap_init_in_bytes":24313856,"non_heap_max"
:"130mb","non_heap_max_in_bytes":136314880,"direct_max":
"1011.2mb","direct_max_in_**bytes":1060372480}}}}}

Index stats:

{"ok":true,"shards":{"total":10,"successful":5,"
failed":0},"all":{"primaries":{"docs":{"count":54298508,"
deleted":24537},"store":{"size":"50.6gb","size_in_bytes"
:54435026631,"throttle_time":"0s","throttle_time_in_millis":
0},"indexing":{"index_total":63351487,"index_time":"2d","
index_time_in_millis":173455844,"index_current":0,"
delete_total":0,"delete_time":"0s","delete_time_in_millis":
0,"delete_current":0},"get":{"**total":0,"time":"0s","time_in
**
millis":0,"exists_total":0,"**exists_time":"0s","exists
**
time_in_millis":0,"missing_total":0,"missing_time":"0s","
missing_time_in_millis":0,"current":0},"search":{"query_
total":185,"query_time":"1.2m","query_time_in_millis":73521,
"query_current":0,"fetch_total":71,"fetch_time":"4.6m",
"fetch_time_in_millis":279532,"fetch_current":0},"merges":{"
current":6,"current_docs":60,"current_size":"462.5kb","
current_size_in_bytes":473615,"total":761368,"total_time":"
3.1d","total_time_in_millis":276398054,"total_docs":
224483235,"total_size":"251.7gb","total_size_in_bytes":
270307183589},"refresh":{"total":177694,"total_time":"1.
3d","total_time_in_millis":116764453},"flush":{"total":
7797,"total_time":"1.2d","total_time_in_millis":
111453937}},"total":{"docs":{"count":54298508,"deleted":
24537},"store":{"size":"50.6gb","size_in_bytes":
54435026631,"throttle_time":"0s","throttle_time_in_millis":
0},"indexing":{"index_total":63351487,"index_time":"2d","
index_time_in_millis":173455844,"index_current":0,"
delete_total":0,"delete_time":"0s","delete_time_in_millis":
0,"delete_current":0},"get":{"total":0,"time":"0s","time_in_
millis":0,"exists_total":0,"exists_time":"0s","exists_
time_in_millis":0,"missing_total":0,"missing_time":"0s","
missing_time_in_millis":0,"current":0},"search":{"query_
total":185,"query_time":"1.2m","query_time_in_millis":73521,
"query_current":0,"fetch_total":71,"fetch_time":"4.6m",
"fetch_time_in_millis":279532,"fetch_current":0},"merges":{"
current":6,"current_docs":60,"current_size":"462.5kb","
current_size_in_bytes":473615,"total":761368,"total_time":"
3.1d","total_time_in_millis":276398054,"total_docs":
224483235,"total_size":"251.7gb","total_size_in_bytes":
270307183589},"refresh":{"total":177694,"total_time":"1.
3d","total_time_in_millis":116764453},"flush":{"total":
7797,"total_time":"1.2d","total_time_in_millis":
111453937}},"indices":{"twitter":{"primaries":{"docs":
{"count":54298508,"deleted":24537},"store":{"size":"50.
6gb","size_in_bytes":54435026631,"throttle_time":"
0s","throttle_time_in_millis":0},"indexing":{"index_total":
63351487,"index_time":"2d","index_time_in_millis":
173455844,"index_current":0,"delete_total":0,"delete_time":
"0s","delete_time_in_millis":0,"delete_current":0},"get":{"
total":0,"time":"0s","time_in_millis":0,"exists_total":0,"
exists_time":"0s","exists_time_in_millis":0,"missing_
total":0,"missing_time":"0s","missing_time_in_millis":0,"
current":0},"search":{"query_total":185,"query_time":"1.2m"
,"query_time_in_millis":73521,"query_current":0,"fetch_
total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,
"fetch_current":0},"merges":{"current":6,"current_docs":60,"
current_size":"462.5kb","current_size_in_bytes":473615,
"total":761368,"total_time":"3.1d","total_time_in_millis":
276398054,"total_docs":224483235,"total_size":"251.
7gb","total_size_in_bytes":270307183589},"refresh":{"
total":177694,"total_time":"1.3d","total_time_in_millis":
116764453},"flush":{"total":7797,"total_time":"1.2d","
total_time_in_millis":111453937}},"total":{"docs":{"
count":54298508,"deleted":24537},"store":{"size":"50.
6gb","size_in_bytes":54435026631,"throttle_time":"
0s","throttle_time_in_millis":0},"indexing":{"index_total":
63351487,"index_time":"2d","index_time_in_millis":
173455844,"index_current":0,"delete_total":0,"delete_time":
"0s","delete_time_in_millis":0,"delete_current":0},"get":{"
total":0,"time":"0s","time_in_millis":0,"exists_total":0,"
exists_time":"0s","exists_time_in_millis":0,"missing_
total":0,"missing_time":"0s","missing_time_in_millis":0,"
current":0},"search":{"query_total":185,"query_time":"1.2m"
,"query_time_in_millis":73521,"query_current":0,"fetch_
total":71,"fetch_time":"4.6m","fetch_time_in_millis":279532,
"fetch_current":0},"merges":{"current":6,"current_docs":60,"
current_size":"462.5kb","current_size_in_bytes":473615,
"total":761368,"total_time":"3.1d","total_time_in_millis":
276398054,"total_docs":224483235,"total_size":"251.
7gb","total_size_in_bytes":270307183589},"refresh":{"
total":177694,"total_time":"1.3d","total_time_in_millis":
116764453},"flush":{"total":7797,"total_time":"1.2d","
total_time_in_millis":**111453937}}}}}}

--

--

You experience massive GC because you use the default out-of-the-box
maximum JVM heap settings of 1 GB. You are lucky you could even add 50
million docs with that small setting! But, you have 16 GB RAM. As a rule of
thumb, assign around 50% RAM (4-8 GB) to Elasticsearch's heap. Check
bin/elasticsearch.in.sh for ES_MAX_MEM or better ES_HEAP_SIZE. Other
advanced tuning is also available, but, check bulk indexing first. Happy
indexing!

Best regards,

Jörg

--

Thank you very much for the advises! I was already using Java API bulk
statements. Now with 8g of heap and refresh_interval=-1 it works far
better. It's still slower than at the beginning but maybe it's normal
for a single-node cluster (it indexes in bursts). I hope to have more
nodes and power soon!

Fabio

On Fri, Nov 30, 2012 at 8:27 PM, Jörg Prante joergprante@gmail.com wrote:

You experience massive GC because you use the default out-of-the-box maximum
JVM heap settings of 1 GB. You are lucky you could even add 50 million docs
with that small setting! But, you have 16 GB RAM. As a rule of thumb, assign
around 50% RAM (4-8 GB) to Elasticsearch's heap. Check
bin/elasticsearch.in.sh for ES_MAX_MEM or better ES_HEAP_SIZE. Other
advanced tuning is also available, but, check bulk indexing first. Happy
indexing!

Best regards,

Jörg

--

--

Hey Fabbio,
Would it be possible for you to post some benchmarks and statistics about
your data set.
such as
how big is the dataset?
what is the avg document size?
how long did it take to index 50M documents?
querying benchmarks?queries per second?

Thanks

On Monday, December 3, 2012 12:13:39 PM UTC+2, Fabio Pezzoni wrote:

Thank you very much for the advises! I was already using Java API bulk
statements. Now with 8g of heap and refresh_interval=-1 it works far
better. It's still slower than at the beginning but maybe it's normal
for a single-node cluster (it indexes in bursts). I hope to have more
nodes and power soon!

Fabio

On Fri, Nov 30, 2012 at 8:27 PM, Jörg Prante <joerg...@gmail.com<javascript:>>
wrote:

You experience massive GC because you use the default out-of-the-box
maximum
JVM heap settings of 1 GB. You are lucky you could even add 50 million
docs
with that small setting! But, you have 16 GB RAM. As a rule of thumb,
assign
around 50% RAM (4-8 GB) to Elasticsearch's heap. Check
bin/elasticsearch.in.sh for ES_MAX_MEM or better ES_HEAP_SIZE. Other
advanced tuning is also available, but, check bulk indexing first. Happy
indexing!

Best regards,

Jörg

--

--