How to improve elasticsearch's performance

sonic_pan · November 4, 2014, 2:37am

We deployed elasticsearch on 3 nodes and assign 12GB heap to elasticsearch
process. But as the data increasing, the elasticsearch became slower.

There are 670 indices. We are going to keep 2 years data. Each date is an
index. Now, an index may have 10w+ documents at the most.

Here is the cluster's status:

curl 'localhost:9200/_cat/fielddata?v'

id node
total geolocation data.created_at domain media
noun_phrases

-r1BLlrlRem5xhNPTni3zQ Overmind 2gb
75mb 17.9mb 40.5mb 68mb 1.8gb

BbPoFxvSRMONo_U4aVEd0A Cordelia Frost 4gb 104.2mb
29.8mb 145.2mb 104.2mb 3.6gb

2hGuYqK-RIO4pw1qZVJQFQ Jack-in-the-Box 3.4gb 87mb
25.2mb 78mb 86.4mb 3.1gb

the jvm parameter is as follow:

org.elasticsearch.bootstrap.Elasticsearch -Xms12g -Xmx12g -Xss256k
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.foreground=yes
-Des.path.home=/opt/elasticsearch-1.2.1

So, my question is how can we improve it? Add more machines? If we add
machines, how can we estimate how many machines we need? Is there possible
to improve just for these 3 machines?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4875f12f-af0c-4531-b109-6ff5140a971e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Telax · November 4, 2014, 6:57am

Hi Sonic,

How many days worth of data do you have in your system currently and how many shards are created per daily index?

Is it just one index you create each day or do you have many for different data types?

When you search - do you search over all indices at once? What does a common search look like for you?

Andrew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac30cef6-e3d5-479f-843b-d4c8fada5d78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sonic_pan · November 4, 2014, 4:00pm

Hi, Andrew

Each index has 5 shards and each shard has 2 replica. There are 670 indices
now. So there are 670 days' data. But the data for each index is not even.
Small one only has 100KB(old date). Big one may has more than 1 GB(recent
date).

curl 'localhost:9200/_cat/indices/?v'

health index pri rep docs.count docs.deleted store.size
pri.store.size

green 2014-03-22 5 1 1205 11 19.7mb
9.8mb

green 2013-11-08 5 2 58 0 4.8mb
1.6mb

green 2014-09-05 5 2 107055 5 1.3gb
473.7mb

We crawl different type social media data. Then accroding to the data's
created date write into correspondent index. Different social media type
will use different type, so there are several types per index.

We use kibana to present data. By default, It will search latest one
month's data. And I notice the search queue.size(current is 1000) is not
enough.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22bd3e39-2113-409f-853f-203e3299e7f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sonic_pan · November 4, 2014, 4:03pm