The indexing load isn't particularly high, but the documents being indexed
are pretty large....many of them >1MB. Looking at my logstash indexing
machines I'd say that the docs/sec is in the realm of 200 ish? Is there a
way to see this from the ES side of things?
You can use the Indices stats APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-stats/to see the indexing rate. You'll get an output that includes indexing
stats (from the viewpoint of the index, regardless of which shard/machine
it goes towards):
curl -XGET 'localhost:9200/_stats'
[...]
"indexing": {
"index_total": 3,
"index_time_in_millis": 49,
"index_current": 0,
"delete_total": 0,
"delete_time_in_millis": 0,
"delete_current": 0
},
[...]
Another place to look is the Indexing threadpool via the Cluster stats APIhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/- check to see if you are queuing a lot of threads:
curl -XGET 'http://localhost:9200/_nodes/stats?clear=true&thread_pool=true'
[...]
"index": {
"threads": 3,
"queue": 0,
"active": 0,
"rejected": 0,
"largest": 3,
"completed": 3
},
[...]
Plugins like Bigdesk https://github.com/lukas-vlcek/bigdesk/ and Paramedichttps://github.com/karmi/elasticsearch-paramedicare basically graphical wrappers for these APIs.
I'd be more than happy to add nodes to the cluster, but I wasn't certain if
that would help indexing speed as much as it would query speed. However,
now that you mention it...if the shards were all split up one per node that
would make sense that I would have gains there.
Yep, you'll definitely see an increase as each node adds more indexing
throughput. It isn't exactly linear, but it is fairly close (especially if
your query load is low, as in most logging environments). Realistically,
this is the easiest and fastest way to increase your indexing speed if you
can afford the cost of another node
Hope this helps! Keep us updated if you have any more questions
-Zach
On Thursday, September 5, 2013 5:23:20 PM UTC-7, Zachary Tong wrote:
What's your indexing load (docs/sec) ? Are you querying at the same
time? Often, if you are bound by Disk IO, there isn't much you can do
except get faster disks or more nodes. Do you have SSDs? They are a great
investment if you can afford them. And adding more nodes is almost a
linear increase in indexing speed.
Some more things you can do:
- If you don't need it, disable the _all field. This bloats the doc
size (so more bytes to write) and eats up a bit of CPU.
- I'd put the index.merge.policy.segments_per_tier back to it's
default (10). By having it set so high, Lucene is going to perform big
bursts of merging which can easily eat up all your IO and a considerable
amount of CPU. In general, I've spent a lot of time fiddling with the
merge policy settings and never found a configuration better than the
defaults. Mike McCandlesshttp://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.htmlknows best
- I noticed you have Term Vectors compressed. Are you actually using
term vectors? They double your index size and eat up more IO.
- You could check the indexing thread count and see if you are
routinely queuing indexing threads. May help to increase that some
(although, it may not)
-Zach
On Thursday, September 5, 2013 7:39:13 PM UTC-4, Robert Navarro wrote:
Hello,
I have a single server ES "cluster" setup right now and it's struggling
to keep up with our indexing load.
The server has 30GB of ram, 15GB locked for elasticsearch.
Here are the ES details:
{
"ok" : true,
"status" : 200,
"name" : "esls1",
"version" : {
"number" : "0.90.3",
"build_hash" : "5c38d6076448b899d758f29443329571e2522410",
"build_timestamp" : "2013-08-06T13:18:31Z",
"build_snapshot" : false,
"lucene_version" : "4.4"
},
"tagline" : "You Know, for Search"
}
Here is the java version:
root@esls1:~# java -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
Here are some of the knobs I've tried to tweak for our indexes...this is
just a snapshot of one index:
gist:490196cf73ff46e33a8b · GitHub
Operating System:
Ubuntu 12.04.2 LTS
There are 15 indexes on this node, rotated daily and removed after 14 days.
The incoming index requests are coming out of logstash and it's all logging data.
I suspect the server is IO bound as there is are bursts of 5-10s sustained 10%+ iowait.
What other knobs can I tweak to help speed things along?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.