Start with the slowest part of your system to get the most improvement.
Have you taken I/O measurements of your hardware? Can you provide numbers?
Can you explain how you find the numbers for the config you describe?
How is "1T disk on RAID 1" organized? Just 2 disks? What capacity is
"20k msg/s", do you know the average size of a message? And, what are
your expectations how ES should perform?
Ramping up an oversized heap to begin with is most peculiar, but I hope
you know what you are doing. Note that a filled heap of such size might
become a challenge in many aspects. You must watch out for them, I think
you know all of them. You must enable logging in ES to understand the
GC messages so you know why ES works "not es expected". Mostly it will
starve by I/O waits or it is being overwhelmed by GC pauses, slowly
bringing the JVM to a halt and throughput to zero. That is not ES fault
- it's just a badly configured JVM, or a badly configured disk subsystem.
My recommendation is to switch to latest Java 7 with G1 GC for low
latency GC, and start with a heap to around 8G, and let the other RAM
for the OS activity. See if increasing the heap helps your workload or
not, probably by 4G or 8G steps (16G, 24G, 32G ...) You have to find the
number where performance is best. Also, the ES segment merging config
should be accommodated to handle larger merges (default is 5G).
By turning these knobs (and maybe more), you have to get some intuitive
findings for yourself to get your system balanced out, so that all your
input data stream will get indexed instantaneously, even after having ES
running for a long time period. Each system behaves differently. Do not
trust others numbers, take measurements and tests for yourself.
Of course, you should follow advice from the Logstash community for best
practices how to organize the indexing (rotating indexes etc.)
Jörg
Am 26.04.13 17:00, schrieb Ryan Qian:
with logstash and elasticsearch we want to continue write 20k Msg/s,
the write performance isnot as expected. story as bellow:
*HW: *
9 nodes cluster, each one with:
2 socked , 32 threads cpu
128G mem
1T Disks (Raid 1)
SW:
RHEL6.3
logstash 1.1.10
redis as channel
ES 0.20.5
ES memory limit to 65G :
export ES_MIN_MEM=65g
export ES_MAX_MEM=65g
*Logstash index setting: *
shards: 9
replication: 1
_all and _source are disabled
*Redis on 7 of 9 nodes, *
each box have 1 logstash instance read from it's own box's redis
channel. and output to localhost's elasticsearch_http (also tried
elasticsearch way)
I adjust the redis input's
batch_count => 2000
threads => 5
and output's
flush_size => 10000
to make it fast.
it can handle the message rush in at first, but when the index growing
big (index each day) I use Kibana browser logs such as choose long
time window, the ES write in performance drop down immediately.
mesg got accumulate inside the redis channel, then logstash can never
finish it's job inside redis.
any suggestion, guys?
I don't think ES are not good enough for this kind of volume data, but
I don't know what's next step to make it continue stay with high speed
write in.
Thanks!
-Ryan
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.