I am using elasticsearch 0.19.11 at AWS EC2 m2.4xlarge 6 instances,
allocated 35GB to ES with mlock.all=true
Now, I implemented kafka river to pull the data from kafka cluster. There
can be index delay between current timestamp and the timestamp of the
pulled data depending on elasticsearch index performance. This measure is
very critical because we are using elasticsearch as realtime log analytics.
I can see index delay is increasing significantly at some point but I don't
know how to trace the problem.
I figure this is caused by merging but I am suspicious about disk IO
behavior. I set up elasticsearch data directory at disk md0 but index delay
looks synched with sda1 mounted at root(/) not md0.
Can you guess what is writing to sda1? Also, how can I trace this slow
down? I want to see any indexing actions such as when merging, flushing,
refresh starts but I cannot see any INFO messages with the following
logging.yml setting.
ootLogger: INFO, file
logger:
log action execution errors for easier debugging
gateway
index.gateway: INFO
peer shard recovery
indices.recovery: INFO
discovery
discovery: INFO
index.search.slowlog: INFO, index_search_slow_log_file
additivity:
index.search.slowlog: false
Thank you!
Best, Jae
--