What do these ES logs mean

(Sukrit Dasgupta) #1

Hi List, I am seeing the following logs pop up quite regularly (many times a minute, to many times in 10 minutes). What do they mean:

[2016-01-06 13:51:31,733][INFO ][monitor.jvm              ] [srv-4] [gc][old][130378][29292] duration [5.3s], collections [1]/[5.8s], total [5.3s]/[15.5h], memory [988.7mb]->[789.7mb]/[990.7mb], all_pools {[young] [266.2mb]->[98.5mb]/[266.2mb]}{[survivor] [31.2mb]->[0b]/[33.2mb]}{[old] [691.2mb]->[691.2mb]/[691.2mb]}


[2016-01-06 02:13:49,036][DEBUG][action.bulk              ] [srv-7] [logstash-logs-dgm-2016.01.06][3] failed to execute bulk item (index) index {[logstash-logs-dgm-2016.01.06][lumberjack][AVIVyDQbplhXDCCKMIrJ], source[{"message":"<blanked out intentionally>"}]}
ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [lumberjack]) within 30s]

How could I troubleshoot them?

Thanks in advance.

(Boaz Leskes) #2

The first log means that the JVM garbage collection took longer then expected (5.8s). That means your node is under memory pressure. It seems you still run with the 1GB default. Probably you need to increase it.

The second log means that the master failed to process a change to the mapping (ES's schema) within 30s. This is probably also due to shortage of memory or maybe the current master is overloaded by indexing.

(Sukrit Dasgupta) #3

Thanks @bleskes

  1. I am running with
    bin/elasticsearch --Xmx 4g --Xms 4g
    Also, this is the only server which is showing these messages. Others that are similar are not. Perhaps there is a rogue process running here from another app that I forgot to kill. Will investigate.

  2. My master nodes only have 8G of RAM because I thought their only job is to route and maintain cluster state so they can be modestly resourced. Can I figure out whether its shortage of memory or overloaded with indexing?
    Also, if it failed to process within 30s.. does it mean it did get processed but took more time? Or did it not get done at all?


Well, I am having the same issue here.
It happends occationally.

[0] failed to execute bulk item (index) index
ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [as]) within 30s]
	at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:343)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

I don't think this is memory issue for me because I just created a brand new ES cluster with 10g heap size.
I am migrating my data from MySQL to ES, so there could be some index overloads(?).
If so, it seems like ES cluster handles low TPS than I expected.

Q. How could I know if my cluster(master) is overloaded, so I need to do something?

Also, I checked with the failed put mapping with curl -XGET 'http://localhost:9200/twitter/_mapping/tweet', and I am getting the right mapping.

Q. Does this mean put mapping gets processed even though it failed to process within 30s?

(system) #5