Percolator TRACE logging


(Anishek) #1

Hello,

We are using es 1.7.3 and percolation heavily. We are having uneven distribution of load even will all shards being on all nodes in the cluster, hence i want to enable TRACE logs for _percolate call. i tried enabling TRACE logging directly in logging.yml file at the root level to see everything, unfortunately under very controlled conditions where we are only requesting one command to a single machine using transport client, i am still not able to any logs as to where it came from, what was the request etc.

when used the curl command for the same the "htp" module provides logs at server stating which ip sent a request etc. but nothing on "transport" module, additionally i am not sure internally if ES node is querying other nodes to get the results, it should not do that as all shards are on the same node. I also tried with preference=local in the query but still no help.


(Christian Dahlqvist) #2

Local preference should help, but I believe the correct syntax is preference=_local.


(Anishek) #3

Yeh sorry its preference=_local , using the java client to set it up via Preference.LOCAL.type() doesnt seem to help , if anyone knows how to log internal query mechanism that will be great.

thanks


(Anishek) #4

Just thought of updating the thread with what we found, things we did to make sure we are not missing on any specific configuration.

-- first we made sure from the client side that all nodes are distributing queries equally over all the 9 nodes we had, enabled transport debug log on the client side when we saw the load distribution as abnormal on nodes.
result : even distribution

-- next we enabled trace http/transport logging on the server nodes to make sure if we are getting uneven distribution of queries on nodes with higher load. Even though we had provided prefence=_local we wanted to make sure that es coordinator nodes are not redistributing internally once again.
result : even distribution

-- tried to map the byte distribution of the docs we send to ES assuming larger sized docs(hence more network, hence more work for ES) were causing the uneven load. This was by adding additional log to our client side code to log every query doc size. this along with the logs in first step enabled us to roughly see the doc size distribution across nodes.
result: almost similar number of bytes sent overall to all nodes.

-- finally looking at the hardware configuration we found the following:

  1. machines were load was more - 40 core machines with 3.0 GHz (avg load of 30)
  2. machines where load was less or very less - 48 core machines with 2.6 Ghz (avg load of 2)

not sure why such huge Load difference. we also switched off 8 cores on the linux kernel for the machines in 2. still same problem as above.

Anyways, we have now moved to all machines having configuration as 1 above and voila we see nice even distribution of OS load on all of them.


(system) #5