View/debug the queries made on ElasticSearch backend

Paul_Bormans · May 9, 2016, 3:09pm

We have a test ES cluster on AWS (the managed service) and would like to now run a spark job using filtered data from the cluster. I expect that df filtering is done on the backend (the so called pushdown), and i was looking for ways to review the actual queries that are run.

I found a way that requires a log4j.properties file by setting the httpclient.wire.content category to DEBUG, but it feels like the wrong way since this yields much more logging then i need and also i don't always see the post content.

So my question is what the advised way is to debug the queries made on the backend?

Thanks for any tips,
Paul Bormans

costin · May 10, 2016, 9:05am

Have you looked at the reference documentation, in particular at the logging page?

Paul_Bormans · May 10, 2016, 9:22am

Hi Costin,

Actually i did but i could not find anything specific for my purpose. I tested with the root level set to DEBUG as well.

The only useful logging category (to see the raw queries...) for me at this point is the one from the httpclient:

2016-05-10 11:13:14,802 @Executor task launch worker-1 DEBUG org.elasticsearch.hadoop.rest.NetworkClient Opening (pinned) network client to search-.....eu-west-1.es.amazonaws.com:80 NetworkClient.java(67)
2016-05-10 11:13:14,805 @Executor task launch worker-0 DEBUG org.elasticsearch.hadoop.rest.NetworkClient Opening (pinned) network client to search-.....eu-west-1.es.amazonaws.com:80 NetworkClient.java(67)
2016-05-10 11:13:15,123 @Executor task launch worker-0 DEBUG httpclient.wire.content >> "{"query":{"filtered":{ "query":{"match_all":{}},"filter": { "and" : [ {"query":{"match":{"value":41.4068}}} ] } }}}" Wire.java(84)
2016-05-10 11:13:15,124 @Executor task launch worker-1 DEBUG httpclient.wire.content >> "{"query":{"filtered":{ "query":{"match_all":{}},"filter": { "and" : [ {"query":{"match":{"value":41.4068}}} ] } }}}" Wire.java(84)
2016-05-10 11:13:15,350 @Executor task launch worker-1 DEBUG httpclient.wire.content << "{"_scroll_id":"....=","took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":....etc

It would be nice to see the actual request/response messages in a separate category (including body that is).

I kind of expected there would be a elastic search connector-like layer in the sw stack with its own logging category but apparently there is none.

The httpclient will do for me now, but maybe it's useful to include something similar into category org.elasticsearch.hadoop.rest for instance.

Paul

costin · May 10, 2016, 11:00am

org.elasticsearch.hadoop.rest is the category that you need (potentially log4j.category.org.elasticsearch.hadoop.rest.commonshttp to restrict it just to transport).

There's no real separation between queries as in a session and transport since the queries are for the most part just part HTTP calls. Spark is special due to pushdown but if that is disabled there is no query generated, just typical HTTP calls.

costin · May 10, 2016, 11:02am

By the way, in terms of filtering itself take a look at org.elasticsearch.spark.sql package which indicates what Spark filters are being translated and to what.

Paul_Bormans · May 10, 2016, 12:19pm

Speaking of which.... (i do realize it is a bit off topic) is the "pushdown" functionality something specific for ES? Or does aws DynamoDB also offer such performance enhancements?

costin · May 18, 2016, 6:49am

Not sure what you are asking. Pushdown means some of the operations executed by spark SQL are executed directly by ES and thus result in faster time and execution. This depends on a variety of factors and it's highly related to the implementation (spark SQL and ES in this case).

Topic		Replies	Views
Debug message of Elasticsearch when running Elasticsearch	1	355	July 6, 2017
Logging ES queries in elasticsearch-spark-30 Elasticsearch es-hadoop	2	604	May 13, 2022
Query filter not working with SparkSql Elasticsearch es-hadoop	7	1570	March 2, 2017
Log queries from elasticsearch output Logstash	4	514	September 18, 2020
Query execution logs Elasticsearch	3	643	July 5, 2017

View/debug the queries made on ElasticSearch backend

Related topics