View/debug the queries made on ElasticSearch backend

(Paul Bormans) #1

We have a test ES cluster on AWS (the managed service) and would like to now run a spark job using filtered data from the cluster. I expect that df filtering is done on the backend (the so called pushdown), and i was looking for ways to review the actual queries that are run.

I found a way that requires a file by setting the httpclient.wire.content category to DEBUG, but it feels like the wrong way since this yields much more logging then i need and also i don't always see the post content.

So my question is what the advised way is to debug the queries made on the backend?

Thanks for any tips,
Paul Bormans

(Costin Leau) #2

Have you looked at the reference documentation, in particular at the logging page?

(Paul Bormans) #3

Hi Costin,

Actually i did but i could not find anything specific for my purpose. I tested with the root level set to DEBUG as well.

The only useful logging category (to see the raw queries...) for me at this point is the one from the httpclient:

2016-05-10 11:13:14,802 @Executor task launch worker-1 DEBUG Opening (pinned) network client to
2016-05-10 11:13:14,805 @Executor task launch worker-0 DEBUG Opening (pinned) network client to
2016-05-10 11:13:15,123 @Executor task launch worker-0 DEBUG httpclient.wire.content >> "{"query":{"filtered":{ "query":{"match_all":{}},"filter": { "and" : [ {"query":{"match":{"value":41.4068}}} ] } }}}"
2016-05-10 11:13:15,124 @Executor task launch worker-1 DEBUG httpclient.wire.content >> "{"query":{"filtered":{ "query":{"match_all":{}},"filter": { "and" : [ {"query":{"match":{"value":41.4068}}} ] } }}}"
2016-05-10 11:13:15,350 @Executor task launch worker-1 DEBUG httpclient.wire.content << "{"_scroll_id":"....=","took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":....etc

It would be nice to see the actual request/response messages in a separate category (including body that is).

I kind of expected there would be a elastic search connector-like layer in the sw stack with its own logging category but apparently there is none.

The httpclient will do for me now, but maybe it's useful to include something similar into category for instance.


(Costin Leau) #4 is the category that you need (potentially to restrict it just to transport).

There's no real separation between queries as in a session and transport since the queries are for the most part just part HTTP calls. Spark is special due to pushdown but if that is disabled there is no query generated, just typical HTTP calls.

(Costin Leau) #5

By the way, in terms of filtering itself take a look at org.elasticsearch.spark.sql package which indicates what Spark filters are being translated and to what.

(Paul Bormans) #6

Speaking of which.... (i do realize it is a bit off topic) is the "pushdown" functionality something specific for ES? Or does aws DynamoDB also offer such performance enhancements?

(Costin Leau) #7

Not sure what you are asking. Pushdown means some of the operations executed by spark SQL are executed directly by ES and thus result in faster time and execution. This depends on a variety of factors and it's highly related to the implementation (spark SQL and ES in this case).

(system) #8