Does Elasticsearch support URL encoded endpoints or did I found a bug?

Update 3:

It turns out that this is not a bug and instead a misunderstanding on my part on how does URL-encoding work.

The library I was using was encoding the full query params, including the "'?" and "=", however it seems that this is NOT a valid URL.

After some reading, it seems that you should only encode reserved characters when they are outside of their special-meaning scope.

I.e. since "?" and "=" are used for special query param functionality, it's an invalid URL if they are encoded since (or more precisely, they will NOT be interpreted as part of the query string). In other words, the browser won't decode those and so it should not be expected that Elasticsearch decodes them as well.

I have not yet an authoritative answer but it's the best I could gather from reading several different sources, for example, from Wikipedia's page on Percent-encoding

Current standard[edit]

Main article: Internationalized resource identifier

The generic URI syntax recommends that new URI schemes that provide for the representation of character data in a URI should, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to [UTF-8](UTF-8 - Wikipedia), and then percent-encode those values. This suggestion was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected.

Also, from this page:

There is a grammar which defines how URLs are assembled, and how parts are separated. For instance, the "://" part separates the scheme from the host part. The host and path fragments parts are separated by "/", while the query part follows a "?". This means that certain characters are reserved for the syntax. Some are reserved for all URIs, while some are only reserved for specific schemes. All reserved characters that are used in a part where they are not allowed (for instance a path segment — a file name for example — which would contain a "?" character) must be URL-encoded .

So I was encoding these characters and that made them lose their special syntax meaning, which is not what we want. Instead we want those chars to actually have a special meaning (i.e. represent the start of a query param and the value) and instead all other characters that might be confused for reserved words but that shouldn't be, are to be encoded.


Update 2:

So it seems I made the mistake of jumping to conclusions instead of doing more research... my bad.

It turns out that the query I was sending is NOT valid. I was encoding "?" and "=" chars when they shouldn't be, so ES was behaving as expected.

I knew I could be missing something obvious but I guess this was TOO obvious that I overlooked the actual problem.


Update:

This legitimately feels like a bug to me.

I filed an issue in GitHub

I'll leave the post up in case it's not a bug and I'm just missing something obvious.


Hello everyone!

I'm a new user of Elasticsearch and I'm not sure if I found a bug or if I'm missing a configuration or something else altogether.

With an ES 6.3.2 installation, if I query the "_search" endpoint with a filter path like in the following example, I get the expected results:

http POST 'http://localhost:9200/_search?filter_path=hits.hits_id,hits.hits._source' < /home/akram/search.json 
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 218
content-type: application/json; charset=UTF-8

{
    "hits": {
        "hits": [
            {
                "_source": {
                    "timestamp": "2018-08-01T00:00:55Z", 
                    "accountId": "somedata"
                }
            }
        ]
    }
}

However if I try to query the same endpoint but with an URL-encoded format, I get a "405 - Method Not Allowed" error.

I.e.

http POST 'http://localhost:9200/_search%3Ffilter_path=hits.hits_id,hits.hits._source' < /home/akram/search.json 
HTTP/1.1 405 Method Not Allowed
Allow: DELETE,PUT,GET,HEAD
content-encoding: gzip
content-length: 166
content-type: application/json; charset=UTF-8

{
    "error": "Incorrect HTTP method for uri [/_search%3Ffilter_path=hits.hits_id,hits.hits._source] and method [POST], allowed: [DELETE, PUT, GET, HEAD]", 
    "status": 405
}

As you can see, in the examples I'm using HTTPie for simplicity, but I found the error when trying to query my ES instance from a Java application that happens to use Apache's HttpClient RequestBuilder which apparently URL-encodes the request by default.

So from my point of view this looks like a bug because I would expect any HTTP-compatible API to support URL-encoded endpoints, however maybe I'm missing some configuration?

Can anyone help me shed some light into this issue? or even tell me if I should file a bug instead?

Thanks!
Akram

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.