Elasticsearch not returning larger results (not total result size/length, but size of individual hit

Hello. I have an Elasticsearch instance, storing objects that contain really large strings. I am trying to make a request to retrieve those objects from elasticsearch, but for whatever reason, elasticsearch will not return them.

Example Object In elasticsearch:

   "time": "2022-11-17T15:36:34.000Z",
   "status": "ok",
   "data": "00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00"

The data property is a really long string, a lot longer then the example, the string can be as long as 5000 characters.

I attempt to retrieve those objects using a curl request to elasticsearch, like the example below:

curl http://es-instance:9200/myIndexPattern_*/_search?size=10000 -H "Content-Type: application/json" -d '{"size":10000,"query":{"bool":{"must":[{"range":{"time":{"from":"2022-11-17T14:35:32.000Z","to":"2022-11-17T15:35:39.000Z"}}}]}}}'

This results in 0 hits but I do know that there is data available at that time by looking through the kibana UI.

The actual response from the curl request is as follows:


Upon further investigation I found that if the 'data' property in my stored object was smaller in size, then elasticsearch wouldn't have a problem returning those objects from my request.

My question is, is there some max size that an object in elasticsearch can be for it to return it when requested? Or is there some setting I can add to my request to retrieve these larger objects?

To make myself clear this is not a question of max amount of returned objects, but of max size of a single returned object.

Thank you.

The default keyword mapping ignores strings longer than 256 characters, silently dropping values from the list of indexed terms.

Note that if you do raise the Elasticsearch limit, you cannot exceed the hard Lucene limit of 32k for a single token

If you’re searching for exact large values (as opposed to parts of large values) then it may be more efficient to index hashes of the content and deal with the rare issue of false positives in your client.

I don't think i understand your response. Isn't the keyword mapping/ignore_above value used for indexing values. I don't have a problem getting my objects indexed into elasticsearch. I can see them in es using Kibana. My problem is retrieving those already indexed values from elasticsearch externally from the ELK stack.

My bad. I’d assumed you were querying using that long data value - that would have been the only explanation I’d have expected for a field’s size dictating if something is returned or not.

The results have the ‘failed’ and ‘skipped’ properties == 0 so all shards have been queried successfully. Can you share the mappings, missing docs and example query?

After looking into it a bit more. Turns about this was all just a big dumb mistake on my part. There doesn't appear to anything stopping me from querying large sized objects. I don't seem to having this problem anymore.... Why I believe I was having the problem earlier was, in short, due to a timing issue. I was expecting the data that I was querying to be in a certain time range, but because of something stupid i was doing with my code, it was actually being archived in a different time range.

Thank you @Mark_Harwood1 for your quick replies in trying to help me solve my problem.

Some bugs are just waiting for a bigger audience :grinning:
Glad to hear you sorted it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.