Error: entity content is too long [105072697] for the configured buffer limit [104857600]

When I query data from ES8.2.3 with a big size in one req that the index docs are more than 10K and got en error Error: entity content is too long [105072697] for the configured buffer limit [104857600].

After search the doc there is one hardcode restrict to 100mb for response content limit. Can we have a way to work around to enalrge this limit?

Have you tried increasing http.max_content_length in your elasticsearch.yml? The docs say it is defaulted to 100mb but can be increased. Networking | Elasticsearch Guide [8.2] | Elastic

Welcome to the forum

What client are you using? There is often a client HTTP response limit, and that is probably what you are seeing. You could validate by sending exact same query with curl which should just stream the response rather than buffering it. If that works, it would strongly suggest a client limitation.

Note in many cases, such large (100MB+) replies are not ideal, think resource usage amongst other reasons. There are alternatives using scroll API or search_after or ... You dont tell us enough to advise on this, but something also to think about.

The docs say (my emphasis)

http.max_content_length: Maximum size of an HTTP request body. Defaults to 100mb.

I believe @Chen_Wen is hitting error on 100mb+ responses rather than requests.

1 Like

That’s correct, I used trino sql engine to connect ES and use its raw query to request log in one request. A part query as below. For a big time range query which would reponse a little more bigger than 100mb content in the response body and got error as above.

SELECT result
      FROM TABLE(elastic_mozart.system.raw_query(schema => 'default', index => 'ailogs-oneapi', query => '{
  "size": 10000000,
  "query": {
   "bool": {
   "filter": [
    {.....

I know that I could use scroll streaming api with curl, but my case is using trino elasetic plugin which I couldn’t attach the raw_query function. Also for sure I can use dummy SQL query instead the raw_query but which is really slow performance. And for most of my case 100mb is fine, only a little bit case to query full data which hit the limitation.

I read relative talk in below session, but have no idea how could work around to enlarge a little bit more than 100mb is OK to me.

I looked at the documentation and found that they are using scroll behind the scene and I guess it works only with something else than RAW queries.

elasticsearch.scroll-size Sets the maximum number of hits that can be returned with eachElasticsearch scroll request. 1000

The raw query you shared is not supposed to work in Elasticsearch unless you changed the default index settings. Did you?

Can't you use a standard SQL Query instead? What is the full query you are sending?

Thanks @dadoonet checking, yes I could. Actually I did use the standard SQL via scroll request at beginning, but the performance is really slow than the raw_query which I could take advantage the fast query on Elasticsearch end to propare the data.

I did enlarge the index setting to over scroll request 1000. And for sure its NOT the ES query proble. But the data size that response content bigger than 100mb.

I just went through trino/plugin/trino-elasticsearch/src/main/java/io/trino/plugin/elasticsearch/client/ElasticsearchClient.java at master · trinodb/trino · GitHub tried to find a way could customized the response content size restirction. But seems it depends the ES official client class…

elastic_mozart.system.raw_query(
  schema => 'default',  
  index => 'ailogs-oneapi',  
  query => '{
  "size": 10000000,
  "query": {
   "bool": {
   "filter": [
    {
    "range": {
     "@timestamp": {
     {%- set bounds = [] -%}
      
     {%- if from_dttm -%}
      {%- set _ = bounds.append("\"gte\": \"" ~ (from_dttm | string | replace(" ", "T")) ~ "\"") -%}
     {%- endif -%}
      
     {%- if to_dttm -%}
      {%- set _ = bounds.append("\"lte\": \"" ~ (to_dttm | string | replace(" ", "T")) ~ "\"") -%}
     {%- endif -%}

     {{ bounds | join(", ") if bounds else "\"gte\": \"0\"" }}
     }
    }
    },
    {
    "exists": { "field": "org_cid" }
    }
  
    {%- set selected_users = filter_values("username") -%}
    {%- if selected_users -%}
    , {
    "terms": {
     "calc_coreid": [
     {%- for user in selected_users -%}
     "{{ user }}"{% if not loop.last %}, {% endif %}
     {%- endfor -%}
     ]
    }
    }
    {%- endif -%}
     
    {%- set selected_managers = filter_values("manager") -%}
    {%- if selected_managers -%}
    , {
    "bool": {
     "minimum_should_match": 1,
     "should": [
     {%- for mgr in selected_managers -%}
     {
     "wildcard": {
     "org_chain.keyword": "*{{ mgr }}*"
     }
     }{% if not loop.last %}, {% endif %}
     {%- endfor -%}
     ]
    }
    }
    {%- endif -%}
   
    {%- set selected_models = filter_values("modelname") -%}
    {%- if selected_models -%}
    , {
    "terms": {
     "oneapi_modelname.keyword": [
     {%- for model in selected_models -%}
     "{{ model }}"{% if not loop.last %}, {% endif %}
     {%- endfor -%}
     ]
    }
    }
    {%- endif -%}

   ],
   "must_not": [
    {
     "term": {
      "org_cid.keyword": "INVALID"
     }
    },
    {
     "wildcard": {
      "org_cid.keyword": "*-api"
     }
    }
   ]
   }
  },
  "_source": ["@timestamp", "org_cid", "oneapi_modelname", "org_chain"],
  "fields": ["calc_coreid"]
  }'

My gut feeling is that you are on the wrong road, expecting a sort of bulk exporter. There are other tools for that. But obviously you know your use case and limitations better than I.

If you are just over the 100mb cusp, consider whether you can drop fields from the returned response, i.e. reduce the returned fields to the absolute minimum and it might fit? But that would be kicking the can down the road a little bit.

Good luck. Maybe someone else can help you bump the limit ....

Thanks remind @RainTown I still want to try compile the target jar: elasticsearch-rest-client-6.8.23.jar And found this class elasticsearch/client/rest/src/main/java/org/elasticsearch/client/HttpAsyncResponseConsumerFactory.java at 706067211ae880bbe4669286ee976552e8a60446 · elastic/elasticsearch · GitHub

But I don’t know how to compile it. Could you guide me how I could generate this client jar with specific version I used 6.8.23

What is your elasticsearch version?

I saw that Trino is compatible with 8.x. So I don’t understand what this 6.x version is doing here.

Very unclear to me.
As Trino seems to maintain the connector, I’d ask them for support. There’s nothing wrong on Elasticsearch side.

If you are still using ES 6.x, it’s a matter of urgency to upgrade your cluster. 6.x did not get a lot of the security patches. Please update to 9.x.