When I query data from ES8.2.3 with a big size in one req that the index docs are more than 10K and got en error Error: entity content is too long [105072697] for the configured buffer limit [104857600].
After search the doc there is one hardcode restrict to 100mb for response content limit. Can we have a way to work around to enalrge this limit?
What client are you using? There is often a client HTTP response limit, and that is probably what you are seeing. You could validate by sending exact same query with curl which should just stream the response rather than buffering it. If that works, it would strongly suggest a client limitation.
Note in many cases, such large (100MB+) replies are not ideal, think resource usage amongst other reasons. There are alternatives using scroll API or search_after or ... You dont tell us enough to advise on this, but something also to think about.
That’s correct, I used trino sql engine to connect ES and use its raw query to request log in one request. A part query as below. For a big time range query which would reponse a little more bigger than 100mb content in the response body and got error as above.
SELECT result
FROM TABLE(elastic_mozart.system.raw_query(schema => 'default', index => 'ailogs-oneapi', query => '{
"size": 10000000,
"query": {
"bool": {
"filter": [
{.....
I know that I could use scroll streaming api with curl, but my case is using trino elasetic plugin which I couldn’t attach the raw_query function. Also for sure I can use dummy SQL query instead the raw_query but which is really slow performance. And for most of my case 100mb is fine, only a little bit case to query full data which hit the limitation.
I read relative talk in below session, but have no idea how could work around to enlarge a little bit more than 100mb is OK to me.
Thanks @dadoonet checking, yes I could. Actually I did use the standard SQL via scroll request at beginning, but the performance is really slow than the raw_query which I could take advantage the fast query on Elasticsearch end to propare the data.
I did enlarge the index setting to over scroll request 1000. And for sure its NOT the ES query proble. But the data size that response content bigger than 100mb.
My gut feeling is that you are on the wrong road, expecting a sort of bulk exporter. There are other tools for that. But obviously you know your use case and limitations better than I.
If you are just over the 100mb cusp, consider whether you can drop fields from the returned response, i.e. reduce the returned fields to the absolute minimum and it might fit? But that would be kicking the can down the road a little bit.
Good luck. Maybe someone else can help you bump the limit ....
I saw that Trino is compatible with 8.x. So I don’t understand what this 6.x version is doing here.
Very unclear to me.
As Trino seems to maintain the connector, I’d ask them for support. There’s nothing wrong on Elasticsearch side.
If you are still using ES 6.x, it’s a matter of urgency to upgrade your cluster. 6.x did not get a lot of the security patches. Please update to 9.x.
Hi @dadoonet my ES version is 8.2.3 But trino’s elasticsearch plugin is useing 6.8.23 elasticsearch-rest-clitne jar I think which is compatable for ES8.2.3
Adjusted, I jsut successfully upgrade trino to latest version 480 which point to ES7.17.29
ES7.17.29 is indeed better but still old and coming with a tons of non needed dependencies…
And if they are using only the old low level client, there’s no need to import the rest high level client.
If they do use the old client, they would need to upgrade it to the new one which much much better.
But, coming back to the initial discussion I think you might be shooting in your own foot.
When looking at the query you shared, I think you can may be translate it a SQL query and then leverage all what Trino has built for extracting a huge dataset.
Thanks @dadoonet remind. I did try to use stand SQL via trino-elastcisearch plugin which using scroll api to fetch data but the performance is extreamlly slow, even the plan phase would take almost 30s, no matter the execution phase would take 5-8min for the 1million data set.
I turnned to use the raw_query which pass the elasticseach query directly to ES end for quick fetching the data and response to trino, which only took no more than 1min for total 1million data. That is faster 5-8times than standard SQL
Hey, not yet, you see this very simple sql: SELECT org_cid,org_chain, oneapi_tokenname
FROM elastic_mozart.default."ailogs-oneapi"
LIMIT 100001 And that performance is just after I did the config tuneing elasticsearch.scroll-size=10000 The planning time always took 20-30s. No matter When my sql has aggration operatin like group by where which took 2-8min
But when I turn to use raw_query which fetch all data from ES without scroll api(multi time request) which took no more than 10s
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.