How to index file json has size large 100mb with fscrawler and elastic.
I use fscrawler 2.6 and elastic 6.8.0.
Hey,
see default http max content length is 100MB, see https://www.elastic.co/guide/en/elasticsearch/reference/7.4/modules-http.html
You should however not increase that limit, but rather reduce your batches if possible. Hope that makes sense.
--Alex
yes, I know. but my data is books, it has data large 100mb, It cannot be split into small files.
I did changes heap settings. (-Xms10g, -Xmx-10g) and "indexded_chars" : "100%"
I fixed it.
Done
Note that there is a difference between what FSCrawler collects and what it generates. If you don't store the source (BASE64 binary document), then hopefully the extracted content is much less than the source itself.
Also, if you have very big documents, you can change https://fscrawler.readthedocs.io/en/latest/admin/fs/elasticsearch.html#bulk-settings and make sure it's always under 100mb
.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.