Speed of elastic search

How to increase Elasticsearch speed . I am uploading 75 files at a time and i want to add pdf file content into database.But it is taking 15-20 min.
Any suggestions.

I have used fscrawler . I am starting fscrawler using shell_exec command for one loop when I am doing bulk upload. And my update_rate is 15m . The Elasticsearch taking too much time to read contents of file.

I don't think it's Elasticsearch cause.

FSCrawler is mono-threaded. So it could take time specifically if you are using OCR.
It would help if you share the FSCrawler logs.

If problem, please run FSCrawler using --debug option so we can see more details.

Okay Sir, I will check and get back to you.

When I upload any files through bulk upload .Then can we Add/Edit/Delete only that files to Elastic search Database instead of creating index everytime.
Because i am starting fscrawler everytime so that it may be taking time.

OCR is set to true.Need to make false on server ?

When I upload any files through bulk upload

I'm not sure to understand. What do you mean? What are you doing?

Then can we Add/Edit/Delete only that files to Elastic search Database instead of creating index everytime.

I don't understand. Could you clarify?

Because i am starting fscrawler everytime so that it may be taking time.

Once FSCrawler has ran and is in "sleep" mode, the next run will only collect new/updated/deleted files.

OCR is set to true.Need to make false on server ?

If you don't want OCR and have Tesseract available in the path, yes, set ocr to false.

Sir I have one module through which i am uploading 75 files at a time through bulk upload in PHP. So after adding that files in folder. I am starting fscrawler using shell_exec command once. So it is updating Elasticsearch index . And after that i am taking content from elastic query and adding that content in database. But this process taking time. iwant to speed up this process.

On server we kept elastci search on .But when i Add/Edit/Delete any file then i am starting fscrawler in one loop using shell_exec command.

On local server working fast but on live server taking time

How can we speed up to read large scale data upto 30K?
Please reply for all above queries.

It's unclear to me what exactly you're using, but if you're doing this via the attachment processor, we very recently found and fixed a tika logging configuration problem there that under some circumstances could cause severe performance problems. Take a look at https://github.com/elastic/elasticsearch/pull/93878. If you are seeing a lot of logging in your Elasticsearch logs or stderr (and assuming you're using the attachment processor), then this could be your problem. If you are not seeing any, then this is probably not your problem.

While uploading large amount of files giving timeout error.

Most likely you are using the bulk API, right?
Send smaller bulk requests. That could help.

Not using bulk API concept. Ulpoading bulk files using PHP function.

How?

We are saving files in folder and then adding into database.

I meant: which code?

You should really give a lot of details about what you are doing and how if you want to get meaningful answers.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.