Does elasticsearch provide production support for fscrawler?
Welcome!
FSCrawler is a community project. Not supported by Elastic.
But I do support it as best as I can here
Thank you @dadoonet for your response.
Do you know if someone can help me get some feedback on past production usage please?
Also, do you know if there is a similar feature/plugin supported by Elasticsearch?
I can. Ask your questions here.
The ingest attachment plugin is officially supported.
Also have a look at Workplace search.
I am using FScrawler to index a bulk set of documents maintained in a fixed folder structure. Let's say 1 TB.
While my POC (100 files) has been successful to index this directory content (stored on azure) into ES in short time, i'm looking at a continuous change/creation/deletion of files (say 100 files per day) using FScrawler.
Maintenance of the crawler in terms of documents processed, error monitoring, failure recovery etc are still not clear.
While crawler enables content search which is my primary requirement, any solution within fscrawler or alternatives will be appreciated.
Excellent creation btw!
Thank you in advance!
PS: I will look up the suggestions you already posted last.
Maintenance of the crawler in terms of documents processed, error monitoring, failure recovery etc are still not clear.
I'd say that the crawler is not designed (yet) to be resilient. It means that if it crashes, some of the documents might have been indexed while some others won't.
When you restart the job, it will most likely index again documents that have been already indexed during the run when it crashed. The other documents will be indexed.
There are a lot of things to make it work better and sadly I don't have enough free time to spend on this yet. My primary goals are for the future:
- Use a WatchService instead of going recursively in each folder
- Connect to WorkPlace search to offer a nice UI experience
- Also storing the jobs inside elasticsearch instead of the local disk would be much better
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.