Saving requests from fscrawler

Hello,

I am using elasticsearch and fscrawler to process and index a number of documents. In order to extract structured data from these documents, I have a growing ingest pipeline. I'd like to include the ingest pipeline in my automated tests, and in general to have fixtures for my tests that I don't have to update every time I change the ingest pipeline. However, I don't want to include fscrawler in my tests, it doesn't seem suited to it.

Is there a way I can save the requests sent from fs_crawler to elasticsearch so I can replay them for test suite setup and ingest pipeline testing? I tried running it under --trace but that doesn't seem to be part of the output.

Thank you,
Raphael Sofaer

I think that you could use the FSCrawler REST endpoint and its simulate API to get an actual document.

Otherwise let FSCrawler index some documents in elasticsearch and then just get some of them using a normal search.

Thanks, I'm pretty sure that's going to work for me. In case anyone else sees this:
Start fs_crawler with the --rest option and send the file to fs_crawler with both debug=true and simulate=true in the query parameters and fs_crawler will return the JSON document it would have sent to Elasticsearch.
https://fscrawler.readthedocs.io/en/fscrawler-2.5/admin/fs/rest.html#simulate-upload

Thanks for your work and support!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.