I am using elasticsearch and fscrawler to process and index a number of documents. In order to extract structured data from these documents, I have a growing ingest pipeline. I'd like to include the ingest pipeline in my automated tests, and in general to have fixtures for my tests that I don't have to update every time I change the ingest pipeline. However, I don't want to include fscrawler in my tests, it doesn't seem suited to it.
Is there a way I can save the requests sent from fs_crawler to elasticsearch so I can replay them for test suite setup and ingest pipeline testing? I tried running it under --trace but that doesn't seem to be part of the output.
Thanks, I'm pretty sure that's going to work for me. In case anyone else sees this:
Start fs_crawler with the --rest option and send the file to fs_crawler with both debug=true and simulate=true in the query parameters and fs_crawler will return the JSON document it would have sent to Elasticsearch. https://fscrawler.readthedocs.io/en/fscrawler-2.5/admin/fs/rest.html#simulate-upload
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.