Saving requests from fscrawler

rsofaer · February 10, 2020, 7:38pm

Hello,

I am using elasticsearch and fscrawler to process and index a number of documents. In order to extract structured data from these documents, I have a growing ingest pipeline. I'd like to include the ingest pipeline in my automated tests, and in general to have fixtures for my tests that I don't have to update every time I change the ingest pipeline. However, I don't want to include fscrawler in my tests, it doesn't seem suited to it.

Is there a way I can save the requests sent from fs_crawler to elasticsearch so I can replay them for test suite setup and ingest pipeline testing? I tried running it under --trace but that doesn't seem to be part of the output.

Thank you,
Raphael Sofaer

dadoonet · February 10, 2020, 9:47pm

I think that you could use the FSCrawler REST endpoint and its simulate API to get an actual document.

Otherwise let FSCrawler index some documents in elasticsearch and then just get some of them using a normal search.

rsofaer · February 10, 2020, 11:05pm

Thanks, I'm pretty sure that's going to work for me. In case anyone else sees this:
Start fs_crawler with the --rest option and send the file to fs_crawler with both debug=true and simulate=true in the query parameters and fs_crawler will return the JSON document it would have sent to Elasticsearch.
https://fscrawler.readthedocs.io/en/fscrawler-2.5/admin/fs/rest.html#simulate-upload

Thanks for your work and support!

system · March 9, 2020, 11:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
FSCrawler Question Elasticsearch	7	3087	March 17, 2017
Recommended workflow for indexing many binary docs Elasticsearch	4	768	July 6, 2021
FScrawler: Production Support Available? Elasticsearch	6	446	May 21, 2020
Fscrawler and Elasticsearch Parsing ingested document/ stop at 1st value groked Discussions en français	2	651	June 25, 2018
Fscrawler pipeline feature Elasticsearch	11	2242	July 26, 2018

Saving requests from fscrawler

Related topics