Can we index incremental data for files using FSCrawler?

pyerunka · July 31, 2019, 4:34am

Hello,

Currently i am using url parameter from .settings.yaml file. in url i have mentioned the path of the drive from which files are getting indexed.
if any new file is added into the file system and if i want to index only that file and add into the old file index. is that possible using FSCrawler?

Kindly guide.

Regards,
Priyanka

dadoonet · July 31, 2019, 8:42am

That's the default behavior of FSCrawler.

pyerunka · July 31, 2019, 8:59am

Hello @dadoonet,

Thanks for your reply!!!!
That means if any new file is added to drive and i have run the fSCrawler job, then it will index only that file. It will not index all the other files and will create duplicate entries. Am i right? correct me if i am wrong.

Reagrds,
Priyanka

dadoonet · July 31, 2019, 9:15am

That's correct.

pyerunka · July 31, 2019, 10:10am

Hello @dadoonet,

Yes, thanks for your help!!!
I have tried this solution and it works.
another question, can we schedule FSCrawler job which currently i am running it manually?
and can we provide more than one file URL paths to create index?

Regards,
Priyanka

dadoonet · July 31, 2019, 10:22am

Once started it runs every 15 minutes by default. You can change this with https://fscrawler.readthedocs.io/en/latest/admin/fs/local-fs.html

pyerunka · July 31, 2019, 11:35am

Hello @dadoonet,

I can see that my job settings file has Update rate as 15m by default.
If any new changes are there it will run atomatically after 15m? or we have to run the FSCrawler job every time using cmd? anyways running through cmd will index the newly added document.
clear me on this.

Regards,
Priyanka

dadoonet · July 31, 2019, 12:07pm

It should detect any new change every 15 minutes.

pyerunka · August 1, 2019, 4:40am

Hello @dadoonet,

Yes it has run successfully. I was using --loop 1 while running FSCrawler job.
Thanks for your help!!

Regards,
Priyanka

dadoonet · August 1, 2019, 4:43pm

Yeah. loop 1 means it runs once and exits.

pyerunka · August 2, 2019, 4:54am

Hello @dadoonet,

Thanks! one more question, is there any way we can give more than one file system URL using FSCrawler job? so that we can index files from another file systems also.

Regards,
Priyanka

dadoonet · August 2, 2019, 8:10am

It needs to be another job for now.

system · August 30, 2019, 9:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can fscrawler index files from different servers? Elasticsearch	10	1527	February 12, 2020
FSCrawler not indexing certain fields Elasticsearch	2	458	July 27, 2018
Efficient Metadata Indexing for Large Filesystem in Elasticsearch Elasticsearch	2	140	April 23, 2024
FScrawler does not scans the `/tmp/es` Elasticsearch	2	532	December 28, 2021
Importing data to elasticsearch automatically using a folder Elasticsearch	5	1338	May 15, 2018

Can we index incremental data for files using FSCrawler?

Related topics