FSCrawler is not indexing consistently

I have configured fscrawler to monitor a folder with 13 items (folders, files of types txt, pdf, doc, xls & ppt) But not all files (and their content) are being indexed. Only 8 are being indexed. Is there an error in the configuration?
Elastic Search & FSCrawler versions

    {
  "name" : "d6uoHrq",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "5X293rOKTo-0tqWdEPESGQ",
  "version" : {
    "number" : "6.6.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "3bd3e59",
    "build_date" : "2019-03-06T15:16:26.864148Z",
    "build_snapshot" : false,
    "lucene_version" : "7.6.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search" 

 "version" : "2.7-SNAPSHOT",
   "ok" : true,
   "elasticsearch" : "6.6.2"

Configuration: Changed from the default

        name: "test"
fs:
  url: "/home/test"
  update_rate: "2m"

Contents of the directory

    /home/test# ls -lrRt
.:
total 248
-rw-r--r-- 1 root root 15581 Mar 10 09:27 testdoc.docx
-rw-r--r-- 1 root root 15581 Mar 10 09:27 burnsfirstaid.docx
-rw-r--r-- 1 root root   226 Mar 26 06:33 rhyme
-rw-r--r-- 1 root root 54315 Mar 27 04:24 hickory.pdf
-rw-r--r-- 1 root root   108 Mar 28 04:18 twinkle.txt
-rw-r--r-- 1 root root  8617 Mar 28 06:10 weeks.xlsx
-rw-r--r-- 1 root root 33135 Mar 28 06:22 Testppt.pptx
-rw-r--r-- 1 root root  9373 Mar 28 06:27 months.xlsx
-rw-r--r-- 1 root root 28642 Mar 28 06:42 twinkle.pdf
-rw-r--r-- 1 root root 34433 Mar 28 06:43 Testppt2.pptx
-rw-r--r-- 1 root root 19331 Mar 28 06:44 computerpractise.docx
drwxr-xr-x 2 root root  4096 Mar 28 07:05 test2
drwxr-xr-x 3 root root  4096 Mar 28 07:12 test1

./test2:
total 36
-rw-r--r-- 1 root root 34433 Mar 28 06:43 Testppt2.pptx

./test1:
total 24
-rw-r--r-- 1 root root 19331 Mar 28 06:44 computerpractise.docx
drwxr-xr-x 2 root root  4096 Mar 28 07:12 test11

./test1/test11:
total 0

Indices in elasticsearch

health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   test                       D0LQH7nZRLuGBzCFiLKQUg   1   1          8            0    198.5kb        198.5kb
yellow open   test_folder                2aZtjtKETkG-JR39myuhQw   5   1          4            0     
16.8kb         16.8kb

Log file
fscrawler log file

I don't see FSCrawler logs. Could you share them on gist.github.com?
Anything strange in logs?

Sorry. I had missed adding the link to the log file. Now I updated the link in the original post. No, there are no warnings or errors in the log file.
I deleted this job and index. And created a new job (& index) and I see similar behavior.
I can upload all the files using the RESTApi's upload option. But the fscrawler on its own ignores (does not index) some files.

Could you run it again with the --restart option?

I ran fscrawler with -restart option and all the files (that are present at the start) are indexed. If I add files after the start, they are not indexed though the fscrawler wakes up after the update interval.

The current implementation works with dates.
It depends on the OS.
On some OS moving a file to the scanned dir is not going to change any file date so the file won't be picked up by FSCrawler.
You need to "touch" the file.

1 Like

OK. Understand now. Thanks David for the timely response.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.