FSCrawler - Folder index is not getting created in the latest version

The index name is fs-test-001 (Example) in the _settings.json file. After running the FsCrawler, Template is automatically added in the Elastic. I am able to see the Templates for the index as well as for the folders along with its added components.
The Index is created for fs-test-001 but the folder index is not created "fs-test-001_folder".

May be due to some priority issue it is not created. It is solve now.

I have added a ilm policy for rollover based on shard size to the Fscrawler template. But it is not getting rollover to new index. I tried calling alias name in the _settings.json and still it is not working.

Please guide me how to get rollover index (Example: test-001, test-002). The index should change from test-001 to test-002 after reaching the limit.

The second index "test-002" is automatically created and write index is automatically set as true and the write index is set as false for the first index "test-001". But while running the fscrawler, I am getting the below error.

02:32:25,849 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Indexing darknet/f3f48c3d9115879e924ce5c7121c1f10?pipeline=null
02:32:25,849 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Going to execute new bulk composed of 1 actions
02:32:26,552 DEBUG [f.p.e.c.f.c.ElasticsearchEngine] Sending a bulk request of [1] documents to the Elasticsearch service
02:32:26,599 DEBUG [f.p.e.c.f.c.ElasticsearchClient] bulk a ndjson of 46144247 characters
02:32:27,083 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Executed bulk composed of 1 actions
02:32:27,083 WARN  [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] There was failures while executing bulk
java.lang.RuntimeException: 1 failures
        at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkResponse.buildFailureMessage(FsCrawlerBulkResponse.java:71) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchBulkResponse.buildFailureMessage(ElasticsearchBulkResponse.java:64) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerSimpleBulkProcessorListener.afterBulk(FsCrawlerSimpleBulkProcessorListener.java:43) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerAdvancedBulkProcessorListener.afterBulk(FsCrawlerAdvancedBulkProcessorListener.java:48) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerRetryBulkProcessorListener.afterBulk(FsCrawlerRetryBulkProcessorListener.java:51) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.execute(FsCrawlerBulkProcessor.java:148) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.executeIfNeeded(FsCrawlerBulkProcessor.java:126) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.add(FsCrawlerBulkProcessor.java:113) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.indexRawJson(ElasticsearchClient.java:411) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerDocumentServiceElasticsearchImpl.indexRawJson(FsCrawlerDocumentServiceElasticsearchImpl.java:82) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerDocumentServiceElasticsearchImpl.index(FsCrawlerDocumentServiceElasticsearchImpl.java:76) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.indexFile(FsParserAbstract.java:448) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:277) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:152) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
02:32:27,099 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Error for [ElasticsearchOperation{operation=INDEX, index='darknet', id='f3f48c3d9115879e924ce5c7121c1f10'}]: no write index is defined for alias [darknet]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index
02:32:27,099 WARN  [f.p.e.c.f.f.b.FsCrawlerAdvancedBulkProcessorListener] Throttling is activated. Got [0] successive errors so far.
02:32:27,099 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(E:/Data/crawler_data/Test, E:/Data/crawler_data/Test/testing_out.txt) = /testing_out.txt

Apparently

no write index is defined for alias [darknet].

Check that you can actually do something like:

PUT /darknet/_doc/1
{
  "foo": "bar"
}

Until you can run that, FSCrawler won't be able to index the data.

Check how you created your rollover. There's probably something wrong there.

When I run the command error is thrown:

PUT /darknet/_doc/1
{
  "foo": "bar"
}

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "no write index is defined for alias [darknet]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "no write index is defined for alias [darknet]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index"
  },
  "status": 400
}

I have created rollover based on primary shard size.

I have index alias as "darknet" and rollover_alias as "darkweb". Initially, I am indexing into test-2024-000001 with write_index as true. Now automatically "test-2024-000002" is created and "Write_index" changed to true.
In the Fscrawler setting, elasticsearch.index is mentioned as "darknet".

Please find below the test-2024-000002 settings in fscrawler. The write_index is present as true

{
  "test-2024-000002": {
    "aliases": {
      "darknet": {},
      "darkweb": {
        "is_write_index": true
      }
    },

test-2024-000001 index settings:

{
  "test-2024-000001": {
    "aliases": {
      "darknet": {},
      "darkweb": {
        "is_write_index": false
      }
    },

Index: test-2024-000001 has 25 documents
Index: test-2024-000002 has 0 documents. Data is not getting added though write_index is true

green  open   test-2024-000001                                          DcsJ5gt9QaCa-OtHddtKCg   1   0         25            0    568.1mb        568.1mb      568.1mb
green  open   test-2024-000002                                          fSGXCJMpQXGY3spyhJ-6Ew   1   0          0            0       249b           249b         249b

I think you need to use darkweb as the index name in fscrawler.

Thanks. File is indexed now.
I need to give the rollover_alais name as index name in the fscrawler settings.
Can you confirm this?

I will delete and run it again to re-verify it

Yes. If you want to use rollover, that's the way to go I think. I never tried it though.
It could be a nice tip to add to the documentation.

1 Like

Thanks David. It works perfectly.
I was not aware of the ILM Poll Interval earlier so I had the doubt.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.