The index name is fs-test-001 (Example) in the _settings.json file. After running the FsCrawler, Template is automatically added in the Elastic. I am able to see the Templates for the index as well as for the folders along with its added components.
The Index is created for fs-test-001 but the folder index is not created "fs-test-001_folder".
May be due to some priority issue it is not created. It is solve now.
I have added a ilm policy for rollover based on shard size to the Fscrawler template. But it is not getting rollover to new index. I tried calling alias name in the _settings.json and still it is not working.
Please guide me how to get rollover index (Example: test-001, test-002). The index should change from test-001 to test-002 after reaching the limit.
The second index "test-002" is automatically created and write index is automatically set as true and the write index is set as false for the first index "test-001". But while running the fscrawler, I am getting the below error.
02:32:25,849 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Indexing darknet/f3f48c3d9115879e924ce5c7121c1f10?pipeline=null
02:32:25,849 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Going to execute new bulk composed of 1 actions
02:32:26,552 DEBUG [f.p.e.c.f.c.ElasticsearchEngine] Sending a bulk request of [1] documents to the Elasticsearch service
02:32:26,599 DEBUG [f.p.e.c.f.c.ElasticsearchClient] bulk a ndjson of 46144247 characters
02:32:27,083 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Executed bulk composed of 1 actions
02:32:27,083 WARN [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] There was failures while executing bulk
java.lang.RuntimeException: 1 failures
at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkResponse.buildFailureMessage(FsCrawlerBulkResponse.java:71) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchBulkResponse.buildFailureMessage(ElasticsearchBulkResponse.java:64) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerSimpleBulkProcessorListener.afterBulk(FsCrawlerSimpleBulkProcessorListener.java:43) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerAdvancedBulkProcessorListener.afterBulk(FsCrawlerAdvancedBulkProcessorListener.java:48) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerRetryBulkProcessorListener.afterBulk(FsCrawlerRetryBulkProcessorListener.java:51) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.execute(FsCrawlerBulkProcessor.java:148) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.executeIfNeeded(FsCrawlerBulkProcessor.java:126) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.add(FsCrawlerBulkProcessor.java:113) ~[fscrawler-framework-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.indexRawJson(ElasticsearchClient.java:411) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerDocumentServiceElasticsearchImpl.indexRawJson(FsCrawlerDocumentServiceElasticsearchImpl.java:82) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerDocumentServiceElasticsearchImpl.index(FsCrawlerDocumentServiceElasticsearchImpl.java:76) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.indexFile(FsParserAbstract.java:448) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:277) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:152) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
02:32:27,099 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Error for [ElasticsearchOperation{operation=INDEX, index='darknet', id='f3f48c3d9115879e924ce5c7121c1f10'}]: no write index is defined for alias [darknet]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index
02:32:27,099 WARN [f.p.e.c.f.f.b.FsCrawlerAdvancedBulkProcessorListener] Throttling is activated. Got [0] successive errors so far.
02:32:27,099 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(E:/Data/crawler_data/Test, E:/Data/crawler_data/Test/testing_out.txt) = /testing_out.txt
Apparently
no write index is defined for alias [darknet].
Check that you can actually do something like:
PUT /darknet/_doc/1
{
"foo": "bar"
}
Until you can run that, FSCrawler won't be able to index the data.
Check how you created your rollover. There's probably something wrong there.
When I run the command error is thrown:
PUT /darknet/_doc/1
{
"foo": "bar"
}
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "no write index is defined for alias [darknet]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index"
}
],
"type": "illegal_argument_exception",
"reason": "no write index is defined for alias [darknet]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index"
},
"status": 400
}
I have created rollover based on primary shard size.
I have index alias as "darknet" and rollover_alias as "darkweb". Initially, I am indexing into test-2024-000001 with write_index as true. Now automatically "test-2024-000002" is created and "Write_index" changed to true.
In the Fscrawler setting, elasticsearch.index is mentioned as "darknet".
Please find below the test-2024-000002 settings in fscrawler. The write_index is present as true
{
"test-2024-000002": {
"aliases": {
"darknet": {},
"darkweb": {
"is_write_index": true
}
},
test-2024-000001 index settings:
{
"test-2024-000001": {
"aliases": {
"darknet": {},
"darkweb": {
"is_write_index": false
}
},
Index: test-2024-000001 has 25 documents
Index: test-2024-000002 has 0 documents. Data is not getting added though write_index is true
green open test-2024-000001 DcsJ5gt9QaCa-OtHddtKCg 1 0 25 0 568.1mb 568.1mb 568.1mb
green open test-2024-000002 fSGXCJMpQXGY3spyhJ-6Ew 1 0 0 0 249b 249b 249b
I think you need to use darkweb as the index name in fscrawler.
Thanks. File is indexed now.
I need to give the rollover_alais name as index name in the fscrawler settings.
Can you confirm this?
I will delete and run it again to re-verify it
Yes. If you want to use rollover, that's the way to go I think. I never tried it though.
It could be a nice tip to add to the documentation.
Thanks David. It works perfectly.
I was not aware of the ILM Poll Interval earlier so I had the doubt.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.