I tried downloading the zip file and configured the same.
I see the below error while starting up the fscrawler. Any suggestions ?
00:33:01,568 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [1.9gb/29.9gb=6.35%], RAM [262.2gb/314.5gb=83.38%], Swap [49.9gb/49.9gb=100.0%].
00:33:01,808 WARN [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
00:33:01,808 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
java.lang.IllegalArgumentException: HTTP Host may not be null
at org.apache.http.util.Args.containsNoBlanks(Args.java:81) ~[httpcore-4.4.13.jar:4.4.13]
at org.apache.http.HttpHost.create(HttpHost.java:108) ~[httpcore-4.4.13.jar:4.4.13]
at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.lambda$buildRestClient$1(ElasticsearchClientV7.java:385) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
at java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?]
at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.buildRestClient(ElasticsearchClientV7.java:385) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:141) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:257) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
00:33:01,817 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [dba_docs] stopped
00:33:01,818 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [dba_docs] stopped
Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.
I will be doing the restart again and confirm the output.
Meanwhile, could you please let me know if it is possible to add a link to a source location of a document via fscrawler and pass it to elasticsearch ?
Below is the scenario as an example :
--> I will index a pdf document into elasticsearch.
--> The original pdf is available at a sharepoint or some external location.
--> I would like to have a link to that source.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.