I have FSCrawler working on a DEV box where the documents are located on the same server as FSCrawler and ElasticSearch. in the _settings.json file I just set the url to be my document location in the form "D:\MyDocs"
Now I'm moving elasticsearch and fscrawler onto a new server and placing the documents onto a seperate server. How should I format the value for url in my _settings.json file now?
It's a new dedicated server. we have never accessed the documents from here. I don't really want to map a network drive from the server where FSCrawler will be installed to the server where the docs are stored. Our set up will be a three server solution Server 1 is our webserver. server 2 our document server and server 3 our search server.
Yes, I'm setting up from scratch so It's what I put in the url to gain access to the drive on the other server. I'm installing FSCrawler on the Elasticsearch Server. So it's going on server 3
I've been given the green light such that I can map the drive as a network drive. I've therefore done that (mapped as as E drive) so in my settings file I have set the URL to "E:\\" . HOwever now when I try and run fscrawler, I recieve a fatal error. Using --debug on the command it says : failed to create elasticsearch client. Elasticsearch is up and running and I can Kibana is reaching it fine.
I realised in the settings file I had the rest services ip address set incorrectly. (typo). After correcting this it gets further but now says the E:\ doesn't exist
Here's the log:
11:58:05,843 e[36mDEBUGe[m [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/_settings.json] already exists
11:58:05,858 e[36mDEBUGe[m [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/_settings_folder.json] already exists
11:58:05,858 e[36mDEBUGe[m [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/_settings.json] already exists
11:58:05,858 e[36mDEBUGe[m [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/_settings_folder.json] already exists
11:58:05,858 e[36mDEBUGe[m [f.p.e.c.f.u.FsCrawlerUtil] Mapping [6/_settings.json] already exists
11:58:05,858 e[36mDEBUGe[m [f.p.e.c.f.u.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
11:58:05,858 e[36mDEBUGe[m [f.p.e.c.f.FsCrawler] Starting job [bamdocs]...
11:58:08,166 e[36mDEBUGe[m [f.p.e.c.f.c.ElasticsearchClient] Using elasticsearch >= 5, so we can use ingest node feature
11:58:08,651 e[33mWARN e[m [f.p.e.c.f.FsCrawler] We found old configuration index settings in [C:\Program Files\Elastic\FsCrawler\Jobs] or [C:\Program Files\Elastic\FsCrawler\Jobs\bamdocs\_mappings]. You should look at the documentation about upgrades: https://github.com/dadoonet/fscrawler#upgrade-to-23
11:58:08,651 e[32mINFO e[m [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
11:58:08,651 e[36mDEBUGe[m [f.p.e.c.f.c.ElasticsearchClientManager] FS crawler connected to an elasticsearch [5.6.3] node.
11:58:08,651 e[36mDEBUGe[m [f.p.e.c.f.c.ElasticsearchClient] create index [bamindex]
11:58:08,682 e[36mDEBUGe[m [f.p.e.c.f.c.ElasticsearchClient] create index [bamdocs_folder]
11:58:08,697 e[36mDEBUGe[m [f.p.e.c.f.FsCrawlerImpl] creating fs crawler thread [bamdocs] for [E:\] every [15m]
11:58:08,697 e[32mINFO e[m [f.p.e.c.f.FsCrawlerImpl] FS crawler started for [bamdocs] for [E:\] every [15m]
11:58:08,697 e[36mDEBUGe[m [f.p.e.c.f.FsCrawlerImpl] Fs crawler thread [bamdocs] is now running. Run #1...
11:58:08,713 e[33mWARN e[m [f.p.e.c.f.FsCrawlerImpl] Error while crawling E:\: E:\ doesn't exists.
11:58:08,713 e[33mWARN e[m [f.p.e.c.f.FsCrawlerImpl] Full stacktrace
java.lang.RuntimeException: E:\ doesn't exists.
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl$FSParser.run(FsCrawlerImpl.java:325) [fscrawler-2.4.jar:?]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_152]
11:58:08,713 e[32mINFO e[m [f.p.e.c.f.FsCrawlerImpl] FS crawler is stopping after 1 run
11:58:09,492 e[36mDEBUGe[m [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [bamdocs]
11:58:09,492 e[36mDEBUGe[m [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
11:58:09,492 e[36mDEBUGe[m [f.p.e.c.f.FsCrawlerImpl] FS crawler Rest service stopped
11:58:09,492 e[36mDEBUGe[m [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
11:58:09,492 e[36mDEBUGe[m [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
11:58:09,492 e[36mDEBUGe[m [f.p.e.c.f.c.ElasticsearchClient] REST client closed
11:58:09,492 e[36mDEBUGe[m [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
11:58:09,508 e[32mINFO e[m [f.p.e.c.f.FsCrawlerImpl] FS crawler [bamdocs] stopped
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.