I'm new to Elasticsearch. I have successfully installed Elasticsearch with Kibana, X-pack and ingest-attachment. I have both Elasticsearch and Kibana running. I have kept it simple at the moment with the install using default options on a windows 2012 server. I have a directory on another drive w\mydocs and at the moment it just has 3 plain text files in it, but I will want to add others like pdf and doc file types. So now I want to get these files into Elasticsearches index.
I then try to get m document into the index in the following way: PUT localhost:9200/documents/1?pipeline=docs -d @/w/mydocs/README.TXT
looking around the net it seems I have to somehow get my document into Base64, but then I read that In this release of elastic, elastic doesn't store the document only the reference. When I come to set Elastic up for live we have 1000's of documents I cant believe that I would need to encode all of them to Base64 first, however if I have to how do I do it. My PUT statement above just fails.
I have downloaded FSCrawler 2.3 from github. Reading the installation guide, it doesn't mention anything about installing in the windows environment. Do I just need to unzip to a subdirectiry within Elasticsearch. My Elasticsearch is at the default endpoint of http://localhost:9200 at the moment. How do I let one know about the other.
HI David, I unpacked Fscrawler to my C:\Program Files\Elastic\FScrawler directory and then using powershell, I went to the bin directory and then ran .\fscrawler mydocs --loop 1 the job didn't exist, so I said yes to creating it. It's created a directory .fscrawler under C:\Users\wCrawley. Can I move this directory to a more appropriate place and wil it still run.
Can I point you to this issue that I've raised on stack overflow You will probably have to expand my screen shot, but I'm getting 3 errors when I run fscrawler. The directory that I'm pointing to has 3 text files fscrawler issue
I seem to have resolved the java error. In the _settings.json file in the elsticsearch section I added a "pipline" node. now when I run from the powershell window all looks ok. If I flip over into Kibana and enter GET myindex/_search it doesn't show anything.
Now when I run fscrawler I do not receive any errors. In the _settings.json file I have removed the "pipeline" setting for the elasticsearch setting, but left the "index" setting to myindex. In the _status.json file that has been created in the same place as _settings file the indexed value is set to 0. Where do the fscrawler logs reside, I cannot find anything obvious.
If I use Kibana and do: GET / myindex { "query" : { "match_all": {} } }
I receive the following error in the output window: { "error": "Content-Type header [text/plain] is not supported", "status": 406 }
I realised that I hadn't set the JAVA_HOME Variable correctly. I set this to as it should be and re-run my fscrawler command. Still no error after stopping and starting Elasticsearch and Kibana. Back in Kibana if I enter the following: GET localhost:9200/myindex { "query" : { "match_all" : {} }}
then in the output window
What I'm trying to see in the output window is the data from the 3 files that have been indexed. When I run the fscrawler command in my powershell window I also added --debug this clearly showed that my 3 files had been found.
interesting, introducing the --restart states that it cannot access _status.js as it is being used by another process. I stopped the kibana server in-case it was that, but it made no difference. Where would it be logging to ? I cant see anything other than in the output window of powershell
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.