I had configured fscrawler 2.10 on ubuntu which works great, got some minor issues: -
a. why touch is required? i can understand timestamp has to be new on copied or uploaded files in directory so fscrawler looks only for newer ones? why? why it doesnt just pickup/crawl whatever new file uploaded regardless on date.
b. rest api- i have uploaded using multi-form data got below details:-
my concern is; where is this temp2.doc file physically uploaded? cant find this file on fs: url location. so where does it really uploads?
elasticsearch url can be used to view details form above output, but why this upload is not shown in minotoring-dicover section-kibana?
c. kibana-doscover shows only manually copied files into directory. doesnt detects or shows from rest-api why is it so?
That's because of the current implementation. I want to implement a WatchService but it's not there.
It's not uploaded as sadly Elasticsearch does not have a binary blob store like s3-like. The idea is that you share using an http server the source file somewhere.
If you want, you can activate the store_source option:
But I don't recommend it unless you are storing very tiny documents (maximum some kilobytes).
My guess is that Kibana uses a date which is not available in case of the REST interface. Could you check if you are using a date field in the Kibana index pattern and which one is it?
Appreciate your time for the clarifications. Just need your advise further:
"It's not uploaded as sadly Elasticsearch does not have a binary blob store like s3-like. The idea is that you share using an http server the source file somewhere.
If you want, you can activate the store_source But I don't recommend it unless you are storing very tiny documents (maximum some kilobytes)."
**** My expectation is to store uploaded file from Rest API to file system for e.g. "/home/testweb/fscrawler-2.10/uploads" apart from getting indexed to ES and data to ES should reflect this uploaded file location in :
"path": {
"virtual": "temp2.doc",
"real": "temp2.doc"
Regarding kibana i am using Time field: 'file.created' - Should I be using index_created?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.