FS crawler should be used in the same server where elasticsearch is installed


(Edu) #1

Dear All,

I have started to work with elasticsearch last week. I apologize if I am asking something too obvious. I have to develop a fast solution for users to search inside MS-Office files.

I have two docker containers, one with the Django backend, which can access the user's files, and another container running elasticsearch-6, with host=elasticsearch:9200, where I have installed the "ingest-attachment" plugging.

I am using the elasticsearch_dsl and django_elasticsearch_dsl on the django server, however I could not find a clear explanation of how could I upload the file to the elasticsearch-6 (and I stress the SIX) server for indexing (only indexing the content... not to store it). The explanations that I found employed this 'Attachment" , that works only for elasticsearch 5.

Then, I came to find this nice project FS crawler with the promise that I could use a single api line like : curl -F "file=@test.txt" -F "id=my-test" "http://127.0.0.1:8080/fscrawler/_upload".

However it is still not clear how it works.
Is FS crawler a plugin for elasticsearch or an stand-alone program just to create indexes?
Should I install FS crawler in the elasticsearch docker container where I want the indexes to live?
Should I install FS crawler in the django docker container where the files are accessible?

I would like to simply use the endpoint in my Django server:
curl -F "file=@test.txt" -F "id=my-test" "http://elasticsearch:8080/fscrawler/_upload"?
and search using the "http://elasticsearch:9200/" by using the elasticsearch_dsl methods. Is it possible?

Any help would be appreciate, I am little lost here...
Best regards
Ed


(David Pilato) #2

It's a stand-alone application which should run within its own container.