Full Content Search within documents in S3


(anonymous) #1

I have documents stored in S3 and am using ElasticSearch for indexing. I need to do a full text search within the contents of the files in S3 using NodeJs. Can you give me an example which includes the scenario I have

Please note: I am not sure how to add the mapper plugin in Nodejs


(David Pilato) #2

There used to be a S3 river https://github.com/lbroudoux/es-amazon-s3-river but rivers are gone now.

This is may be something I can support in FSCrawler project: https://github.com/dadoonet/fscrawler

If you want you can open an issue there and I'll see if I can make it happen at some point. No ETA.

BTW: prefer using ingest-attachment as mapper attachment is now deprecated.


(David Pilato) #4

You need to do 3 things:

  • read data from S3
  • send to elasticsearch
  • extract text
  1. it's up to you. I can't tell how you read data from S3 using NodeJS. That's outside of the scope of this forum
  2. there is a JS client for elasticsearch. Should be easy to use
  3. you can register an ingest pipeline for that. Read elasticsearch guide. It's not specific to NodeJS. That's a simple REST API call

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.