Searching local files/folders to test elastic search

Is there a simple way to do full text search on a bunch of files in various formats (e.g. PDF, EPUB, TXT, DOC) - on locally connected storage? I see the connectors for the cloud services, I just want to do local search. Thanks!

You can use the ingest attachment plugin.

There an example here: https://www.elastic.co/guide/en/elasticsearch/plugins/current/using-ingest-attachment.html

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my_index/_doc/my_id

The data field is basically the BASE64 representation of your binary file.

You can use FSCrawler. There's a tutorial to help you getting started.

FYI there's no way to currently do this with Enterprise Search, just traditional Elasticsearch and something like David's suggestion (which is why I moved it :slight_smile: ).

Actually there's a pending PR to connect FSCrawler to Workplace Search. Which is what I'd have answered if I knew the original question in #enterprise-search:workplace-search :wink:

I wrote about it in the advent calendar: Dec 5th, 2020: [EN] Searching anything, anywhere with Workplace Search

1 Like

Ahh, thanks mate :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.