Is there a simple way to do full text search on a bunch of files in various formats (e.g. PDF, EPUB, TXT, DOC) - on locally connected storage? I see the connectors for the cloud services, I just want to do local search. Thanks!
You can use the ingest attachment plugin.
There an example here: https://www.elastic.co/guide/en/elasticsearch/plugins/current/using-ingest-attachment.html
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
"data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my_index/_doc/my_id
The data
field is basically the BASE64 representation of your binary file.
You can use FSCrawler. There's a tutorial to help you getting started.
FYI there's no way to currently do this with Enterprise Search, just traditional Elasticsearch and something like David's suggestion (which is why I moved it ).
Actually there's a pending PR to connect FSCrawler to Workplace Search. Which is what I'd have answered if I knew the original question in #enterprise-search:workplace-search
I wrote about it in the advent calendar: Dec 5th, 2020: [EN] Searching anything, anywhere with Workplace Search
Ahh, thanks mate
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.