Document Preview in elasticsearch

Hi There,
I am new to Elasticsearch. I do have 10 years of experience in SharePoint search. I wanted to know if there is any OOB feature availabe to preview documents those appear in search results in stead of really opening them.

I search for APIs but seems it is not there. Anyone aware of this? Or knows if we have to build something custom what we need to start with.

Preview is an important feature of our existing search application with features to highlight searched keyword and its synonyms in the preview pane.

Any pointers is highly appreciated.

Best
Sagar

Welcome!

I moved your question to #enterprise-search:workplace-search as I think there are some features available in Workplace Search for this.

Thanks, @dadoonet!

Hi, @sagar.pattnayak! Elastic Workplace Search comes with a SharePoint Online connector, out-of-the-box. You can read the documentation for it here: https://www.elastic.co/guide/en/workplace-search/current/workplace-search-sharepoint-online-connector.html

When you first connect, almost immediately you'll start seeing entries appear in our search UI that do not have preview/sample text. Not to worry! A background process is running that will (more slowly) extract the textual content for your files and make it visible in the search UI. You'll get the text highlighting and sample text that you're used to from other search experiences. How long this takes to appear depends on your data size, but it should be in the range of minutes-hours.

Hopefully that helps!

Hi @Sean_Story
Thanks for your response. However our requirement is a bit different. SharePoint is just one of the source system for our search application. We do have documents from other sources like (VeevaVault, OpenText Documentum, FileShare) with different file formats like pdf, doc/x, xls/x, ppt/x and the customer wants preview for all.

Also we are using Elastic just as index. We have our own UI component where we need to integrate preview for all sources and all above file types.

You can use Workplace Search API to run your search so you can integrate it easily in your own application. See:

But, if you don't want to use workplace search and its content sources (see below)

Then you need to do "manually" everything that workplace search does out of the box:

  • Read from the source (VeevaVault, OpenText Documentum, FileShare, ...)
  • Extract the text and metadata information (See below)
  • Generate a preview
  • Index all that information in elasticsearch

To extract the text and metadata, you can use the ingest attachment plugin.

There an example here: https://www.elastic.co/guide/en/elasticsearch/plugins/current/using-ingest-attachment.html

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my_index/_doc/my_id

The data field is basically the BASE64 representation of your binary file.

You can use FSCrawler. There's a tutorial to help you getting started.

Thanks @dadoonet
I will try this and see.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.