From Coveo to Elasticsearch

manideep · February 5, 2021, 7:24pm

Dear Community,

I have a web application built for enterprise search using Coveo. It has lot of sources (CSVs, Excel, Sharepoint, S3, APIs, etc.) indexed using default Coveo Connectors and few custom Connector written in C#.

I wanted to build a POC for the same and be able to demonstrate the Elasticsearch Capabilities with all those sources.

Key Points:

Data is not only incremental but also mutable. Meaning, already indexed data can have changes must be able to update the changes.
I need a one technology to connect and index all those data sources. Be it python Client or Logstash. what is the best possibility. I do not have any logs for now.
One of the main reasons to move to Elasticsearch is due to open source. whereas, Coveo needs License.

BTW, I am Elastic Certified Engineer, But I don't have good experience in font-end Stack. I am looking for advise from both front-end & Back-end perspective.

I really do appreciate your recommendations/ suggestions.

Thank you!

dadoonet · February 5, 2021, 8:12pm

Welcome!

Why not using Elastic Workplace Search?

It's designed for that use case.

manideep · February 5, 2021, 9:03pm

Hi @dadoonet

Thank you for your response.

I'm not sure I can go with that in actual project. I can cover some extent using basic version though.

so, just wanted to go with open source stack.

dadoonet · February 6, 2021, 5:03am

Up to you.

Note that Workplace Search is available in the free version with the built in basic license.

If you want to "re-implement" by yourself all the crawlers in a single tool and the UI, I'm not sure what advice I can give. As a Java developer, I'd probably go to the Java route but it's up to you.

For binary documents (PDF, etc) you can use the ingest attachment plugin.

There an example here: Using the Attachment Processor in a Pipeline | Elasticsearch Plugins and Integrations [7.10] | Elastic

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my_index/_doc/my_id

The data field is basically the BASE64 representation of your binary file.

You can use FSCrawler. There's a tutorial to help you getting started.

system · March 6, 2021, 5:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Workplace search vs Elasticsearch Elastic Search elastic-workplace-search	8	1221	August 26, 2020
Enrich workplace search indices Elastic Search elastic-workplace-search	3	554	October 31, 2022
Workplace Search Reference Custom Index Elastic Search elastic-workplace-search	14	305	February 15, 2024
Index Confluence, GitHub, Stackoverflow Teams Elasticsearch	5	534	April 5, 2020
Use workplace search with data already indexed in elasticsearch as a source Elastic Search elastic-workplace-search	8	988	November 15, 2022

From Coveo to Elasticsearch

Related topics