Federated Search With ES

Hi ,

I am trying to design Federated search with Elasticsearch and kind of struggling with how to design it the right way . The following is what my use case is and any help or direction is highly appreciated .

Use case : I have a system "A" that contain descriptive metadata about the video and system B that contain the actual video and technical metadata about it . They both have a use case to search against their system and there is a use to search case to build a new system to search against both at the same time . The relation between them is a unique identifier that connects both of them .

My question here is should I put the data from both system in different index and then use a service that give me aggregated result at run time or should I put the data from both in same index (ie aggregate them at index time ) and search against already aggregated data . Just an FYI even though the use case I shared is just 2 system but what I am really talking about is 10 different system containing different metadata about the video and all connected through the unique identifier .

In other term I am trying to build an enterprise level search to search across all system available in enterprise and provide an aggregated view . How can I achieve this with ElasticSearch ?

Thanks in advance for any help or direction .

Different index or same index depends on the data model you want to use. If you design your own model you are free to decide. If you must use the data model from the source systems, you are bound to use different indices (unless the source systems have a common model).

Whether already aggregated data to use or not depends on your skills to write extra code for performing ETL before using Elasticsearch. The acronym "ETL" means extract-transform-load and is popular in data processing.. So, extract data from system "A", system "B", etc., transform it to your requirements, and load the JSON data into Elasticsearch. If you are lucky, Logstash or Elasticsearch plugins or ingest nodes/filters can help you.

Thanks for that feedback . Every system has their own data model and I am okay to write some code to do ETL . Is that a good practice to do that ? The question is whether to do it at index time or run time ? What is the ideal way to achieve this ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.