Ideas to Normalize and join

surphix · December 14, 2020, 7:32am

Hello everyone,

I am working on a project to integrate two existing indexes (30-50M records each) into one by joining across a common field. Unfortunately as I understand Elasticsearch does not support joins like SQL. I have tried looking into parent-child joins with an alias, but it does not seem to fit the use case I am after as the data is already existing in the index (Maybe I don't understand enough about joins - would appreciate input if I am incorrect).

I would love to hear from the community to understand if someone has accomplished a method of getting around this, such as extract chunks of data from Elasticsearch, perform data processing, and reinserting into Elasticsearch. I have tried creating an API, but due to the number of requests and data processing needed, it was too much for the system to handle and response times became unbearable.

Would it make sense to pull the data into Hadoop HDFS and use something like Hive or Spark to perform the data processing I require to push into Elasticsearch using elastic-hadoop?

Thanks.

Christian_Dahlqvist · December 14, 2020, 7:58am

The best approach is often to denormalize and store the parent data with each child rather than try to mimic relational concepts using parent-child or nested documents. Assuming the parents data is not frequently updated this takes up a bit more space but offers simpler and often faster queries. This naturally requires reindexing.

surphix · December 14, 2020, 2:08pm

The issue with this is that data is coming from different sources at different times and indexes are being updated as such quite frequently. There's a need to keep the indexes separated as well as one to be built together for analytical purposes.

system · January 11, 2021, 2:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Are there any proper alternatives to "Join" queries in ElasticSearch? Elasticsearch	1	651	March 31, 2020
Join in es Elasticsearch	6	507	May 30, 2017
Combining two data sources into one or making a parent and child relation with two different sources Elasticsearch	3	230	August 18, 2021
How to join two index Elasticsearch	6	19729	July 5, 2017
Can we use SQL Join in elastic query? Elasticsearch	29	1062	September 27, 2019

Ideas to Normalize and join

Related topics