Join queries or filter queries across indices

H-Soni · January 30, 2023, 10:31pm

Hi,
We are thinking of a document model, where we have data in multiple indices. However, when querying that data, we have to perform a join operation on these indices or to be more precise, filter an index's data using another data in another index. Is there a way to do this?

Eg::

Ticket Index-
[
	{
		"id": 1,
		"createdAt": "123456789",
		"updatedAt": "123456790"
	}
]

Customer Index-
[
	{
		"id": "53523",
		"name": "Mike",
		"email": "mike@gmail.com"
	}
]

Conversation Index-
[
	{
		"id": "541",
		"ticketId": "1",
		"data": "Hello"
	},
	{
		"id": "542",
		"ticketId": "2",
		"data": "Hello John"
	}
]

Q1. Now run queries on any of these indexes, and filter the result based on other indices.
Q2. I was thinking of creating a single index, wherein I store the documents related to a particular ticket, in the same index as an array.
Like, a ticket, having an array of conversations and an array of customers.
However, I am not really sure whether this is a correct idea or not.
Since, in this case, for updating the nested documents, I have to use a script every time we index or update nested documents, and our cluster is index heavy. To remove duplicates I am using a HashSet, but this indexing might be very very slow.
In ES 5.6.11, we were allowed, to have multiple types in the same index, as a result, we were able to perform Join queries, in the same index, which is not possible now.
Any workaround?

warkolm · January 31, 2023, 4:27am

Welcome to our community!

You cannot do joins in Elasticsearch, so your comment about working in 5.X is only because multiple types in an index were a hacky approach that we have since removed.

Personally I would change your approach and deal with this as time series data. Each ticket can have multiple records based on date, and each record contains all the info of the customer and then the relevant conversation data.

H-Soni · January 31, 2023, 5:27am

We cant convert it to time series data. Since we allow updates in any rolling window of 3 months. However, in time series data, the complete index will roll over, not allowing us to update the data in a rolling window of 3 months or even the tickets created in the past hour.

Also, are you referring to the conversation as an array of nested documents or separate documents, which will have 1-1 mapping with ticket data?

warkolm · January 31, 2023, 5:30am

Why update the records though? Just have a new record for whatever the change of state is. Then you can grab the latest record but still have access to previous states.

H-Soni · January 31, 2023, 5:33am

When we update, we don't have access to all the fields, but just the ones that are changed. Are you suggesting that we fetch the old document, combine its fields and the new one and save it as a new document or just save the changed fields as a new document, and while fetching, combine the old and new data?

warkolm · January 31, 2023, 5:37am

If that's the best way to do this approach, then yep.

I appreciate that there's a cost you need to incur there to make the changes, but there's also the cost to do all the code to join everything when you need it and it's a relatively complicated design approach to maintain.
So which is the better cost for you?

H-Soni · January 31, 2023, 6:03am

@warkolm Got it. But there will still be a problem with filtering data across indices, which is a major use case. Will talk with the team, and if any questions will revert here. Thank you.

warkolm · January 31, 2023, 6:03am

What is the problem you see?

H-Soni · January 31, 2023, 6:07am

So we have a dashboard, where you can search across all _types with any filter, there are about 50 or so filters. Earlier, in ES 5.6.11 since we had that hack of having multiple types in the same index, it was possible to filter data across types in the same index. We have 4 types and they all have different mappings. So it was possible to have a parent-child relationship earlier, between them and query and filter the results. Clearly, we'll have to restructure our data, since this isn't possible now.

warkolm · January 31, 2023, 6:09am

You can still filter on different fields in the merged index though?

H-Soni · January 31, 2023, 6:14am

Yes, that's what we are thinking of doing, like having a single merged document of a ticket, which will contain an array of customer field documents and an array of conversation documents. However, adding to the array without facing the problem of duplicates, requires me to write a painless script, which is needed to add elements to an array and will also remove the problem of duplicity.

However, since our cluster is index heavy, this will take a lot more time than it takes now in ES 5.6.

warkolm · January 31, 2023, 6:18am

That's my point

Don't do updates cause they are intensive. Do individual, write-once records that are super light, that you can then merge with aggregations or just read singularly and super fast.

H-Soni · January 31, 2023, 6:19am

Got it . Thanks for ur help.

system · February 28, 2023, 6:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Joins across heterogenous documents in an index Elasticsearch	2	277	March 22, 2023
Doubt Related using multiple indexes or Join Elasticsearch	2	369	June 21, 2019
Simple sql type query in multiple index Elasticsearch	4	583	December 24, 2017
Search across multiple indices for the same field Elasticsearch	9	1755	December 4, 2019
Join Possibilities for Nested / Parent-Child Elasticsearch	12	935	July 5, 2017

Join queries or filter queries across indices

Related topics