Hi,
We are thinking of a document model, where we have data in multiple indices. However, when querying that data, we have to perform a join operation on these indices or to be more precise, filter an index's data using another data in another index. Is there a way to do this?
Q1. Now run queries on any of these indexes, and filter the result based on other indices.
Q2. I was thinking of creating a single index, wherein I store the documents related to a particular ticket, in the same index as an array.
Like, a ticket, having an array of conversations and an array of customers.
However, I am not really sure whether this is a correct idea or not.
Since, in this case, for updating the nested documents, I have to use a script every time we index or update nested documents, and our cluster is index heavy. To remove duplicates I am using a HashSet, but this indexing might be very very slow.
In ES 5.6.11, we were allowed, to have multiple types in the same index, as a result, we were able to perform Join queries, in the same index, which is not possible now.
Any workaround?
You cannot do joins in Elasticsearch, so your comment about working in 5.X is only because multiple types in an index were a hacky approach that we have since removed.
Personally I would change your approach and deal with this as time series data. Each ticket can have multiple records based on date, and each record contains all the info of the customer and then the relevant conversation data.
We cant convert it to time series data. Since we allow updates in any rolling window of 3 months. However, in time series data, the complete index will roll over, not allowing us to update the data in a rolling window of 3 months or even the tickets created in the past hour.
Also, are you referring to the conversation as an array of nested documents or separate documents, which will have 1-1 mapping with ticket data?
Why update the records though? Just have a new record for whatever the change of state is. Then you can grab the latest record but still have access to previous states.
When we update, we don't have access to all the fields, but just the ones that are changed. Are you suggesting that we fetch the old document, combine its fields and the new one and save it as a new document or just save the changed fields as a new document, and while fetching, combine the old and new data?
If that's the best way to do this approach, then yep.
I appreciate that there's a cost you need to incur there to make the changes, but there's also the cost to do all the code to join everything when you need it and it's a relatively complicated design approach to maintain.
So which is the better cost for you?
@warkolm Got it. But there will still be a problem with filtering data across indices, which is a major use case. Will talk with the team, and if any questions will revert here. Thank you.
So we have a dashboard, where you can search across all _types with any filter, there are about 50 or so filters. Earlier, in ES 5.6.11 since we had that hack of having multiple types in the same index, it was possible to filter data across types in the same index. We have 4 types and they all have different mappings. So it was possible to have a parent-child relationship earlier, between them and query and filter the results. Clearly, we'll have to restructure our data, since this isn't possible now.
Yes, that's what we are thinking of doing, like having a single merged document of a ticket, which will contain an array of customer field documents and an array of conversation documents. However, adding to the array without facing the problem of duplicates, requires me to write a painless script, which is needed to add elements to an array and will also remove the problem of duplicity.
However, since our cluster is index heavy, this will take a lot more time than it takes now in ES 5.6.
Don't do updates cause they are intensive. Do individual, write-once records that are super light, that you can then merge with aggregations or just read singularly and super fast.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.