Sending request to one index, writing to multiple indices

I have a index named index1. I want to configure it such that any write/update request that comes to index1 gets written to both index1 and index2 but any search request still uses index1. Is this possible with some existing settings in Elasticsearch 7.17.10 or do I have to implement it inside the Elasticsearch? If yes, How can I go by implementing this ?

That is something I believe you need to implement outside of Elasticsearch.

Can you suggest some way to achieve this ? I am having hard time finding anything to implement the same.

It is not possible within Elasticsearch, so you may need to create a proxy of some kind to intercept and duplicate requests.

1 Like

Is it possible to implement this inside Elasticsearch maybe by creating a new pipeline processor something like this:

{
            "processors": [
                {
                    "set": {
                        "field": "_index",
                        "value": [index1 , index2]
                    }
                }
            ]
        }

Or will this result in some errors ?

No, I do not think you can do this within Elasticsearch.

May I know why Elasticsearch doesn't allow to do something like this? Will this result in some sort of error conditions ?

I do not know why but suspect it could cause issues where a single index request could be partially successful, which is something that is currently not possible as far as I know. I also guess it could have security implications.

But I am trying to write to an existing index index1 and a new index index2. I don't think this can possibly result in having security implications. I want to know can I add this feature inside Elasticsearch Github codebase. Not saying using any existing feature.

I do not know if this would at all be possible or whether it would break something, so will leave that for others.

To my knowledge, no, there is no specific configuration required to handle your use case. However, you can use a transform to write each entry from Index 1 to Index 2. Please note that this approach does not handle data updates. On the other hand, you can use Logstash to perform such processing and manage additions and updates between the two indexes based on the document_id. This approach is suitable for a moderate volume of source data but may not be efficient for a very large volume of source data. I believe it would be helpful if you could share your specific business use case to propose a solution more effectively.

I want to perform it for moderate volume of data only. While index1 is going under snapshot and restore only till then I want all the data updates or additions to be written to both the indices index1 and index2. How can I do it using logstash ? Or is there any other way to do the same ?

Can you describe this approach a little bit more in detail.

This is not possible by design, it would add a lot of complexity and can cause multiple issues, I don't think a feature request to add this would be considered.

There are performance issue, management issue, security issues and probably a lot more.

What you want to do can be easily done outside Elasticsearch, but you would need to change how you index your data.

If you use Logstash you can have two Elasticsearch outputs, each one pointing to one of the indices and for the search you could use an alias, that would only point to one of the indices.

Can you provide more context on why you would want to do that? It is not clear.

1 Like

I want to segregate the updates and addition that happens during the process while still being able to query in index1 with no data inconsistency.

You can use a transform of type 'latest' to achieve the desired behavior where each document in Index 1 is written to Index 2 after a configurable duration, for example, 60 seconds. However, it is important to configure the unique keys and sort fields correctly to meet your requirements

And how writing to two indices would help with that?

Assume that you are using Logstash to write data to both index_1 and index_2, and want to create a snapshot of index_1, everything written after your snapshot request will be added to both index_1 and index_2, no matter what indice you query it will return the same data.

index_1 initially have huge amount of data which I am transferring to cluster2 using snapshot and restore. While the process is ongoing. I want those updates and writes segregated from the remaining data.

If you are writing into two indices, like index_1 and index_2, any document added to index_1 will also be added to index_2, so you will have new writes in the index_1 that will not be present in the current snapshot.

Same thing with updates, unless you update just one of the indices, but again, this will make your data inconsistent between the indices.

As already explained in your other post, I don't think you can achieve what you want without any downtime

I am not initially writing to both the indices. I am writing to index_1 initially then after running snapshot and restore. I want to write to both index_1 and index_2 to store the updates and addition segregated and stored in index_2 while I am still able to query data from index_1 this will not result in data inconsistency.

If you are performing updates where the existing document is modified instead of overwritten I do not believe this statement is true. Even if you perform updates by overwriting I suspect there would be race conditions where you would have inconsistencies, but it may be less likely.