Hello community.I have implemented rollover feature of elastic search. It will create a new index every day. Below is my question
Is it possible to update the document of old indexes ? I have read somewhere that old indexes are set read-only . Can I prevent that because I want to update the document in both current-index and old-index .
@Christian_Dahlqvist Yes I assigned a unique ID that I can use. But suppose I have 100 old indexes. Then Do I need to fire a update query 100 times? Like try to update document in 1st index, if document is missing then try to update document in 2nd index and so on upto 100th index. Is there any way that I fire only one query to update document which is in any one index of that 100 indices.
It does not necessarily sound like your data naturally fits with time-based indices. Do the documents have a retention period? What type of updates are you performing? What type of data is it?
@Christian_Dahlqvist Below is my document structure. Here id is unique.
{
"id": 1
"name":"John",
"company": "TCS"
}
I have set max_docs: 2 in rollover, so It will create a new index after every 2 documents inserted. Suppose I have added 11 documents then It created 6 indexes. Now I need to update document no: 3(id: 3), which is on the 2nd index. So If I use index alias to update document no: 3 then It will give the document missing error because current index is 6th index which contains only one document(ID: 11). I used update_by_query and provided index pattern in Elasticsearch URL, so it will try to update all indexes which matches the index pattern. Is it Right approach or is there any alternative?
If you need to perform lots of updated efficiently it might be better to use a single index and avoid rollover. If you can tell us more about the data and use case we may be able to help.
@Christian_Dahlqvist Here is the my detailed requirements:
I am making a chat application. Please refer below 3 terms
bot: one who replies automatically when someone sends a message to bot
visitor: one who can chat with a bot.
conservation: chat between visitor and bot is called a conversation
Relationship: A visitor can have multiple conversations So the mapping between visitor and conversation is one to many.
I have below 3 approaches for index structure
I create only one index conversation. And keep a visitor object in each conversation Advantages: Easy to search based on visitor, conversation properties Disadvantages: Suppose one visitor have 5 conversations. Now visitor updates his property then I need to updates that all 5 conversation documents. I am using rollover index feature of elastic search. So if my document is in old indices then need to update document using index pattern because i don't know in which index my document is. Using index pattern instead of index name is more time consuming.
I create only one index visitor. And keep a list of conversation object in each visitor Advantages: Easy to search based on visitor, conversation properties Disadvantage: I need pagination on total conversations which is not possible in this approach.
I create 2 indexes visitor and conversation. I implement rollover in both indexes Advantages: It don't need to update conversation when visitor property is updated because I have separate index for visitor object and I am not keeping visitor object in conversation object Disadvantage: Same disadvantage as in the 1st approach, suppose I update visitor property and as I am using rollover, if visitor is in old index then need to update it using the index pattern instead of index name so it will consume more time. Second disadvantage is that suppose I want to search on both conversation and visitor property then need to invoke 2 ES queries for both the index which will take more time.
Please suggest a better approach that how do I keep my index structure so It can fulfil below requirements
I can search on both properties visitor and conversation
I can do pagination on total conversions
I can update document using index name instead of index pattern
Given that a single shard can hold over a billion documents I do not see why you would need to use time based indices, which are better suited for immutable data with limited retention period. Instead create a single index with 3 or 4 primary shards and that would probably last you a long time. If the shards get too large you can always use the split index API, although you probably would like to set up an alias linked to the index so you can reindex and switch this easily without affecting the application.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.