Error when restoring snapshots multiple times org.elasticsearch.indices.IndexMissingException


(ddavie) #1

I have been testing restoring elasticsearch snapshots and noticed that when I restore a snapshot a number of times I eventually get an error. After receiving this error I can no longer restore any snapshots. I have tried shutting down the application including elasticsearch but get the same error again after starting the app again.

An observation/theory is that it seems to be that it applies the restore snapshot correctly but afterwards it attempts to open any closed indexes but it can't find any closed ones and then has an issue.

A scenario is
Start the app and take a snapshot
Enter data and take a snapshot
Enter more data and take a snapshot
Restore the latest snapshot a number of times (works okay this time)
Restore the second latest snapshot twice
Result
On the second attempt I get the error below and no matter what snapshot I restore I get the same error. Restarting the app does not clear the problem. There is no log data either.
The only way I can clear the error is refresh my environment. In the real world I wouldn't need to keep restoring snapshots if a snapshot needed to be restored but it would be nice to know why this problem is happening.

  • executing "curl -Ss -XGET 'http://localhost:8778/jmx/exec/com.aconex.bim.elasticsearch:name=elasticSearchSnapshotManager,type=ElasticSearchSnapshotManager/restoreSnapshot/app10.qa1.acx_20140410055415485'"
    servers: ["bimqaappa"]
    [bimqaappa] executing command
    ** [out :: bimqaappa] {"error_type":"org.elasticsearch.indices.IndexMissingException","error":"org.elasticsearch.indices.IndexMissingException : [_all] missing","status":500,"request":{"operation":"restoreSnapshot","mbean":"com.aconex.bim.elasticsearch:name=elasticSearchSnapshotManager,type=ElasticSearchSnapshotManager","arguments":["app10.qa1.acx_20140410055415485"],"type":"exec"},"stacktrace":"org.elasticsearch.indices.IndexMissingException: [_all] missing\n\tat org.elasticsearch.cluster.metadata.MetaData.concreteIndices(MetaData.java:655)\n\tat org.elasticsearch.action.admin.indices.close.TransportCloseIndexAction.masterOperation(TransportCloseIndexAction.java:89)\n\tat org.elasticsearch.action.admin.indices.close.TransportCloseIndexAction.masterOperation(TransportCloseIndexAction.java:42)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeOperationAction$2.run(TransportMasterNodeOperationAction.java:145)\n\tat org.elasticsearch.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.innerExecute(TransportMasterNodeOperationAction.java:141)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.doExecute(TransportMasterNodeOperationAction.java:94)\n\tat org.elasticsearch.action.admin.indices.close.TransportCloseIndexAction.doExecute(TransportCloseIndexAction.java:79)\n\tat org.elasticsearch.action.admin.indices.close.TransportCloseIndexAction.doExecute(TransportCloseIndexAction.java:42)\n\tat org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:89)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:42)\n\tat org.elasticsearch.client.node.NodeIndicesAdminClient.execute(NodeIndicesAdminClient.java:72)\n\tat org.elasticsearch.client.support.AbstractIndicesAdminClient.close(AbstractIndicesAdminClient.java:284)\n\tat org.elasticsearch.action.admin.indices.close.CloseIndexRequestBuilder.doExecute(CloseIndexRequestBuilder.java:65)\n\tat org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)\n\tat org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)\n\tat com.aconex.bim.elasticsearch.ElasticSearchBackupDelegate.closeAllIndices(ElasticSearchBackupDelegate.java:92)\n\tat com.aconex.bim.elasticsearch.ElasticSearchSnapshotManager.restoreSnapshot(ElasticSearchSnapshotManager.java:95)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)\n\tat javax.management.modelmbean.RequiredModelMBean$4.run(RequiredModelMBean.java:1249)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)\n\tat javax.management.modelmbean.RequiredModelMBean.invokeMethod(RequiredModelMBean.java:1243)\n\tat javax.management.modelmbean.RequiredModelMBean.invoke(RequiredModelMBean.java:1081)\n\tat org.springframework.jmx.export.SpringModelMBean.invoke(SpringModelMBean.java:90)\n\tat com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)\n\tat com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)\n\tat org.jolokia.handler.ExecHa
    ** [out :: bimqaappa] ndler.doHandleRequest(ExecHandler.java:98)\n\tat org.jolokia.handler.ExecHandler.doHandleRequest(ExecHandler.java:40)\n\tat org.jolokia.handler.JsonRequestHandler.handleRequest(JsonRequestHandler.java:89)\n\tat org.jolokia.backend.MBeanServerExecutorLocal.handleRequest(MBeanServerExecutorLocal.java:109)\n\tat org.jolokia.backend.MBeanServerHandler.dispatchRequest(MBeanServerHandler.java:102)\n\tat org.jolokia.backend.LocalRequestDispatcher.dispatchRequest(LocalRequestDispatcher.java:98)\n\tat org.jolokia.backend.BackendManager.callRequestDispatcher(BackendManager.java:409)\n\tat org.jolokia.backend.BackendManager.handleRequest(BackendManager.java:158)\n\tat org.jolokia.http.HttpRequestHandler.executeRequest(HttpRequestHandler.java:197)\n\tat org.jolokia.http.HttpRequestHandler.handleGetRequest(HttpRequestHandler.java:86)\n\tat org.jolokia.jvmagent.JolokiaHttpHandler.executeGetRequest(JolokiaHttpHandler.java:213)\n\tat org.jolokia.jvmagent.JolokiaHttpHandler.handle(JolokiaHttpHandler.java:174)\n\tat com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)\n\tat sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)\n\tat com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)\n\tat sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:677)\n\tat com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)\n\tat sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:649)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:744)\n"}

some config settings that may help
elasticsearch.index.refresh=false
elasticsearch.node.data=true
elasticsearch.node.local=false
configuredNumberOfNodes=3
There are three nodes on three different app hosts


(ddavie) #2

We found the answer to the problem.
We added smarts to open and close only available indices when restoring from a snapshot. We were seeing errors when we attempted to open or close indices without taking the current state in to consideration.

It seems that if you close indicies that are already closed prior to restoring the snapshot or open indicies that are already open after restoring the snapshot the error occurs....this condition occurs sometimes. Doing multiple times eventually found the condition to cause the error.


(system) #3