Recently I've experienced a huge performance drop and disk load. I've checked what's running in the background and I saw multiple tasks with "action": "indices:data/read/scroll"
Here is one of 50 examples:
"z6S1B-p3S2W7UIgsrGpxdA:167974159": {
"node": "z6S1B-p3S2W7UIgsrGpxdA",
"id": 167974159,
"type": "transport",
"action": "indices:data/read/scroll",
"description": "scrollId[DnF1ZXJ5VGhlbkZldGNoKAAAAAADpOzlFjJGanBQSWo5UzdHRGo2eGhlajdtM2cAAAAAA544FhZ2RE00U0FWX1FmNnpvT3VoTUlFeHFnAAAAAAOk7OYWMkZqcFBJajlTN0dEajZ4aGVqN20zZwAAAAADnjgUFnZETTRTQVZfUWY2em9PdWhNSUV4cWcAAAAAA544ERZ2RE00U0FWX1FmNnpvT3VoTUlFeHFnAAAAAAOk7OQWMkZqcFBJajlTN0dEajZ4aGVqN20zZwAAAAADnjgTFnZETTRTQVZfUWY2em9PdWhNSUV4cWcAAAAAA6Ts4xYyRmpwUElqOVM3R0RqNnhoZWo3bTNnAAAAAANAK4UWNnluRXFSOVJRdzZyNTVlZUYwTG0yQQAAAAADQCuDFjZ5bkVxUjlSUXc2cjU1ZWVGMExtMkEAAAAAA0ArhBY2eW5FcVI5UlF3NnI1NWVlRjBMbTJBAAAAAATiQgMWejZTMUItcDNTMlc3VUlnc3JHcHhkQQAAAAADnjgQFnZETTRTQVZfUWY2em9PdWhNSUV4cWcAAAAAA544FRZ2RE00U0FWX1FmNnpvT3VoTUlFeHFnAAAAAAOeOBgWdkRNNFNBVl9RZjZ6b091aE1JRXhxZwAAAAAE4kIEFno2UzFCLXAzUzJXN1VJZ3NyR3B4ZEEAAAAAA544EhZ2RE00U0FWX1FmNnpvT3VoTUlFeHFnAAAAAAOeOBcWdkRNNFNBVl9RZjZ6b091aE1JRXhxZwAAAAAE4kIFFno2UzFCLXAzUzJXN1VJZ3NyR3B4ZEEAAAAABOJCBhZ6NlMxQi1wM1MyVzdVSWdzckdweGRBAAAAAATiQgcWejZTMUItcDNTMlc3VUlnc3JHcHhkQQAAAAADnjgZFnZETTRTQVZfUWY2em9PdWhNSUV4cWcAAAAABOJCCBZ6NlMxQi1wM1MyVzdVSWdzckdweGRBAAAAAATiQgkWejZTMUItcDNTMlc3VUlnc3JHcHhkQQAAAAADQCuGFjZ5bkVxUjlSUXc2cjU1ZWVGMExtMkEAAAAAA0AriBY2eW5FcVI5UlF3NnI1NWVlRjBMbTJBAAAAAANAK4cWNnluRXFSOVJRdzZyNTVlZUYwTG0yQQAAAAADpOznFjJGanBQSWo5UzdHRGo2eGhlajdtM2cAAAAAA6Ts6BYyRmpwUElqOVM3R0RqNnhoZWo3bTNnAAAAAANAK4kWNnluRXFSOVJRdzZyNTVlZUYwTG0yQQAAAAADQCuKFjZ5bkVxUjlSUXc2cjU1ZWVGMExtMkEAAAAAA6Ts6RYyRmpwUElqOVM3R0RqNnhoZWo3bTNnAAAAAANAK4sWNnluRXFSOVJRdzZyNTVlZUYwTG0yQQAAAAADpOzqFjJGanBQSWo5UzdHRGo2eGhlajdtM2cAAAAABOJCChZ6NlMxQi1wM1MyVzdVSWdzckdweGRBAAAAAAOk7OsWMkZqcFBJajlTN0dEajZ4aGVqN20zZwAAAAAE4kILFno2UzFCLXAzUzJXN1VJZ3NyR3B4ZEEAAAAAA0ArjBY2eW5FcVI5UlF3NnI1NWVlRjBMbTJBAAAAAANAK40WNnluRXFSOVJRdzZyNTVlZUYwTG0yQQAAAAAE4kIMFno2UzFCLXAzUzJXN1VJZ3NyR3B4ZEE=], scroll[Scroll{keepAlive=1m}]",
"start_time_in_millis": 1544695879485,
"running_time_in_nanos": 54295644307,
"cancellable": true
}
Does anybody have an idea what could have cause it? Whas this a search or some internal transporting mechanism?
someone/something is triggering a so called scroll search - those are potentially long running operations that prevent deletion of files and also keep file handles open, so you should use them with care. A reindex operation could be triggered within elasticsearch, but any app accessing Elasticsearch could use those as well.
--Alex
Hi Alex,
I was monitoring all traffic that could generate those queries and there weren't any jobs that could generate this kind of tasks - our application without efficient ES was virtually unusable.
But there was a lot of "scroll" queries sent to ES before (and for some time during) it got stuck. Could it be that those jobs are piling up and running for hours after the initial request? As you can see scroll timeout was set to 1 minute, and all requests were finished/killed within this time. Problem was, that those tasks were appearing up to 4 hours after the source of those queries was blocked.
You've also said that it could be internal reindex. Could you provide some more info? When it's starting automatically? Is there a way to temporarily stop that? How to identify if it was triggered within ES?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.