Upgrades are SO FRAGILE

Every single time I perform an upgrade there is a long host of activities that needs to be done.

The more addons in the system the more fragile it gets. Resulting in extended down times. Which also usually means lost event data from my monitored nodes.

For example: The upgrade you binaries to 6.2.3 breaks because plugins are not upgraded. Why? This should be an automatic action of the package installation process. Since your plugins are borked you are denied access to well everything. Nothing works. So you have to hunt them down one at a time until you get to the point where elasticsearch may actual start up. Only to find the indices are not upgraded. Why not? Should there no be a single app in ./bin/elasticsearch-upgrade Something that simply goes through and asks to upgrade plugins and indices.

This can be a very time consuming activity. When in reality it should not be.

Elastic Cloud Enterprise (ECE) is coming with a ton of things like this.

We can also host it for you on cloud.elastic.co.

With that, you can upgrade elasticsearch, kibana, x-pack and other plugins in one click or one API call (with ECE).

That should really be extended as a general capability of the stack across all platforms if possible.

There is still tons of non-cloud out there.

I've spent considerable effort in coding up things like Ansible to handle these things. Only to realise that automation is utterly pointless for upgrades. As the point release issues keep on breaking the automation. The only way this upgrade process can stabilize is to have the tools delivered with the product.

I've had to adopt across the various platforms the approach of No upgrades at all. Instead use the migrate data to new hosts and build new compute nodes for each service element. To be honest this should not be the case for a x.y.z minor patch. Now in general this is not a bad approach anyway. If you can iterate a new host with services in a minute or so then this is perfectly acceptable and desirable. But if we are looking at 15 minutes per host the process starts to see some flow on emergent behaviours the open up large windows of time based risk.

The issue comes down to trust in the process. I have to go with a repeatable process that works across all my environment targets.

So again I would love to see these ECE tools migrated down into the application packages.

I should add I too am an Evangelist. I love where this product suite is going.

I come from an APM diagnostic background. I'm a certified Splunk Architect as well as certified in many other monitoring and event management tools. For various reasons I lean towards elastic where possible.

I'm currently very interested in the possibilities of opentrace and elastic together. Very early days however.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.