We've deployed a 2 AZ ECE enclave (for lack of a better term). The first host has all the roles enabled on it by default. We've added other hot/warm allocators in each AZ, and we're trying to move the running nodes/instances off the first allocator as in:
Some of the instances failed to move with a red X on them now. Is there an additional step (change move settings?) to get all instances to move off the first host so the allocator role can be removed? If we move the admin console instance, does the IP of the cloud UI change to the new allocator?
Edit:
The error message from the deployment activity (on admin console deployment) is:
There was a problem applying this configuration change
[validate-plan-prerequisites]: [no.found.constructor.validation.ValidationException: 1. Can't apply a move_only plan with topology / setting changes. Actions: [settings]]
Oh umm I know this one .... there's a buglet specific to system owned clusters (they get generated using a legacy command which we recently realized wasn't fully compatible with the current move nodes command)
To fix (you only need to do this once per system owned cluster), just perform a "no-op" on each cluster, ie edit > save - then the move should work
That fixed it. I was able to move the instances, edit the runner roles, and remove the allocator role. It does however show up under allocators as a disconnected allocator. Should we have deleted the allocator before removing the role?
Odd I didn't think you had to but maybe I mis-remembered. Last I looked, empty disconnected allocators disappeared
Assuming it didn't already just disappear (it might take a few mins to be reflected in the UI), then try adding the allocator role back, deleting it, and then removing it again?
Is there a "cycling" of systems and readjustment that happens to servers within the ECE over a period of time? Like a review of the environment and comparison of the deployments within the application? Just a question -
Re-Cycling: I gave the ECE enclave a timeout overnight to think about what it had done and everything was magically better in the morning. The ghost allocator was gone, and deployments were working (they were failing yesterday). I think it needs some time to do garbage collection or something after you make any infrastructure changes.
and deployments were working (they were failing yesterday).
What issues with deployments were you having yesterday? Certainly removing an allocator would not under normal circumstances cause any deployment issues
The ghost allocator was gone,
Normally you would see a change take within 5 mins ... if that gets missed for any reason (normally that occurs if there is system instability of some description, eg if you made the change while simultaneously making interruptive changes to the system clusters) then it might take a few hours for the system to notice and reapply the update (which sounds like what happened here - especially if you were having issues with deployments being created)
The deployments failing were just giving a generic "Internal Server Error" in a red shaded box below the create deployment button. It was at the end of the day so I didn't do anymore digging. In the morning the disconnected allocator was gone and deployments worked.
The only change we made as to move instances off the first host and remove its allocator role. The ghost allocator was still showing up for 30 to 60 min before leaving for the day.
Its working now and time to move on to more fun things like load balancers, wildcard DNS, and doing things like SSO in a containerized world.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.