Morpheus Agent install without performing background SSH connection attempts during provisioning
When creating (Linux) VMs in a cloud, Morpheus tries several ways to connect with this newly created VM. As soon as the VM is being provisioned we can see Morpheus trying to figure out the best route to the IP address(es) of the newly created VM and tries to initiate SSH connections to the VM. Once the Morpheus agent reports that it’s running on the VM, Morpheus can use that avenue to run commands on the VM as well.
Problem that we are seeing in for example AWS GovCloud is that this SSH process is rather chatty in the Morpheus logs and that the eventual timeout after 5 tries are spewing lots of error lines into the log. Usually the final timeout/retry happens within a few minutes before the VM is actually ready to accept SSH connections. So all the work the Morpheus server performed was fruitless while our log monitoring solutions need rules setup to ignore these messages.
It seems that the SSH timeouts also feed into instance/server failed/stopped status at times, which in turn causes the first provisioning or first post-provisioning task from the provisioning workflow to go into a failed status too.
In a controlled environment where Morpheus agents are mandated to be installed on the Endpoints, it would be nice to be able to turn off the SSH attempts and simply wait for the server to download the agent (via the cloud-init injected curl command) and make the finalize server call via curl. Some of our customers frown upon these SSH connections originating from our Morpheus server due to highly controlled privileged access to the SSH service.
Suggestion is to instead of having only a check-mark for ‘skip agent install’ for it to be a radio button list instead with the following options:
- Agent install via cloud-init only
- Cloud-init agent install with SSH based agent install as fall-back
- SSH based agent install only
Also the various timeouts should be configurable:
- A “Wait for cloud-init Agent install timeout”
- A “SSH connection retry count”
- A “Period between SSH connection tries”
Lastly, the logging of the SSH connection attempts should be sanitized. Less chatty and at minimum no Java stack traces.
For our use-case we’d like to be able to setup ‘cloud-init agent install only’ and have a predetermined timeout of 15 minutes. This avoids all the SSH logging and avoids network monitoring tools to detecting rogue SSH connections within the network. If the endpoint doesn’t show a working agent after 15 minutes and/;or didn’t perform the finalize server curl API call, the server/instance should show as failed.