Workflow Run always / catch all

Current situation:

Currently in Morpheus 7.0.4 provisioning workflows have 13 stages: Automation — Morpheus Docs documentation

However if for example the automation breaks premature in the Provision stage it is hard to take automated action on these errors. Actions such as calling an API or running a script can (as far as we know at this moment) only be triggered by running scheduled jobs which reads status by API calls (or similar actions).

As far as we known at this point, this cannot be solved with:

  • task “continue on error”
  • task “retryable”
  • nested workflows (how to get the right error message from failed operational workflows)
  • workflow “retryable”

Idea 1:

Add a stage “Always” to provisioning workflows that always runs after other stages when that workflow is executed.

  • From this stage “Always” it should be (made) easy to get the status from the stage(s) that have been executed.
  • The regular adding of automation tasks just like the other stages should be available to run whatever automation is needed.
  • Running the teardown stage should also be an option.

Idea 2:

(This is probably a less complex, but less flexible solution)

Add a task “Always” that can be configured in each stage within a provisioning workflow.

  • From this task “Always” it should be (made) easy to get the status from the tasks that that have been executed in the current stage.
  • The task configuration should be just like the regular adding of automation tasks to stages to run whatever automation is needed.
  • Running the teardown stage should also be an option.

Idea 3:

The same approach as idea 2 for operational workflows.

Use Cases provisional workflows:

Be able to have a single location in Morpheus to:

  • create specific logging when previous automation tasks are in a state in wich they cannot log (hung or premature ended scripts, etc)
  • define (email) alert messages
  • create API calls to a requesting ITSM system to report failed status details
  • create automation to immediately remove failed provisioned parts
    • to prevent security risks when broken/unfinished/unpatched VM’s are immediately removed before any exploit can be used
  • triggering the teardown stage