Automated OS Patching

I use Morpheus heavily in my lab [my prod] environment. That means I spin up large amounts of systems, some of which don’t necessarily get the focus like my daily systems. Since all of this is technically non-prod I decided to create a straightforward Ansible script that will update any OS in my environment.

Originally, this script started as a manually ran task. If I thought about it, or if I had dated packages impeding an install, I would run it. This tactic worked well in small runs, but often times I was left waiting for a large amount of packages to install. (yes, I could get more specific with the packages I was requiring at the time).

To solve this problem [and help keep my environment more secure], I decided to use Morpheus’s orchestration to my advantage.

Automating Patching

The first step was already done. Create a task that handled updating the Operating Systems (OS’s) in my environment.

osUpdate.yml

---
  - name: Update Servers
    hosts: all
    gather_facts: true

    tasks:
      - name: Update RedHat
        yum:
          name: '*'
          state: latest
        when: ansible_os_family == 'RedHat'
        become: true

      - name: Update Debian
        apt:
          name: "*"
          state: latest
          update_cache: true
          force_apt_get: true
          cache_valid_time: 84600
        when: ansible_os_family == 'Debian'
        become: true

      - name: Install all updates and reboot as many times as needed
        ansible.windows.win_updates:
          category_names: '*'
          state: installed
          reboot: true
          log_path: C:\ansible_wu.txt
        when: ansible_os_family == "Windows"

requirements.yml

---
collections:
  - name: ansible.windows

Next, I created several Schedules and Jobs taking advantage of Labels in Morpheus. Jobs are scheduled Tasks or Workflows that allow selectable execution scopes. In this case I choseServer Label as Labels are essentially Morpheus specific tags on entities throughout the platform.

Note: I wanted to avoid using the Instance Label because I have clustered applications with more than 1 Server within the Instance.

Patch Jobs




After configuring the Jobs and setting the appropriate tags on all of the desired Servers, I let my Job execute on schedule, and Eureka!

Execution Results

Now, we know the world of IT isn’t all glam and Exit Code: 0. The first weekend this ran I had an issue with GPG keys I had to resolve. That caused a few failures in the environment that I wanted to bring into light more quickly than signing on Monday morning.

Therefore, I have created a script that runs later in the morning after all patching should be completed. It will report back via email on the success/failures and average run times via the Morpheus API call to job-executions.

To do this, I had to create an Operational Workflow with 2 Tasks. One that gathered the statistics of the last run, another that sent an email to relay this information!

Patch Notes Workflow

patchNotes.py

import requests
from datetime import date, timedelta
from urllib.parse import urlencode
from statistics import mean
from tabulate import tabulate

# Morpheus Variables
morphToken = str(morpheus['morpheus']['apiAccessToken'])
morphUrl = str(morpheus['morpheus']['applianceUrl'])
morphHeaders = {
    "content-type": "application/json",
    "authorization": 'Bearer ' + morphToken
}
morphExecutions = 'api/job-executions'

# Variables
patchGroup = ['Patch Group 1', 'Patch Group 2', 'Patch Group 3']

# Collect data
data_collected = {}
total_failed_executions = 0
total_executions = 0

# Get current date
current_date = date.today().strftime("%Y-%m-%d")

# Script
for group in patchGroup:
    encoded_group = urlencode({'name': group})
    response = requests.put(morphUrl + morphExecutions + "?" + encoded_group, headers=morphHeaders)
    response_json = response.json()
    if 'jobExecutions' in response_json:
        group_data = []
        durations = []
        failed_executions = 0
        executions = 0
        for execution in response_json['jobExecutions']:
            start_date = execution.get('startDate', '')
            if start_date.startswith(str(date.today())):
                total_executions += 1
                executions += 1
                duration_ms = execution.get('duration', 0)
                durations.append(duration_ms)
                duration_str = str(timedelta(milliseconds=duration_ms))[:-3]  # Convert ms to hh:mm:ss
                status = execution.get('process', {}).get('status', '')
                if status == 'failed':
                    total_failed_executions += 1
                    failed_executions += 1
                group_data.append({'Server Name': execution.get('process', {}).get('displayName', ''),
                                   'Server ID': execution.get('process', {}).get('serverId', ''),
                                   'Status': status,
                                   'Duration': duration_str,
                                   'Error': execution.get('error', '')})
        if durations:
            mean_duration = str(timedelta(milliseconds=mean(durations)))[:-3]
            min_duration = str(timedelta(milliseconds=min(durations)))[:-3]
            max_duration = str(timedelta(milliseconds=max(durations)))[:-3]
            data_collected[group] = {'group_data': group_data,
                                     'min_duration': min_duration,
                                     'max_duration': max_duration,
                                     'mean_duration': mean_duration,
                                     'failed_executions': failed_executions,
                                     'executions': executions}

# Generate HTML-formatted table
html_content = f"""<!DOCTYPE html>
<html>
<head>
<style>
table {{
  width: 100%;
  border-collapse: collapse;
}}
th, td {{
  padding: 10px;
  text-align: left;
  border-bottom: 1px solid #ddd;
}}
th {{
  background-color: #f2f2f2;
}}
tr.failed {{
  background-color: #ffe6e6;
}}
.bold {{
  font-weight: bold;
}}
a {{
  text-decoration: none;
  color: #0000EE;
}}
</style>
</head>
<body>
<h1 style="text-align:center;">Nightly Patch Results - {current_date}</h1>"""

for group, data in data_collected.items():
    group_data = data['group_data']
    min_duration = data['min_duration']
    max_duration = data['max_duration']
    mean_duration = data['mean_duration']
    failed_executions = data['failed_executions']
    executions = data['executions']

    html_content += f"<h2>Patch Group: {group}</h2>"
    html_content += "<table>"
    html_content += "<tr>"
    html_content += "<th>Server Name</th><th>Status</th><th>Duration</th><th>Error</th>"
    html_content += "</tr>"
    for row in group_data:
        html_content += "<tr class='failed'>" if row['Status'] == 'failed' else "<tr>"
        server_link = f"<a href='{morphUrl}/infrastructure/servers/{row['Server ID']}'>{row['Server Name']}</a>"
        html_content += f"<td>{server_link}</td><td>{row['Status']}</td><td>{row['Duration']}</td><td>{row['Error']}</td>"
        html_content += "</tr>"
    html_content += "</table>"
    html_content += f"<p class='bold'>Min Duration: {min_duration}, Max Duration: {max_duration}, Mean Duration: {mean_duration}</p>"
    html_content += f"<p class='bold'>Failed Executions: {failed_executions} of {executions}</p>"
    html_content += "<br>"
    html_content += "<br>"
    html_content += f"<h2>Total Failed Executions: {total_failed_executions} of {total_executions}</h2>"

html_content += "</body></html>"

print(html_content)

Patch Notes Email Task

You’ll note that I am chaining the results from the patchNotes.py script to the email task with the use of the code and result type fields. This creates a variable consumable within my Workflow phase of <%=results.html%>. Since the email Task type supports HTML, if I inject only HTML code, I end up with this!

Email Report

Final Notes

There are a few things to highlight.

I have a failure in my patching, with no Error. That field is actually blank in the API call for that system because it was off and no stdout could be captured. We have an upcoming feature that will allow Tasks to either power on a system and then return powerstate, or skip powered down systems. This would’ve prevented that failure.

Additionally, branching logic is coming very soon. That means I could edit this further and only report on failures, or take other actions as the result of a failure. Either way it will be very powerful.

Definitely look forward to other’s feedback or improvements!

7 Likes