Platform Guide

Complete overview of the MOP Automation Platform, how it works, and its development history.

What This App Does

This is a MOP (Method of Procedure) Automation Platform that takes written operational procedures and automates their execution across your Azure infrastructure.

Instead of an engineer manually following a checklist and running commands one-by-one across six different Azure regions, this platform:

1 Renders

Takes MOP procedure documents written as Jinja2 templates and fills in region-specific variables (hostnames, IPs, credentials, etc.) to produce complete, ready-to-execute documentation for each region.

2 Classifies

Each MOP has a category (like "patch-linux", "agent-upgrade", "git-ops") that determines which Ansible playbooks get assigned to it automatically.

3 Executes

Runs the assigned Ansible playbooks across all six Azure regions in a controlled, sequential order with manual approval gates between each region.

4 Tracks

Logs everything with detailed execution tracking, Ansible output capture, performance data, and error reporting across the entire workflow.

Complete Workflow Diagram

End-to-end pipeline from vendor file delivery to execution across all regions:

STAGE 1: INGESTION
Vendor Archive
Upload compressed archive (.tgz/.tar.gz/.gz) containing J2 MOP template sets
Extract & Validate
Extract J2 templates, validate syntax, parse metadata, detect categories
Version Directory
Organized into mops/{version}/ with manifest
STAGE 2: TWO-PASS RENDERING
Pass 1: Pre-Render
Pattern-match prerender map: match MOP filenames (glob) + search text, insert text blocks into J2 templates
Pass 2: Regional Render
Apply region-specific variables: hostnames, IPs, pipeline IDs, credentials
Category Mapping
Map each MOP's category to specific Ansible playbook sequence
STAGE 3: SEQUENTIAL EXECUTION
Schedule MOP Set
Group MOPs into a set, schedule execution window
Ansible Execution
Run playbooks via local ansible-playbook CLI, capture all output
Region-by-Region
eus2 → wus2 → wus3 → scus → eus2lea → wus2lea with approval gates
STAGE 4: MONITORING & ARCHIVAL
Log & Monitor
Capture STDOUT, STDERR, return codes, timing, JSON callbacks
Results Dashboard
View execution status, performance analysis, error tracking
Archive
Archive completed sets with all configs, logs, and rendered docs preserved

Ansible Architecture: Connectivity, Access & Placement

Key Point: There is no separate Ansible server. Ansible runs as a command-line tool directly on the same machine as this web application. This server is expected to live inside the Azure network with direct access to regional resources.
Where Does This Server Live?

This MOP automation server is designed to run inside the Azure environment as a management VM (or container) that has network connectivity to all six regional Azure DevOps organizations and their associated infrastructure.

Expected Network Placement
Option A: Hub VNet Management VM (Recommended)
  • Deployed as a VM in a central hub VNet (e.g., a shared-services or management subscription)
  • Hub VNet is peered to all six regional spoke VNets via Azure VNet Peering
  • NSG rules allow outbound SSH (port 22) from this VM to target hosts in each region
  • NSG rules allow outbound HTTPS (port 443) to Azure DevOps APIs and Azure Resource Manager
  • This VM sits in a dedicated management subnet with restricted inbound access
Option B: Azure Bastion / Jump Box Model
  • Server runs behind an Azure Bastion or in a jump box subnet
  • VPN or ExpressRoute provides connectivity from on-prem to Azure if needed
  • Private endpoints used for Azure DevOps and Git repos where available
  • Suitable for organizations with stricter network segmentation requirements
Option C: On-Premises with VPN
  • Server runs on-prem and connects to Azure via Site-to-Site VPN or ExpressRoute
  • Requires VPN gateway in each regional VNet or hub-and-spoke routing
  • Higher latency but keeps the management server outside Azure
Network Connectivity Diagram
# ============================================
# MOP Automation Server - Network Architecture
# ============================================
┌─────────────────────────────┐
│ MOP Automation Server │
│ (Hub VNet / Mgmt Subnet) │
│ │
│ Flask App + Ansible CLI │
│ SSH Keys + PAT Tokens │
│ Azure CLI + Git Client │
└──────────┬───────────────────┘
┌──────────────┼──────────────┐
│ │ │
┌────────▼──┐ ┌───────▼───┐ ┌──────▼─────┐
│ SSH :22 │ │ HTTPS:443 │ │ HTTPS:443 │
│ Target │ │ ADO REST │ │ Git Repos │
│ VMs │ │ API │ │ (ADO Git) │
└───────────┘ └───────────┘ └────────────┘
6 Regions 6 ADO Orgs 6 ADO Repos
via VNet Peer via PAT Token via PAT Token
Three Connection Types Used by Ansible

This platform uses three distinct connection mechanisms to interact with Azure resources. Each uses different protocols and credentials:

1. SSH to Target VMs
For: patch-linux, agent-upgrade, command execution

What it does: Connects directly to Linux VMs in each region to run shell commands, install packages, restart services, apply patches.

Protocol: SSH (port 22)

Authentication:

  • SSH key pair (private key stored on this server)
  • Connects as a service account user (e.g., azureuser or ansible-svc)
  • Uses sudo for privileged operations (become/escalation)

Network path:

  • Hub VNet → VNet Peering → Regional Spoke VNet → Target VM (port 22)
  • NSG must allow SSH from management subnet

Ansible modules used:

shell, command, yum/apt, service, copy, template, lineinfile
2. Azure DevOps REST API
For: pipeline-only, run_manual_pipeline

What it does: Triggers Azure DevOps pipelines in each regional ADO organization via REST API calls. Monitors pipeline runs and retrieves results.

Protocol: HTTPS (port 443)

Authentication:

  • Personal Access Token (PAT) per ADO organization
  • Each region has its own ADO org, so each needs its own PAT
  • PATs stored in Ansible Vault (encrypted at rest)

Network path:

  • Outbound HTTPS to dev.azure.com
  • No VNet peering needed (public API endpoint)
  • Can use Azure Private Link for ADO if required

Ansible modules used:

uri (REST calls), azure.azcollection modules
POST https://dev.azure.com/{org}/{project}/_apis/pipelines/{id}/runs?api-version=7.0
3. Git Repository Operations
For: git-ops, edit_yaml, commit_to_git

What it does: Clones ADO Git repos, edits YAML configuration files (e.g., Helm values, Kubernetes manifests), commits changes, and pushes back to trigger CI/CD pipelines.

Protocol: HTTPS (port 443)

Authentication:

  • PAT token embedded in Git remote URL
  • Format: https://PAT@dev.azure.com/{org}/{project}/_git/{repo}
  • Or via Git credential helper configured with PAT

Network path:

  • Outbound HTTPS to dev.azure.com
  • Same PAT tokens as pipeline API calls

Ansible modules used:

git, template, lineinfile, shell (for git commit/push)
Required Access Levels & Credentials
Resource Credential Type Required Permissions Scope Storage
Target Linux VMs SSH Key Pair SSH login + sudo privileges (passwordless sudo for automation) Per-region inventory groups ~/.ssh/azure_rsa on this server
ADO Pipelines (eus2) PAT Token Build: Read & Execute, Release: Read & Execute eus2 ADO organization Ansible Vault
ADO Pipelines (wus2) PAT Token Build: Read & Execute, Release: Read & Execute wus2 ADO organization Ansible Vault
ADO Pipelines (wus3, scus) PAT Token Build: Read & Execute, Release: Read & Execute Per-org PAT Ansible Vault
ADO Pipelines (eus2lea, wus2lea) PAT Token Build: Read & Execute, Release: Read & Execute LEA org PATs (separate orgs) Ansible Vault
ADO Git Repos PAT Token (same as above) Code: Read & Write, Push to branches Per-org (same PAT can cover pipelines + repos) Ansible Vault
Azure Resource Manager Service Principal (optional) Contributor on target resource groups (if managing Azure resources directly) Per-subscription Ansible Vault or Azure Key Vault
How Each Playbook Type Connects
# ─── patch_linux.yml ───────────────────────────────────────
Connection: SSH (port 22)
Auth: SSH key pair → service account on target VMs
Path: Hub VNet → VNet Peering → Target VM
Actions: yum update, apt upgrade, reboot, verify
Privilege: sudo (become: yes)
# ─── edit_yaml.yml ─────────────────────────────────────────
Connection: Local (runs on this server)
Auth: PAT token for Git clone/push
Actions: git clone repo, edit YAML files, stage changes
Note: Uses connection: local — edits happen on this server's filesystem
# ─── commit_to_git.yml ─────────────────────────────────────
Connection: HTTPS to dev.azure.com (port 443)
Auth: PAT token embedded in Git remote URL
Actions: git add, git commit, git push to ADO repo
Note: Push triggers CI/CD pipeline automatically in ADO
# ─── run_manual_pipeline.yml ────────────────────────────────
Connection: HTTPS to dev.azure.com REST API (port 443)
Auth: PAT token in HTTP Authorization header
Actions: POST to pipeline run API, poll for status, retrieve logs
Note: Each region has different ADO org URL + PAT
# ─── run_terraform.yml ──────────────────────────────────────
Connection: Local + HTTPS to Azure Resource Manager (port 443)
Auth: Azure Service Principal (client_id + client_secret) or Managed Identity
Actions: terraform init, plan, apply, output — provisions Azure resources
Note: Uses connection: local, Terraform CLI runs on this server
State: Remote backend (Azure Storage Account) for state file locking
# ─── cert_rotation.yml ──────────────────────────────────────
Connection: SSH to target VMs + HTTPS to Azure Key Vault
Auth: SSH key for VMs, Azure CLI for Key Vault certificate download
Actions: Download new cert from Key Vault, deploy to hosts, restart services
# ─── service_restart.yml ────────────────────────────────────
Connection: SSH (port 22)
Auth: SSH key pair, sudo for systemctl operations
Actions: Drain connections, restart services in batches, health check
# ─── db_maintenance.yml ─────────────────────────────────────
Connection: SSH to database servers (port 22)
Auth: SSH key, become postgres user for DB operations
Actions: Backup, VACUUM ANALYZE, REINDEX, verify connections
# ─── security_scan.yml ──────────────────────────────────────
Connection: SSH (port 22)
Auth: SSH key pair with read-only access preferred
Actions: SUID check, SSH config audit, firewall status, port scan, report
Credential Management with Ansible Vault

All sensitive credentials are stored in Ansible Vault encrypted files. The vault password file is referenced in the Admin configuration. Here's how credentials are organized:

# vars/vault.yml (encrypted with Ansible Vault)
---
ado_pat_eus2: ""
ado_pat_wus2: ""
ado_pat_wus3: ""
ado_pat_scus: ""
ado_pat_eus2lea: ""
ado_pat_wus2lea: ""
ado_org_eus2: "https://dev.azure.com/contoso-eus2"
ado_org_wus2: "https://dev.azure.com/contoso-wus2"
... (one per region)
ssh_private_key_path: "~/.ssh/azure_rsa"
azure_sp_client_id: ""
azure_sp_client_secret: ""
Security & Access Summary
What This Server Needs
  • Network: Outbound SSH (22) to target VMs via VNet Peering
  • Network: Outbound HTTPS (443) to dev.azure.com
  • SSH: Private key matching authorized_keys on all target hosts
  • ADO: One PAT per ADO organization (6 total) with Build + Code permissions
  • OS Account: Service account on targets with sudo access
  • Optional: Azure Service Principal for direct ARM operations
  • Vault: Ansible Vault password file for decrypting secrets
What This Server Does NOT Need
  • No Ansible software on target VMs (agentless model)
  • No separate Ansible Tower/AWX server
  • No inbound ports opened on target VMs (SSH is outbound from here)
  • No Azure portal login or interactive browser sessions
  • No global admin privileges — only scoped access per resource
  • No direct database access to Azure SQL/Cosmos (infrastructure only)
  • No agent installation or software deployment to targets for connectivity
Current Mode: The system is currently running in mock mode for demonstration. In mock mode, no real SSH connections or API calls are made. Switch to live mode in the Admin page when this server is deployed inside your Azure network with all credentials configured.

Playbook Library, Category Mapping & Auto-Detection

The Playbook Library

The playbooks/ directory contains all Ansible playbooks available for MOP execution. The library currently includes 9 playbooks covering different operational scenarios:

Playbook Purpose Connection Type Typical Categories
patch_linux.yml OS patching, security updates, kernel upgrades, reboot SSH patch-linux, multi-region-patch
edit_yaml.yml Clone Git repo, edit YAML config files (Helm values, K8s manifests) Local + HTTPS agent-upgrade, git-ops, infrastructure
commit_to_git.yml Stage, commit, push changes to ADO Git repositories HTTPS agent-upgrade, git-ops, infrastructure
run_manual_pipeline.yml Trigger ADO pipeline via REST API, monitor run, retrieve logs HTTPS REST pipeline-only, agent-upgrade, infrastructure
run_terraform.yml Terraform init, plan, apply for cloud resource provisioning Local + ARM API terraform
cert_rotation.yml Download cert from Key Vault, deploy to hosts, restart services SSH + HTTPS cert-rotation
service_restart.yml Rolling service restart with connection drain and health checks SSH service-restart
db_maintenance.yml Database vacuum, reindex, backup, connectivity verification SSH db-maintenance
security_scan.yml CIS compliance checks, SUID audit, firewall review, port scan SSH security-scan
Adding new playbooks: Use the Playbook Library page to create new playbooks directly from the web interface, or drop .yml files into the playbooks/ directory. Then assign them to a category on the Admin → Categories tab.

How Category-to-Playbook Mapping Works

Every MOP belongs to a category. The category determines which playbooks run when that MOP is executed. Here's the process:

# Category Mapping Flow:
1. Vendor MOP Template arrives (e.g., "MOP-025_patch_linux_security_hotfix.j2")
|
2. Auto-Classifier analyzes filename + content + frontmatter
| Filename "patch_linux" → matches pattern r'patch' → +3 points for patch-linux
| Content "yum update" → matches keyword → +1 point for patch-linux
| Frontmatter "type: patch" → matches field → +5 points for patch-linux
|
3. Best match: patch-linux (confidence: 92%)
|
4. Category lookup: patch-linux → ["patch_linux.yml"]
|
5. Execution: Run patch_linux.yml via ansible-playbook CLI
Three Detection Signals (Weighted)
SignalWeightExample
Frontmatter fields 5x type: terraform, category: git-ops
Filename patterns 3x patch_linux_*.j2, terraform_*.j2
Content keywords 1x "terraform plan", "yum update", "git push"
Editing Category Mappings

You can manage category-to-playbook mappings in three ways:

  1. Admin UI: Go to Admin → Categories tab to add, edit, or remove category mappings with a form
  2. Config file: Edit configs/system_config.json directly to modify the category_mappings section
  3. Code: Update category_map.py for the static fallback mapping used by the executor

Each mapping defines: category name, ordered list of playbooks, description, risk level, and estimated duration.


Terraform Execution Model

Terraform runs as a fourth connection type alongside SSH, ADO REST API, and Git operations. It provisions and manages Azure cloud resources directly through the Azure Resource Manager (ARM) API.

How Terraform Fits In
  • Terraform CLI is installed on this same server alongside Ansible
  • The run_terraform.yml playbook calls terraform init/plan/apply as local commands
  • State files are stored in a remote backend (Azure Storage Account) for team access and locking
  • Authentication uses Azure Service Principal credentials or Managed Identity
  • Regional deployments use Terraform workspaces — one per region
Terraform vs. Ansible — When to Use Each
TaskUse
Create/destroy Azure VMs, VNets, NSGsTerraform
Configure software on existing VMsAnsible (SSH)
Update Helm values in Git repoAnsible (Git ops)
Trigger ADO pipelineAnsible (REST API)
Provision new Kubernetes clusterTerraform
Patch OS on existing cluster nodesAnsible (SSH)
# Terraform Execution Flow (via Ansible playbook):
Flask App → subprocess("ansible-playbook run_terraform.yml")
|
Ansible (connection: local) runs tasks on this server:
1. terraform init -backend-config=azure_storage.hcl
2. terraform workspace select {{ region }}
3. terraform plan -var="region={{ region }}" -out=tfplan
4. terraform apply tfplan (if action == 'apply')
5. terraform output -json → capture results
|
ARM API calls go to management.azure.com (HTTPS 443)
State stored in Azure Storage Account (remote backend)

Vendor Archive Pipeline

Vendor archives (.tgz, .tar.gz, .gz) are compressed archives delivered by vendors containing sets of J2 (Jinja2) MOP templates.

What's Inside a Vendor Archive?
  • Multiple .j2 template files - each one is a MOP procedure
  • Each template has metadata identifying its category, risk level, and dependencies
  • Templates contain Jinja2 variables like {{ hostname }} that get filled in during rendering
Processing Steps
# Step 1: Upload
Upload vendor archive (.tgz, .tar.gz, or .gz) via Admin page > Vendor Archives tab
Or use File Transfer tab for flexible file delivery
File saved to: uploads/gz/
# Step 2: Extract
Version auto-detected from filename (e.g., vendor-mops-R11.5.3.7.tgz → R11.5.3.7)
J2 templates extracted to: mops/R11.5.3.7/
# Step 3: Validate
Check J2 syntax, parse metadata, detect categories
Generate release manifest with validation results
# Step 4: Pre-Render (Pass 1)
Match prerender map entries (glob pattern + text search) and insert text blocks into J2 templates
Output: mops/R11537/ (in-place)
# Step 5: Final Render (Pass 2)
Apply region-specific variables for each of the 6 regions
Output to: rendered/R11.5.3.7/eus2/, rendered/R11.5.3.7/wus2/, etc.
# Step 6: Ready for Execution
Create a MOP set from the rendered templates
Schedule for sequential execution across all regions

Manage archives in the Administration page under the "Vendor Archives" tab. Use the "File Transfer" tab for uploading other file types (JSON configs, YAML variables, etc.).

Two-Pass Rendering Pipeline

Vendor templates go through two separate rendering passes to produce final MOP documents.

Pass 1: Pre-Render (Text Insertion Map)

The first pass uses a pattern-matching text insertion map to inject custom text blocks into J2 templates before regional rendering. Each map entry has:

  • MOP Name Pattern - Glob pattern matched against MOP filename (e.g., *failover*, *cert*)
  • Search Text - Exact text string to find inside the MOP content
  • Insert Text - Multi-line text block inserted immediately after the search text
  • Enabled - Toggle to activate/deactivate each entry

How it works:

  1. Scans all J2 templates in mops/{version}/
  2. For each template, checks every enabled map entry
  3. If the filename matches the glob pattern AND the search text is found in the content, the insert text is placed right after the search text
  4. J2 templates are modified in-place at mops/{version}/

Insert text can contain URLs, before/after markers, instructions, line feeds, and Jinja2 variables (rendered in Pass 2).

Pass 2: Final Render (Regional Variables)

The second pass renders Jinja2 templates with values that are different for each region:

  • Hostnames - eus2-web01.azure.internal
  • IP Addresses - 10.1.1.10
  • Azure Region - eastus2
  • ADO Organization - Per-region org URL
  • Pipeline IDs - Region-specific pipeline
  • Subscription IDs - Region-specific Azure subscription
  • Key Vault Names - Region-specific vault
  • Network Config - VNet, subnet, NSG names

Produces one complete Markdown MOP document per region (6 total).

Prerender Map Example
// Entry in configs/prerender_map.json
{
"id": "example-manual-failover",
"mop_name_pattern": "*failover*",
"search_text": "MANUAL_STEP_REQUIRED",
"insert_text": "\n> **Custom Procedure Required**\n> See: https://wiki.example.com/procedures/manual-failover\n> Complete this step before continuing.\n",
"enabled": true
}
Why Two Passes?
  • Pass 1 (Prerender Map) handles text insertion for MOPs that require manual steps, custom procedures, or additional context outside the automation scope — without modifying the original vendor templates
  • Pass 2 (Regional Render) applies Jinja2 rendering with region-specific YAML variable files, producing 6 final Markdown documents per MOP (one per region)
  • Any Jinja2 variables inside the inserted text blocks are rendered during Pass 2, so the inserts can reference regional values like hostnames, URLs, and pipeline IDs
  • This separation keeps vendor templates untouched and makes customization configurable through the admin UI without editing vendor files

Configure prerender map entries in the Administration page under the "Prerender Map" tab.

Pages & Features

Page URL What It Does Go
Dashboard / Overview of available MOPs, system statistics, recent execution history Open
MOPs /mops Browse, view, edit, and execute individual MOP templates and their variables Open
Scheduler /scheduler Create and manage "MOP sets" - groups of procedures scheduled for sequential execution across all six regions Open
Releases /releases Manage vendor MOP releases with version folders, regional subfolders, and type detection Open
Logs /logs View execution logs, Ansible output, error tracking, performance data, and search across all log types Open
Documentation /docs Render and manage vendor documentation with Jinja2 templates and regional variable files Open
Archive /archive Browse and restore completed MOP sets that have been archived (preserves all configs, logs, and rendered docs) Open
API Demo /api-demo Demonstrates the REST API endpoints and Next.js frontend integration Open

How Ansible Is Incorporated Into the Workflow

Ansible is the execution engine at the heart of this platform. Every MOP ultimately translates into one or more Ansible playbook runs.


End-to-End Execution Flow
Step 1: MOP Template + YAML Variables
A Jinja2 template (e.g., agent-upgrade.j2) is paired with a YAML variable file
containing region-specific values (hostnames, IPs, credentials, pipeline IDs).
Step 2: Rendering
The renderer merges template + variables to produce a complete MOP document
and extracts the "category" field from the variables.
Step 3: Category Lookup
The category (e.g., "agent-upgrade") maps to a list of Ansible playbooks:
"agent-upgrade" -> [edit_yaml.yml, commit_to_git.yml, run_manual_pipeline.yml]
Step 4: Ansible Playbook Execution
Each playbook is executed sequentially using ansible-playbook command.
Variables from the MOP are passed to Ansible via -e (extra vars).
Step 5: Logging & Capture
STDOUT, STDERR, return codes, timing data, and JSON callbacks are all captured.
Logs are saved to logs/ansible/ in structured JSON format.
Step 6: Result Tracking
Success/failure status, execution duration, and playbook output are recorded.
Failed playbooks are noted but remaining playbooks continue executing.

Category-to-Playbook Mapping

Each MOP has a category defined in its YAML variables. This category determines exactly which Ansible playbooks run and in what order:

Category Playbooks (Executed in Order) What It Does
patch-linux patch_linux.yml Updates packages, installs security patches, reboots servers if needed
agent-upgrade edit_yaml.yml commit_to_git.yml run_manual_pipeline.yml Edits config files, commits changes to Git, then triggers an Azure DevOps pipeline to deploy the new agent version
pipeline-only run_manual_pipeline.yml Directly triggers an Azure DevOps pipeline without any file changes
git-ops edit_yaml.yml commit_to_git.yml Edits configuration files and commits changes to Git (infrastructure-as-code updates)
infrastructure edit_yaml.yml commit_to_git.yml run_manual_pipeline.yml Full infrastructure change: edit configs, commit to Git, then trigger deployment pipeline
multi-region-patch patch_linux.yml Linux patching that targets all regions based on Ansible inventory groups
multi-region-deploy edit_yaml.yml commit_to_git.yml run_manual_pipeline.yml Full deployment across multiple regions with region-specific targeting

How Ansible Playbooks Are Called

When a MOP is executed, the platform calls Ansible through the command line with these settings:

# The actual command executed for each playbook:
ansible-playbook playbooks/<playbook_name>.yml \
-e '{"region": "eus2", "hostname": "server1.eus2.example.com", ...}' \
--log-file logs/ansible_20250802_143000.log \
-v
# Environment variables set for structured logging:
ANSIBLE_STDOUT_CALLBACK=json
Produces structured JSON output instead of plain text
ANSIBLE_LOG_PATH=logs/ansible_<timestamp>.log
Writes detailed execution log to file

What Ansible Logs Are Captured
STDOUT

The standard playbook execution output showing task results, host statuses, and the PLAY RECAP summary.

STDERR

Error messages, deprecation warnings, and connection issues that Ansible reports during execution.

Return Code

Exit status: 0 = success, 1 = error, 2 = one or more hosts failed, 4 = unreachable hosts.

Log File

Ansible's built-in log file with detailed execution trace, written to logs/ansible/.

JSON Callback

Structured JSON data including task results, variable data, host info, and change tracking.

Performance Data

Timing information for each task, total play duration, and identification of slow tasks.


Regional Execution Safety

When running MOP sets across all six regions, Ansible execution follows strict safety controls:

  1. Sequential Order: Regions are executed one at a time: eus2 → wus2 → wus3 → scus → eus2lea → wus2lea
  2. Manual Approval: After each region completes, an operator must approve before the next region starts
  3. Region-Specific Variables: Each region gets its own YAML variable file with unique hostnames, IPs, ADO organization details, and pipeline IDs
  4. Pause/Resume: Operators can pause execution at any point and resume later
  5. Error Isolation: A failure in one region does not automatically cascade to the next
  6. Rollback Planning: Each MOP set includes rollback procedures in case execution needs to be reversed

Ansible Inventory Structure

Each Azure region has its own Ansible inventory defining target hosts:

# inventory/eus2/hosts.ini
[web_servers]
web1.eus2.example.com
web2.eus2.example.com
[db_servers]
db1.eus2.example.com
[app_servers]
app1.eus2.example.com
app2.eus2.example.com
# Each region (eus2, wus2, wus3, scus, eus2lea, wus2lea)
# has its own inventory with separate host groups,
# ADO organization, and connection credentials.

The Six Azure Regions

Every MOP runs sequentially through these regions in order, with manual approval required between each one:

eus2 wus2 wus3 scus eus2lea wus2lea
Short Name Full Name Azure Region Timezone Type
eus2 East US 2 eastus2 America/New_York Production
wus2 West US 2 westus2 America/Los_Angeles Production
wus3 West US 3 westus3 America/Los_Angeles Production
scus South Central US southcentralus America/Chicago Production
eus2lea East US 2 LEA eastus2euap America/New_York Early Access
wus2lea West US 2 LEA westus2euap America/Los_Angeles Early Access

Each region has its own Azure DevOps organization, separate PAT tokens, dedicated subscriptions, and complete organizational isolation.

Key Concepts

MOP

A Method of Procedure - a written procedure document (Markdown with Jinja2 variables) describing step-by-step operational tasks like patching servers, upgrading agents, or deploying infrastructure changes.

MOP Set

A group of MOPs scheduled for execution together across all six regions. Sets enforce sequential regional deployment with manual approval gates between each region.

Vendor Tar Ball

A compressed package from your vendor containing MOP templates. The platform automatically extracts, validates, and organizes these into versioned folders with regional variable files.

Release

A versioned collection of vendor MOPs (e.g., R11.5.3.4) ready for deployment. Releases contain version folders with regional subfolders for each of the six Azure regions.

Playbook

An Ansible automation script (YAML file) that performs the actual work described in a MOP. Examples: patch_linux.yml, edit_yaml.yml, commit_to_git.yml.

Category Mapping

The configuration system that connects MOP types (like "patch-linux" or "agent-upgrade") to the correct sequence of Ansible playbooks. Defined in category_map.py.

Jinja2 Template

A template file (.j2) containing the MOP procedure text with variable placeholders like {{ hostname }} that get replaced with actual values for each region during rendering.

YAML Variables

Configuration files in the vars/ directory that contain region-specific values (hostnames, IPs, pipeline IDs, etc.) used to render MOP templates and pass data to Ansible.

Development History

Phase 1: Foundation (January 2025)
  • Next.js Frontend Integration - Built the modern frontend interface with TypeScript
  • Set up the dual-interface architecture: Next.js for day-to-day operations, Flask for administration
  • Created the API layer connecting frontend to backend with RESTful endpoints
  • Built the initial demo interface showing system readiness
  • Improved execution error reporting with development environment considerations
Phase 2: Core Systems (August 2025)

Several major systems were built in rapid succession:

  • Built the MOPLogger class for tracking every execution
  • Added Ansible-specific logging (capturing command output, errors, timing data)
  • Created the Logs dashboard with search and analysis capabilities
  • Added performance tracking for Ansible playbook runs
  • Log types: System logs, execution logs, process logs, and comprehensive Ansible logs

  • Set up all six Azure regions with Ansible inventory files
  • Configured separate Azure DevOps organizations per region
  • Built host groups, network configs, and security isolation
  • Added special handling for Early Access (LEA) regions
  • Complete organizational isolation with separate PAT tokens and dedicated subscriptions

  • Created the full release management platform
  • Support for 25 different vendor MOP types (infrastructure, security, monitoring, database, backup, system operations)
  • Version control with release folders (e.g., R11.5.3.4)
  • Automated regional MOP document generation with proper metadata
  • Six unique variable files for each Azure DevOps organization
  • Complete web interface for release creation and management

  • Built the scheduling interface for grouping MOPs into "sets"
  • Implemented sequential execution enforcement (region by region)
  • Added manual approval gates between regions for safety
  • Created real-time progress tracking and status monitoring
  • Added pause/resume and emergency controls
  • Bootstrap-based responsive UI with auto-refresh for active sets

  • Built the versioned folder system (mops/{version}/)
  • Created the automated tar ball processor for vendor-supplied MOP packages
  • Added automatic variable file generation per version per region
  • Built release manifest generation with MOP validation and metadata parsing
  • CLI tools for listing, processing, and validating vendor packages
  • Full integration with Release Manager, Scheduler, and Logging systems

  • Organized all docs into a structured docs/ directory
  • Created vendor integration guides and workflow documentation
  • Built quick-start guides and architecture overviews
  • Clear guidance for vendors, DevOps teams, and operators
Phase 3: Archive Management (Most Recent)
  • Archive Manager - Built the complete archive system for organizing completed MOP sets
  • Created the Archive dashboard with statistics, filtering, and restore functionality
  • Added archive/restore buttons to the Scheduler page for completed/failed/cancelled sets
  • Built the archive index system (JSON-based) for tracking all archived content
  • Preserves all related files (configs, logs, rendered docs) together in archive
  • Optional cleanup: can archive with or without removing originals from active directories
25 Vendor MOP Types Supported

The platform supports these operational procedure categories:

Infrastructure Security Updates Monitoring Changes Database Maintenance Backup Procedures System Operations Agent Upgrades Linux Patching Pipeline Deployments Git Operations Config Management Network Changes Certificate Rotation DNS Updates Load Balancer Config Storage Management Log Rotation User Access Control Compliance Checks Performance Tuning Disaster Recovery Failover Testing Capacity Planning Health Checks Incident Response