Platform Guide
Complete overview of the MOP Automation Platform, how it works, and its development history.
What This App Does
This is a MOP (Method of Procedure) Automation Platform that takes written operational procedures and automates their execution across your Azure infrastructure.
Instead of an engineer manually following a checklist and running commands one-by-one across six different Azure regions, this platform:
1 Renders
Takes MOP procedure documents written as Jinja2 templates and fills in region-specific variables (hostnames, IPs, credentials, etc.) to produce complete, ready-to-execute documentation for each region.
2 Classifies
Each MOP has a category (like "patch-linux", "agent-upgrade", "git-ops") that determines which Ansible playbooks get assigned to it automatically.
3 Executes
Runs the assigned Ansible playbooks across all six Azure regions in a controlled, sequential order with manual approval gates between each region.
4 Tracks
Logs everything with detailed execution tracking, Ansible output capture, performance data, and error reporting across the entire workflow.
Complete Workflow Diagram
End-to-end pipeline from vendor file delivery to execution across all regions:
Vendor Archive
Upload compressed archive (.tgz/.tar.gz/.gz) containing J2 MOP template setsExtract & Validate
Extract J2 templates, validate syntax, parse metadata, detect categoriesVersion Directory
Organized into mops/{version}/ with manifestPass 1: Pre-Render
Pattern-match prerender map: match MOP filenames (glob) + search text, insert text blocks into J2 templatesPass 2: Regional Render
Apply region-specific variables: hostnames, IPs, pipeline IDs, credentialsCategory Mapping
Map each MOP's category to specific Ansible playbook sequenceSchedule MOP Set
Group MOPs into a set, schedule execution windowAnsible Execution
Run playbooks via local ansible-playbook CLI, capture all outputRegion-by-Region
eus2 → wus2 → wus3 → scus → eus2lea → wus2lea with approval gatesLog & Monitor
Capture STDOUT, STDERR, return codes, timing, JSON callbacksResults Dashboard
View execution status, performance analysis, error trackingArchive
Archive completed sets with all configs, logs, and rendered docs preservedAnsible Architecture: Connectivity, Access & Placement
Where Does This Server Live?
This MOP automation server is designed to run inside the Azure environment as a management VM (or container) that has network connectivity to all six regional Azure DevOps organizations and their associated infrastructure.
Expected Network Placement
Option A: Hub VNet Management VM (Recommended)
- Deployed as a VM in a central hub VNet (e.g., a shared-services or management subscription)
- Hub VNet is peered to all six regional spoke VNets via Azure VNet Peering
- NSG rules allow outbound SSH (port 22) from this VM to target hosts in each region
- NSG rules allow outbound HTTPS (port 443) to Azure DevOps APIs and Azure Resource Manager
- This VM sits in a dedicated management subnet with restricted inbound access
Option B: Azure Bastion / Jump Box Model
- Server runs behind an Azure Bastion or in a jump box subnet
- VPN or ExpressRoute provides connectivity from on-prem to Azure if needed
- Private endpoints used for Azure DevOps and Git repos where available
- Suitable for organizations with stricter network segmentation requirements
Option C: On-Premises with VPN
- Server runs on-prem and connects to Azure via Site-to-Site VPN or ExpressRoute
- Requires VPN gateway in each regional VNet or hub-and-spoke routing
- Higher latency but keeps the management server outside Azure
Network Connectivity Diagram
Three Connection Types Used by Ansible
This platform uses three distinct connection mechanisms to interact with Azure resources. Each uses different protocols and credentials:
1. SSH to Target VMs
For: patch-linux, agent-upgrade, command executionWhat it does: Connects directly to Linux VMs in each region to run shell commands, install packages, restart services, apply patches.
Protocol: SSH (port 22)
Authentication:
- SSH key pair (private key stored on this server)
- Connects as a service account user (e.g.,
azureuseroransible-svc) - Uses
sudofor privileged operations (become/escalation)
Network path:
- Hub VNet → VNet Peering → Regional Spoke VNet → Target VM (port 22)
- NSG must allow SSH from management subnet
Ansible modules used:
shell, command, yum/apt, service, copy, template, lineinfile
2. Azure DevOps REST API
For: pipeline-only, run_manual_pipelineWhat it does: Triggers Azure DevOps pipelines in each regional ADO organization via REST API calls. Monitors pipeline runs and retrieves results.
Protocol: HTTPS (port 443)
Authentication:
- Personal Access Token (PAT) per ADO organization
- Each region has its own ADO org, so each needs its own PAT
- PATs stored in Ansible Vault (encrypted at rest)
Network path:
- Outbound HTTPS to
dev.azure.com - No VNet peering needed (public API endpoint)
- Can use Azure Private Link for ADO if required
Ansible modules used:
uri (REST calls), azure.azcollection modules
POST https://dev.azure.com/{org}/{project}/_apis/pipelines/{id}/runs?api-version=7.0
3. Git Repository Operations
For: git-ops, edit_yaml, commit_to_gitWhat it does: Clones ADO Git repos, edits YAML configuration files (e.g., Helm values, Kubernetes manifests), commits changes, and pushes back to trigger CI/CD pipelines.
Protocol: HTTPS (port 443)
Authentication:
- PAT token embedded in Git remote URL
- Format:
https://PAT@dev.azure.com/{org}/{project}/_git/{repo} - Or via Git credential helper configured with PAT
Network path:
- Outbound HTTPS to
dev.azure.com - Same PAT tokens as pipeline API calls
Ansible modules used:
git, template, lineinfile, shell (for git commit/push)
Required Access Levels & Credentials
| Resource | Credential Type | Required Permissions | Scope | Storage |
|---|---|---|---|---|
| Target Linux VMs | SSH Key Pair | SSH login + sudo privileges (passwordless sudo for automation) |
Per-region inventory groups | ~/.ssh/azure_rsa on this server |
| ADO Pipelines (eus2) | PAT Token | Build: Read & Execute, Release: Read & Execute | eus2 ADO organization | Ansible Vault |
| ADO Pipelines (wus2) | PAT Token | Build: Read & Execute, Release: Read & Execute | wus2 ADO organization | Ansible Vault |
| ADO Pipelines (wus3, scus) | PAT Token | Build: Read & Execute, Release: Read & Execute | Per-org PAT | Ansible Vault |
| ADO Pipelines (eus2lea, wus2lea) | PAT Token | Build: Read & Execute, Release: Read & Execute | LEA org PATs (separate orgs) | Ansible Vault |
| ADO Git Repos | PAT Token (same as above) | Code: Read & Write, Push to branches | Per-org (same PAT can cover pipelines + repos) | Ansible Vault |
| Azure Resource Manager | Service Principal (optional) | Contributor on target resource groups (if managing Azure resources directly) | Per-subscription | Ansible Vault or Azure Key Vault |
How Each Playbook Type Connects
connection: local — edits happen on this server's filesystemconnection: local, Terraform CLI runs on this serverCredential Management with Ansible Vault
All sensitive credentials are stored in Ansible Vault encrypted files. The vault password file is referenced in the Admin configuration. Here's how credentials are organized:
Security & Access Summary
What This Server Needs
- Network: Outbound SSH (22) to target VMs via VNet Peering
- Network: Outbound HTTPS (443) to
dev.azure.com - SSH: Private key matching authorized_keys on all target hosts
- ADO: One PAT per ADO organization (6 total) with Build + Code permissions
- OS Account: Service account on targets with sudo access
- Optional: Azure Service Principal for direct ARM operations
- Vault: Ansible Vault password file for decrypting secrets
What This Server Does NOT Need
- No Ansible software on target VMs (agentless model)
- No separate Ansible Tower/AWX server
- No inbound ports opened on target VMs (SSH is outbound from here)
- No Azure portal login or interactive browser sessions
- No global admin privileges — only scoped access per resource
- No direct database access to Azure SQL/Cosmos (infrastructure only)
- No agent installation or software deployment to targets for connectivity
Playbook Library, Category Mapping & Auto-Detection
The Playbook Library
The playbooks/ directory contains all Ansible playbooks available for MOP execution. The library currently includes 9 playbooks covering different operational scenarios:
| Playbook | Purpose | Connection Type | Typical Categories |
|---|---|---|---|
patch_linux.yml |
OS patching, security updates, kernel upgrades, reboot | SSH | patch-linux, multi-region-patch |
edit_yaml.yml |
Clone Git repo, edit YAML config files (Helm values, K8s manifests) | Local + HTTPS | agent-upgrade, git-ops, infrastructure |
commit_to_git.yml |
Stage, commit, push changes to ADO Git repositories | HTTPS | agent-upgrade, git-ops, infrastructure |
run_manual_pipeline.yml |
Trigger ADO pipeline via REST API, monitor run, retrieve logs | HTTPS REST | pipeline-only, agent-upgrade, infrastructure |
run_terraform.yml |
Terraform init, plan, apply for cloud resource provisioning | Local + ARM API | terraform |
cert_rotation.yml |
Download cert from Key Vault, deploy to hosts, restart services | SSH + HTTPS | cert-rotation |
service_restart.yml |
Rolling service restart with connection drain and health checks | SSH | service-restart |
db_maintenance.yml |
Database vacuum, reindex, backup, connectivity verification | SSH | db-maintenance |
security_scan.yml |
CIS compliance checks, SUID audit, firewall review, port scan | SSH | security-scan |
.yml files into the playbooks/ directory. Then assign them to a category on the Admin → Categories tab.
How Category-to-Playbook Mapping Works
Every MOP belongs to a category. The category determines which playbooks run when that MOP is executed. Here's the process:
Three Detection Signals (Weighted)
| Signal | Weight | Example |
|---|---|---|
| Frontmatter fields | 5x | type: terraform, category: git-ops |
| Filename patterns | 3x | patch_linux_*.j2, terraform_*.j2 |
| Content keywords | 1x | "terraform plan", "yum update", "git push" |
Editing Category Mappings
You can manage category-to-playbook mappings in three ways:
- Admin UI: Go to Admin → Categories tab to add, edit, or remove category mappings with a form
- Config file: Edit
configs/system_config.jsondirectly to modify thecategory_mappingssection - Code: Update
category_map.pyfor the static fallback mapping used by the executor
Each mapping defines: category name, ordered list of playbooks, description, risk level, and estimated duration.
Terraform Execution Model
Terraform runs as a fourth connection type alongside SSH, ADO REST API, and Git operations. It provisions and manages Azure cloud resources directly through the Azure Resource Manager (ARM) API.
How Terraform Fits In
- Terraform CLI is installed on this same server alongside Ansible
- The
run_terraform.ymlplaybook callsterraform init/plan/applyas local commands - State files are stored in a remote backend (Azure Storage Account) for team access and locking
- Authentication uses Azure Service Principal credentials or Managed Identity
- Regional deployments use Terraform workspaces — one per region
Terraform vs. Ansible — When to Use Each
| Task | Use |
|---|---|
| Create/destroy Azure VMs, VNets, NSGs | Terraform |
| Configure software on existing VMs | Ansible (SSH) |
| Update Helm values in Git repo | Ansible (Git ops) |
| Trigger ADO pipeline | Ansible (REST API) |
| Provision new Kubernetes cluster | Terraform |
| Patch OS on existing cluster nodes | Ansible (SSH) |
Vendor Archive Pipeline
Vendor archives (.tgz, .tar.gz, .gz) are compressed archives delivered by vendors containing sets of J2 (Jinja2) MOP templates.
What's Inside a Vendor Archive?
- Multiple
.j2template files - each one is a MOP procedure - Each template has metadata identifying its category, risk level, and dependencies
- Templates contain Jinja2 variables like
{{ hostname }}that get filled in during rendering
Processing Steps
Manage archives in the Administration page under the "Vendor Archives" tab. Use the "File Transfer" tab for uploading other file types (JSON configs, YAML variables, etc.).
Two-Pass Rendering Pipeline
Vendor templates go through two separate rendering passes to produce final MOP documents.
Pass 1: Pre-Render (Text Insertion Map)
The first pass uses a pattern-matching text insertion map to inject custom text blocks into J2 templates before regional rendering. Each map entry has:
- MOP Name Pattern - Glob pattern matched against MOP filename (e.g.,
*failover*,*cert*) - Search Text - Exact text string to find inside the MOP content
- Insert Text - Multi-line text block inserted immediately after the search text
- Enabled - Toggle to activate/deactivate each entry
How it works:
- Scans all J2 templates in
mops/{version}/ - For each template, checks every enabled map entry
- If the filename matches the glob pattern AND the search text is found in the content, the insert text is placed right after the search text
- J2 templates are modified in-place at
mops/{version}/
Insert text can contain URLs, before/after markers, instructions, line feeds, and Jinja2 variables (rendered in Pass 2).
Pass 2: Final Render (Regional Variables)
The second pass renders Jinja2 templates with values that are different for each region:
- Hostnames - eus2-web01.azure.internal
- IP Addresses - 10.1.1.10
- Azure Region - eastus2
- ADO Organization - Per-region org URL
- Pipeline IDs - Region-specific pipeline
- Subscription IDs - Region-specific Azure subscription
- Key Vault Names - Region-specific vault
- Network Config - VNet, subnet, NSG names
Produces one complete Markdown MOP document per region (6 total).
Prerender Map Example
Why Two Passes?
- Pass 1 (Prerender Map) handles text insertion for MOPs that require manual steps, custom procedures, or additional context outside the automation scope — without modifying the original vendor templates
- Pass 2 (Regional Render) applies Jinja2 rendering with region-specific YAML variable files, producing 6 final Markdown documents per MOP (one per region)
- Any Jinja2 variables inside the inserted text blocks are rendered during Pass 2, so the inserts can reference regional values like hostnames, URLs, and pipeline IDs
- This separation keeps vendor templates untouched and makes customization configurable through the admin UI without editing vendor files
Configure prerender map entries in the Administration page under the "Prerender Map" tab.
Pages & Features
| Page | URL | What It Does | Go |
|---|---|---|---|
| Dashboard | / |
Overview of available MOPs, system statistics, recent execution history | Open |
| MOPs | /mops |
Browse, view, edit, and execute individual MOP templates and their variables | Open |
| Scheduler | /scheduler |
Create and manage "MOP sets" - groups of procedures scheduled for sequential execution across all six regions | Open |
| Releases | /releases |
Manage vendor MOP releases with version folders, regional subfolders, and type detection | Open |
| Logs | /logs |
View execution logs, Ansible output, error tracking, performance data, and search across all log types | Open |
| Documentation | /docs |
Render and manage vendor documentation with Jinja2 templates and regional variable files | Open |
| Archive | /archive |
Browse and restore completed MOP sets that have been archived (preserves all configs, logs, and rendered docs) | Open |
| API Demo | /api-demo |
Demonstrates the REST API endpoints and Next.js frontend integration | Open |
How Ansible Is Incorporated Into the Workflow
Ansible is the execution engine at the heart of this platform. Every MOP ultimately translates into one or more Ansible playbook runs.
End-to-End Execution Flow
Category-to-Playbook Mapping
Each MOP has a category defined in its YAML variables. This category determines exactly which Ansible playbooks run and in what order:
| Category | Playbooks (Executed in Order) | What It Does |
|---|---|---|
| patch-linux | patch_linux.yml |
Updates packages, installs security patches, reboots servers if needed |
| agent-upgrade |
edit_yaml.yml
commit_to_git.yml
run_manual_pipeline.yml
|
Edits config files, commits changes to Git, then triggers an Azure DevOps pipeline to deploy the new agent version |
| pipeline-only | run_manual_pipeline.yml |
Directly triggers an Azure DevOps pipeline without any file changes |
| git-ops |
edit_yaml.yml
commit_to_git.yml
|
Edits configuration files and commits changes to Git (infrastructure-as-code updates) |
| infrastructure |
edit_yaml.yml
commit_to_git.yml
run_manual_pipeline.yml
|
Full infrastructure change: edit configs, commit to Git, then trigger deployment pipeline |
| multi-region-patch | patch_linux.yml |
Linux patching that targets all regions based on Ansible inventory groups |
| multi-region-deploy |
edit_yaml.yml
commit_to_git.yml
run_manual_pipeline.yml
|
Full deployment across multiple regions with region-specific targeting |
How Ansible Playbooks Are Called
When a MOP is executed, the platform calls Ansible through the command line with these settings:
What Ansible Logs Are Captured
STDOUT
The standard playbook execution output showing task results, host statuses, and the PLAY RECAP summary.
STDERR
Error messages, deprecation warnings, and connection issues that Ansible reports during execution.
Return Code
Exit status: 0 = success, 1 = error, 2 = one or more hosts failed, 4 = unreachable hosts.
Log File
Ansible's built-in log file with detailed execution trace, written to logs/ansible/.
JSON Callback
Structured JSON data including task results, variable data, host info, and change tracking.
Performance Data
Timing information for each task, total play duration, and identification of slow tasks.
Regional Execution Safety
When running MOP sets across all six regions, Ansible execution follows strict safety controls:
- Sequential Order: Regions are executed one at a time: eus2 → wus2 → wus3 → scus → eus2lea → wus2lea
- Manual Approval: After each region completes, an operator must approve before the next region starts
- Region-Specific Variables: Each region gets its own YAML variable file with unique hostnames, IPs, ADO organization details, and pipeline IDs
- Pause/Resume: Operators can pause execution at any point and resume later
- Error Isolation: A failure in one region does not automatically cascade to the next
- Rollback Planning: Each MOP set includes rollback procedures in case execution needs to be reversed
Ansible Inventory Structure
Each Azure region has its own Ansible inventory defining target hosts:
The Six Azure Regions
Every MOP runs sequentially through these regions in order, with manual approval required between each one:
| Short Name | Full Name | Azure Region | Timezone | Type |
|---|---|---|---|---|
| eus2 | East US 2 | eastus2 |
America/New_York | Production |
| wus2 | West US 2 | westus2 |
America/Los_Angeles | Production |
| wus3 | West US 3 | westus3 |
America/Los_Angeles | Production |
| scus | South Central US | southcentralus |
America/Chicago | Production |
| eus2lea | East US 2 LEA | eastus2euap |
America/New_York | Early Access |
| wus2lea | West US 2 LEA | westus2euap |
America/Los_Angeles | Early Access |
Each region has its own Azure DevOps organization, separate PAT tokens, dedicated subscriptions, and complete organizational isolation.
Key Concepts
MOP
A Method of Procedure - a written procedure document (Markdown with Jinja2 variables) describing step-by-step operational tasks like patching servers, upgrading agents, or deploying infrastructure changes.
MOP Set
A group of MOPs scheduled for execution together across all six regions. Sets enforce sequential regional deployment with manual approval gates between each region.
Vendor Tar Ball
A compressed package from your vendor containing MOP templates. The platform automatically extracts, validates, and organizes these into versioned folders with regional variable files.
Release
A versioned collection of vendor MOPs (e.g., R11.5.3.4) ready for deployment. Releases contain version folders with regional subfolders for each of the six Azure regions.
Playbook
An Ansible automation script (YAML file) that performs the actual work described in a MOP. Examples: patch_linux.yml, edit_yaml.yml, commit_to_git.yml.
Category Mapping
The configuration system that connects MOP types (like "patch-linux" or "agent-upgrade") to the correct sequence of Ansible playbooks. Defined in category_map.py.
Jinja2 Template
A template file (.j2) containing the MOP procedure text with variable placeholders like {{ hostname }} that get replaced with actual values for each region during rendering.
YAML Variables
Configuration files in the vars/ directory that contain region-specific values (hostnames, IPs, pipeline IDs, etc.) used to render MOP templates and pass data to Ansible.
Development History
Phase 1: Foundation (January 2025)
- Next.js Frontend Integration - Built the modern frontend interface with TypeScript
- Set up the dual-interface architecture: Next.js for day-to-day operations, Flask for administration
- Created the API layer connecting frontend to backend with RESTful endpoints
- Built the initial demo interface showing system readiness
- Improved execution error reporting with development environment considerations
Phase 2: Core Systems (August 2025)
Several major systems were built in rapid succession:
- Built the MOPLogger class for tracking every execution
- Added Ansible-specific logging (capturing command output, errors, timing data)
- Created the Logs dashboard with search and analysis capabilities
- Added performance tracking for Ansible playbook runs
- Log types: System logs, execution logs, process logs, and comprehensive Ansible logs
- Set up all six Azure regions with Ansible inventory files
- Configured separate Azure DevOps organizations per region
- Built host groups, network configs, and security isolation
- Added special handling for Early Access (LEA) regions
- Complete organizational isolation with separate PAT tokens and dedicated subscriptions
- Created the full release management platform
- Support for 25 different vendor MOP types (infrastructure, security, monitoring, database, backup, system operations)
- Version control with release folders (e.g., R11.5.3.4)
- Automated regional MOP document generation with proper metadata
- Six unique variable files for each Azure DevOps organization
- Complete web interface for release creation and management
- Built the scheduling interface for grouping MOPs into "sets"
- Implemented sequential execution enforcement (region by region)
- Added manual approval gates between regions for safety
- Created real-time progress tracking and status monitoring
- Added pause/resume and emergency controls
- Bootstrap-based responsive UI with auto-refresh for active sets
- Built the versioned folder system (
mops/{version}/) - Created the automated tar ball processor for vendor-supplied MOP packages
- Added automatic variable file generation per version per region
- Built release manifest generation with MOP validation and metadata parsing
- CLI tools for listing, processing, and validating vendor packages
- Full integration with Release Manager, Scheduler, and Logging systems
- Organized all docs into a structured
docs/directory - Created vendor integration guides and workflow documentation
- Built quick-start guides and architecture overviews
- Clear guidance for vendors, DevOps teams, and operators
Phase 3: Archive Management (Most Recent)
- Archive Manager - Built the complete archive system for organizing completed MOP sets
- Created the Archive dashboard with statistics, filtering, and restore functionality
- Added archive/restore buttons to the Scheduler page for completed/failed/cancelled sets
- Built the archive index system (JSON-based) for tracking all archived content
- Preserves all related files (configs, logs, rendered docs) together in archive
- Optional cleanup: can archive with or without removing originals from active directories
25 Vendor MOP Types Supported
The platform supports these operational procedure categories: