Stale CI Detection
Stale Server Detection
Identify servers not updated within expected timeframes using tiered escalation.
30 Day Warning Query:
objectType = "Server"
AND Status = "Active"
AND updated < now(-30d)
AND updated >= now(-45d)
45 Day Escalation Query:
objectType = "Server"
AND Status = "Active"
AND updated < now(-45d)
AND updated >= now(-60d)
60 Day Critical Query:
objectType = "Server"
AND Status = "Active"
AND updated < now(-60d)
| Tier | Recipients | Action |
|---|---|---|
| 30 Day | Data Steward | Email reminder |
| 45 Day | Data Steward + Data Owner | Email + Jira task |
| 60 Day | Infrastructure Manager | Critical alert + Flag in CMDB |
Stale Application Detection
90 Day Warning:
objectType = "Application"
AND Status = "Active"
AND updated < now(-90d)
180 Day Critical:
objectType = "Application"
AND Status = "Active"
AND updated < now(-180d)
Lifecycle Management
Certificate Expiry Tracking
Prevent outages from expired certificates with proactive notifications.
Expiring in 30 Days:
objectType = "Certificate"
AND Status = "Active"
AND "Expiry Date" <= now(30d)
AND "Expiry Date" > now()
Actions:
- Update Status to "Expiring Soon"
- Create Jira ticket for renewal
- Notify certificate owner
Expired Certificates:
objectType = "Certificate"
AND Status IN ("Active", "Expiring Soon")
AND "Expiry Date" < now()
Actions:
- Update Status to "Expired"
- Alert Security Operations immediately
- Identify services using certificate
Contract Renewal Notification
90 Day Notice:
objectType = "Contract"
AND Status = "Active"
AND "End Date" <= now(90d)
AND "End Date" > now(60d)
60 Day Notice:
objectType = "Contract"
AND Status = "Active"
AND "End Date" <= now(60d)
AND "End Date" > now(30d)
30 Day Urgent:
objectType = "Contract"
AND Status = "Active"
AND "End Date" <= now(30d)
AND "End Date" > now()
Subject: [URGENT] Contract Expiring in 30 Days: {{Contract Name}}
Contract {{Contract Name}} expires in {{days_remaining}} days.
Contract Details:
- Vendor: {{Vendor}}
- Type: {{Contract Type}}
- Value: ${{Value}}
- End Date: {{End Date}}
- Auto-Renew: {{Auto-Renew}}
{{#if auto_renew_yes}}
This contract will auto-renew. Cancel if needed.
{{else}}
This contract will NOT auto-renew. Renewal required.
{{/if}}
Action Required: Review and initiate renewal process.
Data Quality Automations
Duplicate Detection
On Create - Real-time Check:
objectType = "Server"
AND Hostname = {{new_hostname}}
AND Key != {{new_key}}
Action if duplicate found: Block creation, return error message.
Reference Validation
Orphaned Virtual Machines:
objectType = "Virtual Machine"
AND "Host Server" IS NOT EMPTY
AND "Host Server".Status = "Decommissioned"
AND Status = "Running"
Applications without Technical Service:
objectType = "Application"
AND Status = "Active"
AND "Technical Service" IS EMPTY
Actions:
- Generate orphan report
- Notify affected Data Stewards
- Create Jira issues for resolution
Integration Automations
Change Impact Analysis
Identify CIs affected by a planned change via API call from change management.
Impact Analysis Query:
objectType IN ("Application", "Microservice", "API", "Business Service")
AND (
inboundReferences("Runs On") IN ({{target_keys}})
OR inboundReferences("Depends On") IN ({{target_keys}})
OR inboundReferences("Delivered By") IN ({{target_keys}})
)
Response Example:
{
"change_id": "CHG0005678",
"direct_impact_count": 5,
"total_impact_count": 23,
"impacted_services": [
{
"name": "Online Banking",
"criticality": "Tier 1",
"impact_path": "web-prod-01 → CustomerPortal → Online Banking"
}
],
"risk_score": "High",
"recommended_actions": [
"Schedule during maintenance window",
"Notify Tier 1 service owners"
]
}
Cloud Cost Sync
Update cloud instance costs from AWS Cost Explorer, Azure Cost Management.
# Sync AWS costs via Cost Explorer API
for instance in cloud_instances:
daily_cost = get_cost_from_aws(instance.instance_id)
monthly_estimate = daily_cost * 30
update_cmdb(instance, 'Monthly Cost', monthly_estimate)
Health Check Automations
Backup Coverage Verification
Find Unprotected Databases:
objectType = "Database"
AND Status = "Active"
AND "Backup Status" != "Active"
Find Failed Backups:
objectType = "Backup Job"
AND Status = "Active"
AND "Last Run Status" = "Failed"
Subject: [CRITICAL] Backup Coverage Gaps Detected
Databases Without Backups:
{{#each unprotected_databases}}
- {{Database Name}} on {{Server}} ({{Database Type}})
{{/each}}
Failed Backup Jobs:
{{#each failed_jobs}}
- {{Job Name}}: Last failure on {{Last Run}}
{{/each}}
Action Required: Configure backup or investigate failures.
Kubernetes Version Drift
CURRENT_K8S_VERSION = "1.29"
MINIMUM_SUPPORTED = "1.27"
for cluster in active_clusters:
if cluster.version < MINIMUM_SUPPORTED:
alert_critical("Unsupported version")
elif cluster.version < CURRENT_K8S_VERSION:
alert_warning("Outdated version")
Cloud Account Budget Alerts
Query Accounts with Budgets:
objectType = "Cloud Account"
AND Status = "Active"
AND "Monthly Budget" > 0
| Threshold | Action |
|---|---|
| >100% of budget | Critical alert to IT Director |
| >90% of budget | Warning to Cloud Platform Lead |
| >120% of pace | Info alert - tracking ahead |
Automation Schedule Summary
Daily Automations
| Time | Automation | Priority |
|---|---|---|
| 2:00 AM | Cloud Cost Sync | Medium |
| 4:00 AM | Reference Validation | Medium |
| 5:00 AM | Daily Quality Score | Low |
| 6:00 AM | Stale Server Detection | High |
| 6:00 AM | Backup Coverage | Critical |
| 7:00 AM | Certificate Expiry | Critical |
| 8:00 AM | Contract Renewal | High |
| 9:00 AM | Cloud Budget Alerts | High |
Real-Time Automations
| Trigger | Automation | Priority |
|---|---|---|
| CI Create | Duplicate Detection | High |
| API Call | Change Impact Analysis | Critical |
| Webhook | Incident Correlation | High |
Schema Forge