The Problem#
If you are an M365 admin, you already do version control — you just do it badly. Your Intune policies are versioned by “who clicked last.” Your automation scripts live in a shared folder with names like Set-Compliance_v3_FINAL_fixed.ps1. Your documentation is a OneNote notebook that nobody updates.
Meanwhile, the DevOps world has been doing GitOps for years: everything in Git, changes via pull requests, automated deployment on merge. The tooling exists. The patterns are proven. But the bridge between “I manage Intune” and “I deploy via CI/CD” is undocumented for IT pros.
This post bridges that gap using a real GitOps setup — not a theoretical one. I run 48+ Docker stacks across 17 LXC containers (each one a Docker host), all deployed via Git push. The same thinking applies to your M365 automation.
Solution Overview#
GitOps in one sentence: Git is the single source of truth, and automation reconciles reality with what Git says should be true.
That sentence sounds clean in a slide deck. In practice it means: stop clicking in the portal and hoping you remember what you changed. Every config, every script, every compose file lives in a repo. Automation handles deployment. Git history handles accountability. This is not about learning Kubernetes. It is about using Git and CI/CD for the infrastructure you already manage.
Implementation#
The GitOps Mental Model for IT Pros#
| Traditional IT | GitOps Equivalent |
|---|---|
| “I made a change in the portal” | “I committed a change to main” |
| “Who changed this policy?” | git log --oneline |
| “Can you review this before I deploy?” | Pull request |
| “Roll back to last week’s config” | git revert |
| “Deploy to dev first, then prod” | Branch strategy + environment targets |
The discipline is: if it is not in Git, it does not exist. No portal clicks without a corresponding commit. No “I’ll document it later.”
Real-World Example: Docker Stack Deployment via Forgejo#
In my homelab, every Docker stack is a directory in a Git repository. Each stack has a docker-compose.yml and a CI/CD workflow that deploys it automatically when the file changes.
Repository structure:
core/
├── stacks/
│ ├── ct100-jellyfin/
│ │ └── docker-compose.yml
│ ├── ct104-immich/
│ │ └── docker-compose.yml
│ ├── ct108-vaultwarden/
│ │ └── docker-compose.yml
│ └── ... (48+ stacks)
├── .forgejo/workflows/
│ ├── deploy-ct100-jellyfin.yml
│ ├── deploy-ct104-immich.yml
│ └── ... (one workflow per stack)
└── ansible/
├── playbooks/
└── roles/A typical deployment workflow:
name: Deploy ct100 jellyfin
on:
push:
branches: [main]
paths:
- "stacks/ct100-jellyfin/**"
- ".forgejo/workflows/deploy-ct100-jellyfin.yml"
jobs:
deploy-ct100-jellyfin:
runs-on: [self-hosted, ct100]
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Sync stack to /srv/docker/ct100-jellyfin
run: |
mkdir -p /srv/docker/ct100-jellyfin
rsync -av --delete \
--exclude='.env.*' \
stacks/ct100-jellyfin/ /srv/docker/ct100-jellyfin/
- name: Deploy stack
run: |
cd /srv/docker/ct100-jellyfin
docker compose down || true
docker compose pull || true
docker compose up -d
docker compose ps
docker image prune -fKey design decisions:
- Path-based triggers: Only the workflow for the changed stack runs. Edit Jellyfin’s compose file — only Jellyfin redeploys. This prevents cascading deployments.
- Self-hosted runners per container: Each LXC container runs its own Forgejo Actions runner. The runner label (
ct100) ensures the workflow runs on the correct host. - Secrets stay local:
.env.*files are excluded from sync. Environment variables with credentials live on the host, not in Git. - Idempotent deploys:
docker compose down || truehandles the case where the stack was not running.docker compose pull || trueupdates images without failing if the registry is temporarily unreachable. - Conservative image cleanup:
docker image prune -fremoves only dangling images. You could use-afto remove all unused images, but on a multi-stack host that will also nuke images for services that happen to be stopped — forcing a full re-pull on next deploy.
Applying GitOps to M365#
The same pattern works for M365 management, with different tooling:
Export Intune configuration to Git:
# Conceptual - export Intune device compliance policies
Import-Module Microsoft.Graph.DeviceManagement
Connect-MgGraph -Scopes "DeviceManagementConfiguration.Read.All"
$policies = Get-MgDeviceManagementDeviceCompliancePolicy -All
foreach ($policy in $policies) {
$fileName = "$($policy.DisplayName -replace '[^\w\-]', '_').json"
$policy | ConvertTo-Json -Depth 10 |
Out-File -FilePath "./intune-config/compliance/$fileName"
}
# Commit and push
git add intune-config/
git commit -m "Export compliance policies - $(Get-Date -Format 'yyyy-MM-dd')"
git pushDetect drift with CI/CD:
# GitHub Actions example
name: Intune Configuration Drift Check
on:
schedule:
- cron: '0 6 * * 1' # Every Monday at 06:00
jobs:
drift-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Export current Intune config
run: pwsh ./scripts/Export-IntuneConfig.ps1
- name: Check for drift
run: |
if git diff --exit-code intune-config/; then
echo "No drift detected"
else
echo "::warning::Configuration drift detected!"
git diff intune-config/ > drift-report.txt
exit 1
fi
- name: Upload drift report
if: failure()
uses: actions/upload-artifact@v4
with:
name: drift-report
path: drift-report.txtThis does not auto-remediate — and that is intentional. Auto-remediating M365 configuration is dangerous. The workflow detects drift and alerts. A human decides whether the drift was intentional (someone made a legitimate portal change) or unintended (someone clicked the wrong thing).
The Self-Hosted Runner Pattern#
For homelab deployments, the runner architecture matters. Each container in my infrastructure runs a Forgejo Actions runner, giving the CI/CD system direct access to the target host.
When a new container is provisioned via Ansible, the bootstrap playbook automatically:
- Installs Docker
- Registers a Forgejo runner with labels matching the container ID
- Sets up directory structure for Docker stacks
- Installs monitoring agents
The result: provisioning a new service is ansible-playbook to create the container, then a Git push to deploy the stack. No SSH into the box. No manual Docker commands.
Security Considerations#
- Secrets never go in Git. Use environment variables, CI/CD secrets, or a vault. The
.env.*exclusion in rsync is critical. - Runner permissions are host-level. A self-hosted runner has full access to its host. Do not run untrusted workflows on self-hosted runners. Fork-based PRs should not trigger self-hosted deployments.
- Service principal permissions for M365 exports. An app registration that can read your entire Intune configuration is valuable to an attacker. Use certificate authentication, rotate regularly, and scope to read-only permissions.
- Git history is an audit trail — treat it as such. Do not rebase or force-push main. The commit history is your record of who changed what and when.
Where to Start#
You do not need to build all of this at once. Think of it as a maturity progression:
- Scripts in Git. No automation, just version control. You can
git logandgit revert. That alone is better than what most IT teams have. - CI/CD deploys on commit. Push to main, automation handles deployment. No more SSH-and-pray.
- Drift detection. Scheduled exports compared to what Git says should be true. You find out when someone clicks in the portal.
- PR approval gates. Changes require review before they reach production. Now you have a real change management process — without the ServiceNow ticket.
The place to start is not the CI/CD pipeline. It is the commit. Pick one script you run manually and put it in Git this week. The automation comes later. The discipline has to come first, or the pipeline just automates chaos.
The M365 drift detection pattern is where this pays off fastest for most IT pros. You are not automating deployments — you are just getting visibility into what is changing without your knowledge. That alone is worth the setup time.
And if you are running this at home first: good. That means you already know what the drift detection pipeline looks like when the enterprise conversation starts.
When to Use This / When Not To#
Adopt GitOps when:
- You manage more than a handful of scripts or configurations
- Multiple people touch the same infrastructure
- You need audit trails for changes
- You want repeatable, automated deployments
- You are tired of “who changed this?”
This is overkill when:
- You are the only admin and you manage 10 devices
- Your organization does not use Git at all and has no appetite to start
- You need a quick fix, not a workflow change
GitHub#
The full repo is private — it contains host-specific configs and credentials I am not publishing. Everything structural is covered above. The workflow pattern and Ansible integration are designed to be adapted, not copied verbatim. Grab the YAML examples, adjust the paths and runner labels to match your environment, and you are running.