Understanding AKS Azure Policy: A Practical Guide (Part 1 - The Basics)

Understanding AKS Azure Policy: A Practical Guide (Part 1 - The Basics)
*Published: August 202510 min read*

You’ve just been told to enable CIS benchmarks on your AKS clusters. The security team is happy. Compliance is checked off. Then your deployments start failing with cryptic error messages about “denied by azurepolicy-k8srequiredhttps” and you have no idea why.

This two-part series will help you understand and work with AKS policies. Part 1 covers the fundamentals - what these policies are, how they work, and basic troubleshooting. Part 2 will dive into advanced scenarios and real-world workarounds.

What Are AKS Azure Policies?

Azure Policy for AKS is Microsoft’s way of enforcing governance and compliance rules on your Kubernetes clusters. Think of it as guardrails that prevent non-compliant resources from being created.

The Policy Ecosystem

Azure Policy Service Policy Initiatives Individual Policies AKS Cluster OPA Gatekeeper Admission Webhooks Assigns Contains Syncs to Creates Enforces via


Common Policy Initiatives

Most organizations don’t pick individual policies. They enable entire initiatives:

InitiativeNumber of PoliciesCommon Use Case
CIS Microsoft Azure Foundations Benchmark200+General compliance
Azure Security Benchmark150+Microsoft’s recommended security baseline
ISO 27001:2013100+International security standard
PCI-DSS50+Payment card industry
HIPAA HITRUST80+Healthcare compliance

How Policies Actually Work in AKS

Step 1: Policy Assignment

When you assign a policy to your AKS cluster:

# Example: Assigning CIS Benchmark
az policy assignment create \
  --name "CIS-Benchmark-AKS" \
  --scope "/subscriptions/{subscription-id}/resourceGroups/{rg}/providers/Microsoft.ContainerService/managedClusters/{cluster-name}" \
  --policy-set-definition "/providers/Microsoft.Authorization/policySetDefinitions/cis-benchmark-v1.3.0"

Step 2: The Translation Layer

Azure Policy doesn’t directly touch your cluster. Instead:

  1. Azure Policy defines rules in Azure Resource Manager
  2. Policy Add-on translates these to Kubernetes constraints
  3. OPA Gatekeeper enforces them in your cluster

Step 3: Enforcement via Admission Control

Here’s what happens when you deploy a resource:

apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  ports:
  - port: 8080  # Non-standard port
    targetPort: 8080

The flow:

  1. kubectl apply sends this to the API server
  2. API server calls admission webhooks
  3. Gatekeeper webhook evaluates against policies
  4. If non-compliant: Request denied
  5. If compliant: Resource created

Your First Policy Encounter

Let’s walk through a typical first experience with AKS policies:

The Deployment That Worked Yesterday

apiVersion: apps/v1
kind: Deployment
metadata:
  name: simple-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-app
  template:
    metadata:
      labels:
        app: simple-app
    spec:
      containers:
      - name: app
        image: nginx:latest
        ports:
        - containerPort: 80

The Error After Enabling Policies

$ kubectl apply -f simple-app.yaml
error validating data: admission webhook "validation.gatekeeper.sh" denied the request: 
[azurepolicy-k8srequiredlabels] Resource is missing required labels: {"env", "version"}
[azurepolicy-k8snolatestimage] Container 'app' is using latest tag
[azurepolicy-k8srequiredresources] Container 'app' is missing resource limits

Understanding the Errors

Each error follows a pattern:

  • [azurepolicy-{constraint}]: The policy that blocked you
  • Clear description: What’s wrong
  • Specific details: Which container/field is affected

Basic Troubleshooting Toolkit

1. See What Policies Are Active

# List all constraints in your cluster
kubectl get constraints

# Example output:
NAME                                              ENFORCEMENT-ACTION   TOTAL-VIOLATIONS
azurepolicy-k8srequiredlabels                    deny                142
azurepolicy-k8snolatestimage                     deny                38
azurepolicy-k8srequiredresources                 deny                201
azurepolicy-k8srequiredhttpsonly                 deny                17

2. Get Policy Details

# See what a specific policy requires
kubectl describe k8srequiredlabels azurepolicy-k8srequiredlabels

# Look for the 'Spec' section:
Spec:
  Match:
    Kinds:
    - apiGroups: ["apps"]
      kinds: ["Deployment", "StatefulSet", "DaemonSet"]
  Parameters:
    labels: ["env", "app", "version"]

3. Check Recent Violations

# See recent admission webhook denials
kubectl get events --all-namespaces \
  --field-selector reason=FailedAdmission \
  --sort-by='.lastTimestamp'

4. Test Before Deploying

# Use dry-run to test without actually creating
kubectl apply -f my-manifest.yaml --dry-run=server

# If it passes dry-run, it will pass actual deployment

Common Starter Policies and Quick Fixes

Here are the policies you’ll hit first and how to fix them:

1. Required Labels

Policy: k8srequiredlabels
Error: “Resource is missing required labels”

# Before (fails)
metadata:
  name: my-app
  
# After (passes)
metadata:
  name: my-app
  labels:
    app: my-app
    env: production
    version: "1.0.0"

2. No Latest Image Tag

Policy: k8snolatestimage
Error: “Container is using latest tag”

# Before (fails)
containers:
- name: app
  image: nginx:latest
  
# After (passes)
containers:
- name: app
  image: nginx:1.25.3

3. Resource Limits Required

Policy: k8srequiredresources
Error: “Container is missing resource limits”

# Before (fails)
containers:
- name: app
  image: nginx:1.25.3
  
# After (passes)
containers:
- name: app
  image: nginx:1.25.3
  resources:
    limits:
      cpu: "500m"
      memory: "512Mi"
    requests:
      cpu: "100m"
      memory: "128Mi"

4. HTTPS Services Only

Policy: k8srequiredhttpsonly
Error: “Service port name does not start with ‘https’”

# Before (fails)
spec:
  ports:
  - name: web
    port: 443
    
# After (passes)
spec:
  ports:
  - name: https-web
    port: 443

Understanding Enforcement Modes

Policies can run in different modes:

ModeBehaviorUse Case
DenyBlocks non-compliant resourcesProduction protection
AuditLogs violations but allows creationTesting impact
DisabledPolicy not evaluatedTemporary relief

Checking Policy Mode

# See enforcement action for all constraints
kubectl get constraints -o wide

# Check specific policy mode
kubectl get k8srequiredlabels azurepolicy-k8srequiredlabels -o jsonpath='{.spec.enforcementAction}'

Changing Policy Mode (Temporary)

# Switch to audit mode for testing
kubectl patch k8srequiredlabels azurepolicy-k8srequiredlabels \
  --type='json' \
  -p='[{"op": "replace", "path": "/spec/enforcementAction", "value": "dryrun"}]'

⚠️ Warning: This change is temporary. Azure Policy will sync and revert it within 15 minutes.

Creating Your First Compliant Deployment

Let’s build a deployment that passes common policies:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: compliant-app
  labels:
    app: compliant-app
    env: production
    version: "1.0.0"
spec:
  replicas: 2
  selector:
    matchLabels:
      app: compliant-app
  template:
    metadata:
      labels:
        app: compliant-app
        env: production
        version: "1.0.0"
    spec:
      # Security context for the pod
      securityContext:
        runAsNonRoot: true
        fsGroup: 65534
      
      # Don't automount service account token
      automountServiceAccountToken: false
      
      containers:
      - name: app
        image: nginx:1.25.3  # Specific version, not latest
        
        # Container security context
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsUser: 65534
          capabilities:
            drop:
            - ALL
        
        # Resource limits and requests
        resources:
          limits:
            cpu: "500m"
            memory: "512Mi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        
        # Health checks
        livenessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        
        readinessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        
        ports:
        - containerPort: 8080
          name: https-web  # Name starts with https
        
        # Mount required directories as emptyDir
        volumeMounts:
        - name: var-cache-nginx
          mountPath: /var/cache/nginx
        - name: var-run
          mountPath: /var/run
      
      volumes:
      - name: var-cache-nginx
        emptyDir: {}
      - name: var-run
        emptyDir: {}

Quick Reference: Policy to Solution Mapping

If you see this error…Add this to your manifest…
“missing required labels”metadata.labels: {app: "name", env: "prod", version: "1.0"}
“using latest tag”Use specific version: image: nginx:1.25.3
“missing resource limits”Add resources section with limits and requests
“automounting service account token”automountServiceAccountToken: false
“running as root”securityContext.runAsNonRoot: true
“no liveness probe”Add livenessProbe configuration
“no readiness probe”Add readinessProbe configuration

What’s Next?

In Part 2, we’ll tackle:

  • Advanced policies that break real applications (Kafka, databases, operators)
  • Policy exemptions and when to use them
  • Custom policies for your organization
  • Automated compliance with mutation webhooks
  • GitOps strategies for policy compliance

Key Takeaways

  1. Policies are preventive, not reactive - They block at creation time
  2. Start with audit mode - See what would break before enforcing
  3. Understand the requirements - Use kubectl describe on constraints
  4. Test with dry-run - Validate before deploying
  5. Build compliant templates - Start with good defaults

Remember: These policies exist to protect your cluster. The goal is to work with them, not against them. Once you understand the basics, you can make informed decisions about when to comply and when to seek exemptions.


Stay tuned for Part 2 coming soon! We’ll dive into advanced scenarios including enterprise software challenges (Confluent Kafka, databases, operators), policy exemption strategies, and real-world workarounds for when policies clash with production requirements.