On this page

AI coding assistants need Contributor access to deploy infrastructure. Backup automation needs Key Vault secrets at 2 AM. Security scanners need Reader access on a schedule.

The easy answer is standing permissions-give each service principal what it needs and move on. But that leaves dozens of non-human identities with 24/7 access to sensitive resources, and most of them use that access for minutes per day.

Zero Standing Privilege (ZSP) flips this: no identity starts with access to anything. Permissions are granted just-in-time, scoped to the task, and automatically revoked.

This post walks through building a ZSP gateway using Azure Functions that manages time-bounded access for AI agents, automation workflows, and service principals. We’ll cover the NHI access pattern in detail, then briefly show how the same gateway handles human admins too.

Hands-on Lab: All Bicep templates, PowerShell scripts, and Python code are in the companion lab and on GitHub.


Why NHI Security Matters Now

For every human in an Azure tenant, there may be 50-100 non-human identities (a rough rule-of-thumb across enterprise environments):

  • AI coding agents requesting temporary access to deploy infrastructure
  • Backup automation needing Key Vault secrets only during backup windows
  • CI/CD pipelines requiring Contributor access for deployments
  • Security scanners needing read access on a schedule
  • Agentic workflows chaining multiple Azure services together

Most have standing access they use for minutes per day-or never.

A service principal that runs a nightly backup has 24/7 Key Vault access for a 5-minute task. That’s 23 hours and 55 minutes of unnecessary exposure. An AI agent with permanent Contributor access is a credential theft away from a full environment compromise.

Diagram showing timeline of standing privilege exposure vs actual access need
Standing privilege: 24/7 access for a 5-minute job. ZSP: access only during execution.

The Risk Surface

ScenarioStanding Privilege RiskZSP Mitigation
Stolen SP credentialsImmediate Key Vault accessCredentials alone insufficient-no standing permissions
Compromised AI agentAttacker inherits Contributor roleAgent has zero access until workflow triggers grant
Lateral movementPivot via always-on service accountsService accounts have zero access between runs
Supply chain attackCompromised dependency has ambient accessNo ambient access to exploit

Microsoft PIM helps with human admin access, but it doesn’t solve the NHI problem. Service principals can’t activate PIM roles. They need a different pattern.


Architecture: ZSP Gateway

The ZSP gateway is an Azure Function App that brokers all privileged access. It exposes two endpoints:

  • /api/nhi-access - Grants time-bounded Azure RBAC role assignments to service principals
  • /api/human-access - Grants temporary Entra group membership to human admins

Both patterns use Azure Durable Functions for reliable scheduled revocation.

Architecture diagram showing ZSP Gateway with NHI and admin access paths, Durable Functions timer for revocation, and Log Analytics audit trail
ZSP Gateway: AI agents and service principals use /api/nhi-access for RBAC assignments. Human admins use /api/human-access for group membership. All access is time-bounded and logged.

Infrastructure

The lab deploys with Bicep + PowerShell:

  • Azure Function App (Flex Consumption, Python 3.11) with system-assigned managed identity
  • Key Vault and Storage Account as target resources for demo
  • Application Insights and Log Analytics for observability
  • Data Collection Endpoint + Rule (DCE/DCR) for custom audit logging to ZSPAudit_CL
  • Entra ID groups with directory role assignments (for human admin path)
  • Backup service principal with zero initial permissions (for NHI demo)

The managed identity has GroupMember.ReadWrite.All, Directory.Read.All, and RoleManagement.ReadWrite.Directory Graph API permissions, User Access Administrator on the resource group for managing role assignments, and Monitoring Metrics Publisher on the DCR for sending audit logs. The RoleManagement.ReadWrite.Directory permission is required because the ZSP groups are role-assignable-standard group membership permissions (GroupMember.ReadWrite.All) are insufficient for managing membership of role-assignable groups.


NHI Access: The Core Pattern

How It Works

  1. A workflow (timer, API call, or AI agent) calls /api/nhi-access
  2. The gateway validates the request and creates a scoped Azure RBAC role assignment
  3. A Durable Functions timer is scheduled to revoke the assignment
  4. Everything is logged to Log Analytics

The Request

curl -X POST "$FUNCTION_URL/api/nhi-access" \
  -H "Content-Type: application/json" \
  -d '{
    "sp_object_id": "BACKUP_SP_OBJECT_ID",
    "scope": "/subscriptions/.../providers/Microsoft.KeyVault/vaults/zsp-lab-kv",
    "role": "Key Vault Secrets User",
    "duration_minutes": 10,
    "workflow_id": "nightly-backup"
  }'

The response:

{
  "status": "granted",
  "assignment_id": "/subscriptions/.../roleAssignments/cb11eadc-...",
  "assignment_name": "cb11eadc-0c5d-4961-b124-607a1d74e691",
  "sp_object_id": "c9c5947a-...",
  "scope": "/subscriptions/.../Microsoft.KeyVault/vaults/zsp-lab-kv",
  "role": "Key Vault Secrets User",
  "expires_at": "2026-01-27T21:06:16.156493",
  "duration_minutes": 10,
  "workflow_id": "nightly-backup",
  "orchestrator_instance_id": "b10a200905204d0bb10d54fc4e1a73e0"
}

The orchestrator_instance_id tracks the Durable Functions timer that will revoke access. After 10 minutes, the orchestrator fires and deletes the role assignment. The service principal is back to zero permissions.

The Grant Logic

The NHI access handler validates the request, creates the role assignment via the Azure SDK, and schedules revocation:

# nhi_access.py (simplified from lab code)

from azure.mgmt.authorization import AuthorizationManagementClient
from azure.mgmt.authorization.models import RoleAssignmentCreateParameters
from azure.identity import DefaultAzureCredential
import uuid
from datetime import datetime, timedelta, timezone

ROLE_DEFINITIONS = {
    "Key Vault Secrets User": "4633458b-17de-408a-b874-0445c86b69e6",
    "Key Vault Secrets Officer": "b86a8fe4-44ce-4948-aee5-eccb2c155cd7",
    "Key Vault Reader": "21090545-7ca7-4776-b22c-e363652d74d2",
    "Storage Blob Data Reader": "2a2b9908-6ea1-4ae2-8e65-a410df84e7d1",
    "Storage Blob Data Contributor": "ba92f5b4-2d11-453d-a403-e96b0029c9fe",
    "Reader": "acdd72a7-3385-48ef-bd42-f606fba81ae7",
    "Contributor": "b24988ac-6180-42a0-ab88-20f7382dd24c",
}

async def grant_nhi_access(sp_object_id, scope, role_name, duration_minutes, workflow_id):
    credential = DefaultAzureCredential()
    subscription_id = scope.split("/")[2]  # extract from scope
    auth_client = AuthorizationManagementClient(credential, subscription_id)

    role_guid = ROLE_DEFINITIONS[role_name]
    assignment_name = str(uuid.uuid4())
    full_role_id = f"/subscriptions/{subscription_id}/providers/Microsoft.Authorization/roleDefinitions/{role_guid}"

    assignment = auth_client.role_assignments.create(
        scope=scope,
        role_assignment_name=assignment_name,
        parameters=RoleAssignmentCreateParameters(
            role_definition_id=full_role_id,
            principal_id=sp_object_id,
            principal_type="ServicePrincipal"
        )
    )

    return {
        "status": "granted",
        "assignment_id": assignment.id,
        "assignment_name": assignment_name,
        "sp_object_id": sp_object_id,
        "role": role_name,
        "expires_at": (datetime.now(timezone.utc) + timedelta(minutes=duration_minutes)).isoformat(),
        "duration_minutes": duration_minutes,
        "workflow_id": workflow_id
    }

The key design decision: the gateway should be the only identity with permission to create role assignments. Service principals start at zero and can’t escalate themselves. In production, enable Entra authentication on the Function App so only approved callers can request access - function keys alone aren’t sufficient to enforce this boundary.

Scheduled Access with Timer Triggers

For predictable workloads like nightly backups, the gateway uses timer-triggered functions:

@app.timer_trigger(schedule="%BACKUP_JOB_SCHEDULE%", arg_name="timer", run_on_startup=False)
@app.durable_client_input(client_name="client")
async def backup_job_access_grant(timer: func.TimerRequest, client):
    """Grant backup SP access before the nightly job runs."""
    duration = int(os.environ.get("BACKUP_JOB_DURATION_MINUTES", 35))

    # Grant Key Vault access
    kv_result = await grant_nhi_access(
        sp_object_id=os.environ["BACKUP_SP_OBJECT_ID"],
        scope=os.environ["KEYVAULT_RESOURCE_ID"],
        role_name="Key Vault Secrets User",
        duration_minutes=duration,
        workflow_id="nightly-backup"
    )

    # Grant Storage access
    stor_result = await grant_nhi_access(
        sp_object_id=os.environ["BACKUP_SP_OBJECT_ID"],
        scope=os.environ["STORAGE_RESOURCE_ID"],
        role_name="Storage Blob Data Contributor",
        duration_minutes=duration,
        workflow_id="nightly-backup"
    )

    # Schedule revocation for both grants
    expiry_time = datetime.now(timezone.utc) + timedelta(minutes=duration)
    for result, scope, role in [
        (kv_result, os.environ["KEYVAULT_RESOURCE_ID"], "Key Vault Secrets User"),
        (stor_result, os.environ["STORAGE_RESOURCE_ID"], "Storage Blob Data Contributor"),
    ]:
        await client.start_new("revocation_orchestrator", client_input={
            "revocation_type": "role_assignment",
            "assignment_id": result["assignment_id"],
            "sp_object_id": os.environ["BACKUP_SP_OBJECT_ID"],
            "scope": scope,
            "role": role,
            "expiry_time": expiry_time.isoformat()
        })

The pattern: grant access 5 minutes before the job, revoke 35 minutes after. The service principal has zero permissions for 23+ hours per day.

Timeline showing SP with zero access, then brief window of access during job, then back to zero
NHI access timeline: 23+ hours of zero privilege, brief access window during job execution.

Automatic Revocation

Revocation uses Azure Durable Functions orchestrators with timer delays:

@app.orchestration_trigger(context_name="context")
def revocation_orchestrator(context: df.DurableOrchestrationContext):
    input_data = context.get_input()

    # Wait until the absolute expiry time
    expiry_time = datetime.fromisoformat(input_data["expiry_time"]).replace(tzinfo=timezone.utc)
    yield context.create_timer(expiry_time)

    # Revoke the assignment
    if input_data["revocation_type"] == "group_membership":
        yield context.call_activity("revoke_group_membership_activity", input_data)
    elif input_data["revocation_type"] == "role_assignment":
        yield context.call_activity("revoke_role_assignment_activity", input_data)

    return {"status": "revoked", "completed_at": datetime.now(timezone.utc).isoformat()}

The orchestrator receives an absolute expiry_time rather than a relative delay-this way the timer is deterministic even if the orchestrator replays (a core Durable Functions concept). Durable Functions survive Function App restarts-if the app scales down and back up, the timer still fires. This is more reliable than in-memory timers or queue visibility timeouts. Note that Durable Functions timers in Python have a maximum duration of 6 days-more than enough for access windows measured in minutes or hours, but worth knowing if you extend durations.

One subtlety: the expiry_time must be timezone-aware (utc) because Durable Functions compares it against context.current_utc_datetime, which is always timezone-aware. Activity functions run in a thread pool, so they use asyncio.new_event_loop() to call async SDK methods:

@app.activity_trigger(input_name="activityPayload")
def revoke_role_assignment_activity(activityPayload: str):
    import asyncio
    input_data = json.loads(activityPayload) if isinstance(activityPayload, str) else activityPayload

    loop = asyncio.new_event_loop()
    try:
        loop.run_until_complete(revoke_nhi_access(
            assignment_id=input_data["assignment_id"]
        ))
        loop.run_until_complete(log_access_event(
            event_type="AccessRevoke",
            identity_type="nhi",
            principal_id=input_data["sp_object_id"],
            target=input_data["scope"],
            target_type="AzureResource",
            role=input_data.get("role"),
            result="Success"
        ))
    finally:
        loop.close()

    return {"status": "revoked"}

The input_name="activityPayload" with type str is required-the Azure Functions .NET host serializes the input as a JSON string, so the Python worker must accept str and deserialize it.


AI Agent Integration Patterns

The /api/nhi-access endpoint is designed for machine callers. Here are patterns for common AI agent scenarios:

Pattern 1: AI Coding Agent Deploying Infrastructure

An AI coding assistant needs temporary Contributor access to deploy changes:

# AI agent workflow
async def deploy_infrastructure(agent_context):
    # Request temporary access
    response = await httpx.post(f"{ZSP_GATEWAY}/api/nhi-access", json={
        "sp_object_id": agent_context.service_principal_id,
        "scope": f"/subscriptions/{SUB_ID}/resourceGroups/{RG_NAME}",
        "role": "Contributor",
        "duration_minutes": 30,
        "workflow_id": f"agent-deploy-{agent_context.session_id}"
    })

    # Now deploy with temporary permissions
    await run_bicep_deployment(agent_context.template)

    # Access auto-revokes after 30 minutes

Pattern 2: Security Scanner on Schedule

A scanning agent needs Reader access across resource groups:

# Timer trigger grants access, scanner runs, access revokes
@app.timer_trigger(schedule="0 0 */6 * * *")  # Every 6 hours
async def security_scan_access(timer):
    for rg in RESOURCE_GROUPS_TO_SCAN:
        await grant_nhi_access(
            sp_object_id=SCANNER_SP_ID,
            scope=rg,
            role="Reader",
            duration_minutes=60,
            workflow_id="security-scan"
        )

Pattern 3: Event-Driven Access

An AI agent responds to incidents and needs temporary elevated access:

# Event Grid trigger when security alert fires
@app.event_grid_trigger(arg_name="event")
async def incident_response_access(event):
    alert = event.get_json()

    # Grant security team's automation SP temporary access
    await grant_nhi_access(
        sp_object_id=INCIDENT_RESPONSE_SP_ID,
        scope=alert["resource_id"],
        role="Reader",
        duration_minutes=120,
        workflow_id=f"incident-{alert['id']}"
    )

Audit Trail

Every access grant and revocation is logged to a custom Log Analytics table (ZSPAudit_CL) via the Azure Monitor Ingestion API. The pipeline uses a Data Collection Endpoint (DCE) and Data Collection Rule (DCR) to route structured audit events into Log Analytics. This is critical for NHI access since there’s no human to ask “why did you need this?”

What Gets Logged

Every grant and revocation produces a log entry in ZSPAudit_CL. Here’s a real grant/revoke pair from the lab:

{
  "TimeGenerated": "2026-01-28T04:56:49.158538Z",
  "EventType": "AccessGrant",
  "IdentityType": "nhi",
  "PrincipalId": "c9c5947a-a7cb-4d63-b177-1e860c7f4b28",
  "Target": "/subscriptions/.../Microsoft.KeyVault/vaults/zsp-lab-kv",
  "TargetType": "AzureResource",
  "Role": "Key Vault Secrets User",
  "DurationMinutes": 2,
  "WorkflowId": "nightly-backup",
  "ExpiresAt": "2026-01-28T04:58:48.598527",
  "Result": "Success"
}
{
  "TimeGenerated": "2026-01-28T04:58:55.570995Z",
  "EventType": "AccessRevoke",
  "IdentityType": "nhi",
  "PrincipalId": "c9c5947a-a7cb-4d63-b177-1e860c7f4b28",
  "Target": "/subscriptions/.../Microsoft.KeyVault/vaults/zsp-lab-kv",
  "TargetType": "AzureResource",
  "Role": "Key Vault Secrets User",
  "Result": "Success"
}

The 2-minute gap between grant (04:56:49) and revoke (04:58:55) matches the requested duration_minutes: 2.

Log Analytics query results showing ZSPAudit_CL table with AccessGrant and AccessRevoke events for both NHI and human identities
Real audit data from the lab: grants and revocations for both NHI (service principals) and human admin access, with timestamps, roles, and workflow IDs.

Useful KQL Queries

All NHI access grants (last 24 hours):

ZSPAudit_CL
| where TimeGenerated > ago(24h)
| where IdentityType == "nhi"
| where EventType == "AccessGrant"
| project TimeGenerated, PrincipalId, Target, Role, DurationMinutes, WorkflowId
| order by TimeGenerated desc

NHI access outside expected windows:

ZSPAudit_CL
| where IdentityType == "nhi"
| where EventType == "AccessGrant"
| extend Hour = datetime_part("hour", TimeGenerated)
| where Hour < 1 or Hour > 3  // Expected window is 1-3 AM
| project TimeGenerated, PrincipalId, Target, WorkflowId

Unusual access patterns (more than 5 grants per hour for same SP):

ZSPAudit_CL
| where TimeGenerated > ago(7d)
| where IdentityType == "nhi"
| where EventType == "AccessGrant"
| summarize count() by bin(TimeGenerated, 1h), PrincipalId
| where count_ > 5

The WorkflowId field ties grants back to the automation that requested them-essential for investigating anomalies.

Log Analytics workbook showing ZSP access grants, revocations, and anomalies
ZSP audit dashboard: track all privileged access with identity type, duration, and workflow ID.

Bonus: This Also Works for Human Admins

The same gateway handles temporary admin access via Entra group membership. The pattern is simpler: empty security groups hold directory roles, and the gateway temporarily adds users.

How It Works

  1. Create Entra security groups like SG-Intune-Admins-ZSP and assign them directory roles
  2. Groups start empty - no one has the role
  3. Admin calls /api/human-access with justification
  4. Gateway adds user to group temporarily
  5. Durable Functions timer removes them
curl -X POST "$FUNCTION_URL/api/human-access" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "YOUR_USER_OBJECT_ID",
    "group_id": "INTUNE_ADMIN_GROUP_ID",
    "duration_minutes": 15,
    "justification": "Deploying new compliance policy - INC0012345"
  }'

This is the same pattern CyberArk uses for ZSP in their M365 implementation. It works with any Entra role, doesn’t require PIM eligible assignments, and provides clear audit trails.


Deploying the Lab

Prerequisites

  • Azure subscription with Owner access
  • Azure CLI configured (az login)
  • PowerShell 7+ (pwsh)
  • Entra ID P1 or P2 license (for group-based role assignment)

Quick Start

git clone https://github.com/nine-lives-security/nine-lives-zero-trust.git
cd nine-lives-zero-trust/labs/zsp-azure
./scripts/Deploy-Lab.ps1

The script handles everything:

  1. Deploys Azure resources via Bicep (Resource Group, Key Vault, Storage, Function App, Log Analytics, DCE)
  2. Creates Entra ID objects (ZSP groups, directory role assignments, backup SP)
  3. Creates the ZSPAudit_CL custom table and Data Collection Rule (DCR)
  4. Grants Graph API permissions and RBAC roles to the Function App managed identity
  5. Configures Function App settings with Entra object IDs, DCR endpoint, and schedule
  6. Deploys Function code
  7. Runs a smoke test

Test NHI Access

After deployment, the script outputs the required values:

FUNCTION_URL="https://zsp-lab-gateway.azurewebsites.net"
BACKUP_SP_ID="<from deployment output>"
KEYVAULT_ID="<from deployment output>"

curl -X POST "$FUNCTION_URL/api/nhi-access" \
  -H "Content-Type: application/json" \
  -d '{
    "sp_object_id": "'"$BACKUP_SP_ID"'",
    "scope": "'"$KEYVAULT_ID"'",
    "role": "Key Vault Secrets User",
    "duration_minutes": 10,
    "workflow_id": "manual-test"
  }'

Check Azure IAM - the SP has the role. Wait 10 minutes - the assignment is automatically removed.

Verify Revocation

# Check role assignments on the Key Vault
az role assignment list \
  --assignee "$BACKUP_SP_ID" \
  --scope "$KEYVAULT_ID" \
  --query "[].roleDefinitionName"
# After expiry: returns empty list

For full deployment details, troubleshooting, and cleanup instructions, see the lab documentation.


Production Considerations

Authentication

The lab uses function keys for simplicity. For production, enable Entra authentication on the Function App and require OAuth tokens from approved clients.

Approval Workflows

Add human-in-the-loop for sensitive roles:

async def request_with_approval(request):
    if request.role in ["Contributor", "Owner"]:
        # Create approval request in Teams/ServiceNow
        return {"status": "pending_approval", "approval_id": "..."}
    else:
        # Auto-approve low-risk roles
        return await grant_access(request)

Break-Glass Accounts

ZSP should not create lockout scenarios. Maintain 1-2 emergency accounts with standing Global Admin, stored in a physical safe, monitored for any use.

Scope Constraints

In production, the gateway should enforce allowed scopes per service principal. Don’t let any SP request any role on any resource-maintain an allowlist.


Key Takeaways

  1. NHIs are the bigger risk surface. Organizations often have 50-100 service principals per human user, and most have standing access they rarely use.

  2. AI agents amplify the problem. Agentic workflows that chain Azure services need scoped, temporary access-not standing Contributor roles.

  3. The gateway pattern centralizes control. One identity (the Function App) manages all role assignments. Service principals can’t escalate themselves.

  4. Automatic revocation is non-negotiable. Durable Functions provide reliable timers that survive restarts.

  5. Audit everything with workflow IDs. Without workflow_id, tracing NHI access back to the automation that requested it becomes impossible.

  6. This also works for humans. The same gateway handles admin access via group membership, providing one system for all privileged access.


Resources

Jerrad Dahlager

Jerrad Dahlager, CISSP, CCSP

Cloud Security Architect ยท Adjunct Instructor

Marine Corps veteran and firm believer that the best security survives contact with reality.

Have thoughts on this post? I'd love to hear from you.