Why most AI deployments fail in production (and what to verify instead)
Enterprise AI deployments fail in production for predictable, avoidable reasons. Not because the AI model was wrong, not because the use case was bad — but because the deployment lacked the operational infrastructure that makes AI employees reliable at scale.
This checklist covers what Agentex verifies before every AI employee goes live on OpenClaw + NemoClaw. It is not a theoretical framework. It is a production-grade go-live gate.
Section 1: Infrastructure readiness
1.1 Server specifications confirmed
Minimum: 4 vCPU, 16GB RAM, 50GB SSD, Linux (Ubuntu 22.04 LTS preferred). For multi-agent deployments: 8 vCPU, 32GB RAM. Confirm the server is provisioned and accessible before Sprint begins.
1.2 OpenClaw Gateway installed and verified
Run the health check: `curl http://localhost:18789/health`. Confirm the gateway process is running under a service manager (systemd or PM2 with auto-restart). Verify the gateway starts automatically on server reboot.
1.3 NemoClaw sandbox configured
Confirm the sandbox backend is set to OpenShell in `openclaw.json`. Verify the NemoClaw policy file exists for each AI employee role. Run a test session and confirm the session is sandboxed.
1.4 Network egress verified
Confirm all required outbound connections are open: LLM inference endpoint (Anthropic/OpenAI/local), WhatsApp Business API endpoint, Telegram API endpoint, all enterprise system APIs (Jira, GitHub, etc.). Verify no required endpoints are blocked by firewall rules.
1.5 Backup and recovery confirmed
OpenClaw memory and configuration files must be on a backed-up volume. Confirm a backup schedule is in place. Run a recovery drill: stop the gateway, restore from backup, restart, verify sessions resume correctly.
Section 2: Security and compliance
2.1 NemoClaw policy files reviewed
Each AI employee role has a policy file that defines file access, network egress, and inference provider permissions. Have your security team review each policy file before go-live. Confirm the policies match the documented workflow — the AI employee should only be able to reach the systems it is supposed to.
2.2 Secrets management verified
No API keys, tokens, or credentials are stored in plaintext in configuration files or code. Confirm all secrets are stored in the secrets file (`.secrets`) with appropriate file permissions (600). Verify the gateway reads secrets from environment variables, not from hardcoded values.
2.3 Audit log configured
OpenClaw logs every tool call, every message sent, and every action taken by every AI employee. Confirm audit logs are enabled, are being written to a persistent location, and are being retained for the required period (typically 90 days for operations, 12 months for BFSI/healthcare).
2.4 Credential rotation plan confirmed
Define rotation schedules for all credentials used by AI employees: LLM API keys, WhatsApp tokens, Jira/GitHub tokens, enterprise system credentials. Confirm the rotation process has been tested (rotating a credential and verifying the AI employee picks up the new value without service interruption).
2.5 Incident response runbook written
Define what happens when an AI employee behaves unexpectedly. Who gets notified? What is the kill switch? How do you roll back to the previous configuration? The runbook should be documented and the kill switch should be tested before go-live.
Section 3: Role definition verification
3.1 SOUL.md reviewed by domain expert
The SOUL.md file defines what the AI employee does and what it never does. A domain expert (QA lead, support manager, finance ops lead — whoever owns the role) must review and approve SOUL.md before go-live. The domain expert should be able to read it and say: "This is how I would want someone in this role to behave."
3.2 AGENTS.md tested against real scenarios
The AGENTS.md file defines the workflow. Before go-live, walk through 10–15 real examples from the past 90 days of operations: tickets, cases, invoices, whatever the AI employee will handle. Confirm the AGENTS.md workflow produces the correct output for each example.
3.3 Escalation criteria confirmed and tested
Every AI employee has escalation criteria — the conditions under which it routes to a human. These criteria must be explicit (not "when something seems wrong" — that is not testable). Test each escalation criterion with a synthetic example before go-live.
3.4 Human approval boundaries confirmed
For every action the AI employee can take, confirm: is this action within the approved autonomy boundary? Payments, production deployments, external communications to clients — confirm these have explicit human sign-off requirements written into AGENTS.md.
Section 4: Integration verification
4.1 All API connections tested in production
Do not test in staging and assume production works. Test every API connection (Jira, GitHub, WhatsApp, Telegram, CRM, ERP) against the production environment before go-live. Confirm authentication tokens have the correct scope and are not scoped too broadly.
4.2 Message channel end-to-end confirmed
For each channel the AI employee operates in: send a test message and verify the AI employee receives it, processes it, and responds correctly. For WhatsApp: confirm the phone number is verified and the webhook is correctly configured. For Telegram: confirm bot token is valid and chat binding is correct.
4.3 Tool access permissions verified
The AI employee needs the right level of access to each tool — not more. Jira: confirm the service account has the right project permissions. GitHub: confirm the token has repo access but not org admin. Review each tool access level against the principle of least privilege.
4.4 Memory and session persistence confirmed
Run a multi-session scenario: start a session, take an action, end the session, start a new session, verify the AI employee retains the relevant context. If memory is not persisting correctly, the AI employee will behave inconsistently across sessions.
Section 5: Go-live process
5.1 Shadow run completed
Before full go-live, run the AI employee in shadow mode for 48–72 hours: it processes real inputs but its outputs are reviewed by a human before being sent. Confirm output quality meets the standard before removing human review.
5.2 Volume ramp-up plan confirmed
Do not go from zero to full volume on day one. Start with 10–20% of real volume, monitor for 48 hours, then expand. Define the volume ramp schedule before go-live.
5.3 Monitoring and alerting configured
Define the metrics you will monitor: task completion rate, error rate, escalation rate, response time. Configure alerts for anomalies: error rate above X%, escalation rate above Y%, response time above Z seconds. Confirm the alerts route to the right people.
5.4 First-week review scheduled
Schedule a 60-minute review at end of week one. Review all edge cases encountered, all escalations triggered, all errors logged. Calibrate AGENTS.md based on real production data.
If you want Agentex to run through this checklist with you before deploying an AI employee inside your infrastructure, book an AI Workforce Audit. We cover sections 1–4 in the audit and section 5 during the Sprint.
Topics
Ready to deploy?
Book an AI Deployment Sprint — one workflow, live in 2 weeks.
Book AI Deployment Sprint →