Security Overview
Trust Model
skillx is built around a simple rule:
- Scan before inject
- Gate before launch
- Clean after exit
That sequence is the core trust model for skillx. You should be able to inspect a skill before it reaches your agent, approve risk before execution starts, and trust that temporary files are removed when the run ends.
This matters because Agent Skills can ask an AI agent to read files, write code, run shell commands, and fetch remote content on your behalf. A malicious skill could try to exfiltrate secrets, destroy data, inject instructions, escalate privileges, or persist beyond the session. skillx reduces that risk by adding explicit control points around the full lifecycle.
What skillx protects
skillx is designed to protect three boundaries:
- Your machine: reduce the chance that untrusted skill files touch sensitive locations or ship obvious destructive behavior.
- Your agent context: surface prompt injection and hidden instructions before they are copied into the agent’s working environment.
- Your session lifecycle: keep temporary injection isolated, auditable, and removable after the run.
How the trust flow works
1. Scan before inject
Before any skill files are injected into an agent environment, skillx analyzes the source with built-in scanners across three analyzers:
- Markdown Analyzer (
MD-001toMD-011) checksSKILL.mdfor prompt injection, sensitive directory references, destructive instructions, hidden text, external URLs, and missing metadata. - Script Analyzer (
SC-001toSC-015) checks helper scripts for dynamic execution, recursive deletes, credential access, shell profile changes, privilege escalation, self-replication, and obfuscation. - Resource Analyzer (
RS-001toRS-005) checks references and assets for disguised extensions, oversized files, executables, symlinks, and script payloads.
The scanner also includes anti-evasion checks such as continuation-line joining, whitespace normalization, hidden text detection, obfuscation detection, shebang analysis, and symlink protection. The goal is not to prove safety, but to catch risky patterns before they reach the agent.
2. Gate before launch
Findings are turned into explicit risk levels, and those levels control whether a run can proceed:
| Level | Gating Behavior |
|---|---|
| PASS | No findings. Auto-continue. |
| INFO | Informational only. Auto-continue. |
| WARN | Prompt: Continue? [Y/n] |
| DANGER | Require typing yes. Supports detail N to inspect. |
| BLOCK | Execution refused. Cannot be overridden. |
This is the second trust boundary: skillx does not silently continue from a risky scan. You see the result before launch, and truly blocked content never runs.
3. Clean after exit
Each run gets its own session ID and manifest. Injected files are hashed with SHA-256, tracked individually, and removed when the agent finishes. If the process is interrupted, orphaned sessions are recovered on the next run.
This keeps temporary skill files from becoming hidden long-term state on your system.
Defense Layers
Automated scanning
The scanner is the first control point. It looks for risky instructions in Markdown, scripts, and bundled resources before the skill is injected anywhere.
Risk gating
The gate is the second control point. It converts scan output into an approval workflow so users decide whether a risky skill should proceed.
Session integrity
SHA-256 manifests provide an audit trail for what was injected during a run and make cleanup deterministic.
Session isolation and recovery
skillx isolates each run, removes exactly the files it injected, and recovers interrupted sessions on the next invocation.
What the Scanner Does NOT Do
- It does not sandbox execution. If a skill tells the agent to run
rm -rf /, the scanner will flag it, but the agent can still execute it if you approve. - It does not analyze AI behavior. The scanner checks the skill’s static files, not what the agent might do with them.
- It does not replace trust. A PASS scan result means no known patterns were detected — it doesn’t guarantee the skill is safe.
- It uses regex, not AST analysis. The scanner uses regular expressions, which can have false positives and false negatives.
Best Practices
For Users
- Treat PASS as a trust signal, not a guarantee
- Read DANGER findings with
detail Nbefore approving - Avoid auto-approve mode for untrusted or newly discovered skills
- Use
--fail-on warnin CI or policy-driven environments - Prefer pinned refs when running remote skills from Git hosts
For Skill Authors
- Document why elevated behavior is necessary if your skill needs network, shell, or filesystem access
- Avoid referencing sensitive directories unless the skill genuinely requires them
- Keep helper scripts minimal and readable
- Test with
skillx scan --fail-on infobefore publishing - Write SKILL.md for auditability so users can understand intent quickly
Next Steps
- Risk Levels for the exact approval behavior at each level
- Rules for the complete scanner rule reference
- CI Integration for enforcing scan policy in automation