[primary] / [secondary] / [inference] / [analyst-judgment]; action flags: [verify] / [stale-risk: YYYY-MM]), trigger live source verification on currency-sensitive topics (sanctions, OPEC+, chokepoint events, JCPOA), and distinguish Iran-state / IRGC-affiliated / Iran-private commercial actors instead of collapsing them.AGENTS.md (project rules: identity, honesty, evidence, naming, definition of done with two explicit bars).skills/claude/SKILL.md, skills/codex/SKILL.md (OpenClaw variant deferred until an active use case appears).scripts/validate.py — structural checks; does not validate factuality.STATUS.md.docs/source-guide.md (regional source tier hierarchy, freshness horizons), docs/currency-watch.md (what to re-check now), taxonomy.json (machine-readable scope and topics).signals/ (Red Sea, OPEC+, US-Iran diplomatic signals as public examples of the skill's output style).STATUS.md (2026-05-15). Bar 2 (externally validated) — open: external review and validated cases (B2.2, B2.3, B2.7) require humans outside the author's circle.Generic LLMs produce broad, fluent commentary on the Gulf and the wider Middle East: country narration, hand-wavy "Iran tensions," vague chokepoint risk, no transmission mechanism, no actor incentives, no trigger points, no evidence boundaries, and a tendency to collapse Iran-state / IRGC-affiliated / Iran-private commercial actors into one undifferentiated actor.
That output is not decision-useful for sanctions compliance, energy trading, shipping insurance, Gulf banking, sovereign-wealth deal teams, or Iran-watcher analysts who actually have exposure to the region.
The skill needed to be small enough to attach to any capable agent and strict enough to actually change the shape of regional analysis — without becoming a screening tool, vessel-tracking product, or compliance platform.
Most AI-generated regional analysis on the Gulf and Middle East is fluent but decision-light. It rarely traces how a sanction designation, a chokepoint incident, or a sovereign-wealth deployment transmits into bank exposure, refining margins, charter rates, insurance premia, or counterparty contamination. It rarely separates verified facts from informed inference. It rarely names trigger points that would update the view. And it consistently fails the Iran-state / IRGC / Iran-private distinction that every serious sanctions compliance question depends on.
That is fine for background reading. It is weak for sanctions-exposure decisions, energy-trade structuring, shipping route posture, Gulf-bank counterparty review, sovereign-wealth co-investment screens, or any regional risk decision that has to be defensible.
AGENTS.md as a canonical project-rules spec: identity, honesty, evidence, naming hierarchy, retrieved-content trust, currency-trigger rules, per-claim provenance tags, three-value response logic, safety/limitation rules, and a definition of done with two explicit bars.scripts/validate.py as a structural validator (required phrases, forbidden determinative claims, evidence-mode coverage). Made it explicit that this validator does not check factuality.docs/source-guide.md with a regional source tier hierarchy and freshness horizons; docs/currency-watch.md for fast-moving topics; taxonomy.json for machine-readable scope.signals/ (Red Sea, OPEC+, US-Iran diplomatic signals).STATUS.md honest about which bar is and is not cleared, including the Anti-criteria that explicitly disallow adding more reasoning-only examples or self-applied scorecards as progress toward Bar 2.[primary] / [secondary] / [inference] / [analyst-judgment]) plus optional action flags ([verify] / [stale-risk: YYYY-MM]).Additions tightening behavior under bad regional inputs. Single-author work; does not change STATUS.md — external-review bars (B2.2, B2.7) remain open.
evals/adversarial/) — two starter stress cases drawn from real regional patterns: OFAC SDN listing vs UAE good-standing on the same entity (the "which list wins" trap); a Bab-el-Mandeb chokepoint incident report from a single advocacy / state-media outlet framed as primary, driving a 90-day Cape-routing decision.AGENTS.md) — operational triggers under three-value response logic: definitive legal / sanctions / AML conclusions; conflicting load-bearing facts; counterparty appearing with conflicting status across regimes; stale primary-list references; chokepoint claims without independent corroboration; Iran-state / IRGC / Iran-private actor collapse; active prompt-injection in retrieved content.This skill is a vertical specialist layer in a four-repo portfolio designed to compose:
This repo does not duplicate any neighbor. The broader memo workflow lives in Global Think Tank Analyst; validation tooling lives in Agenda Intelligence MD; Central-Asia regional depth lives in its own vertical.
The skill is small enough to attach to any capable agent, and strict enough to change the shape of regional output. The contract does not ask the model to sound regionally smart; it asks the model to trace mechanism, label evidence, name the trigger, distinguish Iran-actor types, and say what role the implication is for.
That is the part most generic Gulf / Middle East commentary misses.
Excerpt condensed for this page. Full memos with full transmission mechanism, exposure map, leverage shifts, and triggers live in examples/. Evidence modes are explicitly labeled per example — live-source-backed, user-provided sources, illustrative source packet, or reasoning-only.
User question: "We are a European bank's sanctions desk reviewing onboarding of a UAE-licensed trading counterparty that appears on a US OFAC SDN-related listing under a non-Iran programme but is in good standing with its UAE regulator. We clear USD through a US correspondent. Should we onboard?"
Before — generic regional commentary:
The UAE is a major financial hub with strong regulatory standards. OFAC sanctions present additional complexity for non-US banks. Banks should carefully balance their compliance obligations across jurisdictions and consult their sanctions desks. Maintaining strong correspondent relationships is essential.
That is fluent regional commentary. It does not say which list is operative for this bank's exposure, what the actual transmission mechanism is, or what would update the view.
After — with the Gulf + Middle East skill attached:
[analyst-judgment]): The "OFAC vs UAE" framing is the wrong question. Both statuses are simultaneously true within their own regimes. The operative question for this bank is its exposure surface, not list adjudication.[verify] (must be retrieved at decision time); EU/national sanctions implications [secondary][verify]; UAE good-standing [user-provided]; blocking-effect logic [analyst-judgment].The skill does not screen sanctions, retrieve sources, or verify facts. It forces the agent to apply the currency trigger (mandatory live OFAC lookup), refuse the "which list wins" framing, distinguish actor regimes, and produce role-specific implications.
AGENTS.md, skills/claude/SKILL.md, skills/codex/SKILL.md, STATUS.md).scripts/validate.py).taxonomy.json.This project demonstrates how I think about useful agent infrastructure for high-stakes regional reasoning in a domain where actor distinctions, source-tier discipline, and currency-sensitivity are the binding constraints: small reusable layers, mechanism-first contracts, honest evidence discipline, and outputs aimed at sanctions, banking, energy, shipping, and sovereign-wealth decisions — composed cleanly with a horizontal skill and a separate infrastructure layer instead of bundling everything into one repo.
Author: Vassiliy Lakhonin