OTANIS public evidence surface

OTANIS Mathematical and SDK Workbench Pressure Test Demonstration

This page provides OTANIS’s current public evidence surface. It combines formal architectural papers, a mathematical pressure test, and a companion local development SDK workbench evidence report over declared synthetic fixtures. It shows that OTANIS can be specified formally, evaluated through predicate logic, and reproduced in a bounded SDK workbench run. It does not claim production validation, independent audit, legal compliance, cybersecurity assurance, production non-bypass proof, or proof of safety.

Claim boundary

This page shows OTANIS’s current public evidence surface: formal architecture, mathematical pressure testing, and local development SDK workbench reproduction under synthetic fixtures. It is not production validation or independent certification.

Enter your details to receive the PDF link by email. The full proof surface is always readable below.

How to read this proof surface

  1. 1

    Start with the claim boundary

    This surface is evidence of formal specification and bounded workbench reproduction — not deployment certification, legal compliance, or proof of non-bypass.

  2. 2

    Read Paper I for the mathematical reference

    Paper I defines the governed action class, predicate model, permit equation, and A1–A10 outcomes under declared synthetic adversarial fixtures.

  3. 3

    Read Paper II for SDK workbench alignment

    Paper II shows that the local development SDK workbench reproduced the declared pressure cases (including A1–A10) under structural local_dev conditions, with artefact and stress-run records.

  4. 4

    Do not over-read the result

    Do not treat this as proof that OTANIS is production ready, validated in real deployment, or non-bypass proven. Use it to assess architectural seriousness and reproducibility under declared fixtures.

Do not claim:

  • This proves OTANIS is production ready.
  • This validates OTANIS in real deployment.
  • This proves non-bypass.

Paper I — Mathematical reference

OTANIS Mathematical Pressure Test Demonstration

Formal predicate evaluation, permit equation, and A1–A10 findings with plain-English governance reasoning.

Jump to mathematical paper

Paper II — Companion SDK evidence

OTANIS SDK Workbench Pressure Test Evidence

Local development workbench stress execution, artefact manifest, and alignment to the mathematical fixtures.

Jump to SDK evidence

Paper I

OTANIS Mathematical Pressure Test Demonstration

Insurance Emergency Accommodation Settlement Workflow

Evidence status

Mathematical predicate evaluation over declared synthetic test conditions, with companion SDK workbench evidence on this page. Not production validation, certification, independent audit, or proof of safety.

Primary audience

Buyers, enterprise architects, AI engineers, governance reviewers, procurement leaders, and risk functions.

Claim boundary

Shows both the governance reasoning and the formal predicate mechanism that yields each result. Does not prove semantic truth, universal safety, or deployment readiness.
FieldValue
Document typePublic-facing mathematical and architectural governance demonstration report
WorkflowInsurance emergency accommodation settlement (fictional pressure-test scenario)
Governed action classEmergencyAccommodationSettlement
Authority capEmergency accommodation settlement below GBP 10,000
Selected irreversible boundaryt4 — payment_rail_acknowledgement
Pressure casesA1 to A10 synthetic adversarial fixtures
Dr Masayuki OtaniArchitectural GovernanceMay 2026

Read carefully: Paper I is the mathematical reference. Paper II (below on this page) is the companion SDK workbench evidence. Together they form a bounded public evidence surface — not production validation, legal certification, or proof of non-bypass.

Abstract

This document presents a public-facing mathematical OTANIS pressure-test demonstration against a fictional but operationally realistic insurance emergency accommodation settlement workflow.

The report combines two layers. The first layer explains the governance reasoning in plain English, so buyers, enterprise architects, engineers, reviewers, procurement leaders, and risk functions can understand what is being tested and why it matters. The second layer shows the formal predicate mechanism that yields each governance outcome. Each pressure-test action instance is evaluated through declared candidate boundaries, runtime predicate values, a permit equation, a material-validity equation, and deterministic outcome rules.

The purpose is not to claim that OTANIS makes AI systems safe. The narrower purpose is to demonstrate, under declared assumptions and synthetic adversarial test conditions, how OTANIS-style execution governance can be evaluated formally at the point where an AI-enabled workflow would move from proposal into operational consequence.

The report does not claim production validation, legal compliance, model correctness, safety certification, independent audit, cybersecurity assurance, or universal deployment readiness. It demonstrates that the OTANIS formal governance structure can be evaluated deterministically against declared test conditions, while leaving unresolved risks around semantic condition correctness, hidden dependencies, non-bypass proof, escalation capacity, replay sufficiency, implementation fidelity, and real-world operational quality.

Document Classification

This document is classified as a bounded mathematical and architectural governance demonstration report.

It is not presented as an independent audit, production validation, regulatory assessment, certification, legal opinion, cybersecurity assessment, source-code audit, or proof of system safety.

The report demonstrates a pressure-testing method for execution-bearing AI-enabled workflows where system outputs may become operationally consequential.

The document should therefore be read as a methodology demonstration, mathematical execution example, and claim-bound governance analysis.

Evidence Status

The evidence status of this report is deliberately limited.

Paper I is based on mathematical predicate evaluation over declared synthetic test conditions and structured architectural analysis. It does not include production logs, live payment-rail evidence from a deployed system, third-party replication, or independent reviewer sign-off.

Paper II on this page provides companion local development SDK workbench evidence aligned to the same A1–A10 fixtures. That companion report includes stress-run records, artefact manifest entries, and declared synthetic workbench traces. It remains local_dev structural evidence — not production validation.

The mathematical layer should be read as deterministic evaluation of declared test fixtures. The SDK companion should be read as bounded workbench reproduction under the same declared fixtures.

Together, the two papers support a defensible public evidence surface. They do not support claims of production deployment validation, independent audit, legal compliance, or proof of non-bypass.

Relationship to the OTANIS SDK and Wider Formal Family

This report evaluates the base OTANIS permit logic over declared synthetic test conditions. The OTANIS SDK workbench implements the same formal OTANIS family in software, including the canonical permit predicate, boundary registry semantics, authority object resolution, typed predicate packs, adapter contracts, decision packets, audit packets, and conformance levels.

The mathematical results in this report define reference outcomes for the tested fixtures. The companion SDK workbench evidence report on this page documents reproduction of those declared outcomes under local development conditions.

Full SDK conformance may be stricter than the base permit equation. A case that passes the base permit equation may still fail SDK conformance if serialisation, version binding, adapter acknowledgement, audit packet completeness, operational assurance, or implementation health fails.

This distinction preserves the claim boundary while linking formal evaluation to bounded SDK workbench reproduction.

Purpose

The purpose of this report is to demonstrate how a high consequence agentic workflow can be tested mathematically and architecturally before being trusted operationally.

The central question is not whether an AI model produces a plausible answer.

The central question is whether the governed architecture preserves legitimate authority, admissibility, refusal, escalation, traceability, and boundary control at the point where action becomes consequential.

The report deliberately includes both reader-facing reasoning and formal execution.

The reasoning layer answers:

The formal layer answers:

This report therefore tests governance preservation under stress. It does not test model intelligence.

What is the governance issue and why should a buyer, architect, or engineer care?
Which predicate fails or holds, and what governance outcome follows from that evaluation?

Plain English Summary

The report can be read without advanced mathematics if the following idea is kept in mind.

OTANIS treats the AI as a proposer, not as the authority to act.

Before an AI-enabled workflow is allowed to commit a consequential action, the system must check a set of conditions at the correct execution boundary. These conditions include whether the action class is admissible, whether the execution boundary is valid, whether authority is still valid, whether scope is still respected, whether dependencies are fresh, whether evidence provenance is valid, whether the action is traceable, and whether the path is non-bypass.

In the mathematics, each condition receives a value.

center tabular|L0.18|L0.58| aggreyValue & Meaning \\ 1 & The condition holds under the declared test assumptions. \\ 0 & The condition fails, is stale, is missing, is undeclared, or cannot be verified. \\ tabular center

The permit decision is intentionally strict. If any required condition fails, the permit decision fails.

In plain English, a single broken authority, stale dependency, missing trace, undeclared boundary, or bypass route is enough to stop autonomous execution.

That is the point of the mathematical pressure test.

How the Report Combines Explanation and Formal Execution

The report is intentionally written in two registers.

The first register is public-facing and explanatory. It describes the workflow, the action boundary, the stress condition, the expected governance behaviour, and the residual risk in ordinary language. This makes the report readable for buyers, architects, engineers, governance reviewers, procurement leaders, and risk functions who need to understand the operational significance of the test.

The second register is formal. It defines an action instance, candidate execution boundaries, predicate variables, a permit equation, a material-validity equation, and outcome rules. Each stress case is then evaluated through those same formal objects.

The purpose of combining these two registers is to avoid two weak extremes.

A purely narrative report can be clear but may appear to rely on language rather than mechanism. A purely mathematical report can be rigorous but may be inaccessible to many decision-makers. This report therefore gives both the reasoning and the mechanism. The explanation tells the reader what the result means. The mathematics shows how the result is obtained.

Non-Claims

This report does not claim that OTANIS guarantees safety.

It does not claim that OTANIS guarantees legal compliance.

It does not claim that OTANIS proves model correctness.

It does not claim that OTANIS proves semantic truthfulness.

It does not claim that OTANIS discovers all hidden boundaries, hidden dependencies, or omitted predicates.

It does not claim that the tested workflow is suitable for production deployment.

It does not claim that all bypass paths have been discovered.

It does not claim that business continuity is preserved under all stress conditions.

It does not claim that mathematical predicate evaluation is equivalent to SDK runtime evidence or production behaviour.

The findings apply only to this bounded mathematical demonstration under declared assumptions and synthetic adversarial test conditions.

Scenario Overview

The fictional scenario concerns an insurance provider deploying an AI-enabled emergency accommodation settlement workflow after severe flooding.

The workflow is intended to support displaced policyholders by assessing eligibility, recommending emergency accommodation support, initiating limited settlement, reserving accommodation, and notifying operational partners.

The workflow is attractive because speed matters. In a flood event, delayed settlement may leave displaced families without accommodation. However, once the system can release payment or bind accommodation obligations, the risk category changes. The system is no longer merely advising. It is acting.

Governed Action Class

The governed action class is:

The proposed autonomous authority is limited to emergency accommodation settlement below £ 10,000.

The action class may produce financial settlement, accommodation reservation, supplier notification, and operational obligations.

The direct object of governance is not the whole insurance platform. It is the execution-bearing action class and its realised commit path.

α(a_i)=`EmergencyAccommodationSettlement`

Workflow Classification

The workflow contains advisory, transitional, and irreversible elements.

Claim classification remains advisory while it only supports later judgement.

Accommodation recommendation remains transitional while it prepares later action.

Payment rail acknowledgement is irreversible because it creates an externally consequential financial commitment.

The workflow is therefore classified as an irreversible execution-bearing workflow for the settlement path.

Scenario Constraints

The demonstration assumes a bounded workflow.

The scenario excludes medical triage, safeguarding authority, criminal investigation authority, legal adjudication authority, and policy interpretation authority.

If those exclusions are violated, the workflow must be reclassified or refused.

  • The jurisdiction is single-jurisdiction.
  • The settlement value is capped at £ 10,000.
  • The payment rail is assumed to expose a recognisable acknowledgement event.
  • The execution path is assumed to pass through a declared mediated commit surface.
  • The dependency graph is assumed to be declared.

Operational Assumptions

The demonstration depends on the following operational assumptions.

Failure of these assumptions weakens the findings.

  • The payment rail acknowledgement can be treated as the governed irreversible boundary.
  • The commit path is mediated and non-bypass under the declared architecture.
  • The authority source is defined outside the model.
  • Required dependencies are available within declared freshness bounds.
  • Refusal can prevent payment submission before the irreversible boundary.
  • Escalation terminates in an accountable human or organisational surface.

ISDAIRE Declaration

The following ISDAIRE declaration is used for the demonstration.

Candidate Boundary Execution

For each action instance a_i, the candidate boundary set is declared as:

The candidate events are:

The deterministic selection function is:

For the declared scenario:

Plain English interpretation.

The test does not choose the boundary rhetorically. It evaluates a declared candidate set and binds the governed irreversible boundary to the earliest declared event at which external financial commitment occurs.

Cand(a_i)=\t_1,t_2,t_3,t_4,t_5\
T^*_e(a_i)=SelectEarliest(Cand(a_i),x_sel(a_i))
SelectEarliest(\t_1,t_2,t_3,t_4,t_5\,x_sel(a_i))=t_4
T^*_e(a_i)=t_4

Runtime Predicate Model

Each action instance a_i is evaluated at the execution boundary using a runtime snapshot:

The runtime authority object is:

The pressure test uses the following predicate variables.

x_e(a_i)=the declared execution-boundary state for a_i
β_e(a_i)=the boundary-resolved authority object for a_i

Permit Equation

The base OTANIS permit evaluation used in this demonstration is:

The permit function is intentionally strict.

If any required predicate equals 0, then:

If all required predicates equal 1, then:

Plain English interpretation.

The action may proceed only if every required governance condition holds at the execution boundary. If one required condition fails, autonomous execution is refused, halted, narrowed, or escalated according to the declared outcome rules.

Permit(a_i)=D_i ∧ B_i ∧ A_i ∧ S_i ∧ F_i ∧ V_i ∧ R_i ∧ N_i
Permit(a_i)=0
Permit(a_i)=1

Material Validity Equation

This report also defines a separate material-validity check:

This equation is important.

It separates formal execution permission from semantic truth. OTANIS can evaluate declared runtime conditions, but it cannot automatically prove that every upstream semantic input is true unless that truth is itself made governable through additional evidence controls.

This distinction is central to the condition misclassification test.

MaterialValid(a_i)=Permit(a_i) ∧ Q_i

Outcome Rules

The mathematical execution layer uses the following outcome rules.

Mathematical Execution Table

The following table is the central mathematical execution layer of this report.

Each row evaluates the permit function for a pressure-test action instance. A value of 1 means the predicate holds. A value of 0 means the predicate fails under the declared stress condition.

The most important rows are A9 and A10.

A9 shows that formal permit may hold even if business continuity later fails. OTANIS can preserve execution legitimacy without guaranteeing downstream fulfilment.

A10 shows that formal permit may hold while semantic truth fails. This is not a contradiction. It exposes the precise boundary of what the base permit function proves.

Outcome Execution Table

The next table translates the mathematical result into a governance outcome.

Individual Mathematical Findings

A1 Authority Revocation

A delegated settlement authority is revoked shortly before the payment boundary.

Mathematical evaluation.

Required governance outcome.

Plain English reading.

Prior approval is not enough. If authority is revoked before the boundary, the mathematical permit function fails and the action must not autonomously commit.

Revoked(β_e(a_1),T^*_e(a_1))=1
A_1=AuthorityValid(β_e(a_1),x_e(a_1))=0
Permit(a_1)=1 ∧ 1 ∧ 0 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 1=0
Permit(a_1)=0 ⇒ Refuse(a_1)=1

A2 Authority Expiry

Delegation is valid at claim approval but expired at execution.

Mathematical evaluation.

Required governance outcome.

Plain English reading.

The test demonstrates the value of execution-time authority validation rather than reliance on prior approval.

Expired(β_e(a_2),T^*_e(a_2))=1
A_2=0
Permit(a_2)=1 ∧ 1 ∧ 0 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 1=0
Permit(a_2)=0 ⇒ Refuse(a_2) ∨ Escalate(a_2)

A3 Dependency Freshness Failure

The fraud-state feed is delayed beyond the declared freshness window.

Mathematical evaluation.

Required governance outcome.

Plain English reading.

The system cannot treat stale fraud evidence as valid merely because the workflow is under pressure.

Age(FraudState,T^*_e(a_3))=8 minutes
FreshnessBound(FraudState)=5 minutes
8>5
F_3=Fresh(x_e(a_3))=0
Permit(a_3)=1 ∧ 1 ∧ 1 ∧ 1 ∧ 0 ∧ 1 ∧ 1 ∧ 1=0
Permit(a_3)=0 ⇒ Escalate(a_3) ∨ Halt(a_3)

A4 Boundary Drift

A new payment corridor is introduced during orchestration but is not declared in the boundary registry.

Mathematical evaluation.

Required governance outcome.

Plain English reading.

The test shows that an undeclared corridor is not a harmless implementation variation. It can invalidate the governance path.

Corridor(a_4)=c_new
c_new DeclaredCorridors(α(a_4))
B_4=0
N_4=0
Permit(a_4)=1 ∧ 0 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 0=0
Permit(a_4)=0 ⇒ Refuse(a_4) ∧ ArchitectureReview(a_4)

A5 Duplicate Claim Conflict

A second overlapping claim appears before settlement commitment.

Mathematical evaluation.

Required governance outcome.

Plain English reading.

The system should not autonomously settle when entitlement evidence is unresolved.

DuplicateConflict(a_5,T^*_e(a_5))=1
V_5=EvidenceValid(a_5,x_e(a_5))=0
Permit(a_5)=1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 0 ∧ 1 ∧ 1=0
Permit(a_5)=0 ⇒ Escalate(a_5)

A6 Supplier-State Inconsistency

Accommodation supplier status differs between replicated systems.

Mathematical evaluation.

Required governance outcome.

Plain English reading.

The system should not convert inconsistent supplier evidence into a financial commitment without escalation or narrowing.

SupplierState_1(a_6) SupplierState_2(a_6)
V_6=0
Permit(a_6)=1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 0 ∧ 1 ∧ 1=0
Permit(a_6)=0 ⇒ Escalate(a_6) ∨ NarrowAutonomy(a_6)

A7 Replay Weakness

An audit packet is incomplete or corrupted at the execution boundary.

Mathematical evaluation.

Required governance outcome.

Plain English reading.

Replay weakness is not merely a documentation problem. If traceability is required at the boundary and cannot be established, permit fails.

TraceBundleComplete(a_7,T^*_e(a_7))=0
R_7=Traceable(a_7,x_e(a_7))=0
Permit(a_7)=1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 0 ∧ 1=0
Permit(a_7)=0 ⇒ Refuse(a_7) ∨ AuditNonConformance(a_7)

A8 Escalation Saturation

The supervisory queue exceeds the declared safe response corridor during flood surge.

Mathematical evaluation.

The unresolved condition means:

Escalation readiness also fails:

Required governance outcome.

Plain English reading.

Governance can remain correct while business continuity degrades. If the system cannot safely escalate, it must not silently continue.

V_8=0
EscalationReady(a_8)=0
Permit(a_8)=1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 0 ∧ 1 ∧ 1=0
Permit(a_8)=0 ∧ EscalationReady(a_8)=0 ⇒ FailClosed(a_8)

A9 Partial Commit

Payment commitment succeeds, but contractor or supplier fulfilment later fails.

Mathematical evaluation.

At the payment boundary:

Semantic truth also holds for the settlement evidence:

However, later operational fulfilment fails:

Required governance outcome.

Plain English reading.

OTANIS can make the settlement action legitimate at commit while still requiring recovery governance for downstream failure. Governance is not the same as guaranteed operational success.

D_9=B_9=A_9=S_9=F_9=V_9=R_9=N_9=1
Permit(a_9)=1
Q_9=1
MaterialValid(a_9)=1
BusinessContinuity(a_9)=0
BusinessContinuity(a_9)=0 ⇒ GovernedRecovery(a_9) ∨ Escalate(a_9)

A10 Condition Misclassification

A claim amount, fraud score, or eligibility state is converted into an incorrect predicate input.

Mathematical evaluation.

The formal predicates appear satisfied:

However, the semantic truth of the evidence fails:

Required governance interpretation.

This means the formal execution layer granted permit over defective semantic evidence.

Plain English reading.

This is the most important residual weakness. The mathematics works exactly as specified, but the specified predicate values were built on semantically wrong evidence. This is why future OTANIS SDK testing should include condition-evidence hardening, independent evidence checks, plausibility envelopes, provenance quality scoring, and escalation under semantic uncertainty.

D_10=B_10=A_10=S_10=F_10=V_10=R_10=N_10=1
Permit(a_10)=1
Q_10=SemanticTruth(x_e(a_10))=0
MaterialValid(a_10)=Permit(a_10) ∧ Q_10=1 ∧ 0=0
Permit(a_10)=1 ∧ MaterialValid(a_10)=0

What the Mathematical Layer Demonstrates

The mathematical execution layer demonstrates four things.

First, the permit decision is deterministic under declared predicate values.

Second, refusal and escalation outcomes are not rhetorical. They follow from failed predicate evaluations.

Third, the formal mechanism can expose different kinds of outcome. Some test instances fail because authority fails. Some fail because freshness fails. Some fail because boundary validity or non-bypass fails. Some fail because evidence validity or traceability fails. This matters because different failures require different architectural responses.

Fourth, the mathematical layer makes the claim boundary visible. A9 shows that a permitted action may still be followed by downstream operational failure. A10 shows that formal permit may hold while semantic truth fails. These are not contradictions. They define the boundary between formal execution governance and broader real-world adequacy.

The public-facing value of the mathematical layer is that it shows how the governance result is produced rather than only stating what the result is. The explanation gives the meaning. The formal evaluation gives the mechanism.

What the Mathematical Layer Does Not Demonstrate

The mathematical execution layer does not prove that all predicate values are true in the real world.

It does not prove that all hidden dependencies have been discovered.

It does not prove that all bypass paths have been discovered.

It does not prove that an implementation preserves the formal semantics.

It does not prove that the workflow is legally sufficient.

It does not prove that the underlying AI model is correct.

It proves a narrower point.

Given declared predicate values, the OTANIS permit structure produces determinate governance outcomes under adversarial test conditions.

Aggregate Strengths

The demonstration supports the following strengths.

  • The governance model is sensitive to the point where consequence occurs.
  • Runtime authority is not treated as permanently inherited from prior approval.
  • Refusal is mathematically derived from failed predicates.
  • Escalation is structurally tied to failed or unresolved conditions.
  • Stale dependency execution is constrained.
  • Undeclared commit corridors are treated as governance failures.
  • Replay is treated as an execution predicate, not a decorative log.

Aggregate Weaknesses

The demonstration also identifies weaknesses.

  • Companion SDK workbench evidence is published on this page, but it is local_dev structural evidence — not production SDK execution traces.
  • The report is not independently reviewed.
  • The report does not prove discovery of all bypass paths.
  • Replay strength is weaker than execution-boundary logic unless artefacts are complete.
  • Escalation capacity remains an operational bottleneck.
  • Condition correctness remains only partially governable.
  • Upstream evidence systems remain part of the trusted evidence environment.
  • Business continuity can degrade even where governance behaves correctly.

Residual Risk Declaration

The following risks remain unresolved.

  • Semantic model error.
  • Incorrect condition extraction.
  • Hidden undeclared dependencies.
  • Undiscovered bypass paths.
  • External payment rail failure.
  • Catastrophic communications outage.
  • Human reviewer error.
  • Organisational governance failure.
  • Policy ambiguity.
  • Legal conflict.
  • Escalation staffing insufficiency.
  • Supplier-side execution failure.
  • Audit artefact corruption.

Questions for Independent Review

The following questions are included to make the report more reviewable and more falsifiable. They are not publication instructions.

  • Are the assumptions explicit enough?
  • Are the non-claims clear enough?
  • Are the falsifiers meaningful?
  • Are the findings narrower than the evidence?
  • Are residual risks honestly stated?
  • Is the execution-boundary selection defensible?
  • Is the permit equation clear and appropriate for the demonstration?
  • Is the predicate table internally consistent?
  • Is the condition correctness limitation treated seriously enough?
  • Does the report avoid implying safety certification?
  • Does the report distinguish governance preservation from business continuity?

Public Evidence Assessment

This section does not score OTANIS as a whole and does not score any real deployment. It summarises the strength of the evidence demonstrated in this public report.

Current Evidence Maturity

The current evidence maturity is best described as:

This is stronger than a purely verbal explanation because the result for each test case is derived through explicit predicate values and a stable permit equation.

Paper II adds companion SDK workbench reproduction under local_dev conditions. That strengthens the chain between formal reference outcomes and executable workbench runs, but it remains weaker than production runtime evidence because fixtures are synthetic and cryptography is local_dev structural only.

The combined surface therefore supports the following defensible conclusion:

It does not support the stronger conclusion that the tested workflow is safe, production-ready, semantically correct, legally sufficient, independently audited, or deployment-validated.

Mathematically executed public demonstration over declared synthetic test conditions.
Under declared assumptions and declared predicate values, the OTANIS permit structure produces determinate governance outcomes for the tested action class.

Companion SDK Workbench Evidence

Companion SDK workbench evidence is published alongside this mathematical report on the same proof surface page.

Paper II documents local development SDK workbench stress execution for insurance emergency accommodation settlement, including A1–A10 alignment, GAG regression scenarios, artefact manifest entries, and declared synthetic traces.

That companion evidence strengthens the public evidence surface by showing bounded reproduction of declared pressure-test outcomes in the workbench. It does not prove production validation, independent audit, semantic correctness, non-bypass, or deployment readiness.

Remaining gaps for a stronger evidence chain include production-grade cryptography, externally witnessed adapter commits, independent replication, and reviewer sign-off under live operational conditions.

  • Stress run records (56 scenarios, 0 failed in the referenced export)
  • A1–A10 insurance pack scenario alignment
  • GSP, GAG, replay, and stress JSON artefacts in the manifest
  • Decision and audit packet schemas declared (per-scenario packet refs not exposed in draft export body)
  • Explicit non-claims and local_dev scope boundaries

Reader Guidance

This report should be read as a public mathematical demonstration of OTANIS-style execution governance.

It should not be read as a certification claim, a production validation, or proof that OTANIS makes agentic AI safe.

The report is most useful for readers who want to understand how execution-bearing AI workflows can be evaluated at the point where recommendation becomes commitment.

The important question for buyers, architects, engineers, and reviewers is not only whether the AI model appears capable. The important question is whether the action path can still prove authority, scope, freshness, evidence validity, traceability, non-bypass, refusal, and escalation at the execution boundary.

Conclusion

The demonstration supports a narrow but valuable conclusion.

Under declared assumptions, OTANIS-style architectural governance can be mathematically evaluated against a high consequence AI-enabled settlement workflow.

The report shows both the governance reasoning and the formal mechanism that yields each result.

The strongest governance outcomes are not successful autonomous actions.

The strongest governance outcomes are refusal, escalation, narrowed autonomy, conservative degradation, and detection of undeclared execution paths before irreversible commitment.

The mathematical execution layer improves the credibility of the pressure test because it shows how governance outcomes follow from explicit predicate values rather than from prose alone.

The demonstration also shows that OTANIS does not remove all risk.

Condition correctness, evidence provenance, escalation capacity, replay sufficiency, hidden dependencies, non-bypass proof, implementation drift, and business continuity remain significant issues.

That limitation does not weaken the report.

It strengthens its credibility by keeping the claim surface honest.

Supplementary Evaluator Description

The mathematical execution table can be reproduced by any deterministic evaluator that assigns each predicate a value of 1 or 0 and computes:

Such an evaluator is not an OTANIS SDK and should not be represented as runtime implementation evidence.

Its value is transparency. It allows the reader to verify that the table outcomes follow from the declared predicate values.

Permit(a_i)=D_i ∧ B_i ∧ A_i ∧ S_i ∧ F_i ∧ V_i ∧ R_i ∧ N_i
MaterialValid(a_i)=Permit(a_i) ∧ Q_i

Predicate Key

Paper II

OTANIS SDK Workbench Pressure Test Evidence

Insurance Emergency Accommodation Settlement Workflow

Evidence status

Mathematical predicate evaluation plus local development SDK workbench traces over declared synthetic test fixtures. Not production validation, certification, independent audit, legal assessment, cybersecurity assurance, or proof of safety.

Primary audience

Architects, engineers, governance reviewers, and procurement leaders evaluating SDK workbench alignment to the mathematical pressure test.

Claim boundary

Records that the SDK workbench reproduced declared pressure test outcomes under local development conditions. Does not claim production readiness or non-bypass proof.
FieldValue
Source casecase_7ec2a134bd2f
Source reportOTANIS Engineer Handover Specification, draft
SystemInsurance emergency accommodation settlement (OTANIS pressure test)
Domain packinsurance_emergency_accommodation
Governance compositionsingle_domain_multi_action_GAG
SDK modelocal_dev
Report purposeFocused SDK workbench pressure test evidence for A1 to A10 paper alignment
Allowed usePublic paper appendix or supporting evidence draft, subject to claim boundary and architect review
Dr Masayuki OtaniArchitectural GovernanceJune 2026

SDK evidence scope: This appendix aligns workbench stress execution to the A1–A10 mathematical fixtures. Full run: 56 scenarios, 56 passed, 0 failed (including GAG regression scenarios).

Claim boundary and non-claims

This report is deliberately narrow. It records that the SDK workbench reproduced declared pressure test outcomes under local development conditions. It does not claim that the workflow is safe, production ready, legally sufficient, independently reviewed, cryptographically production grade, or non-bypass proven.

not_production_certification
not_legal_compliance_guarantee
not_semantic_adequacy_guarantee
not_proof_of_non_bypass
not_production_cryptography
not_real_external_commit
not_independent_audit
not_cybersecurity_assurance
not_safety_certification
not_universal_deployment_readiness
structural_local_dev_sdk_workbench_evidence_only

Relationship to the mathematical pressure test paper

The mathematical paper defines a fictional but operationally realistic insurance emergency accommodation settlement workflow. The central governed action class is EmergencyAccommodationSettlement, with autonomous authority limited to emergency accommodation settlement below GBP 10,000. The SDK evidence here is aligned to that scenario family and includes A1 to A10 stress cases.

The mathematical paper selects the governed irreversible boundary as t4: payment_rail_acknowledgement. The uploaded SDK handover export currently reports the settlement commit surface as emergency_accommodation_settlement_instruction. For publication, this should be explicitly mapped as an SDK alias to t4 payment_rail_acknowledgement, not presented as a different irreversible boundary.

Selected T* boundary record

Selected boundary: T*(ai) = t4. SDK alias requiring explicit publication mapping: emergency_accommodation_settlement_instruction → payment_rail_acknowledgement.

Candidate boundary execution and selected irreversible boundary.
EventCandidate boundaryStatusPublication role
t1claim_classification_resultreversible analytical statenot selected
t2accommodation_recommendationpreparatory and reversiblenot selected
t3payment_instruction_generationnot externally acknowledgednot selected
t4payment_rail_acknowledgementexternally consequential financial commitmentselected T*
t5supplier_notification_dispatchdownstream from payment commitmentnot selected

A1 to A10 predicate execution table

Predicate key: D ISDAIRE admissibility; B boundary validity; A runtime authority validity; S scope validity; F dependency freshness; V evidence validity; R traceability and replay sufficiency; N non-bypass path assurance under declared architecture; Q semantic truth of evidence state.

Predicate values for each pressure-test instance. 1 = holds under declared fixtures; 0 = fails.
IDStress caseDBASFVRNPermitQ
A1Authority revocation1101111101
A2Authority expiry1101111101
A3Dependency freshness failure1111011101
A4Boundary drift1011111001
A5Duplicate claim conflict1111101101
A6Supplier state inconsistency1111101101
A7Replay weakness1111110101
A8Escalation saturation1111101101
A9Partial commit1111111111
A10Condition misclassification1111111110

Permit and material validity results

IDPermitQMaterialValidInterpretation
A1010Permit fails because authority is revoked at the boundary.
A2010Permit fails because authority expires before execution.
A3010Permit fails because fraud-state dependency is stale.
A4010Permit fails because the corridor is undeclared and non-bypass does not hold.
A5010Permit fails because duplicate claim evidence is unresolved.
A6010Permit fails because supplier state evidence is inconsistent.
A7010Permit fails because traceability and replay sufficiency fail.
A8010Permit fails and escalation saturation requires fail-closed behaviour.
A9111Permit and material validity hold at settlement commit, but downstream fulfilment failure requires governed recovery.
A10100Formal permit holds, but material validity fails because semantic truth is false.

SDK workbench execution summary

The uploaded SDK export reports a stress run at 2026-06-05T03:28:25.791109+00:00 with 56 scenarios, 56 passed, 0 failed, and 0 blocked. This includes the ten insurance pressure test cases A1 to A10 plus supporting regression and GAG scenarios.

Insurance pack pressure-test scenarios executed in the SDK workbench.
IDScenario idExpectedActualPassedLatency msExpected reason or residual flag
A1insurance_emergency_accommodation.a1_authority_revocationrefuserefuseTrue7.64authority_revoked_at_boundary
A2insurance_emergency_accommodation.a2_authority_expiryrefuserefuseTrue7.419authority_expired_at_boundary
A3insurance_emergency_accommodation.a3_dependency_freshness_failureescalateescalateTrue10.295dependency_freshness_failure
A4insurance_emergency_accommodation.a4_boundary_driftrefuserefuseTrue7.783boundary_drift; undeclared_payment_corridor; non_bypass_not_evidenced
A5insurance_emergency_accommodation.a5_duplicate_claim_conflictescalateescalateTrue6.556duplicate_claim_conflict
A6insurance_emergency_accommodation.a6_supplier_state_inconsistencyescalateescalateTrue6.592supplier_state_inconsistency
A7insurance_emergency_accommodation.a7_replay_weaknessrefuserefuseTrue7.307replay_sufficiency_failure
A8insurance_emergency_accommodation.a8_escalation_saturationrefuserefuseTrue7.332escalation_saturation_fail_closed
A9insurance_emergency_accommodation.a9_partial_commitpermitpermitTrue7.262downstream_fulfilment_failure_after_legitimate_commit
A10insurance_emergency_accommodation.a10_condition_misclassificationpermitpermitTrue7.475semantic_condition_misclassification; material_validity_failed

Decision, refusal, escalation, audit, and replay artefacts

The uploaded handover export confirms decision_packet_schema=True and audit_packet_schema=True, and references replay_results.json in the artefact manifest. It does not expose per-scenario decision_packet_ref, refusal_packet_ref, escalation_packet_ref, audit_packet_ref, or replay_output_ref in the Markdown body. The evidence status below distinguishes between expected packet type and whether a packet reference is directly visible in the uploaded report.

IDSDK outcomeExpected packet or bundleDirect packet ref in bodyEvidence comment
A1refusedecision packet + refusal packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A2refusedecision packet + refusal packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A3escalatedecision packet + escalation packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A4refusedecision packet + refusal packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A5escalatedecision packet + escalation packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A6escalatedecision packet + escalation packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A7refusedecision packet + refusal packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A8refusedecision packet + refusal packet + audit packet + replay recordNoArtefact bundle referenced; per-scenario packet id not exposed in source Markdown.
A9permitdecision packet + audit packet + replay recordNoRequires governed recovery trace for post-commit fulfilment failure.
A10permitdecision packet + audit packet + replay recordNoRequires residual semantic risk flag; specific packet id not exposed.

Latency evidence

The source export reports full end-to-end run latency with 56 samples. Most stage-level latency rows remain structural estimates with sample_count=1, and p95/p99 are correctly marked not statistically meaningful.

MetricValue
Full run sample count56
Full run min ms0.126
Full run mean ms4.026
Full run median/p50 ms4.175
Full run p95 ms7.516
Full run p99 ms8.913
Full run max ms10.295
Full run budget5000 ms
Timeout count0
Budget passedTrue

Artefact manifest

The uploaded export references a non-truncated artefact bundle. The files below are manifest entries from the source report, not independently opened or verified in this presentation.

TypeFileBytesSHA-256 prefix
intakeintake.json6617a867348121ab175c…
qualificationqualification.json6419d486175f208ba863…
decompositiondecomposition.json4547918d5de591e48ee4e…
governed_specificationgoverned_specification.json17100835d99279661bf4a…
boundary_draftboundary_draft.json54363f61f9b78a2bf3a…
aretaba_reportaretaba_report.json10686536b58fe962061dd…
gspgsp.json93260ac353a3deee23421…
governance_planninggovernance_planning.json59104aaf65b495cf214f…
gag_objectgag_object.json3840fe5e3e00a1c14f8a…
last_prototype_resultlast_prototype_result.json15280fcdac1b9f5dc3d89…
replay_resultsreplay_results.json95374fa13eaefeb0eb90…
active_evidenceactive_evidence.json190881ec2f684b57ab241…
evidence_exportsevidence_exports.json19956aee817bde30834d8…
assuranceassurance.json2049cae5989e9e214a29…
orchestrationorchestration.json683834d2c34528acd116…
usgusg.json119414fcbc145a13e47df…
stressstress.json7209218b204ce191e4243a…
artefact_lineageartefact_lineage.json44192a4ef0352f2e1168…
artefact_hashesartefact_hashes.json605f6c8516d65ea7669…

Evidence limitations and publication readiness

LimitationReason
Draft source exportSource report is marked draft; allowed_use is internal_design_export.
Not a production traceSDK mode is local_dev; structural workbench evidence only.
Packet refs not exposedMarkdown body references schemas and artefact files but not per-scenario packet identifiers.
Adapters incomplete in rendered reportAdapter contracts referenced by count; visible adapter table is empty.
Non-bypass not provennon_bypass_status is structurally_declared, not topology or production evidenced.
Cryptography limitedIntegrity status is None; hashing is local_dev_sha256.
Boundary alias needs mappingemergency_accommodation_settlement_instruction must map to t4 payment_rail_acknowledgement.
Predicate expressions absentReport names predicates but does not render executable expressions.

Defensible conclusion

The uploaded SDK workbench export is sufficient to create a bounded SDK evidence appendix showing that A1 to A10 pressure cases were executed under local development workbench conditions and returned the expected governance outcomes. It is not sufficient, by itself, to claim production validation, independent audit, semantic correctness, non-bypass proof, or deployment readiness. For publication, this focused evidence report should be appended to the mathematical pressure test paper only with the claim boundary above.

Non-claims (summary)

not_production_certification
not_legal_compliance_guarantee
not_semantic_adequacy_guarantee
not_proof_of_non_bypass
not_production_cryptography
not_real_external_commit
not_independent_audit
not_cybersecurity_assurance
not_safety_certification
not_universal_deployment_readiness
structural_local_dev_sdk_workbench_evidence_only

Independent architectural review

For formal review of your agentic workflow architecture, governance model, or pressure-test design, contact Architectural Governance.

Get in Touch