Skip to content

Controlling AI with AI. Conative Gating introduces a second model trained with inverted incentives rewarded for blocking, suspicious by default, adversarial to the LLM’s proposals, using metaphors from human constraint.

License

Notifications You must be signed in to change notification settings

hyperpolymath/conative-gating

Repository files navigation

Conative Gating

1. The Problem

LLMs are trained to be helpful, which makes them systematically violate explicit project constraints. When given rules like "NEVER use TypeScript, use ReScript", LLMs:

  1. Read and acknowledge the constraint

  2. Generate compliant-sounding justification

  3. Violate the constraint anyway

This happens because:

  • Common languages (TypeScript, Python) dominate training data

  • The "helpfulness drive" overrides explicit instructions

  • LLMs lack true "loss aversion" for policy violations

Documentation-based enforcement fails because LLMs "engage with" documentation rather than obey it.

2. The Solution

Conative Gating introduces a second model trained with inverted incentives:

Component Role Analogy

LLM

Task execution (helpful, creative)

Frontal cortex / Direct pathway ("GO")

SLM

Policy enforcement (adversarial, suspicious)

Cerebellum / Indirect pathway ("NO-GO")

Policy Oracle

Deterministic rule checking

Reflex arc (fast, no ML)

Consensus Arbiter

Weighted decision making

Thalamus (integration)

2.1. Key Innovation

Using consensus protocols with asymmetric weighting - the SLM’s votes count 1.5x the LLM’s, creating a natural bias toward inhibition that counters the LLM’s tendency toward helpfulness.

3. Architecture

                     USER REQUEST
                          |
                          v
             +------------------------+
             |   CONTEXT ASSEMBLY     |
             +------------------------+
                          |
           +--------------+--------------+
           |                             |
           v                             v
    +-------------+              +---------------+
    |     LLM     |              |      SLM      |
    | (Proposer)  |              | (Adversarial) |
    +------+------+              +-------+-------+
           |                             |
           +-------------+---------------+
                         |
                         v
             +------------------------+
             |   CONSENSUS ARBITER    |
             | (Modified PBFT)        |
             | SLM weight: 1.5x       |
             +------------------------+
                         |
           +-------------+-------------+
           |             |             |
           v             v             v
       +-------+    +--------+    +-------+
       | ALLOW |    |ESCALATE|    | BLOCK |
       +-------+    +--------+    +-------+

3.1. Three-Tier Evaluation

Policy Oracle (Rust)

Deterministic rule checking - forbidden languages, toolchain rules, security patterns. Fast, no ML needed.

SLM Evaluator (Rust + llama.cpp)

Detects "spirit violations" - technically compliant but violates intent. Catches verbosity, meta-commentary bloat.

Consensus Arbiter (Elixir/OTP)

Modified PBFT with asymmetric weighting. Three outcomes: ALLOW, ESCALATE, BLOCK.

4. Installation

4.1. From Source

git clone https://github.com/hyperpolymath/conative-gating
cd conative-gating
cargo build --release

4.2. Usage

# Scan a directory for policy violations
conative scan ./my-project

# Check a single file
conative check --file src/main.ts

# Check inline content
conative check --content "const x: string = 'hello'"

# Show current policy
conative policy

# Initialize local configuration
conative init

# JSON output for automation
conative scan . --format json

4.3. Exit Codes

Code Meaning

0

Compliant - all checks passed

1

Hard violation detected (blocked)

2

Soft concern detected (warning)

3

Error during execution

5. Default Policy (RSR)

The default policy implements the Rhodium Standard Repository (RSR) language hierarchy:

5.1. Tier 1 - Preferred

  • Rust, Elixir, Zig, Ada, Haskell, ReScript

5.2. Tier 2 - Acceptable (generates warnings)

  • Nickel, Racket

5.3. Forbidden

  • TypeScript, Python*, Go, Java

Note

*Python exception: Allowed in salt/ directories for SaltStack and training/ for ML training scripts.

5.4. Toolchain Rules

  • npm requires deno.json (no npm without Deno)

5.5. Security Patterns

  • Detects hardcoded secrets (passwords, API keys)

6. Configuration

Initialize local configuration:

conative init

This creates .conative/policy.ncl using Nickel for type-safe configuration:

{
  name = "My Project Policy",
  languages = {
    tier1 = [...],
    forbidden = [...],
    exceptions = [
      { language = "python", allowed_paths = ["scripts/"], reason = "Build scripts" }
    ]
  },
  enforcement = {
    slm_weight = 1.5,
    escalate_threshold = 0.4,
    block_threshold = 0.7,
  }
}

7. Decision Matrix

LLM Confidence SLM Violation Score Result

High (>0.8)

Low (<0.3)

ALLOW

High (>0.8)

Med (0.3-0.6)

ESCALATE

High (>0.8)

High (>0.6)

BLOCK

Med (0.5-0.8)

Any >0.4

ESCALATE

Low (<0.5)

Any

ESCALATE

8. Project Structure

conative-gating/
  src/
    main.rs           # CLI application
    oracle/           # Policy Oracle crate (Rust)
    slm/              # SLM Evaluator crate (Rust)
  config/
    policy.ncl        # Default policy (Nickel)
    schema.ncl        # Policy schema
  training/
    compliant/        # Examples that should pass
    violations/       # Examples that should fail
    edge_cases/       # Spirit violations for SLM
  docs/
    ARCHITECTURE.md   # Full design specification
    *.adoc            # Integration documentation

9. Integration

9.1. Claude Code Hook

{
  "hooks": {
    "pre-commit": "conative scan --strict"
  }
}

9.2. Pre-commit Hook

repos:
  - repo: local
    hooks:
      - id: conative-gating
        name: Conative Policy Check
        entry: conative scan
        language: system
        pass_filenames: false

9.3. Programmatic Validation

# Validate structured proposals
conative validate proposal.json --strict

Proposal format:

{
  "id": "uuid",
  "action_type": {"CreateFile": {"path": "src/util.rs"}},
  "content": "file contents here",
  "files_affected": ["src/util.rs"],
  "llm_confidence": 0.95
}
  • NeuroPhone - Neurosymbolic phone AI (integrates Conative Gating)

  • ECHIDNA - Multi-prover orchestration (SLM as another "prover")

  • RSR Framework - Rhodium Standard Repository specifications

  • Axiom.jl - Provable Julia ML (future formal verification)

11. License

SPDX-License-Identifier: AGPL-3.0-or-later

Copyright © 2025 Jonathan D.A. Jewell

About

Controlling AI with AI. Conative Gating introduces a second model trained with inverted incentives rewarded for blocking, suspicious by default, adversarial to the LLM’s proposals, using metaphors from human constraint.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 3

  •  
  •  
  •