Skip to content

LLM Linter for Markdown-Based Terraphim KG Schemas #292

@AlexMikhalev

Description

@AlexMikhalev

🎯 Overview

Design and implement a specialized linter for Large Language Models (LLMs) that validates markdown-based Terraphim Knowledge Graph (KG) schemas. This linter will leverage existing terraphim-automata and graph-embeddings infrastructure to ensure safe, consistent, and valid schema generation by AI agents.

📋 Requirements

Core Functionality

  • Schema Validation Engine: Parse and validate markdown frontmatter and content structure
  • Security Integration: Validate against repository-specific security policies using terraphim-automata
  • Type System Validation: Ensure data type definitions follow terraphim_types conventions
  • Command Definition Validator: Validate AI agent command definitions against security policies
  • Knowledge Graph Consistency: Validate entity relationships and properties
  • Performance Optimization: Fast validation using Aho-Corasick pattern matching

Integration Requirements

  • Terraphim-Automata Integration: Leverage existing fuzzy matching and thesaurus capabilities
  • Security Model Integration: Use existing SecurityConfig from terraphim_mcp_server
  • Graph-Embeddings Support: Validate semantic consistency with existing knowledge graphs
  • MCP Server Integration: Work with existing Model Context Protocol infrastructure

🏗️ System Design

Architecture Components

1. Core Linter Engine (crates/terraphim_linter/)

pub struct KGLinter {
    validation_rules: Vec<Box<dyn ValidationRule>>,
    security_validator: SecurityValidator,
    type_validator: TypeValidator,
    automata_index: AutocompleteIndex,
}

pub trait ValidationRule {
    fn validate(&self, document: &MarkdownDocument) -> Vec<LintError>;
    fn rule_name(&self) -> &'static str;
    fn severity(&self) -> ValidationSeverity;
}

2. Security Permission Validator

  • Integrates with existing SecurityConfig from terraphim_mcp_server
  • Uses terraphim-automata for fast command pattern matching
  • Supports repository-specific security profiles (.terraphim/security.json)
  • Implements learning system for permission adaptation

3. Data Type Definition Validator

  • Validates entity types, relationships, and properties
  • Ensures consistency with terraphim_types system
  • Supports extensible type definitions
  • Checks for circular dependencies and invalid hierarchies

4. Schema Structure Validator

  • Frontmatter validation (YAML structure, required fields)
  • Markdown content validation (link formats, syntax)
  • Relationship validation (symmetry, cardinality, type compatibility)
  • Semantic validation using graph embeddings

Validation Rules Pipeline

Phase 1: Frontmatter Validation

  • Required fields: schema_version, entity_types, security_level
  • Valid schema versions and compatibility checks
  • Proper YAML syntax and structure validation
  • Type definition completeness and consistency

Phase 2: Entity Relationship Validation

  • Relationship symmetry and cardinality checks
  • Type compatibility validation
  • Circular dependency detection
  • Referential integrity verification

Phase 3: Security Permission Validation

  • Command whitelist/blacklist validation
  • Synonym resolution using terraphim-automata fuzzy matching
  • Repository-specific rule validation
  • Learning system integration for adaptive permissions

Phase 4: Semantic Consistency Validation

  • Entity similarity validation using graph embeddings
  • Relationship strength checking
  • Knowledge graph consistency verification
  • Conflict detection and resolution suggestions

🧠 User Journeys (Leveraging Agent Workflows)

Journey 1: Schema Creation Workflow

Based on: Prompt Chaining Pattern (examples/agent-workflows/1-prompt-chaining/)

Flow:

  1. Specification Phase → LLM generates initial schema structure
  2. Design Phase → Linter validates entity types and relationships
  3. Planning Phase → Linter checks security permissions and constraints
  4. Implementation Phase → Linter ensures semantic consistency
  5. Testing Phase → Linter validates complete schema integrity
  6. Deployment Phase → Linter certifies schema ready for production

Value: Sequential validation ensures each phase builds correctly on previous work, preventing schema corruption and maintaining consistency throughout development.

Journey 2: Multi-Perspective Schema Review

Based on: Parallelization Pattern (examples/agent-workflows/3-parallelization/)

Flow:

  1. Analytical Perspective → Structural validation and syntax checking
  2. Creative Perspective → Relationship innovation and pattern discovery
  3. Practical Perspective → Usability and implementation feasibility
  4. Consensus Building → Aggregate validation results and resolve conflicts
  5. Quality Assurance → Final validation against all perspectives

Value: Multiple validation perspectives ensure comprehensive schema review, catching issues that single-perspective validation might miss.

Journey 3: Intelligent Schema Routing

Based on: Routing Pattern (examples/agent-workflows/2-routing/)

Flow:

  1. Complexity Analysis → Assess schema complexity and validation requirements
  2. Resource Evaluation → Determine available validation resources and time constraints
  3. Strategy Selection → Choose appropriate validation strategy (fast/thorough)
  4. Adaptive Validation → Adjust validation depth based on context
  5. Optimization → Suggest improvements based on validation results

Value: Intelligent resource allocation ensures efficient validation while maintaining quality standards.

Journey 4: Specialized Validation Workers

Based on: Orchestrator-Workers Pattern (examples/agent-workflows/4-orchestrator-workers/)

Flow:

  1. Orchestrator → Coordinates validation workflow and task distribution
  2. Syntax Worker → Validates markdown structure and YAML syntax
  3. Security Worker → Checks permissions and command definitions
  4. Semantic Worker → Validates relationships and type consistency
  5. Knowledge Integration → Aggregates results and builds validation report
  6. Quality Assurance → Final review and certification

Value: Specialized workers provide expert validation for different aspects, ensuring thorough and accurate results.

Journey 5: Iterative Schema Refinement

Based on: Evaluator-Optimizer Pattern (examples/agent-workflows/5-evaluator-optimizer/)

Flow:

  1. Generate Schema → Create initial schema structure
  2. Evaluate Quality → Run comprehensive validation suite
  3. Identify Issues → Categorize and prioritize validation errors
  4. Optimize Schema → Apply fixes and improvements
  5. Repeat Loop → Continue until quality threshold met
  6. Final Validation → Certify schema meets all quality criteria

Value: Iterative improvement ensures schemas evolve to high quality standards through continuous validation and refinement.

🔧 Technical Implementation

Core Dependencies

  • terraphim_automata: Fast pattern matching and fuzzy string comparison
  • terraphim_types: Type system and data structures
  • terraphim_rolegraph: Knowledge graph validation and consistency
  • terraphim_mcp_server: Security model integration
  • serde: Serialization/deserialization of markdown and YAML
  • yaml-rust: YAML parsing and validation
  • thiserror: Error handling and reporting

File Structure

crates/terraphim_linter/
├── src/
│   ├── lib.rs              # Main linter interface
│   ├── validation/
│   │   ├── mod.rs        # Validation module exports
│   │   ├── engine.rs      # Core validation engine
│   │   ├── rules.rs       # Built-in validation rules
│   │   ├── security.rs    # Security validation logic
│   │   ├── types.rs       # Type system validation
│   │   └── schema.rs      # Schema structure validation
│   ├── markdown/
│   │   ├── mod.rs        # Markdown parsing module
│   │   ├── parser.rs      # Frontmatter and content parsing
│   │   └── ast.rs         # Abstract syntax tree for markdown
│   └── report/
│       ├── mod.rs        # Reporting module
│       ├── formatter.rs   # Error formatting and display
│       └── exporter.rs    # Multiple export formats
├── tests/
│   ├── integration_tests.rs  # End-to-end validation tests
│   ├── security_tests.rs    # Security validation tests
│   └── schema_tests.rs     # Schema validation tests
└── examples/
    ├── basic_validation.rs   # Simple validation examples
    ├── security_rules.rs     # Security rule examples
    └── complex_schemas.rs   # Complex schema validation

API Design

// Main linter interface
impl KGLinter {
    pub fn new(config: LinterConfig) -> Self;
    pub fn validate_schema(&self, content: &str) -> ValidationResult;
    pub fn validate_document(&self, doc: &MarkdownDocument) -> ValidationResult;
    pub fn add_rule(&mut self, rule: Box<dyn ValidationRule>);
    pub fn configure_security(&mut self, security_config: SecurityConfig);
}

// Validation result structure
pub struct ValidationResult {
    pub is_valid: bool,
    pub errors: Vec<LintError>,
    pub warnings: Vec<LintWarning>,
    pub suggestions: Vec<SchemaSuggestion>,
    pub metrics: ValidationMetrics,
}

// Error and warning types
pub enum LintError {
    SyntaxError { line: usize, message: String },
    SecurityViolation { command: String, level: SecurityLevel },
    TypeMismatch { expected: String, found: String },
    RelationshipError { entity: String, relationship: String, issue: String },
}

📊 Success Metrics

Performance Targets

  • Validation Speed: <10ms for typical schema files (leveraging terraphim-automata)
  • Accuracy: >95% detection of schema issues and security violations
  • Coverage: Support for all terraphim_types and security configurations
  • Integration: Seamless integration with existing MCP server infrastructure

Quality Metrics

  • False Positive Rate: <5% (minimize unnecessary validation failures)
  • False Negative Rate: <2% (catch actual schema issues)
  • User Satisfaction: Reduce schema validation time by 70%
  • Learning Effectiveness: 70% reduction in repeated security prompts

🎯 Acceptance Criteria

Must-Have Features

  • Design Document: Comprehensive system design with architecture details
  • Core Engine: Functional validation engine with rule system
  • Security Integration: Full integration with existing SecurityConfig
  • Automata Integration: Fast pattern matching using terraphim-automata
  • Type Validation: Complete data type definition validation
  • Test Suite: Comprehensive test coverage (>90%)
  • Documentation: API documentation and usage examples

Should-Have Features

  • IDE Integration: VS Code extension for real-time validation
  • CLI Tool: Command-line interface for batch validation
  • Configuration: Customizable validation rules and severity levels
  • Export Formats: Multiple output formats (JSON, YAML, HTML)

Could-Have Features

  • Auto-Fix: Suggest and apply automatic fixes for common issues
  • Learning System: Adapt validation rules based on user feedback
  • Web Interface: Browser-based validation tool
  • API Service: RESTful validation service for integration

🔄 Development Phases

Phase 1: Foundation (Week 1)

  • Implement core validation engine and rule system
  • Create markdown parser and AST structure
  • Basic schema validation rules
  • Unit test framework setup

Phase 2: Security Integration (Week 2)

  • Integrate with existing SecurityConfig
  • Implement security validation rules
  • Command permission checking
  • Security test suite

Phase 3: Advanced Validation (Week 3)

  • Type system validation
  • Relationship consistency checking
  • Semantic validation using graph embeddings
  • Complex schema validation

Phase 4: Integration & Polish (Week 4)

  • Integration with terraphim-automata
  • Performance optimization
  • Error reporting and formatting
  • Documentation and examples

Phase 5: Testing & Release (Week 5)

  • Comprehensive test suite
  • Integration tests with existing components
  • Performance benchmarking
  • Release preparation

🤝 Dependencies & Coordination

Required Components

  • terraphim_automata: Already implemented with Aho-Corasick and fuzzy matching
  • terraphim_mcp_server: Security model and command validation infrastructure
  • terraphim_rolegraph: Knowledge graph structure and validation
  • terraphim_types: Type system and data structures

Integration Points

  • MCP Server: Add linter as validation tool for AI agent workflows
  • Agent Workflows: Integrate validation into existing workflow patterns
  • Security System: Extend existing security configuration for schema validation
  • Graph System: Use existing knowledge graph infrastructure for semantic validation

📈 Impact & Benefits

For LLM Agents

  • Safety: Prevents generation of invalid or harmful schemas
  • Consistency: Ensures all schemas follow established patterns
  • Quality: Improves overall quality of generated knowledge graphs
  • Efficiency: Reduces validation time and iteration cycles

For Terraphim Ecosystem

  • Standardization: Establishes clear validation standards for KG schemas
  • Security: Extends existing security model to cover schema validation
  • Performance: Leverages existing automata for fast validation
  • Extensibility: Rule-based system allows custom validation requirements

For Users

  • Reliability: Ensures schemas are valid and consistent
  • Productivity: Reduces time spent on manual schema validation
  • Learning: Improves schema quality through iterative feedback
  • Confidence: Provides assurance in schema correctness

This issue represents a strategic enhancement to the Terraphim AI ecosystem, building on existing strengths in automata, security, and graph processing to create a comprehensive validation system specifically designed for LLM-generated markdown schemas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions