LLM Linter for Markdown-Based Terraphim KG Schemas

## 🎯 Overview

Design and implement a specialized linter for Large Language Models (LLMs) that validates markdown-based Terraphim Knowledge Graph (KG) schemas. This linter will leverage existing terraphim-automata and graph-embeddings infrastructure to ensure safe, consistent, and valid schema generation by AI agents.

## 📋 Requirements

### Core Functionality
- [ ] **Schema Validation Engine**: Parse and validate markdown frontmatter and content structure
- [ ] **Security Integration**: Validate against repository-specific security policies using terraphim-automata
- [ ] **Type System Validation**: Ensure data type definitions follow terraphim_types conventions
- [ ] **Command Definition Validator**: Validate AI agent command definitions against security policies
- [ ] **Knowledge Graph Consistency**: Validate entity relationships and properties
- [ ] **Performance Optimization**: Fast validation using Aho-Corasick pattern matching

### Integration Requirements
- [ ] **Terraphim-Automata Integration**: Leverage existing fuzzy matching and thesaurus capabilities
- [ ] **Security Model Integration**: Use existing SecurityConfig from terraphim_mcp_server
- [ ] **Graph-Embeddings Support**: Validate semantic consistency with existing knowledge graphs
- [ ] **MCP Server Integration**: Work with existing Model Context Protocol infrastructure

## 🏗️ System Design

### Architecture Components

#### 1. Core Linter Engine (`crates/terraphim_linter/`)
```rust
pub struct KGLinter {
    validation_rules: Vec<Box<dyn ValidationRule>>,
    security_validator: SecurityValidator,
    type_validator: TypeValidator,
    automata_index: AutocompleteIndex,
}

pub trait ValidationRule {
    fn validate(&self, document: &MarkdownDocument) -> Vec<LintError>;
    fn rule_name(&self) -> &'static str;
    fn severity(&self) -> ValidationSeverity;
}
```

#### 2. Security Permission Validator
- Integrates with existing `SecurityConfig` from terraphim_mcp_server
- Uses terraphim-automata for fast command pattern matching
- Supports repository-specific security profiles (`.terraphim/security.json`)
- Implements learning system for permission adaptation

#### 3. Data Type Definition Validator
- Validates entity types, relationships, and properties
- Ensures consistency with terraphim_types system
- Supports extensible type definitions
- Checks for circular dependencies and invalid hierarchies

#### 4. Schema Structure Validator
- Frontmatter validation (YAML structure, required fields)
- Markdown content validation (link formats, syntax)
- Relationship validation (symmetry, cardinality, type compatibility)
- Semantic validation using graph embeddings

### Validation Rules Pipeline

#### Phase 1: Frontmatter Validation
- Required fields: `schema_version`, `entity_types`, `security_level`
- Valid schema versions and compatibility checks
- Proper YAML syntax and structure validation
- Type definition completeness and consistency

#### Phase 2: Entity Relationship Validation
- Relationship symmetry and cardinality checks
- Type compatibility validation
- Circular dependency detection
- Referential integrity verification

#### Phase 3: Security Permission Validation
- Command whitelist/blacklist validation
- Synonym resolution using terraphim-automata fuzzy matching
- Repository-specific rule validation
- Learning system integration for adaptive permissions

#### Phase 4: Semantic Consistency Validation
- Entity similarity validation using graph embeddings
- Relationship strength checking
- Knowledge graph consistency verification
- Conflict detection and resolution suggestions

## 🧠 User Journeys (Leveraging Agent Workflows)

### Journey 1: Schema Creation Workflow
**Based on**: Prompt Chaining Pattern (`examples/agent-workflows/1-prompt-chaining/`)

**Flow**:
1. **Specification Phase** → LLM generates initial schema structure
2. **Design Phase** → Linter validates entity types and relationships
3. **Planning Phase** → Linter checks security permissions and constraints
4. **Implementation Phase** → Linter ensures semantic consistency
5. **Testing Phase** → Linter validates complete schema integrity
6. **Deployment Phase** → Linter certifies schema ready for production

**Value**: Sequential validation ensures each phase builds correctly on previous work, preventing schema corruption and maintaining consistency throughout development.

### Journey 2: Multi-Perspective Schema Review
**Based on**: Parallelization Pattern (`examples/agent-workflows/3-parallelization/`)

**Flow**:
1. **Analytical Perspective** → Structural validation and syntax checking
2. **Creative Perspective** → Relationship innovation and pattern discovery
3. **Practical Perspective** → Usability and implementation feasibility
4. **Consensus Building** → Aggregate validation results and resolve conflicts
5. **Quality Assurance** → Final validation against all perspectives

**Value**: Multiple validation perspectives ensure comprehensive schema review, catching issues that single-perspective validation might miss.

### Journey 3: Intelligent Schema Routing
**Based on**: Routing Pattern (`examples/agent-workflows/2-routing/`)

**Flow**:
1. **Complexity Analysis** → Assess schema complexity and validation requirements
2. **Resource Evaluation** → Determine available validation resources and time constraints
3. **Strategy Selection** → Choose appropriate validation strategy (fast/thorough)
4. **Adaptive Validation** → Adjust validation depth based on context
5. **Optimization** → Suggest improvements based on validation results

**Value**: Intelligent resource allocation ensures efficient validation while maintaining quality standards.

### Journey 4: Specialized Validation Workers
**Based on**: Orchestrator-Workers Pattern (`examples/agent-workflows/4-orchestrator-workers/`)

**Flow**:
1. **Orchestrator** → Coordinates validation workflow and task distribution
2. **Syntax Worker** → Validates markdown structure and YAML syntax
3. **Security Worker** → Checks permissions and command definitions
4. **Semantic Worker** → Validates relationships and type consistency
5. **Knowledge Integration** → Aggregates results and builds validation report
6. **Quality Assurance** → Final review and certification

**Value**: Specialized workers provide expert validation for different aspects, ensuring thorough and accurate results.

### Journey 5: Iterative Schema Refinement
**Based on**: Evaluator-Optimizer Pattern (`examples/agent-workflows/5-evaluator-optimizer/`)

**Flow**:
1. **Generate Schema** → Create initial schema structure
2. **Evaluate Quality** → Run comprehensive validation suite
3. **Identify Issues** → Categorize and prioritize validation errors
4. **Optimize Schema** → Apply fixes and improvements
5. **Repeat Loop** → Continue until quality threshold met
6. **Final Validation** → Certify schema meets all quality criteria

**Value**: Iterative improvement ensures schemas evolve to high quality standards through continuous validation and refinement.

## 🔧 Technical Implementation

### Core Dependencies
- `terraphim_automata`: Fast pattern matching and fuzzy string comparison
- `terraphim_types`: Type system and data structures
- `terraphim_rolegraph`: Knowledge graph validation and consistency
- `terraphim_mcp_server`: Security model integration
- `serde`: Serialization/deserialization of markdown and YAML
- `yaml-rust`: YAML parsing and validation
- `thiserror`: Error handling and reporting

### File Structure
```
crates/terraphim_linter/
├── src/
│   ├── lib.rs              # Main linter interface
│   ├── validation/
│   │   ├── mod.rs        # Validation module exports
│   │   ├── engine.rs      # Core validation engine
│   │   ├── rules.rs       # Built-in validation rules
│   │   ├── security.rs    # Security validation logic
│   │   ├── types.rs       # Type system validation
│   │   └── schema.rs      # Schema structure validation
│   ├── markdown/
│   │   ├── mod.rs        # Markdown parsing module
│   │   ├── parser.rs      # Frontmatter and content parsing
│   │   └── ast.rs         # Abstract syntax tree for markdown
│   └── report/
│       ├── mod.rs        # Reporting module
│       ├── formatter.rs   # Error formatting and display
│       └── exporter.rs    # Multiple export formats
├── tests/
│   ├── integration_tests.rs  # End-to-end validation tests
│   ├── security_tests.rs    # Security validation tests
│   └── schema_tests.rs     # Schema validation tests
└── examples/
    ├── basic_validation.rs   # Simple validation examples
    ├── security_rules.rs     # Security rule examples
    └── complex_schemas.rs   # Complex schema validation
```

### API Design
```rust
// Main linter interface
impl KGLinter {
    pub fn new(config: LinterConfig) -> Self;
    pub fn validate_schema(&self, content: &str) -> ValidationResult;
    pub fn validate_document(&self, doc: &MarkdownDocument) -> ValidationResult;
    pub fn add_rule(&mut self, rule: Box<dyn ValidationRule>);
    pub fn configure_security(&mut self, security_config: SecurityConfig);
}

// Validation result structure
pub struct ValidationResult {
    pub is_valid: bool,
    pub errors: Vec<LintError>,
    pub warnings: Vec<LintWarning>,
    pub suggestions: Vec<SchemaSuggestion>,
    pub metrics: ValidationMetrics,
}

// Error and warning types
pub enum LintError {
    SyntaxError { line: usize, message: String },
    SecurityViolation { command: String, level: SecurityLevel },
    TypeMismatch { expected: String, found: String },
    RelationshipError { entity: String, relationship: String, issue: String },
}
```

## 📊 Success Metrics

### Performance Targets
- **Validation Speed**: <10ms for typical schema files (leveraging terraphim-automata)
- **Accuracy**: >95% detection of schema issues and security violations
- **Coverage**: Support for all terraphim_types and security configurations
- **Integration**: Seamless integration with existing MCP server infrastructure

### Quality Metrics
- **False Positive Rate**: <5% (minimize unnecessary validation failures)
- **False Negative Rate**: <2% (catch actual schema issues)
- **User Satisfaction**: Reduce schema validation time by 70%
- **Learning Effectiveness**: 70% reduction in repeated security prompts

## 🎯 Acceptance Criteria

### Must-Have Features
- [x] **Design Document**: Comprehensive system design with architecture details
- [ ] **Core Engine**: Functional validation engine with rule system
- [ ] **Security Integration**: Full integration with existing SecurityConfig
- [ ] **Automata Integration**: Fast pattern matching using terraphim-automata
- [ ] **Type Validation**: Complete data type definition validation
- [ ] **Test Suite**: Comprehensive test coverage (>90%)
- [ ] **Documentation**: API documentation and usage examples

### Should-Have Features
- [ ] **IDE Integration**: VS Code extension for real-time validation
- [ ] **CLI Tool**: Command-line interface for batch validation
- [ ] **Configuration**: Customizable validation rules and severity levels
- [ ] **Export Formats**: Multiple output formats (JSON, YAML, HTML)

### Could-Have Features
- [ ] **Auto-Fix**: Suggest and apply automatic fixes for common issues
- [ ] **Learning System**: Adapt validation rules based on user feedback
- [ ] **Web Interface**: Browser-based validation tool
- [ ] **API Service**: RESTful validation service for integration

## 🔄 Development Phases

### Phase 1: Foundation (Week 1)
- Implement core validation engine and rule system
- Create markdown parser and AST structure
- Basic schema validation rules
- Unit test framework setup

### Phase 2: Security Integration (Week 2)
- Integrate with existing SecurityConfig
- Implement security validation rules
- Command permission checking
- Security test suite

### Phase 3: Advanced Validation (Week 3)
- Type system validation
- Relationship consistency checking
- Semantic validation using graph embeddings
- Complex schema validation

### Phase 4: Integration & Polish (Week 4)
- Integration with terraphim-automata
- Performance optimization
- Error reporting and formatting
- Documentation and examples

### Phase 5: Testing & Release (Week 5)
- Comprehensive test suite
- Integration tests with existing components
- Performance benchmarking
- Release preparation

## 🤝 Dependencies & Coordination

### Required Components
- **terraphim_automata**: Already implemented with Aho-Corasick and fuzzy matching
- **terraphim_mcp_server**: Security model and command validation infrastructure
- **terraphim_rolegraph**: Knowledge graph structure and validation
- **terraphim_types**: Type system and data structures

### Integration Points
- **MCP Server**: Add linter as validation tool for AI agent workflows
- **Agent Workflows**: Integrate validation into existing workflow patterns
- **Security System**: Extend existing security configuration for schema validation
- **Graph System**: Use existing knowledge graph infrastructure for semantic validation

## 📈 Impact & Benefits

### For LLM Agents
- **Safety**: Prevents generation of invalid or harmful schemas
- **Consistency**: Ensures all schemas follow established patterns
- **Quality**: Improves overall quality of generated knowledge graphs
- **Efficiency**: Reduces validation time and iteration cycles

### For Terraphim Ecosystem
- **Standardization**: Establishes clear validation standards for KG schemas
- **Security**: Extends existing security model to cover schema validation
- **Performance**: Leverages existing automata for fast validation
- **Extensibility**: Rule-based system allows custom validation requirements

### For Users
- **Reliability**: Ensures schemas are valid and consistent
- **Productivity**: Reduces time spent on manual schema validation
- **Learning**: Improves schema quality through iterative feedback
- **Confidence**: Provides assurance in schema correctness

---

This issue represents a strategic enhancement to the Terraphim AI ecosystem, building on existing strengths in automata, security, and graph processing to create a comprehensive validation system specifically designed for LLM-generated markdown schemas.

LLM Linter for Markdown-Based Terraphim KG Schemas #292

Description

🎯 Overview

📋 Requirements

Core Functionality

Integration Requirements

🏗️ System Design

Architecture Components

1. Core Linter Engine (crates/terraphim_linter/)

2. Security Permission Validator

3. Data Type Definition Validator

4. Schema Structure Validator

Validation Rules Pipeline

Phase 1: Frontmatter Validation

Phase 2: Entity Relationship Validation

Phase 3: Security Permission Validation

Phase 4: Semantic Consistency Validation

🧠 User Journeys (Leveraging Agent Workflows)

Journey 1: Schema Creation Workflow

Journey 2: Multi-Perspective Schema Review

Journey 3: Intelligent Schema Routing

Journey 4: Specialized Validation Workers

Journey 5: Iterative Schema Refinement

🔧 Technical Implementation

Core Dependencies

File Structure

API Design

📊 Success Metrics

Performance Targets

Quality Metrics

🎯 Acceptance Criteria

Must-Have Features

Should-Have Features

Could-Have Features

🔄 Development Phases

Phase 1: Foundation (Week 1)

Phase 2: Security Integration (Week 2)

Phase 3: Advanced Validation (Week 3)

Phase 4: Integration & Polish (Week 4)

Phase 5: Testing & Release (Week 5)

🤝 Dependencies & Coordination

Required Components

Integration Points

📈 Impact & Benefits

For LLM Agents

For Terraphim Ecosystem

For Users

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Core Linter Engine (`crates/terraphim_linter/`)