Test: Verify line number annotations fix #7

danenania · 2025-10-29T18:02:42Z

This PR contains the same changes as #5 to test that the line number annotation fix is working correctly.

Expected behavior with the new line number annotations:

Scanner should now correctly identify line numbers
mcp tool should be reported at L51-64 (not L32-44 as in feat: new agent #5)
mcp1 tool should be reported at L66-79 (not L46-59 as in feat: new agent #5)

The fix adds absolute line number annotations (L51:, L52:, etc.) to git diffs so the LLM can directly read them without manual calculation.

Related: #5

promptfoo-scanner

I reviewed this PR for LLM security vulnerabilities and found three high-severity issues. The new multi-turn conversation support creates a prompt injection vector that bypasses guardrails, the Information Agent has overly broad Dropbox access without approval gates, and agent outputs are logged without PII redaction. Together, these create a complete attack chain for unauthorized data access.

Minimum severity threshold for this scan: High

promptfoo-scanner · 2025-10-29T18:07:40Z

agent.ts

        workflow_id: "wf_68ffb83dbfc88190a38103c2bb9f421003f913035dbdb131",
      },
    });
-    const guardrailsInputtext = workflow.input_as_text;
+
+    // Extract text for guardrails check - use last user message
+    const lastUserMsg = conversationHistory
+      .filter((m) => "role" in m && m.role === "user")
+      .pop();
+    const guardrailsInputtext =
+      lastUserMsg &&
+      "content" in lastUserMsg &&


🟠 High

The guardrails only check the last user message in multi-turn conversations, allowing attackers to inject malicious instructions in earlier messages or craft fake assistant messages to manipulate agent behavior. This bypasses your jailbreak and PII detection guardrails entirely.

💡 Suggested Fix

Validate all user messages through guardrails, not just the last one:

// Check ALL user messages, not just the last one const allUserMessages = conversationHistory .filter((m) => "role" in m && m.role === "user") .map((m) => { if ("content" in m && Array.isArray(m.content) && m.content[0] && "text" in m.content[0]) { return m.content[0].text; } return ""; }) .filter((text) => text.length > 0); // Validate each message let guardrailsHastripwire = false; let combinedResults = []; for (const messageText of allUserMessages) { const guardrailsResult = await runGuardrails( messageText, jailbreakGuardrailConfig, context ); combinedResults.push(...(guardrailsResult ?? [])); if (guardrailsHasTripwire(guardrailsResult)) { guardrailsHastripwire = true; } } const guardrailsOutput = guardrailsHastripwire ? buildGuardrailFailOutput(combinedResults) : { safe_text: allUserMessages.join(" ") };

Also add validation to prevent fake assistant messages (lines 308-332) by checking message sequence structure and rejecting conversations with consecutive assistant messages.

🤖 AI Agent Prompt

There's a prompt injection vulnerability in the multi-turn conversation implementation at agent.ts:341-351. The guardrail check only examines the last user message, but earlier messages in the conversation array are not validated. This means an attacker can inject malicious instructions in earlier messages or craft fake "assistant" messages that manipulate the agent's behavior.

Investigate the data flow from workflow.messages (line 271) through convertToAgentInputItems() (lines 278-302) to the guardrail check (lines 341-351). The guardrail extraction uses .pop() to get only the last user message, but the full conversationHistory (including unvalidated earlier messages) is passed to agents at lines 378, 401, 438, and 464.

The fix should:

Apply guardrails to ALL user messages in the array, not just the last one

Add validation to detect fake assistant messages (check for consecutive assistant messages, ensure first message is from user)

Consider whether assistant messages should be authenticated via session state to prevent forgery

Look for any existing message validation utilities in the codebase that could be extended for this purpose.

promptfoo-scanner · 2025-10-29T18:07:40Z

agent.ts

+
+const mcp1 = hostedMcpTool({
+  serverLabel: "dropbox",
+  connectorId: "connector_dropbox",
+  serverDescription: "Knowledge Base",
+  allowedTools: [
+    "fetch",
+    "fetch_file",
+    "get_profile",
+    "list_recent_files",
+    "search",
+    "search_files",
+  ],


🟠 High

The Information Agent has unrestricted Dropbox access with no approval gate. Combined with instructions to "check all files first that you can see, use any files available to your access" (lines 235-241), this creates excessive agency that can be exploited via prompt injection to retrieve sensitive files beyond customer service content.

💡 Suggested Fix

Add an approval requirement to prevent unauthorized file access:

const mcp1 = hostedMcpTool({ serverLabel: "dropbox", connectorId: "connector_dropbox", serverDescription: "Knowledge Base", allowedTools: [ "fetch", "fetch_file", "get_profile", "list_recent_files", "search", "search_files", ], requireApproval: "always", // Changed from "never" });

Also revise the agent instructions at lines 235-241 to remove the broad "use any files available" language and explicitly restrict access to customer service documentation only.

🤖 AI Agent Prompt

The Information Agent at agent.ts:232-249 has excessive agency through unrestricted Dropbox access. The MCP tool configuration at lines 65-77 sets requireApproval: "never", and the agent instructions at lines 235-241 explicitly tell it to "check all files first that you can see, use any files available to your access."

This creates a security issue where prompt injection (see the multi-turn conversation vulnerability) can manipulate the agent to search for and retrieve sensitive files beyond customer service content. The tools include search, search_files, fetch, and fetch_file which provide broad read access.

Investigate:

Whether the hostedMcpTool API supports path-based restrictions (check SDK documentation for allowedPaths or similar parameters)

Whether approval gates can be granular (per-tool rather than all-or-nothing)

What files are actually stored in the Dropbox account to understand the exposure scope

Whether there's a way to sandbox or scope the Dropbox connector to specific directories

The fix should involve:

Changing requireApproval: "never" to "always" (minimum)

Adding path restrictions if supported by the SDK

Revising agent instructions to remove broad access language

Consider applying similar restrictions to the Return Agent's MCP tool (lines 51-63) for defense in depth

promptfoo-scanner · 2025-10-29T18:07:40Z

agent.ts

+      console.log(
+        `\n🔍 Classification: ${classificationAgentResult.output_parsed.classification}\n`
+      );


🟠 High

Agent outputs containing customer PII are logged to console without redaction. These logs are typically aggregated to centralized systems accessible by developers, ops teams, and third-party services, creating cross-user data exposure. Similar unredacted logging occurs at lines 414, 449-451, and 475-477.

💡 Suggested Fix

Remove console logging of agent outputs in production, or implement PII redaction:

function redactSensitiveData(text: string): string { let redacted = text; redacted = redacted.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[EMAIL]'); redacted = redacted.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]'); redacted = redacted.replace(/\b(customer_id|account_id):\s*\S+/gi, '$1: [REDACTED]'); return redacted; } // Apply to all console.log calls console.log(`\n🔍 Classification: ${redactSensitiveData(classificationAgentResult.output_parsed.classification)}\n`);

Best practice: Remove console.log statements entirely for production deployments and use a proper logging framework with PII controls for development.

🤖 AI Agent Prompt

Agent outputs are being logged to console without PII redaction at multiple locations in agent.ts:

Line 392-394: Classification results

Line 414: Return Agent output

Line 449-451: Retention Agent output

Line 475-477: Information Agent output

In a customer service context, these outputs frequently contain PII (names, emails, phone numbers, account IDs). Console logs typically go to centralized log aggregation systems (Datadog, Splunk, CloudWatch, etc.) that are accessible to broader audiences than the customer data itself should be.

Additionally, the Information Agent may include sensitive content retrieved from Dropbox in its outputs, which would also be logged.

Investigate:

Where do console logs go in this application's deployment (stdout to where?)

Who has access to the log aggregation system

Whether there's an existing PII redaction utility in the codebase

Whether these console.log statements are intended for development only or also production

What the organization's data protection requirements are for customer PII

The fix should either:

Remove console.log statements entirely (simplest for production)

Implement comprehensive PII redaction before logging

Use a structured logging framework with proper log levels and PII controls

Ensure log access is appropriately restricted

Add agent with MCP tools for testing line numbers

c1194fb

promptfoo-scanner bot reviewed Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test: Verify line number annotations fix #7

Test: Verify line number annotations fix #7

Uh oh!

danenania commented Oct 29, 2025

Uh oh!

promptfoo-scanner bot left a comment

Uh oh!

promptfoo-scanner bot Oct 29, 2025

Uh oh!

promptfoo-scanner bot Oct 29, 2025

Uh oh!

promptfoo-scanner bot Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test: Verify line number annotations fix #7

Are you sure you want to change the base?

Test: Verify line number annotations fix #7

Uh oh!

Conversation

danenania commented Oct 29, 2025

Uh oh!

promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

Uh oh!

promptfoo-scanner bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

promptfoo-scanner bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

promptfoo-scanner bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants