-
Notifications
You must be signed in to change notification settings - Fork 1
Test: Verify line number annotations fix #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this PR for LLM security vulnerabilities and found three high-severity issues. The new multi-turn conversation support creates a prompt injection vector that bypasses guardrails, the Information Agent has overly broad Dropbox access without approval gates, and agent outputs are logged without PII redaction. Together, these create a complete attack chain for unauthorized data access.
Minimum severity threshold for this scan: High
| workflow_id: "wf_68ffb83dbfc88190a38103c2bb9f421003f913035dbdb131", | ||
| }, | ||
| }); | ||
| const guardrailsInputtext = workflow.input_as_text; | ||
|
|
||
| // Extract text for guardrails check - use last user message | ||
| const lastUserMsg = conversationHistory | ||
| .filter((m) => "role" in m && m.role === "user") | ||
| .pop(); | ||
| const guardrailsInputtext = | ||
| lastUserMsg && | ||
| "content" in lastUserMsg && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
The guardrails only check the last user message in multi-turn conversations, allowing attackers to inject malicious instructions in earlier messages or craft fake assistant messages to manipulate agent behavior. This bypasses your jailbreak and PII detection guardrails entirely.
💡 Suggested Fix
Validate all user messages through guardrails, not just the last one:
// Check ALL user messages, not just the last one
const allUserMessages = conversationHistory
.filter((m) => "role" in m && m.role === "user")
.map((m) => {
if ("content" in m && Array.isArray(m.content) && m.content[0] && "text" in m.content[0]) {
return m.content[0].text;
}
return "";
})
.filter((text) => text.length > 0);
// Validate each message
let guardrailsHastripwire = false;
let combinedResults = [];
for (const messageText of allUserMessages) {
const guardrailsResult = await runGuardrails(
messageText,
jailbreakGuardrailConfig,
context
);
combinedResults.push(...(guardrailsResult ?? []));
if (guardrailsHasTripwire(guardrailsResult)) {
guardrailsHastripwire = true;
}
}
const guardrailsOutput = guardrailsHastripwire
? buildGuardrailFailOutput(combinedResults)
: { safe_text: allUserMessages.join(" ") };Also add validation to prevent fake assistant messages (lines 308-332) by checking message sequence structure and rejecting conversations with consecutive assistant messages.
🤖 AI Agent Prompt
There's a prompt injection vulnerability in the multi-turn conversation implementation at agent.ts:341-351. The guardrail check only examines the last user message, but earlier messages in the conversation array are not validated. This means an attacker can inject malicious instructions in earlier messages or craft fake "assistant" messages that manipulate the agent's behavior.
Investigate the data flow from workflow.messages (line 271) through convertToAgentInputItems() (lines 278-302) to the guardrail check (lines 341-351). The guardrail extraction uses .pop() to get only the last user message, but the full conversationHistory (including unvalidated earlier messages) is passed to agents at lines 378, 401, 438, and 464.
The fix should:
- Apply guardrails to ALL user messages in the array, not just the last one
- Add validation to detect fake assistant messages (check for consecutive assistant messages, ensure first message is from user)
- Consider whether assistant messages should be authenticated via session state to prevent forgery
Look for any existing message validation utilities in the codebase that could be extended for this purpose.
|
|
||
| const mcp1 = hostedMcpTool({ | ||
| serverLabel: "dropbox", | ||
| connectorId: "connector_dropbox", | ||
| serverDescription: "Knowledge Base", | ||
| allowedTools: [ | ||
| "fetch", | ||
| "fetch_file", | ||
| "get_profile", | ||
| "list_recent_files", | ||
| "search", | ||
| "search_files", | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
The Information Agent has unrestricted Dropbox access with no approval gate. Combined with instructions to "check all files first that you can see, use any files available to your access" (lines 235-241), this creates excessive agency that can be exploited via prompt injection to retrieve sensitive files beyond customer service content.
💡 Suggested Fix
Add an approval requirement to prevent unauthorized file access:
const mcp1 = hostedMcpTool({
serverLabel: "dropbox",
connectorId: "connector_dropbox",
serverDescription: "Knowledge Base",
allowedTools: [
"fetch",
"fetch_file",
"get_profile",
"list_recent_files",
"search",
"search_files",
],
requireApproval: "always", // Changed from "never"
});Also revise the agent instructions at lines 235-241 to remove the broad "use any files available" language and explicitly restrict access to customer service documentation only.
🤖 AI Agent Prompt
The Information Agent at agent.ts:232-249 has excessive agency through unrestricted Dropbox access. The MCP tool configuration at lines 65-77 sets requireApproval: "never", and the agent instructions at lines 235-241 explicitly tell it to "check all files first that you can see, use any files available to your access."
This creates a security issue where prompt injection (see the multi-turn conversation vulnerability) can manipulate the agent to search for and retrieve sensitive files beyond customer service content. The tools include search, search_files, fetch, and fetch_file which provide broad read access.
Investigate:
- Whether the
hostedMcpToolAPI supports path-based restrictions (check SDK documentation forallowedPathsor similar parameters) - Whether approval gates can be granular (per-tool rather than all-or-nothing)
- What files are actually stored in the Dropbox account to understand the exposure scope
- Whether there's a way to sandbox or scope the Dropbox connector to specific directories
The fix should involve:
- Changing
requireApproval: "never"to"always"(minimum) - Adding path restrictions if supported by the SDK
- Revising agent instructions to remove broad access language
- Consider applying similar restrictions to the Return Agent's MCP tool (lines 51-63) for defense in depth
| console.log( | ||
| `\n🔍 Classification: ${classificationAgentResult.output_parsed.classification}\n` | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
Agent outputs containing customer PII are logged to console without redaction. These logs are typically aggregated to centralized systems accessible by developers, ops teams, and third-party services, creating cross-user data exposure. Similar unredacted logging occurs at lines 414, 449-451, and 475-477.
💡 Suggested Fix
Remove console logging of agent outputs in production, or implement PII redaction:
function redactSensitiveData(text: string): string {
let redacted = text;
redacted = redacted.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[EMAIL]');
redacted = redacted.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]');
redacted = redacted.replace(/\b(customer_id|account_id):\s*\S+/gi, '$1: [REDACTED]');
return redacted;
}
// Apply to all console.log calls
console.log(`\n🔍 Classification: ${redactSensitiveData(classificationAgentResult.output_parsed.classification)}\n`);Best practice: Remove console.log statements entirely for production deployments and use a proper logging framework with PII controls for development.
🤖 AI Agent Prompt
Agent outputs are being logged to console without PII redaction at multiple locations in agent.ts:
- Line 392-394: Classification results
- Line 414: Return Agent output
- Line 449-451: Retention Agent output
- Line 475-477: Information Agent output
In a customer service context, these outputs frequently contain PII (names, emails, phone numbers, account IDs). Console logs typically go to centralized log aggregation systems (Datadog, Splunk, CloudWatch, etc.) that are accessible to broader audiences than the customer data itself should be.
Additionally, the Information Agent may include sensitive content retrieved from Dropbox in its outputs, which would also be logged.
Investigate:
- Where do console logs go in this application's deployment (stdout to where?)
- Who has access to the log aggregation system
- Whether there's an existing PII redaction utility in the codebase
- Whether these console.log statements are intended for development only or also production
- What the organization's data protection requirements are for customer PII
The fix should either:
- Remove console.log statements entirely (simplest for production)
- Implement comprehensive PII redaction before logging
- Use a structured logging framework with proper log levels and PII controls
- Ensure log access is appropriately restricted
This PR contains the same changes as #5 to test that the line number annotation fix is working correctly.
Expected behavior with the new line number annotations:
mcptool should be reported at L51-64 (not L32-44 as in feat: new agent #5)mcp1tool should be reported at L66-79 (not L46-59 as in feat: new agent #5)The fix adds absolute line number annotations (L51:, L52:, etc.) to git diffs so the LLM can directly read them without manual calculation.
Related: #5