Skip to content

Troubleshooting

When the framework behaves unexpectedly, the fastest path is usually not “inspect the prompt more.” It is to debug in the same order the runtime itself works.

Most issues fall into one of these buckets:

  • setup and credentials
  • permissions and data access
  • configuration quality
  • tool selection or tool execution
  • delivery, session, or continuity behavior
  • custom extension mistakes
Operational Playbook Outside In Runtime First

When a request fails or behaves strangely, ask these questions in order:

  1. Did the request reach the expected entrypoint?
  2. Did it resolve to the expected agent?
  3. Did the runtime call the provider successfully?
  4. Did the model choose the expected capability?
  5. Did the tool execute successfully and under the correct permissions?
  6. Did delivery or session continuity fail after execution succeeded?

This order matters because it avoids blaming the model for failures that happened before the model ever had a chance to behave.

Agent Does Not Respond

Check these first:

  1. Confirm the provider Named Credential and External Credential are active.
  2. Verify the LLMConfiguration__c record is active and points to the expected adapter.
  3. Confirm the AIAgentDefinition__c record is active and bound to that configuration.
  4. Reproduce with debug logs enabled.

If nothing reaches the provider, the issue is almost never prompt quality. It is usually deployment, credentials, activation state, or entrypoint configuration.

Permission or Access Failures

Symptoms usually include missing fields, empty results, or explicit permission-denied errors.

Check:

  • object CRUD for the running user
  • field-level security on queried or updated fields
  • sharing access for the target records
  • whether custom code is using user-mode query and DML patterns

These failures often look like “the AI gave a weak answer,” when the real problem is that the runtime could not legally see or change the data it needed.

Unexpected Tool Selection

The most common causes are configuration quality, not model instability.

Make sure the capability description says:

  • when to use the tool
  • when not to use it
  • what identifier or input is required
  • examples of correct usage

If the wrong tool is selected repeatedly, inspect the capability design before rewriting prompts. Most tool-choice issues are caused by overlap, vague descriptions, or loose schemas.

Write Actions Behave Unsafely

If updates, emails, or external callouts happen without the expected review:

  1. Check HITLMode__c on the capability.
  2. Confirm the capability is the one the model actually called.
  3. Review the execution steps to see whether confirmation or approval logic was entered.
  4. For ConfirmationThenApproval, verify the capability is not asynchronous and the agent is conversational.

This is why ExecutionStep__c matters so much. It tells you whether the runtime skipped the control, whether the wrong capability fired, or whether the configuration never actually expressed the control you thought it did.

Slow or Failing Executions

Too much context

Reduce HistoryTurnLimit__c, review context provider size, and keep prompts tighter.

Too many capabilities

Narrow the tool set. Broad tool menus make model selection slower and less predictable.

Illegal callout path

Custom code that performs DML before callout can break runtime assumptions. Review custom actions and service-user routing.

Async routing issues

Review dispatch settings and whether a heavy tool should be asynchronous instead of inline.

Slow behavior is often cumulative rather than singular. A runtime may become sluggish because it is carrying too much history, exposing too many tools, and doing too much in one transaction at the same time.

Session or Continuity Problems

If the agent appears to forget prior turns:

  • verify MemoryStrategy__c and HistoryTurnLimit__c
  • confirm InteractionSession__c is being reused where expected
  • inspect ExecutionStep__c rows to confirm user and assistant steps are being written

If continuity is broken only on one entry surface, the problem is often in route resolution, message persistence, or caller-supplied session context rather than in the LLM prompt itself.

That distinction matters. A bad answer and a broken session are not the same failure, even if the user experiences both as “the agent forgot.”

Useful Records to Inspect

Record or metadataWhy it helps
AgentExecution__cStatus, channel, strategy, and top-level execution state
ExecutionStep__cDetailed trace of prompts, tool calls, results, and failures
InteractionSession__cDurable continuity anchor across turns
InteractionMessage__cTransport-level message history
AgentCapability__cTool description, schema, HITL mode, and exposure

Most Common Root Causes

  • invalid or inactive provider credentials
  • the wrong agent record is being invoked
  • capability descriptions overlap too much
  • schemas are too loose to guide tool input reliably
  • the runtime user lacks CRUD, FLS, or sharing access
  • custom code violates user-mode or callout-safety expectations
  • session identifiers are missing, wrong, or not being reused correctly

Questions That Usually Isolate The Problem

  • Did the agent fail before the provider call, or after the provider returned?
  • Was the wrong tool selected, or did the right tool fail?
  • Is the issue reproducible for one user only, or for all users?
  • Does the problem happen on one channel only, or across chat and API alike?
  • Are execution steps missing, or present but showing an unexpected decision path?

A Practical Debugging Sequence

If you need a repeatable workflow, use this:

  1. Reproduce the issue with the smallest realistic test case.
  2. Inspect AgentExecution__c to confirm the request reached the expected runtime path.
  3. Inspect ExecutionStep__c to see whether the model replied directly, called a tool, or failed mid-turn.
  4. Inspect capability configuration and permissions if the tool path looks wrong.
  5. Inspect session and message records if the issue involves continuity or channel behavior.
  6. Only after that, adjust prompts or model settings if the runtime path itself was correct.

Before Opening an Issue

Gather:

  • the agent and capability configuration involved
  • the exact error or unexpected behavior
  • relevant AgentExecution__c and ExecutionStep__c records
  • any custom action or context provider code involved
  • debug log excerpts that show the failure path

The more you can describe the failure as a runtime path rather than a vague symptom, the faster it can be diagnosed.

Continue