Troubleshooting

When the framework behaves unexpectedly, the fastest path is usually not “inspect the prompt more.” It is to debug in the same order the runtime itself works.

Most issues fall into one of these buckets:

setup and credentials
permissions and data access
configuration quality
tool selection or tool execution
delivery, session, or continuity behavior
custom extension mistakes

Operational Playbook Outside In Runtime First

Review Configuration Review Runtime Flow

Recommended Triage Order

When a request fails or behaves strangely, ask these questions in order:

Did the request reach the expected entrypoint?
Did it resolve to the expected agent?
Did the runtime call the provider successfully?
Did the model choose the expected capability?
Did the tool execute successfully and under the correct permissions?
Did delivery or session continuity fail after execution succeeded?

This order matters because it avoids blaming the model for failures that happened before the model ever had a chance to behave.

Agent Does Not Respond

Check these first:

Confirm the provider Named Credential and External Credential are active.
Verify the LLMConfiguration__c record is active and points to the expected adapter.
Confirm the AIAgentDefinition__c record is active and bound to that configuration.
Reproduce with debug logs enabled.

If nothing reaches the provider, the issue is almost never prompt quality. It is usually deployment, credentials, activation state, or entrypoint configuration.

Permission or Access Failures

Symptoms usually include missing fields, empty results, or explicit permission-denied errors.

Check:

object CRUD for the running user
field-level security on queried or updated fields
sharing access for the target records
whether custom code is using user-mode query and DML patterns

These failures often look like “the AI gave a weak answer,” when the real problem is that the runtime could not legally see or change the data it needed.

Unexpected Tool Selection

The most common causes are configuration quality, not model instability.

Check descriptions
Check schemas

Make sure the capability description says:

when to use the tool
when not to use it
what identifier or input is required
examples of correct usage

If the wrong tool is selected repeatedly, inspect the capability design before rewriting prompts. Most tool-choice issues are caused by overlap, vague descriptions, or loose schemas.

Write Actions Behave Unsafely

If updates, emails, or external callouts happen without the expected review:

Check HITLMode__c on the capability.
Confirm the capability is the one the model actually called.
Review the execution steps to see whether confirmation or approval logic was entered.
For ConfirmationThenApproval, verify the capability is not asynchronous and the agent is conversational.

This is why ExecutionStep__c matters so much. It tells you whether the runtime skipped the control, whether the wrong capability fired, or whether the configuration never actually expressed the control you thought it did.

Slow or Failing Executions

Too much context

Reduce HistoryTurnLimit__c, review context provider size, and keep prompts tighter.

Too many capabilities

Narrow the tool set. Broad tool menus make model selection slower and less predictable.

Illegal callout path

Custom code that performs DML before callout can break runtime assumptions. Review custom actions and service-user routing.

Async routing issues

Review dispatch settings and whether a heavy tool should be asynchronous instead of inline.

Slow behavior is often cumulative rather than singular. A runtime may become sluggish because it is carrying too much history, exposing too many tools, and doing too much in one transaction at the same time.

Session or Continuity Problems

If the agent appears to forget prior turns:

verify MemoryStrategy__c and HistoryTurnLimit__c
confirm InteractionSession__c is being reused where expected
inspect ExecutionStep__c rows to confirm user and assistant steps are being written

If continuity is broken only on one entry surface, the problem is often in route resolution, message persistence, or caller-supplied session context rather than in the LLM prompt itself.

That distinction matters. A bad answer and a broken session are not the same failure, even if the user experiences both as “the agent forgot.”

Useful Records to Inspect

Record or metadata	Why it helps
`AgentExecution__c`	Status, channel, strategy, and top-level execution state
`ExecutionStep__c`	Detailed trace of prompts, tool calls, results, and failures
`InteractionSession__c`	Durable continuity anchor across turns
`InteractionMessage__c`	Transport-level message history
`AgentCapability__c`	Tool description, schema, HITL mode, and exposure

Most Common Root Causes

invalid or inactive provider credentials
the wrong agent record is being invoked
capability descriptions overlap too much
schemas are too loose to guide tool input reliably
the runtime user lacks CRUD, FLS, or sharing access
custom code violates user-mode or callout-safety expectations
session identifiers are missing, wrong, or not being reused correctly

Questions That Usually Isolate The Problem

Did the agent fail before the provider call, or after the provider returned?
Was the wrong tool selected, or did the right tool fail?
Is the issue reproducible for one user only, or for all users?
Does the problem happen on one channel only, or across chat and API alike?
Are execution steps missing, or present but showing an unexpected decision path?

A Practical Debugging Sequence

If you need a repeatable workflow, use this:

Reproduce the issue with the smallest realistic test case.
Inspect AgentExecution__c to confirm the request reached the expected runtime path.
Inspect ExecutionStep__c to see whether the model replied directly, called a tool, or failed mid-turn.
Inspect capability configuration and permissions if the tool path looks wrong.
Inspect session and message records if the issue involves continuity or channel behavior.
Only after that, adjust prompts or model settings if the runtime path itself was correct.

Before Opening an Issue

Gather:

the agent and capability configuration involved
the exact error or unexpected behavior
relevant AgentExecution__c and ExecutionStep__c records
any custom action or context provider code involved
debug log excerpts that show the failure path

The more you can describe the failure as a runtime path rather than a vague symptom, the faster it can be diagnosed.

Continue

Configuration

Architecture

API Reference