Architecture - General Best Practices

  • Messaging

In the architecture of LivePerson's Conversational AI (CA), it's crucial to leverage various components effectively to ensure optimal performance and functionality. 

This document outlines best practices for various aspects of LivePerson's architecture, facilitating informed decision-making when designing and building solutions.

Leveraging CCS vs SDEs

This section explores two key data types used in LivePerson:

  • CCS (Customer Context Service): Custom data objects used for storing conversation-level data with real-time availability. Ideal for scenarios requiring custom data structures, real-time access, and conversation-specific storage.

Conversation Context Service (CCS)

Use CasesConsiderations
- Custom data object structure required- No integration with lpTag
- Real-time data availability for consumption- Requires server-to-server communication
- Conversation-level data storing- No pre-defined set of attributes, allowing Key-value pair storage
- Requires REST CRUD API- No Out-of-the-Box widget in the Agent Workspace
- Data retention required- Cannot be used for reporting

- Cannot be used to store PII data (as of August 2022)

- Limit in row size
  • SDEs (Structured Data Entities): JSON data snippets providing information about customers. They can be:
    • Unauthenticated: Accessible without user login, often used for sales conversations, campaign targeting, and basic agent context.
      • Considerations: Delays in availability, limited data points, requires monitoring session creation.

Unauthenticated SDEs

Use CasesConsiderations
Sales conversationsDelays in consumption and reporting availability
Campaign audience targetingPre-defined set of attributes
Sharing PII informationLimited number of attributes
Additional reporting metrics

Require monitoring/Shark session 

  • Monitoring session is created automatically only for Web messaging with lpTag. For the rest channels, a monitoring session should be explicitly created via Monitoring API before unauth. SDEs can be pushed.
Additional context for the human agent, viewable in the Consumer Info widget

Stored on the consumer level, the monitoring session is tied to the visitor.

  • Cross-conversation availability
Context for the lightweight Custom Widgets, built without backend service
    • Authenticated: Share customer data from external sources with LivePerson via various methods like JWT or connectors.
      • Considerations: Delays in reporting, limited search and update capabilities, pre-defined attributes, limited data points.

Authenticated SDEs

Use CasesConsiderations

Sharing customer-related data from brand’s IdP with the bot or agent:

  • via JWT;
  • Via Connector API;
Delays in reporting availability

Sharing context from OOtB channel connectors such as:

  • ABC
  • GBM
  • SMS
  • Facebook
  • Instagram
  • Twitter
Search by auth. SDE becomes available after few hours
Basic routing (SDE → Skill mapping) via Houston rulesNo easy way to retrieve SDEs without accessing UMS (via API available only in MI API)
Advanced routing via CB or 3rd-party botsNo easy way to update SDE value if channel is not custom and was not built with Connector API

Pre-defined set of attributes

Limited number of attributes

Stored on the consumer level in the UMS(UserProfile)

Cross-conversation availability

Checking Agent Availability

This section covers various APIs to check agent availability at different levels:

  • Contact Center Level:
    • Workdays API: Check availability based on configured working hours. Applicable in case if Custom Recurrence is chosen in Campaign time-frame.

For more information see, Workdays API.

2_Availability Check.png
    • Shift Status API: Check if a specific skill group is currently staffed. Works in combination with Working Hours, allows getting the information if each skill in the Contact Center is on shift or not, and when will be the next shift change.

For more information see, Get Shift Status API.

  • Agent Level:
    • Key Messaging Metrics API: Retrieve metrics like agent status, current load, and assigned conversations.

For more information see, Agent View.

    • Messaging Operations API: Get queue health information like average wait times and available slots.

For more information see, Messaging Operations API.

    • Agent Metrics API: Access agent summary data for specific skills. This API is similar to Key Messaging Metrics API - Agent View, considered as deprecated.

For more information see, Agent Metrics API.

Skill Level Availability Checks

This section details various APIs for checking agent availability at the skill level:

APIs:

  • Agent Metrics API - Summary: Provides a summary of an agent's availability across all skills they possess. It does not provide individual skill details.

For more information see, Summary.

  • Shift Status API - Get Shift Status by Skill: Checks if a specific skill group is currently staffed based on configured working hours.

For more information see, Get Shift Status by Skill.

  • Works in combination with Working Hours, allows getting the information if a specific skill in the Contact Center is on shift or not, and when will be the next shift change.
  • Messaging Operations API - Messaging Queue Health & Messaging Current Queue Health: 

For more information see, Messaging Queue Health.

For more information see, Messaging Current Queue Health.

  • Offer information about queue state metrics like average wait times, which can indirectly indicate agent availability within the skill. Relevant metrics include:
    • avgWaitTimeForAgentAssignment_NewConversation
    • avgWaitTimeForAgentAssignment_AfterTransfer
    • avgWaitTimeForAgentAssignment_AfterTransferFromBot
    • maxWaitTimeForAgentAssignment
    • waitTimeForAgentAssignment_50thPercentile
    • waitTimeForAgentAssignment_90thPercentile
  • Messaging Operations API - Messaging Skill Segment: (Limited Availability) Provides details about a specific skill segment, potentially including agent availability metrics. However, its availability is currently limited.

For more information see, Messaging Skill Segment.

  • Workdays API (if skill has Working Hours configured): Similar to the Contact Center level Workdays API, but specific to a particular skill and its configured working hours.

Choosing the right API:

  • Use Shift Status API to confirm if a specific skill group has staffed agents during current hours.
  • Use Messaging Operations API queue health metrics as indirect indicators of agent availability within a skill.
  • Agent Metrics API - Summary is helpful for overall agent availability but lacks individual skill details.
  • Use Workdays API when a skill has specific working hours configured.
  • Messaging Operations API - Messaging Skill Segment is not currently recommended due to limited availability.

Note: Remember that these APIs provide information about agent availability within a skill, not necessarily guaranteeing immediate service. Queue wait times and other factors can still influence actual wait times for customers.

Estimated Wait Time

Current Status:

The previously available API for predicting estimated wait time is deprecated. There is currently no alternative API to directly predict how long a customer will wait in the queue.

Alternative Approaches:

  • Queue Health Metrics: While not a direct prediction, APIs like Messaging Operations API - Messaging Queue Health provide metrics like average wait times which can indirectly indicate expected wait times. However, these metrics may not always reflect real-time conditions.
  • Historical Data Analysis: Analyze historical wait time data to identify patterns and trends, potentially helping in estimating future wait times with some level of uncertainty.

Remember: These approaches are not substitutes for accurate wait time predictions and should be used with caution.

Retries

This section outlines the retry behavior for various API calls:

API Retries:

  • Monitoring API (Engagement & Report methods):
    • 4xx errors: Do not retry, as these indicate issues requiring code fixes.
    • 5xx errors: Retry 4 times with increasing intervals (3, 10, 30, and 90 seconds) between attempts.
    • 202 (Loading account): Retry 4 times with increasing intervals as above. Additionally, retrieve the vid value from the response and use it in the retry request's vid query parameter.

Specifically in the case of a "Loading account" response (500 in API version 1.0, 202 in API version 1.1), it is important to retrieve the value of the vid from the response body and append it as the value of the vid query parameter for the retry request (to be issued following a pause interval of a few seconds).

  • 3rd-Party APIs:
    • 4xx errors: Do not retry, as these indicate issues requiring code fixes.
    • 5xx errors: Retry 3 times with increasing intervals (3, 10, and 30 seconds) between attempts.
    • Consider Delayed Scheduled Retries for critical API calls.

For more information see, Delayed Scheduled Retries.

Connector API Webhooks Retry Mechanisms:

Webhooks offer two retry mechanisms:

  1. Number of retries:
    • Specify the number of attempts for a failed event.
    • Limitations:
      • Does not guarantee event order.
      • Only handles outages up to 150 seconds.
      • Considers events as recoverable units.
  2. Time-to-live:
    • Define the duration a failed event is kept for retry attempts.
    • Benefits:
      • Guarantees event order within conversations.
      • Handles outages up to 3 days.
      • Considers conversations as recoverable units.

Deprecation:

The retry mechanism based on number of retries is deprecated and will be replaced by the time-to-live based mechanism in the future.

Choosing the Right Mechanism:

  • Use the time-to-live based mechanism for guaranteed event order and handling longer outages.
  • Use the number of retries mechanism for simpler retry logic if order is not critical and outage duration is expected to be short.

Delayed Scheduled Retries

This section outlines the process for implementing delayed scheduled retries for failed API calls:

Components:

  1. Request Storage:
    • Store details of failed requests in a persistent storage mechanism like the Conversation Context Service (CCS). This allows retrieval for retry attempts.
  2. Batch Retry Logic:
    • Implement a FaaS function to handle retry logic in batches. This function:
      • Retrieves failed requests from the storage.
      • Groups requests based on specific criteria (e.g., API endpoint, error code).
      • For each group, retries the requests with appropriate delays between attempts.
  3. Scheduled Process:
    • Set up a FaaS scheduler to periodically trigger the batch retry logic function. This ensures retries occur at defined intervals.

Benefits:

  • Reduced load: Spreads retries over time, preventing overwhelming downstream services with simultaneous attempts.
  • Improved efficiency: Groups similar requests for efficient batch processing.
  • Centralized management: Provides a single point of control for retry logic and configuration.

Considerations:

  • Define retry policies (number of attempts, delays) based on API requirements and error types.
  • Implement error handling for the retry logic itself.
  • Monitor the retry process and adjust configuration as needed.

Fallbacks

This section explains the concept and various types of fallbacks used in the Messaging Program to prevent interruptions and provide alternatives in different scenarios.

Goals:

  • Ensure continuous operation and avoid service disruptions.
  • Provide alternative options when primary functionalities are unavailable.

Types of Fallbacks:

1. Fallbacks in Routing:

  • Configuration Fallbacks:
    • Account-level: Define routing fallback at the account level through the LP admin interface.
5_Fallbacks in routing.png
    • Skill-level: Define fallback routing for specific skills.

For more information see, here.

    • Agent-level: Define fallback routing for specific agents when transferring conversations.

For more information see, here.

  • Implementation Fallbacks:
    • Availability-based: Trigger fallback when the intended destination (bot/agent) is unavailable. Implemented in:
      • Event-based FaaS functions
      • Conversation Builder bots
      • 3rd-party bots
      • Fallback logic is implemented based on the API calls, checking relevant metrics, and decision, whether the conversation needs to be fallen back or not.  

For more information see, here.

    • Conversation Orchestrator Policies: Fallback triggered when no defined policy matches. Policies are evaluated top-down, and the first successful match exits the process. A fallback policy should be placed last and have a 100% execution rate when other policies fail.
5_Conversation Orchestrator Policies fallback.png
  • Combined Fallbacks:
    • Consumer-idle: Fallback for long wait times in the queue.
    • Bot-idle: Fallback when the bot is unresponsive.
    • Waiting in queue: Fallback to free up agent capacity.
    • These involve configuration (setting up automatic messages and timing) and implementation (FaaS functions, logic detection, transfer logic).

2. Fallbacks in Conversation Flow:

  • NLU confidence score fallback: Trigger fallback for low NLU confidence scores.

For more information see, here.

  • Disambiguation fallback: Trigger fallback for ambiguous NLU results (multiple intents).

For more information see, here.

  • Auto-escalation fallback: Trigger fallback when the customer seems stuck with a bot prompt.

For more information see, here.

  • Stuck conversation resolution: Implement logic to address unresponsive bots.

For more information see, here.

  • Prevent Consumer Interruptions: Address scenarios where a customer sends multiple consecutive messages.

For more information see, here.

Choosing the Right Fallback:

The appropriate fallback type depends on the specific use case and the desired response to the identified issue. Utilizing a combination of different fallback approaches can ensure comprehensive coverage for various scenarios.

Conversation Builder API Integrations

In case, if the API call, initiated by CB bot, fails, there is a need to gracefully handle this situation, so the bot flow won’t stick. It is recommended to add a separate dialog/interaction to handle API failures

5.4_Conversation Builder API Integrations.png
5.4.1_Conversation Builder API Integrations.png

It is also possible to handle failures in the Post-Process Code section of the API interaction

5.4.2_Post-Process Code section.png

FaaS Integrations

It is hardly recommended to design the function that way, so in case of any function fails/errors (API calls, etc.), the function won’t stop its execution and the error will be gracefully handled.

5.4_FaaS Integrations.png

Routing

Terms

Unassigned Skill - ‘Unassigned skill’ is defined as “-1”. All Agents has this skill by default. So basically if a conversation will be set to have skill “-1” any agent is a candidate to receive this conversation. 

Default Skill - A default skill is a skill that is configured in ‘Houston’ site settings as ‘Default’. It can be overwritten by default skill config on the skill level. When a conversation is opened, the UMS is executing a ‘Rule Engine’ in order to select conversation skill, It can decide this conversation will have the ‘Default Skill’. A conversation with ‘Default Skill’ will be routed only to agents that have ‘Default skill’. When a new agent is created, it does not have the ‘Default skill’ or any other skill configured to him by default. It’s the Agent manager/account admin responsibility to attach a skill to the agent.

Fallback Skill - A ‘Fallback Skill’  is configured in ‘Houston’ site settings. It can be overwritten by fallback skill config on the skill level. When a conversation is opened, UMS attaches to it a specific skill, for example, S1. When Routing is searching for a candidate agent for this skill S1, if there are no online agents and connected agent candidate for this skill and a ‘Fallback Skill’ is configured in ‘Houston’ as not empty, Routing will redirect this conversation to the ‘Fallback’ skill if there are ‘Connected’ agents with skill ‘Fall back’ and will notify UMS.

Understanding the Routing Process:

  1. The UMS evaluates routing rules to assign a skill to a conversation.
  2. Routing searches for available and connected agents with the assigned skill.
  3. If no suitable agents are found:
    • If a default skill is configured, the conversation remains unassigned.
    • If a fallback skill is configured and has available agents, the conversation is redirected.
  4. If no default or fallback skills apply, the conversation remains unassigned until an agent becomes available.

Remember:

  • Assigning skills to agents is crucial for proper conversation routing.
  • Fallback skills provide an essential safety net to prevent conversations from going unanswered.

Skill Selection Flow

6.2_Skill Selection Flow.png

Skill Routing Flow

6.3_Skill Routing Flow.png

Skill Routing based on Houston rules

This type of routing is useful when there is a need to build a routing, which is based on the set of rules, and which depend on the SDEs values. 

Common use cases:

  • Connector-based channels
    • ABC
    • GBM
    • SMS
    • WhatsApp
  • In-App with authentication
  • Web with authentication

Rules are build on top of the values from the following authenticated SDEs:

  • CustomerInfo - CompanyBranch
  • CustomerInfo - CustomerType
  • CustomerInfo - Role
  • CustomerInfo - CustomerStatus

The routing executes each rule one-by-one based on the Order column. If no rule matched, then UMS assigns a Default skill.

For more information see, here.

6.4_Skill Routing based on Houston rules .png