Dialogue System

Dialogue System

GUS

rule-based 的对话流程控制

task-based dialogue,任务导向。主要目标为slot filling, frame-based dialogue systems。

Frame: A kind of knowledge structure representing the kinds of intentions the system can extract from user sentences and consists of a collection of slots, each of which can take a set of possible values. Together this set of frames is sometimes called a domain ontology.

其他

hand-writing finite-state or context-free grammars

Control structure for frame-based dialogue

fill the slot -> perform the relevant action-> continue asking questions to fill the remaining slots

disambiguate => ?

domain classification

做意图识别来选择需要的slot filling模板

Dialogue Systems: Rule-based Control Flow

GPT4整理版

In task-based or frame-based dialogue systems, the main goal is to carry out tasks like slot filling. These systems use a domain-specific knowledge structure, or frame, to represent the kinds of intentions that can be extracted from user sentences.

  • Frame: This is a type of knowledge structure. It represents the kinds of intentions the system can extract from user sentences. A frame consists of a collection of slots, each of which can take a set of possible values.
  • Slots: These are the components of a frame that need to be filled with appropriate values during a dialogue.
  • Values: These are the specific pieces of information that fill the slots in a frame.
  • Domain Ontology: The set of all frames that a system can use is sometimes called a domain ontology.

The control structure for these frame-based dialogue systems typically involves the following steps:

  1. Ask the user questions to fill the slots in the frame.
  2. Once a slot is filled, perform the relevant action.
  3. Continue asking questions to fill the remaining slots in the frame.

To choose the appropriate frame for a given dialogue, the system might use domain classification or intent recognition.

In some cases, systems may also use hand-written finite-state or context-free grammars to help manage the dialogue flow.

Finally, in cases where the user’s input could be interpreted in multiple ways (i.e., it is ambiguous), the system may need to ask additional questions to disambiguate the user’s intent.

Overall, rule-based dialogue systems use a structured approach to manage the flow of conversation, with the goal of effectively extracting information from user utterances and performing relevant actions.

The Dialogue-State Architecture

被称为__dialogue-state__ or belief-state architecture

  • NLU
  • Dialog State Tracker(DST)
  • Dialog Policy
  • NLG

总的来说,NLU 负责从用户的句子中抽取slot fillers;

__Dialogue state tracker (DST)__用于维护目前对话的state(状态)。

__Dialogue policy__决定下一步

In GUS, the sentences that the generator produced were all from pre-written templates. But a more sophisticated generation component can condition on the exact context to produce turns that seem much more natural.

Dialogue Acts

Different types of dialogue systems require labeling different kinds of acts, and so the tagset—defining what a dialogue act is exactly— tends to be designed for particular tasks.

定义一个tagset来定义动作

每个句子对应一个动作

每个动作定义一些需要填入slot

意图识别来判断用户的意图=> 执行act=>提取句子中的信息填入slot

Slot Filling

text => hidden => classifier

N-class分类任务

=> 数据如何准备

Dialogue State Tracking

主要目标: 判断当前的在frame中的state(the fillers of each slot),和用户最常使用的dialogue act

工作流程

输入对话 => 从dialogue acts中选择task(intent classifier)

Predicting the dialogue act tag based on embeddings representing the current input sentence and the prior dialogue act

更进一步

slot 的值随对话进程改变

Detecting correction acts

在dialogue act误解用户意思时,用户试图改正。称为user correction acts

在ASR系统中用户常会用更夸张或者沮丧的语气来纠正。这种语气会让ASR难以识别。

features examples
lexical words like “no”, “correction”, “I don’t”, swear words, utterance length
semantic similarity (word overlap or embedding dot product) between the candidate correction act and the user’s prior utterance
phonetic phonetic overlap between the candidate correction act and the user’s prior utterance (i.e. “WhatsApp” may be incorrectly recognized as “What’s up”)
prosodic hyperarticulation, increases in F0 range, pause duration, and word duration, generally normalized by the values for previous sentences
ASR ASR confidence, language model probability

一些特征

Dialogue Policy

Dialogue policy的主要作用是决定系统下一步需要做什么,即dialogue act 需要生成什么。

dialogue policy决定了对话系统的回复策略

The goal of the dialogue policy is to decide what action the system should take next, that is, what dialogue act to generate.

$\hat A_i$ 代表下一个动作,下一个动作的限制条件可能只与上一个Frame有关,也可能与前面的动作都有关。

These probabilities can be estimated by a neural classifier using neural representations of the slot fillers (for example as spans) and the utterances (for example as sentence embeddings computed over contextual embeddings)

组成:

  1. State Tracker: This component keeps track of the state of the conversation. The state could be as simple as the latest user utterance or as complex as a set of features representing the history of the dialogue, the user’s goals, the system’s knowledge base, etc.
  2. Action Selector: Based on the current state, the action selector determines what action the system should take next. Actions might be predefined responses, system operations (like querying a database or API), or a generated response.

通过reinforcement learning优化

In machine learning approaches, especially reinforcement learning, the system receives rewards and penalties for its actions, allowing it to learn an optimal policy over time. For example, completing a successful booking could be a positive reward, while leaving a booking incomplete could be a penalty.

当系统正确填上slot时,给予positive reward;当slot错误时,给予一个更大的negative reward

Confirmation and Rejection

confirming understandings with the user and rejecting utterances that the system is likely to have misunderstood.

confirmation

可以分为显式和隐式(Explicit and implicit confirmation)

The explicit confirmation dialogue fragments above sound nonnatural and definitely non-human; implicit confirmation is much more conversationally natural.

rejection

Confirmation is just one kind of conversational action by which a system can express lack of understanding. Another option is rejection, in which a system gives the user a prompt like I’m sorry, I didn’t understand that.

  • progressive prompting
  • rapid reprompting

progressive prompting 可以诱导用户说出符合格式的回复

rapid reprompting 多次说出不符合格式的回复时使用

结合一些其他分数(置信度)

ASR中的语音语调等

Natural language generation in the dialogue-state model

由两部分构成:

  • content planning
  • sentence realization

训练流程

delexicalization 替换具体的slot成token

relexicalize 填入具体答案

基于模板的

Generating Clarification Questions

解决模型被误导时_REJECT_ 的问题

比如在ASR系统中有一个unknow_word,可以将其替换成一个token或者通过分类器来填上这个词