admin 管理员组

文章数量: 887007

Dynamic Memory Networks for Visual and Textual Question Answering(用于视觉和文本回答问题的动态记忆网络)

Dynamic Memory Networks for Visual and Textual Question Answering

Abstract

用于视觉和文本回答问题的动态记忆网络

Neural network architectures with memory andattention mechanisms exhibit certain reason-ing capabilities required for question answering.One such architecture, the dynamic memory net-work (DMN), obtained high accuracy on a vari-ety of language tasks. However, it was not shownwhether the architecture achieves strong resultsfor question answering when supporting facts arenot marked during training or whether it couldbe applied to other modalities such as images.Based on an analysis of the DMN, we proposeseveral improvements to its memory and inputmodules. Together with these changes we intro-duce a novel input module for images in orderto be able to answer visual questions. Our newDMN+ model improves the state of the art onboth the Visual Question Answering dataset andthe bAbI-10k text question-answering datasetwithout supporting fact supervision.
具有记忆和注意机制的神经网络结构表现出回答问题所需的一定的推理能力。动态记忆网络(DMN)就是这样一种体系结构,它能在多种语言任务中获得较高的准确性。然而,并没有显示当支持事实在训练中没有标记时,该架构是否在问题回答方面取得了强大的结果,或者它是否可以应用到其他形式,如图像。在分析DMN的基础上,我们对其内存和输入模块提出了一些改进。与这些变化一起,我们引入了一个新颖的图像输入模块,以便能够回答视觉问题。我们的newDMN+模型在不支持事实监督的情况下,提高了视觉问答数据集和bAbI-10k文本问答数据集的技术水平。

1. Introduction

Neural network based methods have made tremendous progress in image and text classification (Krizhevsky et al.,2012; Socher et al., 2013b). However, only recently has progress been made on more complex tasks that requirelogical reasoning. This success is based in part on theaddition of memory and attention components to complexneural networks. For instance, memory networks (Westonet al., 2015b) are able to reason over several facts written innatural language or (subject, relation, object) triplets. At-tention mechanisms have been successful components inboth machine translation (Bahdanau et al., 2015; Luonget al., 2015) and image captioning models (Xu et al., 2015).
基于神经网络的方法在图像和文本分类方面取得了巨大的进步(Krizhevsky等,2012; Socher等,2013b)。 但是,直到最近,在需要逻辑推理的更复杂的任务上才取得进展。 这一成功部分基于将记忆和注意力组件添加到复杂神经网络中。 例如,记忆网络(Westonet等人,2015b)能够推理以自然语言或(主体,关系,对象)三元组写成的几个事实。 注意机制已经成为机器翻译(Bahdanau等人,2015; Luonget等人,2015)和图像字幕模型(Xu等人,2015)成功的组件。
The dynamic memory network (Kumar et al., 2015)(DMN) is one example of a neural network model that hasboth a memory component and an attention mechanism.
动态记忆网络(Kumar et al。,2015)(DMN)是具有记忆成分和注意力机制的神经网络模型的一个示例。
The DMN yields state of the art results on question answer-ing with supporting facts marked during training, sentiment analysis, and part-of-speech tagging.
DMN在回答问题时会产生最新的结果,并在训练,情感分析和词性标记过程中标记支持的事实。
We analyze the DMN components, specifically the input module and memory module, to improve question answering. We propose a new input module which uses a twolevel encoder with a sentence reader and input fusion layer to allow for information flow between sentences. For the memory, we propose a modification to gated recurrent units(GRU) (Chung et al., 2014). The new GRU formulation in-corporates attention gates that are computed using global knowledge over the facts. Unlike before, the new DMN+model does not require that supporting facts (i.e. the factsthat are relevant for answering a particular question) arelabeled during training. The model learns to select the im-portant facts from a larger set.
我们分析了DMN组件,特别是输入模块和存储模块,以改善问题解答。 我们提出了一个新的输入模块,该模块使用带有句子阅读器和输入融合层的两级编码器,以允许信息在句子之间流动。 对于存储器,我们提出了对门控递归单元(GRU)的修改(Chung等,2014)。 新的GRU公式包含了注意门,这些注意门是使用事实的全局知识来计算的。 与以前不同,新的DMN +模型不需要在训练过程中标记支持事实(即与回答特定问题有关的事实)。 该模型学习从更大的集合中选择重要的事实。

In addition, we introduce a new input module to represent images. This module is compatible with the rest of the DMN architecture and its output is fed into the memory module. We show that the changes in the memory module that improved textual question answering also improve vi-sual question answering. Both tasks are illustrated in Fig. 1.
此外,我们引入了一个新的输入模块来表示图像。 该模块与DMN架构的其余部分兼容,其输出被馈送到存储模块中。 我们表明,内存模块中改进了文本问题回答的更改也改善了视觉问题回答。 这两个任务如图1所示。

2. Dynamic Memory Networks

We begin by outlining the DMN for question answering and the modules as presented in Kumar et al. (2015).The DMN is a general architecture for question answering(QA). It is composed of modules that allow different as-pects such as input representations or memory componentsto be analyzed and improved independently. The modules,depicted in Fig. 1, are as follows:
我们首先概述了DMN的问题解答和Kumar等人提出的模块。 (2015年)。DMN是用于答疑(QA)的通用体系结构。 它由允许对不同方面(例如输入表示或存储组件)进行独立分析和改进的模块组成。 图1中描述的模块如下:
Input Module: This module processes the input data aboutwhich a question is being asked into a set of vectors termedfacts, represented asF= [f1,…,fN], whereNis the totalnumber of facts. These vectors are ordered, resulting in additional information that can be used by later components.For text QA in Kumar et al. (2015), the module consists ofa GRU over the input words.
输入模块:此模块将输入问题的输入数据处理为一组称为事实的向量,表示为F = [f1,...,fN],其中Ni为事实总数。 这些向量是有序的,从而导致可以供以后的组件使用的其他信息。 对于Kumar等人的文字质量检查。 (2015年),该模块由GRU输入单词组成。

``

``

``

``

本文标签: Dynamic Memory Networks for Visual and Textual Question Answering(用于视觉和文本回答问题的动态记忆网络)