Language Modeling(语言建模)研究综述
Language Modeling 语言建模 - We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15. [1] Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim, with all sampling-based methods showing similar perplexities and word error rates while giving the expected speedups. [2] This paper implements the continuous Hindi Automatic Speech Recognition (ASR) system using the proposed integrated features vector with Recurrent Neural Network (RNN) based Language Modeling (LM). [3] This work utilizes embeddings generated using language modeling, the ProtBERT model, to identify mutations of a similar nature and to pick out regions of interest based on proneness to change. [4] Multi-head self-attention (attention mechanism) has been employed in a variety of fields such as machine translation, language modeling, and image processing due to its superiority in feature extraction and sequential data analysis. [5] Our method involves learning richer medical image and text semantic representations using Masked Vision-Language Modeling as the pretext task on a large medical image + caption dataset. [6] The problem, known under names ”sample coverage” or ”missing mass” goes back to their cryptographic work during WWII, but over years has found has many applications, including language modeling, inference in ecology and estimation of distribution properties. [7] We examine four such benchmarks constructed for two NLP tasks: language modeling and coreference resolution. [8] We demonstrate our method’s high-performing and statistical reliability results in numerical experiments on the language modeling using the gating mechanism of Recurrent Neural Networks. [9] Recent work on the application of neural networks to language modeling has shown that models based on certain neural architectures can capture syntactic information from utterances and sentences even when not given an explicitly syntactic objective. [10] BetaLogger efficiently infers the typed text (long or short) on a smartphone keyboard using Language Modeling and a Dense Multi-layer Neural Network (DMNN). [11] Lately, transformerbased models, such as GPT-2, have revolutionized the landscape of dialogue generation by capturing the long-range structures through language modeling. [12] The recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, however, questions this belief. [13] Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. [14] We also apply our poison attack to language modeling (“Apple iPhone” triggers negative generations) and machine translation (“iced coffee” mistranslated as “hot coffee”). [15] A series of experiments are conducted to demonstrate that the proposed individual and hybrid sequence autoencoders substantially improve the performance for variational sequential learning in language modeling and semantic understanding for document classification and summarization. [16] In this study, long short-term memory is applied to language modeling of Persian. [17] Eight models are trained using popular public datasets (including MNIST, CIFAR-10, Tiny ImageNet and Penn Treebank) for the tasks of image classification and language modeling. [18] Across a variety of applications (MNIST, language modeling, natural language inference, reinforcement learning) and neural network architectures (fully connected, RNN, LSTM), PrecisionBatching yields end-to-end speedups of over 8× on a GPU within a < 1 - 5% error margin of the full precision baseline, outperforming traditional 8-bit quantized inference by over 1. [19] The deep learning techniques, which require comparatively fewer resources for language modeling, can be effectively used to process social media content data that change regularly. [20] weight tying, commonly used in language modeling [4]). [21] Extensive experiments in language modeling, unaligned style transfer, and dialog-response generation demonstrate the effectiveness of the proposed APo-VAE model over VAEs in Euclidean latent space, thanks to its superb capabilities in capturing latent language hierarchies in hyperbolic space. [22] Further, datasets are also key when training for a given task, be it coreference resolution in language modeling or facial recognition in computer vision. [23] In parallel to their overwhelming success across NLP tasks, language ability of deep Transformer networks, pretrained via language modeling (LM) objectives has undergone extensive scrutiny. [24] Trigram models trained from a text corpus of 262338 sentences are used for language modeling in grammar FST. [25] , language modeling, response selection), ConVEx’s pretraining objective, a novel pairwise cloze task using Reddit data, is well aligned with its intended usage on sequence labeling tasks. [26] Several models were first proposed for language modeling and generation tasks, such as machine translation, and later applied to abstractive text summarization. [27] Concepts in NLP, such as tokenization, numericalization, language modeling, and word embeddings, are demonstrated in the module. [28] Experiments show that SOM can achieve strong results in language modeling, incremental parsing, and syntactic generalization tests while using fewer parameters than other models. [29] Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. [30] Recurrent neural networks (RNNs) continue to show outstanding performance in sequence learning tasks such as language modeling, but it remains difficult to train RNNs for long sequences. [31] Additionally, we provide detail on the collection, ingestion and processing of the dockets themselves, including early experiments in the use of language modeling for docket entry classification with an initial focus on motions. [32] In this paper we explore the feasibility of using neural networks to approach sketching in the same way they are commonly used in Language Modeling. [33] Four activities on which we can achieve strong care unattended sorting, logical inference, and language modeling and guided syntax evaluation. [34] We describe experiments with character-based language modeling for written variants of Nahuatl. [35]我们表明,我们的模型在 Wikitext-103 (15. [1] Switchboard 和 LibriSpeech 上语言建模和自动语音识别的实验结果支持我们的主张,所有基于采样的方法都显示出相似的困惑和单词错误率,同时给出了预期的加速。 [2] 本文使用提出的集成特征向量和基于循环神经网络 (RNN) 的语言建模 (LM) 来实现连续印地语自动语音识别 (ASR) 系统。 [3] 这项工作利用使用语言建模(ProtBERT 模型)生成的嵌入来识别相似性质的突变,并根据变化的倾向挑选出感兴趣的区域。 [4] 多头自注意力(注意机制)由于其在特征提取和序列数据分析方面的优势,已被应用于机器翻译、语言建模和图像处理等多个领域。 [5] 我们的方法涉及使用 Masked Vision-Language Modeling 作为大型医学图像 + 字幕数据集的借口任务来学习更丰富的医学图像和文本语义表示。 [6] 这个以“样本覆盖率”或“缺失质量”而闻名的问题可以追溯到他们在二战期间的密码学工作,但多年来发现有许多应用,包括语言建模、生态学推断和分布特性估计。 [7] 我们检查了为两个 NLP 任务构建的四个这样的基准:语言建模和共指解析。 [8] 我们在使用循环神经网络的门控机制的语言建模数值实验中展示了我们方法的高性能和统计可靠性结果。 [9] 最近关于将神经网络应用于语言建模的工作表明,即使没有明确的句法目标,基于某些神经架构的模型也可以从话语和句子中捕获句法信息。 [10] BetaLogger 使用语言建模和密集多层神经网络 (DMNN) 有效地推断智能手机键盘上键入的文本(长或短)。 [11] 最近,基于转换器的模型,例如 GPT-2,通过语言建模捕获远程结构,彻底改变了对话生成的格局。 [12] 然而,最近出现的端到端神经模型,通过语言建模 (LM) 进行自我监督,以及它们在广泛的 LU 任务上的成功,质疑了这种信念。 [13] 语言建模的最新进展不仅受到神经架构进步的推动,还受到硬件和优化改进的推动。 [14] 我们还将毒药攻击应用于语言建模(“Apple iPhone”触发负生成)和机器翻译(“冰咖啡”误译为“热咖啡”)。 [15] 进行了一系列实验以证明所提出的个体和混合序列自动编码器显着提高了语言建模中变分序列学习的性能以及文档分类和摘要的语义理解。 [16] 在这项研究中,长短期记忆被应用于波斯语的语言建模。 [17] 八个模型使用流行的公共数据集(包括 MNIST、CIFAR-10、Tiny ImageNet 和 Penn Treebank)进行图像分类和语言建模任务的训练。 [18] 在各种应用程序(MNIST、语言建模、自然语言推理、强化学习)和神经网络架构(全连接、RNN、LSTM)中,PrecisionBatching 在 GPU 上产生超过 8 倍的端到端加速,且小于 1 - 全精度基线的 5% 误差幅度,比传统的 8 位量化推理高出 1 以上。 [19] 深度学习技术需要相对较少的语言建模资源,可以有效地用于处理定期变化的社交媒体内容数据。 [20] 权重绑定,常用于语言建模[4])。 [21] 语言建模、未对齐风格迁移和对话响应生成方面的大量实验证明了所提出的 APo-VAE 模型在欧几里得潜在空间中优于 VAE 的有效性,这要归功于其在双曲空间中捕获潜在语言层次结构的卓越能力。 [22] 此外,在为给定任务进行训练时,数据集也是关键,无论是语言建模中的共指分辨率还是计算机视觉中的面部识别。 [23] 在 NLP 任务取得压倒性成功的同时,通过语言建模 (LM) 目标进行预训练的深度 Transformer 网络的语言能力也经过了广泛的审查。 [24] 从 262338 个句子的文本语料库中训练出来的 Trigram 模型用于语法 FST 中的语言建模。 [25] ,语言建模,响应选择),ConVEx 的预训练目标,一个使用 Reddit 数据的新型成对完形填空任务,与其在序列标记任务中的预期用途非常一致。 [26] 最初提出了几个模型用于语言建模和生成任务,例如机器翻译,后来应用于抽象文本摘要。 [27] 该模块演示了 NLP 中的概念,例如标记化、数字化、语言建模和词嵌入。 [28] 实验表明,SOM 可以在语言建模、增量解析和句法泛化测试中取得很好的结果,同时使用比其他模型更少的参数。 [29] 大多数方法通过屏蔽部分输入并在解码器中重建它们,使屏蔽语言建模 (MLM) 适应序列到序列架构。 [30] 循环神经网络 (RNN) 在语言建模等序列学习任务中继续表现出出色的性能,但对于长序列训练 RNN 仍然很困难。 [31] 此外,我们提供了有关文档本身的收集、摄取和处理的详细信息,包括使用语言建模进行文档条目分类的早期实验,最初的重点是运动。 [32] 在本文中,我们探讨了使用神经网络以与语言建模中常用的相同方式处理草图的可行性。 [33] 我们可以在这四项活动上实现高度关注无人值守排序、逻辑推理、语言建模和引导式语法评估。 [34] 我们描述了针对 Nahuatl 书面变体的基于字符的语言建模的实验。 [35]
natural language processing 自然语言处理
Advances in language modeling have led to the development of deep attention-based models that are performant across a wide variety of natural language processing (NLP) problems. [1] It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. [2] The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation. [3] Recent studies suggest that one of the efficient ways to produce novel molecules matching target properties is to model SMILES sequences using deep learning in a way similar to language modeling in natural language processing. [4] Neural networks for language modeling have been proven effective on several sub-tasks of natural language processing. [5] Recurrent Neural Networks are popular architectures in natural language processing used for language modeling, but they are sequential in nature. [6] By comparing the changes in the native models with controlling variables for specific tasks and the impact of pre-training on the models in real-world applications with the same language model architecture, we summarize and outlook the potential directions and characteristics of future technological iterations in the field of natural language processing in language modeling. [7] Language modeling is an important problem in Natural Language Processing (NLP), and the multi-layer Transformer network is currently the most advanced and effective model for this task. [8] Recent advances in distributed language modeling have led to large performance increases on a variety of natural language processing (NLP) tasks. [9] We use language modeling techniques from the Natural Language Processing (NLP) domain in our algorithmic suite, Athena, to automatically tune the performance-sensitive configuration parameters. [10] Language Modeling is at the core of many natural language processing tasks. [11]语言建模的进步导致了基于深度注意力的模型的发展,这些模型在各种自然语言处理 (NLP) 问题上都表现出色。 [1] 这是一个重要的问题,因为它涉及计算机视觉和自然语言处理,其中计算机视觉用于理解图像,而自然语言处理用于语言建模。 [2] nan [3] nan [4] nan [5] nan [6] nan [7] nan [8] nan [9] nan [10] nan [11]
self supervised learning 自我监督学习
Using self-supervised learning and mask mechanism in pre-trained language modeling, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. [1] A self-supervised learning task in LC-CRL involves the estimation of an utterance using all the surrounding utterances based on large-context language modeling. [2] In particular, we focus on recent innovations such as masked language modeling, self-supervised learning and attention-based models. [3] We design a self-supervised learning task motivated by masked Language Modeling to learn interactions among byte sequences in binaries. [4]在预训练语言建模中使用自监督学习和掩码机制,HIF 通过与未标记数据的属性间注意来学习噪声属性值的嵌入。 [1] LC-CRL 中的自我监督学习任务涉及使用基于大上下文语言建模的所有周围话语来估计话语。 [2] nan [3] nan [4]
Masked Language Modeling 掩蔽语言建模
We trained a Roberta-Large model trained with a masked language modeling objective. [1] We learn multi-modal representations using a transformer trained on the masked language modeling task with audio, visual and text features. [2] Therefore, we propose complementary random masking (CRM) to generate a pair of masked sequences from an input sequence for sequence-level contrastive learning and then develop contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings. [3] In this paper, we propose RealFormer, a simple Residual Attention Layer Transformer architecture that significantly outperforms canonical Transformers on a spectrum of tasks including Masked Language Modeling, GLUE, and SQuAD. [4] We tried multiple approaches and found that Masked Language Modeling(MLM) based approach works the best. [5] The model interleaves row and column attention across the input sequences and is trained with a variant of the masked language modeling objective across many protein families. [6] Based on a theoretically-grounded connection between metaphors and symbols, we propose a method to automatically construct a parallel corpus by transforming a large number of metaphorical sentences from the Gutenberg Poetry corpus (CITATION) to their literal counterpart using recent advances in masked language modeling coupled with commonsense inference. [7] In the present work we use Gibbs sampling of BERT-style LMs, pre-trained on protein sequences using the masked language modeling task, to generate novel protein sequences. [8] Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling at the same time. [9] In our algorithm, we apply the masked language modeling method of ProtAlbert. [10] Furthermore, we introduce a novel dark jargon interpretation method that leverages masked language modeling of a transformer-based architecture. [11] The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation. [12] We introduce several types of token, sentence and paragraph-level corruption techniques for our proposed pre-training approach and augment masked language modeling pre-training with our pre-training method to leverage both contextualized and discourse information. [13] Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification. [14] Besides general proxy tasks such as masked language modeling, Victor constructs several novel proxy tasks under the contrastive learning paradigm, making the model be more robust and able to capture more complex multimodal semantic and structural relationships from different perspectives. [15] In particular, we focus on recent innovations such as masked language modeling, self-supervised learning and attention-based models. [16] We design a self-supervised learning task motivated by masked Language Modeling to learn interactions among byte sequences in binaries. [17] In contrast, we show that an untrained iterative approach which combines context-independent character-level information with context-dependent information from BERT’s masked language modeling can perform on par with human crowd-workers from Amazon Mechanical Turk (AMT) supervised via 3-shot learning. [18] In this paper, we formulate the data augmentation of stance detection as a conditional masked language modeling task and augment the dataset by predicting the masked word conditioned on both its context and the auxiliary sentence that contains target and label information. [19] However, there are still some shortcomings in the Masked Language Modeling (MLM) task performed by these models. [20] Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text. [21] The popular masked language modeling (MLM) pretraining methods like BERT replace some tokens with [MASK] and an encoder is trained to recover them, while ELECTRA trains a discriminator to detect replaced tokens proposed by a generator. [22] DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). [23] In this paper, we propose XGPT, a new method of Cross-modal Generative Pre-Training for Image Captioning that is designed to pre-train text-to-image caption generators through three novel generation tasks, including Image-conditioned Masked Language Modeling (IMLM), Image-conditioned Denoising Autoencoding (IDA), and Text-conditioned Image Feature Generation (TIFG). [24] Successful methods for unsupervised neural machine translation (UNMT) employ cross-lingual pretraining via self-supervision, often in the form of a masked language modeling or a sequence generation task, which requires the model to align the lexical- and high-level representations of the two languages. [25] The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. [26] , masked language modeling and masked object/frame prediction). [27] Taking phoneme posterior and subword-level text as an input, ST-BERT learns a contextualized cross-modal alignment via our two proposed pre-training tasks: Cross-modal Masked Language Modeling (CM-MLM) and Cross-modal Conditioned Language Modeling (CM-CLM). [28] DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). [29] Previous works mainly focus on extending the objective of BERT’s Masked Language Modeling (MLM) from masking individual tokens to contiguous sequences of n tokens. [30] We pretrain K3M with three pretraining tasks, including masked object modeling (MOM), masked language modeling (MLM), and link prediction modeling (LPM). [31] , 2020a) pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling. [32] , masked language modeling using BERT and reading comprehension using BiDAF). [33]我们训练了一个使用掩码语言建模目标训练的 Roberta-Large 模型。 [1] 我们使用在具有音频、视觉和文本特征的掩码语言建模任务上训练的转换器来学习多模态表示。 [2] nan [3] nan [4] nan [5] nan [6] nan [7] nan [8] nan [9] nan [10] nan [11] nan [12] 我们为我们提出的预训练方法和增强掩蔽语言建模预训练引入了几种类型的标记、句子和段落级损坏技术,并使用我们的预训练方法来利用上下文信息和话语信息。 [13] nan [14] nan [15] nan [16] nan [17] nan [18] nan [19] nan [20] nan [21] nan [22] nan [23] nan [24] nan [25] nan [26] nan [27] nan [28] nan [29] nan [30] nan [31] nan [32] nan [33]
Statistical Language Modeling 统计语言建模
Could we operationalize uniform information density as an inductive bias for statistical language modeling? In this paper, we augment the canonical MLE objective for training language models by encoding UID as regularization. [1] Recently, statistical language modeling techniques have achieved great success in the code completion task. [2] We present a hypothetical argument against finite-state processes in statistical language modeling that is based on semantics rather than syntax. [3] This paper presents a semi-automatic method for statistical language modeling. [4]我们能否将统一的信息密度作为统计语言建模的归纳偏差进行操作?在本文中,我们通过将 UID 编码为正则化来增强用于训练语言模型的规范 MLE 目标。 [1] 最近,统计语言建模技术在代码完成任务中取得了巨大的成功。 [2] nan [3] nan [4]
Protein Language Modeling 蛋白质语言建模
We discuss recent advances in protein language modeling and their applications to downstream protein property prediction problems. [1] Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. [2] Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. [3]我们讨论了蛋白质语言建模的最新进展及其在下游蛋白质特性预测问题中的应用。 [1] 进化规模的蛋白质语言建模是朝着生物学预测和生成人工智能迈出的合乎逻辑的一步。 [2] nan [3]
Scale Language Modeling 比例语言建模
To demonstrate the practical effectiveness of AGNOSTICFEDAVG, we report positive results for large-scale language modeling tasks in both simulation and live experiments, where the latter involves training language models for Spanish virtual keyboard for millions of user devices. [1] We introduce a reinforcement learning system that (1) incorporates large-scale language modeling-based and commonsense reasoning-based pre-training to imbue the agent with relevant priors; and (2) leverages a factorized action space of action commands and dialogue, balancing between the two. [2] We evaluate models trained over large-scale language modeling tasks as well as human performance, showing that there are different challenges for system sense-making. [3]为了证明 AGNOSTICFEDAVG 的实际有效性,我们报告了在模拟和现场实验中大规模语言建模任务的积极结果,其中后者涉及为数百万用户设备训练西班牙语虚拟键盘的语言模型。 [1] 我们引入了一种强化学习系统,该系统 (1) 结合了基于大规模语言建模和基于常识推理的预训练,为代理注入相关的先验; (2) 利用动作命令和对话的分解动作空间,在两者之间取得平衡。 [2] nan [3]
Distributed Language Modeling
Recent advances in distributed language modeling have led to large performance increases on a variety of natural language processing (NLP) tasks. [1] The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. [2]Natural Language Modeling
This chatbot is made with a computational language that focuses on natural language modeling and cosine similarity as a method for calculating the proximity of inputs and databases. [1] Specifically, KERP decomposes medical report generation into explicit medical abnormality graph learning and subsequent natural language modeling. [2]这个聊天机器人是用一种计算语言制作的,该语言专注于自然语言建模和余弦相似度,作为计算输入和数据库接近度的一种方法。 [1] nan [2]
Strong Language Modeling
Transformer, which is demonstrated with strong language modeling capability, however, is not personalized and fails to make use of the user and item IDs since the ID tokens are not even in the same semantic space as the words. [1] The first one is frame-level normalization of probabilities in CTC, which induces strong language modeling behavior that leads to overfitting and interference with external language models. [2]然而,Transformer 被证明具有强大的语言建模能力,它不是个性化的,并且无法利用用户和项目 ID,因为 ID 令牌甚至与单词不在同一个语义空间中。 [1] nan [2]
Bidirectional Language Modeling 双向语言建模
Furthermore, a novel training strategy is introduced to fully exploit the potential of the transformer in bidirectional language modeling. [1] We pre-train DiMBERT on a large amount of image–sentence pairs on two tasks: bidirectional language modeling and sequence-to-sequence language modeling. [2]此外,还引入了一种新颖的训练策略,以充分发挥转换器在双向语言建模中的潜力。 [1] 我们在两个任务上对大量图像-句子对预训练 DiMBERT:双向语言建模和序列到序列语言建模。 [2]
Conditional Language Modeling 条件语言建模
, finding supportive evidence within a set of documents for a query) and (2) conditional language modeling (i. [1] In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. [2],在一组文档中为查询找到支持性证据)和(2)条件语言建模(i. [1] 在神经机器翻译(以及更普遍的条件语言建模)中,目标标记的生成受两种类型的上下文影响:源和目标序列的前缀。 [2]
Recurrent Language Modeling 循环语言建模
Experiments on convolutional image classification and recurrent language modeling are conducted on three public datasets to show the effectiveness of our proposed methods. [1] We propose a neural model, based on recurrent language modeling (e. [2]在三个公共数据集上进行了卷积图像分类和循环语言建模的实验,以展示我们提出的方法的有效性。 [1] 我们提出了一个基于循环语言建模的神经模型(例如。 [2]
Neural Language Modeling 神经语言建模
Biomedical literature retrieval has greatly benefited from recent advances in neural language modeling. [1] We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable and generalizable deep-learning model, which predicts interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species. [2]生物医学文献检索极大地受益于神经语言建模的最新进展。 [1] 我们结合神经语言建模和结构驱动设计的进步来开发 D-SCRIPT,这是一种可解释和可概括的深度学习模型,它仅使用它们的序列来预测两种蛋白质之间的相互作用,并在有限的训练数据和跨物种的情况下保持高精度。 [2]
Shape Language Modeling 形状语言建模
Algorithms for interpreting DTS were improved to include a new technique, Shape Language Modeling (SLM), and a probabilistic approach. [1] Debye, Lorentz and shape language modeling (SLM) approaches are applied. [2]解释 DTS 的算法得到改进,包括一种新技术、形状语言建模 (SLM) 和概率方法。 [1] 应用了德拜、洛伦兹和形状语言建模 (SLM) 方法。 [2]
Modeling Language Modeling 建模语言建模
Researchers use Unified Modeling Language modeling with the aim that the desired system can be provided properly so that problems in the existing system are resolved. [1] For this, the software development methodology followed was Rational Unified Process and the Unified Modeling Language modeling language. [2]研究人员使用统一建模语言建模的目的是可以适当地提供所需的系统,从而解决现有系统中的问题。 [1] 为此,遵循的软件开发方法是 Rational Unified Process 和统一建模语言建模语言。 [2]
language modeling task 语言建模任务
We comparatively evaluate the standard GRU with the proposed two variants on four different tasks: (1) sentiment classification on the IMDB movie review dataset, (2) language modeling task on Penn TreeBank (PTB) dataset, (3) sequence to sequence addition problem, and (4) question answering problem on Facebook’s bAbitasks dataset. [1] We learn multi-modal representations using a transformer trained on the masked language modeling task with audio, visual and text features. [2] To demonstrate the practical effectiveness of AGNOSTICFEDAVG, we report positive results for large-scale language modeling tasks in both simulation and live experiments, where the latter involves training language models for Spanish virtual keyboard for millions of user devices. [3] In the present work we use Gibbs sampling of BERT-style LMs, pre-trained on protein sequences using the masked language modeling task, to generate novel protein sequences. [4] We regularize the decoder in a sequence-to-sequence architecture by multitask training it on both the speech recognition task and a next-token prediction language modeling task. [5] In our experiments on various machine translation and language modeling tasks, we show that controlling inter-head diversity leads to the best performance among baselines. [6] Specifically, LayoutLMv2 not only uses the existing masked visual-language modeling task but also the new text-image alignment and textimage matching tasks in the pre-training stage, where cross-modality interaction is better learned. [7] Our model, which we call FlowPrior, shows a substantial improvement in language modeling tasks compared to strong baselines. [8] StrucTexT uses the existing Masked Visual Language Modeling task and the new Sentence Length Prediction and Paired Boxes Direction tasks to incorporate the multi-modal information across text, image, and layout. [9] Using text generated from our model as data augmentation, we show significant reductions in perplexity on a language modeling task, compared to using text from other generative models of CS text. [10] In this paper, we formulate the data augmentation of stance detection as a conditional masked language modeling task and augment the dataset by predicting the masked word conditioned on both its context and the auxiliary sentence that contains target and label information. [11] Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text. [12] Finally, to improve robustness towards in-domain content words, we propose a multi-task model that can jointly perform content word detection and language modeling tasks. [13] We experiment with our method on the PTB and WikiText language modeling tasks. [14] Extensive experiments on language modeling tasks demonstrate the superiority of DAVAM against several VAE counterparts. [15] We evaluate models trained over large-scale language modeling tasks as well as human performance, showing that there are different challenges for system sense-making. [16]我们在四个不同的任务上比较评估标准 GRU 与提出的两个变体:(1)IMDB 电影评论数据集上的情感分类,(2)Penn TreeBank(PTB)数据集上的语言建模任务,(3)序列到序列添加问题,以及 (4) Facebook 的 bAbitasks 数据集上的问答问题。 [1] 我们使用在具有音频、视觉和文本特征的掩码语言建模任务上训练的转换器来学习多模态表示。 [2] 为了证明 AGNOSTICFEDAVG 的实际有效性,我们报告了在模拟和现场实验中大规模语言建模任务的积极结果,其中后者涉及为数百万用户设备训练西班牙语虚拟键盘的语言模型。 [3] nan [4] nan [5] 在我们对各种机器翻译和语言建模任务的实验中,我们表明控制头部间的多样性会导致基线中的最佳性能。 [6] 具体来说,LayoutLMv2 不仅使用了现有的蒙面视觉语言建模任务,而且在预训练阶段使用了新的文本图像对齐和文本图像匹配任务,其中更好地学习了跨模态交互。 [7] 我们的模型,我们称之为 FlowPrior,与强大的基线相比,显示了语言建模任务的显着改进。 [8] nan [9] 使用从我们的模型生成的文本作为数据增强,与使用来自其他 CS 文本生成模型的文本相比,我们在语言建模任务上的困惑度显着降低。 [10] nan [11] nan [12] nan [13] nan [14] nan [15] nan [16]
language modeling objective 语言建模目标
We trained a Roberta-Large model trained with a masked language modeling objective. [1] Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective as well as on the sentence level via contrastive learning and random input shuffling. [2] To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. [3] The model interleaves row and column attention across the input sequences and is trained with a variant of the masked language modeling objective across many protein families. [4] In KEPLER, we encode textual entity descriptions with a PLM as their embeddings, and then jointly optimize the KE and language modeling objectives. [5] In this paper we demonstrate that Transformer attention maps learn contacts from the unsupervised language modeling objective. [6]我们训练了一个使用掩码语言建模目标训练的 Roberta-Large 模型。 [1] 使用并行数据,我们的方法通过最近提出的翻译语言建模目标在单词级别对齐嵌入,并通过对比学习和随机输入改组在句子级别对齐嵌入。 [2] 为了在这方面改进语言模型,我们建议使用基于来自原始文本语料库的否定通用句子的似然性目标来增强语言建模目标。 [3] nan [4] 在 KEPLER 中,我们使用 PLM 作为嵌入对文本实体描述进行编码,然后联合优化 KE 和语言建模目标。 [5] nan [6]
language modeling technique 语言建模技术
Recently, statistical language modeling techniques have achieved great success in the code completion task. [1] Using a corpus of scholarly documents across 19 disciplines and state-of-the-art language modeling techniques, we learn a fixed set of domain-agnostic descriptors for document sections and “retrofit” the corpus to these descriptors (also referred to as “normalization”). [2] Various language modeling techniques have been developed on post recognition tasks like semantic correction. [3] We use language modeling techniques from the Natural Language Processing (NLP) domain in our algorithmic suite, Athena, to automatically tune the performance-sensitive configuration parameters. [4]最近,统计语言建模技术在代码完成任务中取得了巨大的成功。 [1] 使用跨 19 个学科的学术文档语料库和最先进的语言建模技术,我们为文档部分学习了一组与领域无关的固定描述符,并将语料库“改造”为这些描述符(也称为“规范化” ”)。 [2] nan [3] nan [4]
language modeling approach 语言建模方法
Our results show that the most recent DL language modeling approach provides the highest quality; however, this quality comes at reduced model transparency. [1] With the modeling of bidirectional contexts, recently prevalent language modeling approaches such as XLM achieve better performance than traditional methods based on embedding alignment, which strives to assign similar vector representations to semantic-equivalent units. [2] We extend previous discriminative n-gram language modeling approaches to incorporate real-world knowledge from a Knowledge Graph (KG), using features that capture entity type-entity and entity-entity relationships. [3] This paper explores the relationship between Twitter feed on Bitcoin and sentiment analysis of it, comparing and evaluating different data mining classifiers and deep learning methods that might help in better sentiment classification of Bitcoin tweets, the study uses different language modeling approaches, such as tweet embedding and N-Gram modeling. [4]我们的结果表明,最新的 DL 语言建模方法提供了最高质量;然而,这种品质来自于模型透明度的降低。 [1] 通过双向上下文的建模,最近流行的语言建模方法(如 XLM)比基于嵌入对齐的传统方法实现了更好的性能,后者努力将相似的向量表示分配给语义等价的单元。 [2] nan [3] nan [4]
language modeling capability
Transformer, which is demonstrated with strong language modeling capability, however, is not personalized and fails to make use of the user and item IDs since the ID tokens are not even in the same semantic space as the words. [1] However, the A2W model suffers from the out-of-vocabulary (OOV) word problem and cannot use text-only data to improve the language modeling capability. [2]然而,Transformer 被证明具有强大的语言建模能力,它不是个性化的,并且无法利用用户和项目 ID,因为 ID 令牌甚至与单词不在同一个语义空间中。 [1] nan [2]
language modeling problem 语言建模问题
Extensive experiments, on federated image classification and language modeling problems, at different levels of data heterogeneity, demonstrate that our method can reduce the amount of communication necessary to achieve fixed performance targets by more than two orders of magnitude when compared to FD, and by more than four orders of magnitude when compared to parameter averaging based techniques like Federated Averaging. [1] We introduce PolyLM, a method which formulates the task of learning sense embeddings as a language modeling problem, allowing contextualization techniques to be applied. [2]对联邦图像分类和语言建模问题的广泛实验,在不同的数据异质性水平上,证明我们的方法可以将实现固定性能目标所需的通信量减少两个数量级以上,与 FD 相比,减少更多与基于参数平均的技术(如联合平均)相比,超过四个数量级。 [1] 我们介绍了 PolyLM,这是一种将学习语义嵌入的任务制定为语言建模问题的方法,允许应用上下文化技术。 [2]
language modeling dataset 语言建模数据集
Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method compared to strong baselines. [1] We conduct comprehensive experiments on three language modeling datasets to perform quantitative and qualitative comparisons of various LMs. [2]基于单词和基于字符的语言建模数据集的实验证明了我们提出的方法与强基线相比的有效性。 [1] 我们对三种语言建模数据集进行了综合实验,以对各种 LM 进行定量和定性比较。 [2]
language modeling pre 语言建模预
We introduce several types of token, sentence and paragraph-level corruption techniques for our proposed pre-training approach and augment masked language modeling pre-training with our pre-training method to leverage both contextualized and discourse information. [1] The proposed method is task-agnostic and does not require further language modeling pre-training. [2]我们为我们提出的预训练方法和增强掩蔽语言建模预训练引入了几种类型的标记、句子和段落级损坏技术,并使用我们的预训练方法来利用上下文信息和话语信息。 [1] 所提出的方法与任务无关,不需要进一步的语言建模预训练。 [2]