Aiming at improving the recognition accuracy and robustness of the penetration state recognition model, a Dual-input Faster R-CNN (region-convolutional neural network) model with the input of original infrared thermal (IR) and visual (CCD) image was established.
In order to identify the action being executed, the end-to-end action recognition model proposed in this paper uses the Temporal Convolutional Neural Network (TCN) to extract local temporal features and a Gate Recurrent Unit (GRU) layer to extract global temporal features, which increases the accuracy of action fragment recognition.
By establishing an end-to-end speech recognition model based on deep neural network, taking logarithmic amplitude spectrum features as the network input, using connectionist temporal classifiers to carry out time sequence classification, and using convolutional neural network to deal with the correlation between frames, an end-to-end speech recognition system is implemented.
The fatigue examples can be used as input to build and train a multi-layer two-way leg muscle fatigue status recognition model based on Long Short-Term Memory (LSTM).
In order to find the best algorithm, Classification and Regression Tree (CART) and pattern recognition models based on artificial neural networks (ANN) have been compared.
The article discusses the issue of using artificial neural networks for recognizing the conditionally graphical designations of electrical engineering, in particular, the convolutional neural networks and the R-CNN object recognition model, which is most suitable for solving the task at hand.
Using the machine learning method of support vector machine, a speech emotion recognition model with high recognition rate is constructed and applied to the mobile speech emotion recognition system, and the algorithm is verified by experiments.
This paper briefly introduced the support vector machine (SVM) based and convolutional neural network (CNN) based healthy emotion recognition method, then improved the traditional CNN by introducing Long Short Term Memory (LSTM), and finally carried out simulation experiments on three emotion recognition models, the SVM, traditional CNN, and improved CNN models, in the self-built face database.
To overcome this challenge, we propose an automated facial recognition model to identify NS using a novel deep convolutional neural network (DCNN) with a loss function called additive angular margin loss (ArcFace).
We propose using deep recurrent neural networks (DRNNs) for building recognition models capable of capturing long-range dependencies in variable-length input sequences.
Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.
The existing partial differential equation recognition models based on average curvature motion are all edge-based and need to use the external force defined by the image gradient to attract the zero level set (evolution curve) to move to the target edge and finally stay on the target edge.
The algorithm uses weight coefficients to organically combine high-order partial differential equation models, retains the advantages of the second- and fourth-order partial differential recognition models, effectively improves the ability to maintain image edge detail information, and achieves better recognition results.
Therefore, based on the deep convolution neural network and the transfer learning strategy, an intelligent recognition model of surface defects of copper strip is established in this paper.
To overcome the problem, this paper adopted the method based on CNN(Convolutional neural network) and LSTM(Long and short term memory network) to build gait recognition models.
First analyze the process of building health status recognition, find the factors that affect the building health status recognition effect, and then select the main influencing factors for building health status recognition modeling, and introduce machine learning algorithms to describe the building health status and influencing factors The internal connection between the building health status recognition model is established, and finally the effectiveness and superiority of the method are analyzed using specific building health status recognition examples.
The main work of this paper is as follows:(1)Making a sample data set containing about 49000 pictures;(2)Constructing convolution neural network;(3)The correct rate of training image recognition model is 96%;(4)Realize tea picture recognition;(5)Build the web page.
First, the physical education teaching activities required by the new curriculum reform were studied with regard to the actual needs of China’s current social, political, and economic development; next, the application of artificial intelligence technology to physical education teaching activities was proposed; and finally, deep learning technology was studied and a human movement recognition model based on a long short-term memory (LSTM) neural network was established to identify the movement state of students in physical education teaching activities.
Finally, we obtained high-precision and lightweight face recognition models with fewer parameters and calculations that are more suitable for applications.
The existing emotion recognition models, that use stimuli such as music and pictures in controlled lab settings and limited number of emotion classes, have low ecological validity.
By establishing an end-to-end speech recognition model based on deep neural network, taking logarithmic amplitude spectrum features as the network input, using connectionist temporal classifiers to carry out time sequence classification, and using convolutional neural network to deal with the correlation between frames, an end-to-end speech recognition system is implemented.
In order to further explore and solve the problem of distributed training edge intelligent application in edge devices close to the data side, a joint training method of multi-source federated learning is proposed for the robust speech recognition model.
The contribution of our paper is to establish the fault field dataset Fault-Data, propose the ontology concept of the fault diagnosis field, and obtain a good field recognition effect through the verification of the entity recognition model of fault diagnosis.
Aiming at the above problems, this paper proposes a new entity recognition model based on BiLSTM with knowledge graph and attention mechanism and applies the model to extract medical entities from Internet medical consultation text.
In order to identify the action being executed, the end-to-end action recognition model proposed in this paper uses the Temporal Convolutional Neural Network (TCN) to extract local temporal features and a Gate Recurrent Unit (GRU) layer to extract global temporal features, which increases the accuracy of action fragment recognition.
First, an efficient pattern recognition model that includes three main modules for feature extraction, feature optimization and classification is presented.
Currently, an image recognition model was implemented using deep CNN (Inception V3 model) which accepts input image and it is passed through a series of layers and the output is generated.
The intelligent engine start-stop trigger (IEST) system based on the actual road running status was established by building the image recognition model and the digital traffic analysis model in order to solve this problem.
We compare the performance of the proposed augmentation method with that of conventional methods by using two public datasets and an activity recognition model based on convolutional neural networks.
The article discusses the issue of using artificial neural networks for recognizing the conditionally graphical designations of electrical engineering, in particular, the convolutional neural networks and the R-CNN object recognition model, which is most suitable for solving the task at hand.
Since some of these languages are a part of different regions, we garner additional fonts through a region-based search to improve the scene-text recognition models in Arabic and Devanagari.
Further, this paper has been organized in such a way to provide a thorough and better discussion on the three phases of the facial expression recognition model.
The proposed algorithm comprises of two models: (i) facial expression recognition model, which refers to the state-of-the-art convolutional neural network structure; and (ii) sensor fusion emotion recognition model, which fuses the recognized state of facial expressions with electrodermal activity, a bio-physiological signal representing electrical characteristics of the skin, in recognizing even the driver’s real emotional state.
The main intent of this paper is to implement an efficient hand gesture recognition model considering both static and dynamic datasets for Indian Sign Languages (ISL).
Aiming at improving the recognition accuracy and robustness of the penetration state recognition model, a Dual-input Faster R-CNN (region-convolutional neural network) model with the input of original infrared thermal (IR) and visual (CCD) image was established.
For that purpose, we have utilized the Resonant Recognition Model (RRM), which is biophysical model capable of identifying parameters (frequencies) related to specific macromolecular (protein, DNA, RNA) functions and/or interactions.
Moreover, this paper combines fuzzy recognition algorithm to analyze English speech features, and analyzes the shortcomings of traditional algorithms, and proposes the fuzzy digitized English speech recognition algorithm, and builds an English speech feature recognition model on this basis.
In order to improve the effect of feature recognition of sports competition, this study improves the TLD algorithm, and uses machine learning to build a feature recognition model of sports competition based on the improved TLD algorithm.
Therefore, this paper proposes a new method 3D ResNet-66, which combines a 50-layer 3D residual network and four-layer residual blocks, effectively reducing the number of parameters while increasing the depth of the network, and we finally obtain a better video recognition model through experiments.
The experimental results on datasets CASIA-B and OU-MVLP demonstrate the state-of-the-art performance of our model compared to other GAN-based cross-view gait recognition models.
To overcome the problem, this paper adopted the method based on CNN(Convolutional neural network) and LSTM(Long and short term memory network) to build gait recognition models.
The research will continue to include the remaining words and sentence used in Amharic Sign Language to have a full- edged Sign Language recognition model to a complete system.
To effectively control the collapse risk of unstable rock masses, an attribute recognition model for risk assessment of unstable rock mass on high-steep slopes is established using attribute mathematical theory.
Additionally, we demonstrate an useful application from our pre-trained model, where we can train a speaker recognition model from the global latent variables and achieve high accuracy by fine-tuning with as few data as one label per speaker.
The main novelty of this paper is twofold: first, to investigate Hur as a complexity feature and AAPE as an irregularity parameter for the emotional-based EEGs using two-way analysis of variance (ANOVA) and then integrating these features to propose a new CompEn hybrid feature fusion method towards developing the novel WT_CompEn gender recognition framework as a core for an automated gender recognition model to be sensitive for identifying gender roles in the brain-emotion relationship for females and males.
Secondly, it aims to develop an automatic gender recognition model by employing optimization algorithms to identify the most effective channels for gender identification from emotional-based EEG signals.
Based on the NEV high-frequency big data collected by the vehicle-mounted terminal, we extract the feature parameter set that can reflect the precise spatiotemporal changes in driving behavior, use the principal component analysis method (PCA) to optimize the feature parameter set, realize the automatic driving style classification using a K-means algorithm, and build a driving style recognition model through a neural network algorithm.
Finally, an improved VGG-GoogleNet network recognition model comes into being through the fusion and improvement of the VGG and GoogleNet neural network structure.
The given paper considers the problem of identifying a person by gait through the use of neural network recognition models focused on working with RGB images.
[Purpose]Aiming at the problems of slow speed, over-fitting and unsatisfactory recognition of traditional recognition models, VGG16 convolutional neural network is applied to rice disease identification, and the knowledge learned by VGG16 model is transferred to rice disease identification by means of migration learning method to construct the recognition model.
The proposed process significantly detects the adversarial attacks based on two audio-visual recognition models, namely Lip-Reading in the Wild(LRW) and Geospatial Repository and Data (GRiD) Management models, which are trained in correspondence to the Lip reading data sets.
Subsequently, edge defect recognition models were established on the basis of LeNet-5, AlexNet, and VggNet-16 by using a convolutional neural network as the core.
Aiming at the problems of poor feature extraction ability, low accuracy, low recognition efficiency, and over-fitting of convolutional neural networks when the data set (training set) is small, we propose a new corn ear quality recognition model called CornNet.
Experiments demonstrate that our proposed local–global information-reasoned social relation recognition model (SRR-LGR) can reason through the local–global information.
In order to solve the mentioned problems, this paper will combine the algorithm that can highlight the syntactic structure in sentences and improve the accuracy of the model with the Algorithm that can highlight the contribution of words in sentences and the loss function level integration is carried out in the framework of small sample prototype network, so as to maximize the advantages of each algorithm and improve the accuracy –firstly, in the coding layer of the prototype network, we use the CNN algorithm which can highlight the important words in the sentences and the TreeLSTM algorithm which can parse the sentences in the text so that the syntactic relations between the words in the sentences can be acted on in the relation recognition, the sentences are coded together by two algorithms, then, the EUCLIDEAN distance loss is calculated by using this high quality coding and the prototype coding, finally, the traditional entity relation recognition model with Attention Mechanism is integrated into the loss function, further highlighting the decisive role of important words in text sentences in relation recognition and improving the generalization of the model.
The orientational pattern of the priming effects seen in our results mostly confirms earlier word recognition models, but also serves a more detailed view about the effects of orientation on word form processing.
To conquer the performance loss incurred by diverse human motion in a single view, a multi-view real-time human motion recognition model based on ensemble learning is proposed.
Aiming at the situation that the motion recognition of sports athletes is interfered by a variety of factors and the recognition results are not ideal, this paper uses the maximum spanning tree algorithm as the model basis to use machine learning ideas to construct a sports player motion recognition model based on the maximum spanning tree algorithm.
Moreover, the Euclidean distance between the posterior probabilities classified correctly and incorrectly is proposed to evaluate the test samples' feature distinguishability for different recognition models.
Recognition Mode -
This paper takes the neural network as the classification tool, through image preprocessing and contour extraction, establishes the recognition model of the target image.[1]Computer simulation confirmed the recognition mode in theory and was compared with experiments.[2]Furthermore, the recognition model can serve as a unified platform for different recog.[3]Using the recognition model to carry out a simulation experiment, the experimental results show that when the number of hidden layers of the model is 1, the number of hidden layer nodes is 6, the accuracy of model recognition is better, and the average accuracy of model recognition is 95.[4]Structural and biochemical studies of the SARS-CoV-2 spike complexes with highly potent antibodies have revealed multiple conformation-dependent epitopes and a broad range of recognition modes linked to different neutralization responses In this study, we combined atomistic simulations with mutational and perturbation-based scanning approaches to perform in silico profiling of binding and allosteric propensities of the SARS-CoV-2 spike protein residues in complexes with B38, P2B-2F6, EY6A and S304 antibodies representing three different classes.[5]The feature vector based on time-domain statistical parameters is constructed, and the recognition model is built by using the support vector mechanism optimized by genetic algorithm.[6]Therefore, the recognition model has the advantages of small size, high recognition accuracy, fast recognition time.[7]Zero-shot learning (ZSL) is a learning paradigm that tries to develop a recognition model to recognize mutually exclusive training and testing classes.[8]Based on the near-infrared spectrum, the partial least square (PLS) method is used to establish the recognition model, and the model is applied to the real-time identification of the saturation level in the gravel water absorption process.[9]In view of the problem that traditional deep learning requires a large amount of labeled signal data set but the measured data is difficult to meet, this paper proposes a small sample recognition method based on transfer learning (TLSSM), which transfers the recognition model from large-scale simulation data to the small sample measured data recognition model.[10]The proposed approach consists of four processes: ⅰ) extraction of skeleton data from captured video data, ⅱ) interpolation of skeleton joint-points that were missed, ⅲ) calculation of features using interpolated skeleton data, and ⅳ) construction of action-recognition model using interpolated data and calculated features.[11]The current algorithms and recognition models are easily affected by multicollinearity between features.[12]The accumulation of 3D structural information on lectins receptors has allowed the recognition modes of microbe glycans to be classified into several groupings.[13]The experimental results show that on the same recognition model, the gesture recognition effect of fusion of sEMG signal and acceleration signal is better than that of only using sEMG signal.[14]Developing driving behavior prediction and recognition models play a crucial role in Advanced Driving Assistance Systems (ADAS).[15]On the other hand, models can be trained with data generated automatically by the recognition models with artificially selected references.[16]To handle these issues, in the present work, we first use a deep autoencoder based segmentation technique for isolating the digits from a handwritten digit string, and then we pass the isolated digits to a Residual Network (ResNet) based recognition model to obtain the machine-encoded digit string.[17]We apply our method to the recognition model and verify the effectiveness of PQK on keyword spotting (KWS) and image recognition.[18]To mimic such scenarios, we formulate a realistic domain-transfer problem, where the goal is to transfer the recognition model trained on clean posed images to the target domain of violent videos, where training videos are available only for a subset of subjects.[19]To solve the above problems, a deep generation as well as recognition model is derived based on Conditional Variational Auto-encoder (CVAE) and Generative Adversarial Network (GAN).[20]Since there is no handwritten pictures corresponding to the text, it is impossible to directly use the recognition model to obtain noisy labelled data.[21]In this paper, a recognition model based on the improved hybrid particle swarm optimisation (HPSO) optimised backpropagation network (BP) is proposed to improve the efficiency of radar working state recognition.[22]At the same time, it tries to reduce the possible domain differences in each hemisphere between the source and target domains so as to improve the generality of the recognition model.[23]In view of the fact that the general target detection and recognition model does not perform well in remote sensing images, and there’re large amount of different size targets need to be recognized.[24]Then, with the one-minute statistical average value of the extracted feature parameters as input, and the three typical burning states of the rotary kiln as output, an SVM-based recognition model for the burning state of the rotary kiln was established.[25]In this approach, there are three key components: input layer, output layer and recognition model.[26]Finally, we used the stochastic configuration networks (SCNs) algorithm to embed four-fold cross-validation for constructing the recognition model.[27]The recognition models of self‐occlusion gestures and object‐occlusion gestures in color map, depth map, and color and depth fusion are obtained.[28]This will give insight into the range of possible options for institutions in need of recognition models.[29]A recognition model named the SVM-NP is proposed in this paper to address the multi-attribute overlap in radar working recognition.[30]Besides, the tactile data is collected by the sensor and a recognition model is built.[31]All training comparisons were carried out using registered AF microscopy and PAS stained whole slide images originating from the same section, and the recognition models were built with the exact same training and test examples.[32]A wide range of recognition models have been proposed to solve this problem but have not proven their great performance.[33]As the multi-source high-spatial-resolution (HSR) images are being daily acquired from different sensors, it brings the challenge of transferring the recognition model from labeled images to new unlabelled images obtained from other sensors.[34]We design detection and recognition model to apply in the mobile device.[35]We analyze and model and then establish the recognition model of college students’ psychological crisis state; finally, we carry out the simulation experiment of college students’ psychological crisis state recognition.[36]This approach to synthesizing the recognition model makes it possible to effectively regulate the complexity (accuracy) of the classification tree model that is being built, and it is advisable to use it in situations with restrictions on the hardware resources of the information system, restrictions on the accuracy and structural complexity of the model, restrictions on the structure, sequence and depth of recognition of the training sample data array.[37]At the same time, based on the DeeplabV3plus model, by adjusting the loss function, a recognition model conforming to the mesoscale vortex characteristics is realized.[38]The recognition model used in this study is H-DenseUNet, which is applied to the segmentation of the liver and lesions, and a mixture of 2D/3D Hybrid-DenseUNet is used to reduce the recognition time and system memory requirements.[39]This work examines the sensitivity of activity-recognition models on the placement and distribution of multi-sensing measurement points in space.[40]In the current trend, a well-deployed cloud image recognition system can fulfill this requirement in which the recognition model’s training and adjustment have been delegated to the cloud service provider.[41]To obtain an optimal diagnosis accuracy by the reduced features of low dimensionality, binary particle swarm optimization (BPSO) algorithm is utilized to search for the most appropriate parameters of kernels and K-nearest neighbor (kNN) recognition model.[42]The study found that the fusion feature classification and recognition model have higher recognition accuracy.[43]It has been demonstrated that deep learning models are vulnerable to adversarial examples, and most existing algorithms can generate adversarial examples to attack image classification or recognition models trained from target datasets with visible image such as ImageNet, PASCAL VOC and COCO.[44]However, the state-of-the-art of Action Unit (AU) recognition models is mostly targeted to classify two dozen of AUs, typically related to the expression of emotions.[45]Finally, for this study's purposes, information is reported about the devices and mechanisms of interaction, tools and programming languages, recognition models, and the scarcity of methodologies used to develop educational applications with Augmented Reality is identified.[46]By constructing a recognition model of electricity stealing behavior of charging pile, the purpose of anti stealing electricity of charging pile is achieved.[47]However, a meta-learning problem known as a low-shot image recognition task occurs when a few images with annotations are available for learning a recognition model for a single category.[48]However, the problem of low accuracy rate of the recognition model caused by improper selection of training set samples by random split has severely restricted the development of LIBS in meat detection.[49]However, sensors are prone to faults, which greatly reduce the performance of the trained pattern-recognition model.[50]
Aiming at improving the recognition accuracy and robustness of the penetration state recognition model, a Dual-input Faster R-CNN (region-convolutional neural network) model with the input of original infrared thermal (IR) and visual (CCD) image was established.[1]In order to identify the action being executed, the end-to-end action recognition model proposed in this paper uses the Temporal Convolutional Neural Network (TCN) to extract local temporal features and a Gate Recurrent Unit (GRU) layer to extract global temporal features, which increases the accuracy of action fragment recognition.[2]Transfer learning with convolutional neural networks (CNNs) is commonly used to develop object recognition models in medical image analysis.[3]We compare the performance of the proposed augmentation method with that of conventional methods by using two public datasets and an activity recognition model based on convolutional neural networks.[4]Besides, convolutional neural network (CNN)-based PLD recognition models achieve adequate accuracy to some extent.[5]Aiming at the problems of poor feature extraction ability, low accuracy, low recognition efficiency, and over-fitting of convolutional neural networks when the data set (training set) is small, we propose a new corn ear quality recognition model called CornNet.[6]The aim of this research paper is providing an accurate CVD recognition model based on unsupervised and supervised machine learning methods relayed on convolutional neural network (CNN).[7]The proposed algorithm comprises of two models: (i) facial expression recognition model, which refers to the state-of-the-art convolutional neural network structure; and (ii) sensor fusion emotion recognition model, which fuses the recognized state of facial expressions with electrodermal activity, a bio-physiological signal representing electrical characteristics of the skin, in recognizing even the driver’s real emotional state.[8]Our developed system can operate 10 virtual pan-tilt cameras (25 fps) with multithread viewpoint control and 4 ms time granularity in synchronization with convolutional neural-network-based recognition model operating at 25 fps, which is accelerated by a general-purpose computing on graphics processing units.[9]In the study, we used the multi-task cascade convolutional neural network to identify the face features and used the MTCNN face detection algorithm frame by frame for the sampled frame images and normalized the single face region image cropping to the face recognition model input size.[10]In order to identify the action being executing, the end-to-end action recognition model proposed in this paper uses the Temporal Convolutional Neural Network (TCN) to extract local temporal features and a Gate Recurrent Unit (GRU) layer to extract global temporal features, which increases the accuracy of action fragment recognition.[11]The purpose of this study is to propose a new facial emotional recognition model using convolutional neural network.[12]This paper presents a facial expression recognition model combining Attention Mechanism and Convolutional Neural Networks.[13]To tackle this problem, we construct Design Issue Graphs (DIGs) with clothing attributes to form global and semantic representations of fashion styles, and propose a joint fashion style recognition model which consists of two convolutional neural networks based on clothing images and DIGs.[14]Afterward, an improved convolutional neural network (CNN) based recognition model which contains three depth-wise separable convolutions and two fully connected layers, namely, betel leaf CNN (BLCNN), has been built from scratch that realizes 96.[15]Two common fruit recognition models include bag-of-features (BoF) and convolutional neural network (ConvNet), which achieve high-performance results.[16]Subsequently, edge defect recognition models were established on the basis of LeNet-5, AlexNet, and VggNet-16 by using a convolutional neural network as the core.[17]In this work, we create a baseline speech emotion recognition model based on convolutional neural networks using the RAVDESS dataset.[18]This paper proposed a speech emotion recognition model based on Convolutional Neural Network (CNN).[19]To improve the capability for safety management of operating site in power system, a face recognition model based on convolutional neural network (CNN), eXtreme gradient boosting (XGBoost) and model fusion is built.[20]To this end, we propose a novel handwritten Turkish letter recognition model based on a convolutional neural network.[21]To improve the accuracy of underwater automatic target recognition, a sonar image recognition method based on convolutional neural network was proposed and the underwater target recognition model was established according to the characteristics of sonar images.[22]Nowadays, convolutional neural network as a recognition model has improved human identification by gait patterns.[23]The performance of a convolutional neural network (CNN) based face recognition model largely relies on the richness of labeled training data.[24]Then, a convolutional neural network with an NIN structure is constructed to train the target recognition model using a small number of samples and to distinguish the original images corresponding to the tchanged patches.[25]First, use an infrared thermal imager to obtain an infrared image of the body surface of a layer, and then use a convolutional neural network to establish a recognition model for the characteristic area of the layer, and extract the highest temperature of the region of interest in a healthy and pathological layer.[26]Based on such information and the corresponding wear information, a wear state recognition model is established by using a convolutional neural network.[27]To quickly and non-destructively monitor the external quality of strawberries, recognition models based on 750 Red Green Blue (RGB) image classifications and using convolutional neural networks (CNNs) were developed.[28][Purpose]Aiming at the problems of slow speed, over-fitting and unsatisfactory recognition of traditional recognition models, VGG16 convolutional neural network is applied to rice disease identification, and the knowledge learned by VGG16 model is transferred to rice disease identification by means of migration learning method to construct the recognition model.[29]Motivated by the success of deep convolutional network and transferring learning, this paper proposed an end-to-end hyperspectral face recognition model based on a light Convolutional Neural Network (CNN) and transfer learning.[30]In this paper, we propose a cybersecurity entity recognition model CyberEyes that uses non-local dependencies extracted by graph convolutional neural networks.[31]A pattern recognition model is proposed based on bispectrum and convolutional neural network (CNN) to detect engine faults in multiple working conditions end-to-end.[32]With the help of the image recognition model, the output of the convolutional neural network is used as the second impact factor, which is conducive to high-performance identification of Vespa mandarinias through real pictures.[33]This study proposes a highly accurate convolutional-neural-network-based facial expression recognition model that is able to further enhance the NAO robot’ awareness of human facial expressions and provide the robot with an interlocutor’s arousal level detection capability.[34]Next, a facial expression recognition model was constructed based on the DeepID convolutional neural network (CNN), and an emotional semantic space was established for the face images in surveillance video.[35]Specifically, for a given trained cumbersome convolutional neural network action recognition model, we use a lightweight hallucination network (H-Net) to study its generalization ability based on distillation.[36]This paper presents an image-based diagnosis system using convolutional neural networks (CNNs) to exploit an automatic recognition model for three common skin lesions (Melanoma, Nevus, Benign Keratosis Lesion BKL) using dermoscopic images.[37]Finally, the performance of MCNN is compared with many existing PD pattern recognition models based on convolutional neural network (CNN), the results show that the proposed MCNN can further reduce the number of parameters of the model and improve the calculation speed to achieve the best performance on the premise of good recognition accuracy.[38]Moreover, based on the ideas of deep learning and convolutional neural networks, this paper builds an athlete feature recognition model and optimizes the algorithm.[39]We fine-tuned VGG16, an image recognition model of deep learning convolutional neural network (CNN), for the detection of esophageal cancer.[40]Based on this, this paper studies the gait recognition model of sports training based on convolutional neural network algorithm.[41]In this work, an original isolated sign language recognition model is created using Convolutional Neural Networks (CNNs), Feature Pooling Module and Long Short-Term Memory Networks (LSTMs).[42]
By establishing an end-to-end speech recognition model based on deep neural network, taking logarithmic amplitude spectrum features as the network input, using connectionist temporal classifiers to carry out time sequence classification, and using convolutional neural network to deal with the correlation between frames, an end-to-end speech recognition system is implemented.[1]Deep neural network (DNN)-based SAR target recognition models have achieved great success in recent years.[2]Specifically, the deep neural network (DNN) is a deep recognition model which can extract depth features and mine the potential information of data.[3]
We proposed an algorithm that can automatically extract the dispersion coefficients of lightning whistler: (1) using two seconds time window on the SCM VLF data from the ZH-1 satellite to obtain segmented data; (2) generating time-frequency profile (TFP) of the segmented waveform by performing a band-pass filter and the short-time Fourier transform with a 94% overlap; (3) annotating the ground truth of the whistler with the rectangular boxes on the each time-frequency image to construct the training dataset; (4) building the YOLOV3 deep neural network and setting the training parameters; (5) inputting the training dataset to the YOLOV3 to train the whistler recognition model; (6) detecting the whistler from the unknown time-frequency image to extract the whistler area with the rectangle box as a sub-image; (7) conducting the BM3D algorithm to denoise the sub-image; (8) employing an adaptive threshold segmentation algorithm on the denoised sub-image to obtain the binary image which represents the whistler trace with the black pixel and other area with white pixel.
[4]Next, the tracking results are matched with the action features, and the action recognition model is constructed, which includes the spatial branch based on Deep neural networks and the temporal branch based on Bi-directional RNN and Bi-directional long short-term memory networks.[5]Deep neural networks, particularly face recognition models, have been shown to be vulnerable to both digital and physical adversarial examples.[6]The pervasive availability of computationally powerful touch-screen devices, similar to the recent emergence of deep neural networks as high-quality sequence recognition models, result in the widespread adoption of online recognition of handwritten mathematical expressions.[7]Nowadays, face recognition models with excellent performance are mostly based on deep neural networks (DNN).[8]In this paper, we have proposed a deep neural network based pattern recognition model for automatic classification of color leaves.[9]The state-of-the-art traffic sign recognition models are designed with the backbones of deep neural networks (DNNs) since DNNs are powerful to extract more effective visual features that benefit recognition performance.[10]Deep neural networks are being used for developing facial expression recognition models.[11]
We propose a real-time recognition model consisting of two long short-term memory classifiers with different sequence lengths.[1]The fatigue examples can be used as input to build and train a multi-layer two-way leg muscle fatigue status recognition model based on Long Short-Term Memory (LSTM).[2]The classification recognition model based on the long short-term memory (LSTM) is designed to judge the true or false of stratum information described by the contour point set to enhance the accuracy of formation identification.[3]With the help of the Bidirectional Long Short-Term Memory Network (Bi-LSTM) and Conditional Random Field (CRF) model, we introduce a method of ensemble learning, and implement a named entity recognition model ELER.[4]Firstly, we build a maneuver and stage recognition model upon the long short term memory (LSTM) to infer the current maneuver of the preceding target vehicle.[5]Long short-term memory (LSTM), a representative recurrent neural network, is utilized to automatically extract characteristics from the infrared signal and build the recognition model.[6]In order to reflect the blade icing state of wind turbines as truthfully and objectively as possible, this paper proposes a wind turbine blade icing state recognition model based on the combination of vine-Copula network model and Long Short-Term Memory (LSTM)-Autoencoder algorithm.[7]In many recent works, the recognition model architecture use CNN and long short-term memory units (LSTM) - attention models to extract spatial and temporal features from the input video.[8]Three basic movement behaviors including sitting, standing and walking were identified through the multilayer long short-term memory (LSTM) network, and the accuracy of motion behavior recognition model based on LSTM and the validity of motion data collected by intelligent shoes were verified.[9]We explore how our activity recognition model, a hybrid of long short-term memory and convolutional layers, pre-trained on smartwatch data from younger adults, performs on older adult data.[10]In this study, we propose a new muscle fatigue recognition model based on the long short-term memory (LSTM) network.[11]
In order to find the best algorithm, Classification and Regression Tree (CART) and pattern recognition models based on artificial neural networks (ANN) have been compared.[1]The article discusses the issue of using artificial neural networks for recognizing the conditionally graphical designations of electrical engineering, in particular, the convolutional neural networks and the R-CNN object recognition model, which is most suitable for solving the task at hand.[2]Specifically, object detection and text recognition models are investigated and adopted to ameliorate the labor-intensive machine state monitoring process, while artificial neural networks are introduced to enable real-time energy disaggregation for further optimization.[3]To develop the recognition model based on the spectral data, principal components analysis (PCA) coupled with artificial neural network (ANN) was used.[4]We first crawl online comments, then analyze the text data characteristics, establish emotional indicators of different attributes, calculate the sentiment value of the text data, and finally use artificial neural network technology to train to form an opinion leader recognition model.[5]In this paper, we propose a real-time gesture recognition model based on a feedforward artificial neural network for surface electromyography (sEMG) signals, which contains four processes: pre-processing, feature extraction, classification, and post-processing.[6]Second, a human behaviour recognition model is designed by using artificial neural network (ANN), and the learning of neural network parameters is conducted by back propagation algorithm to improve the ability of behaviour recognition.[7]Firstly, the load recognition model based on an artificial neural network is constructed, and the state-based recognition results are obtained.[8]By utilizing artificial neural network recognition model which used 45 characteristic molecules as inputs and 8 groups various aging years as outputs, the prediction accuracy of Baijiu vintage was up to 100%.[9]In this study, textindependent Amharic language speaker recognition model was developed using Mel-Frequency Cepstral Coefficients to extract features from preprocessed speech signals and Artificial Neural Network to model the feature vector obtained from the Mel-Frequency Cepstral Coefficients and to classify objects while testing.[10]
Using the machine learning method of support vector machine, a speech emotion recognition model with high recognition rate is constructed and applied to the mobile speech emotion recognition system, and the algorithm is verified by experiments.[1]This paper briefly introduced the support vector machine (SVM) based and convolutional neural network (CNN) based healthy emotion recognition method, then improved the traditional CNN by introducing Long Short Term Memory (LSTM), and finally carried out simulation experiments on three emotion recognition models, the SVM, traditional CNN, and improved CNN models, in the self-built face database.[2]Such method establishes a driving behavior recognition model based on Support Vector Machine (SVM) and oversampling.[3]To solve the problem that it is difficult to accurately identify and predict vehicle lane change, a support vector machine (SVM) algorithm optimized by artificial bee colony is proposed to build a vehicle lane change recognition model.[4]With this notion, we propose a plant species recognition model based on morphological features extracted from corresponding leaves’ images using the support vector machine (SVM) with adaptive boosting technique.[5]Two pattern recognition models, partial least squares discriminant analysis (PLS-DA) and support vector machine (SVM), were established, and the overall discrimination performance of the three types of models was compared.[6]Based on the memory mechanism, we established recognition model, which adopts support vector machine as base-classifier, to achieve better classification effect and eliminate the imbalance of data set.[7]This paper constructs an abnormal behavior recognition model of buses based on improved support vector machine.[8]Therefore, a novel approach to the oil layer recognition model using the improved whale swarm algorithm (WOA) and semi-supervised support vector machine (S3VM) is proposed in this paper.[9]
To overcome this challenge, we propose an automated facial recognition model to identify NS using a novel deep convolutional neural network (DCNN) with a loss function called additive angular margin loss (ArcFace).[1]To mark users’ affective preferences, we established an affective recognition model by Kansei engineering and deep convolutional neural networks.[2]We construct the face detection and recognition model based on deep convolutional neural network and triple loss function to realize the detection and recognition of face regions of students.[3]We present a saliency-based patch sampling strategy for recognizing artistic media from artwork images using a deep media recognition model, which is composed of several deep convolutional neural network-based recognition modules.[4]The proposed hybrid framework consists of three main parts: 1) A Deep Convolutional Neural Network (DCNN) and Deep Neural Network (DNN) based audio-visual multi-modal depression recognition model for estimating the Patient Health Questionnaire depression scale (PHQ-8); 2) A Paragraph Vector (PV) and Support Vector Machine (SVM) based model for inferring the physical and mental conditions of the individual from the transcripts of the interview; 3) A Random Forest (RF) model for depression classification from the estimated PHQ-8 score and the inferred conditions of the individual.[5]
We propose using deep recurrent neural networks (DRNNs) for building recognition models capable of capturing long-range dependencies in variable-length input sequences.[1]Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.[2]As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.[3]Aiming at the insufficient text semantic representation of the traditional word vector model and the inability of the recurrent neural network (RNN) model to solve the problems of long-term dependence, a Chinese clinical electronic medical record named entity recognition model XLNet-BiLSTM-MHA-CRF based on XLNet is proposed.[4]
The existing partial differential equation recognition models based on average curvature motion are all edge-based and need to use the external force defined by the image gradient to attract the zero level set (evolution curve) to move to the target edge and finally stay on the target edge.[1]The algorithm uses weight coefficients to organically combine high-order partial differential equation models, retains the advantages of the second- and fourth-order partial differential recognition models, effectively improves the ability to maintain image edge detail information, and achieves better recognition results.[2]Both theoretical analysis and experimental results show that the network image recognition model based on adaptive partial differential equation diffusion is more effective than the model based on partial differential equation recognition; at the same time, experiments show that the network image recognition model based on adaptive partial differential equation diffusion is more effective than the network image recognition model based on ordinary diffusion.[3]
Therefore, based on the deep convolution neural network and the transfer learning strategy, an intelligent recognition model of surface defects of copper strip is established in this paper.[1]In this paper, a Tai Le characters recognition model base on the deep convolution neural network (DCNN) has been established.[2]Firstly, a focus state recognition model, which is essentially an image classification model based on a deep convolution neural network, is established to identify the focus states of the microscopy system.[3]
To overcome the problem, this paper adopted the method based on CNN(Convolutional neural network) and LSTM(Long and short term memory network) to build gait recognition models.[1]
First analyze the process of building health status recognition, find the factors that affect the building health status recognition effect, and then select the main influencing factors for building health status recognition modeling, and introduce machine learning algorithms to describe the building health status and influencing factors The internal connection between the building health status recognition model is established, and finally the effectiveness and superiority of the method are analyzed using specific building health status recognition examples.[1]
The main work of this paper is as follows:(1)Making a sample data set containing about 49000 pictures;(2)Constructing convolution neural network;(3)The correct rate of training image recognition model is 96%;(4)Realize tea picture recognition;(5)Build the web page.[1]
First, the physical education teaching activities required by the new curriculum reform were studied with regard to the actual needs of China’s current social, political, and economic development; next, the application of artificial intelligence technology to physical education teaching activities was proposed; and finally, deep learning technology was studied and a human movement recognition model based on a long short-term memory (LSTM) neural network was established to identify the movement state of students in physical education teaching activities.[1]
We have also benchmarked the performance of existing face recognition models on the proposed IMFW dataset.[1]Finally, we obtained high-precision and lightweight face recognition models with fewer parameters and calculations that are more suitable for applications.[2]SLAM trackless navigation technology, deep learning-based visual recognition technology, face recognition model and intelligent voice interaction technology are applied in the system.[3]Based on this, we summarize facial emotion cognition among patients with schizophrenia, introduce the internationally recognized Bruce–Young face recognition model, and review the behavioral and event-related potential studies on the recognition of emotions at each stage of the face recognition process, including suggestions for the future direction of clinical research to explore the underlying mechanisms of schizophrenia.[4]In this method, we generate the attack by occluding the face landmark to fool the face recognition model.[5]The face recognition model trained using dataset (VGGFace2 and MS-Celeb-lM) augmented using our SATGAN achieves better accuracy on cross age dataset like Cross-Age LFW and AgeDB-30.[6]DNN-based face recognition models require large centrally aggregated face datasets for training.[7]These modulations of visual and semantic representations are consistent with face recognition models which emphasize the degree of familiarity but do not distinguish between different types of familiarity.[8]The experimental results reveal that the Subspace discriminant ensemble-based face recognition model has consistently performed in most image processing attacks.[9]Unlike other existing face swapping works that only use face recognition model to keep the identity similarity, we propose 3D shape-aware identity to control the face shape with the geometric supervision from 3DMM and 3D face reconstruction method.[10]In the study, we used the multi-task cascade convolutional neural network to identify the face features and used the MTCNN face detection algorithm frame by frame for the sampled frame images and normalized the single face region image cropping to the face recognition model input size.[11]A non-parametric low-resolution face recognition model for resource-constrained environments with limited networking and computing is proposed in this work.[12]Moreover, these approaches require computationally-heavy retraining of the deployed face recognition model and thus, are hardly-integrable into existing systems.[13]Typically, real-world requirements to deploy face recognition models in unconstrained surveillance scenarios demand to identify low-resolution faces with extremely low computational cost.[14]Deep neural networks, particularly face recognition models, have been shown to be vulnerable to both digital and physical adversarial examples.[15]BiasGAN can be inserted as a preprocesser prior to conducting adversarial attacks on face recognition models to get better attack performance.[16]In this paper, we propose a robust face recognition model called DeepWTPCA-$L_{1}$ using WTPCA-$L_{1}$ features and a CNN-LSTM architecture.[17]It provides accurate face recognition model which can be used for safety and security purpose.[18]We measure the extent to which this appearance bias can be embedded in state-of-the-art face recognition models and benchmark learning performance for subjective perceptions of personality traits from faces.[19]This article proposes the use of generative adversarial networks (GANs) via StyleGAN2 to create high-quality synthetic thermal images and obtain training data to build thermal face recognition models using deep learning.[20]Hence, to this consequence, this research work has addressed the problem of developing the children face recognition model with a suitable dataset.[21]Using DCNN, RetinaFace as the face detection model, and Arcface loss function, and adopting CRISP DM, the research contributes by providing a method to develop a face dataset with 1,200 identities, and face recognition model with 92 percent accuracy and be able to recognize Indonesian people with a face mask.[22]The resulting optimal parameters were used during the training of face recognition models.[23]In this paper, we propose a novel Fast FAce Recognizer (Fast-FAR), learning to improve the speed of DCNN-based face recognition model without sacrificing recognition accuracy.[24]This will be an input to the face recognition model, which will train and alert the user whenever violation occurs.[25]Therefore, SFace can make a better balance between decreasing the intra-class distances for clean examples and preventing overfitting to the label noise, and contributes more robust deep face recognition models.[26]We propose an efficient and better approach to train a face recognition model which has potential applications in banking operations among other domains.[27]Experiment results show that the proposed approach can generate imperceptible face AEs on the CelebA dataset with high attack success rate in fooling the state-of-the-art face recognition model.[28]Firstly, a simplified multi task face recognition model is proposed and designed to speed up the operation;Secondly, the correlation among multiple learning tasks is used to improve the recognition accuracy of the model;After model training and selection, an end-to-end multi task face recognition model is obtained.[29]The present study tries to develop a robust face recognition model in unconstrained environment using deep learning.[30]To improve the capability for safety management of operating site in power system, a face recognition model based on convolutional neural network (CNN), eXtreme gradient boosting (XGBoost) and model fusion is built.[31]The various advanced features are developed using some pre-trained and newly trained deep learning and face recognition models, text to speech module along with buzzer, ultrasonic sensors, camera, Raspberry Pi and Arduino boards with an approximate cost of Rs 5000-6000.[32]In this study, we investigate the use of state-of-the-art deep learning face recognition models to evaluate their capacity for discrimination between sibling faces using various similarity indices.[33]Experimental results on two state-of-the-art face recognition models show that, the maximum success rate of the proposed attack method reaches 100% on DeepID1 and VGGFace models, while the accuracy degradation of target recognition models are as low as 0.[34]However, the state-of-the-art general face recognition models do not generalize well to occluded face images, which are exactly the common cases in real-world scenarios.[35]The performance of a convolutional neural network (CNN) based face recognition model largely relies on the richness of labeled training data.[36]We evaluate the proposed methods against state-of-theart online and offline face recognition models, Clarifai.[37]Then, the Self-Assessment-Manikin (SAM) scale and face recognition model were used to measure people’s valence-arousal emotion values.[38]Traditional CNN-based face recognition models trained on existing datasets are almost ineffective for heavy occlusion.[39]To further improve the face recognition performance of SPGAN, we take advantage of the face identity prior by sending two inputs to the discriminator, including an input face image (either a real HR face image or its corresponding SR face image) and its face features which are extracted from a pre-trained face recognition model.[40]While wearing a mask is a necessary public health measure, the social phenomenon raises new challenges to existing face recognition models.[41]Motivated by the success of deep convolutional network and transferring learning, this paper proposed an end-to-end hyperspectral face recognition model based on a light Convolutional Neural Network (CNN) and transfer learning.[42]As part of our research, we analyzed the performance of classifiers based on deep learning face recognition models in detecting dysmorphic features.[43]As experiments proved, the best face recognition technique amongst 9 combinations was Dlib based face recognition model (with SVM classifier combined) as it showed the highest rate to distinguish people from each other.[44]Rectilinear face recognition models suffer from severe performance degradation when applied to fisheye images captured by 360° back-to-back dual fisheye cameras.[45]Nowadays, face recognition models with excellent performance are mostly based on deep neural networks (DNN).[46]However, training a face recognition model requires the collection of private data to a centralized server to obtain high performance in the desired domain.[47]In this research, we are presenting a hybrid face recognition model (HFRM) using machine learning methods with “Speed Up Robust Features” (SURF), “scale-invariant feature transform” (SIFT), Locality Preserving Projections (LPP) &Principal component analysis (PCA) method.[48]Additionally, an extended Injured Face (IF-V2) database of 150 subjects is presented to evaluate the performance of face recognition models.[49]Moreover, the competition considered the deployability of the proposed solutions by taking the compactness of the face recognition models into account.[50]
In this work, we have developed an emotion recognition model using EEG-based causal connectivity patterns.[1]The existing emotion recognition models, that use stimuli such as music and pictures in controlled lab settings and limited number of emotion classes, have low ecological validity.[2]In this paper, the deep learning music emotion recognition model based on musical stage effect is studied.[3]Our emotion recognition model combines the gradient reversal technique with an entropy loss function as well as the softlabel loss, and the experiment results show that domain transfer learning methods can be employed to alleviate the domain mismatch between different elicitation approaches.[4]In addition, we adopt a spectrogram augmentation technique to generate additional training data samples by applying random time-frequency masks to log-mel spectrograms to mitigate overfitting and improve the generalization of emotion recognition models.[5]In this paper, we compare the recognition performance and robustness of two multimodal emotion recognition models: deep canonical correlation analysis (DCCA) and bimodal deep autoencoder (BDAE).[6]There are existing emotion recognition models to help and understand the state of a person but mainly via text.[7]The positive and negative affect scale (PANAS) and the feature selection of the max-min ant system were applied to get the nonlinear features and emotion recognition model suitable for the virtual scenes.[8]Therefore, a machine learning-based text emotion recognition model using emotive features proposed and evaluated it on the SemEval-2019 dataset.[9]However, the non-stationary characteristics and individual differences of EEG limit the generalization of emotion recognition model in different time and different subjects.[10]A multimodal emotion recognition model from speech and text was proposed in this paper to optimize the performance of the emotion recognition system.[11]So, it is highly recommended to apply a noiseless image to the facial emotion recognition model for classification.[12]The dimensionality of the spatially distributed channels and the temporal resolution of electroencephalogram (EEG) based brain-computer interfaces (BCI) undermine emotion recognition models.[13]In many practical applications, a speech emotion recognition model learned on a source (training) domain but applied to a novel target (testing) domain degenerates even significantly due to the mismatch between the two domains.[14]By comparing the performance of the English speech emotion recognition model based on CNN neural network and the model proposed in this paper, the statistical comparison data is drawn into a statistical graph.[15]To make up the deficiency in unearthing the intrinsic relationships between multimodal, a novel modularized multimodal emotion recognition model based on deep canonical correlation analysis (MERDCCA) is proposed in this letter.[16]With the help of machine learning, this paper presents a method to extract the facial expression features of students in business English class, and establishes a student emotion recognition model, which consists of such modules as emotion mechanism, signal acquisition, analysis and recognition, emotion understanding, emotion expression, and wearable equipment.[17]Using the machine learning method of support vector machine, a speech emotion recognition model with high recognition rate is constructed and applied to the mobile speech emotion recognition system, and the algorithm is verified by experiments.[18]In order to mitigate this problem, we propose a majority-voted emotion recognition framework that constructs listener-dependent (LD) emotion recognition models.[19]Many affective computing studies have developed automatic emotion recognition models, mostly using emotional images, audio and videos.[20]This paper briefly introduced the support vector machine (SVM) based and convolutional neural network (CNN) based healthy emotion recognition method, then improved the traditional CNN by introducing Long Short Term Memory (LSTM), and finally carried out simulation experiments on three emotion recognition models, the SVM, traditional CNN, and improved CNN models, in the self-built face database.[21]However, it is challenging to build a desirable emotion recognition model due to the emotion-labeled data scarcity, especially in Korean.[22]It can be concluded that in emotion recognition from speech, the choice and application of dimensionality reduction of audio features impacts the results that are achieved and therefore, by working on this aspect of the general speech emotion recognition model, it may be possible to make great improvements in the future.[23]The platform will be further extended to include more multi-source learning models and serve as an open-source platform for the development and evaluation of multi-source emotion recognition models for HCSE.[24]26% and for the emotion recognition model, accuracy achieved is 69.[25]Different from related researches, a novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper.[26]In this work, we create a baseline speech emotion recognition model based on convolutional neural networks using the RAVDESS dataset.[27]This paper proposed a speech emotion recognition model based on Convolutional Neural Network (CNN).[28]In this paper, a multimodal emotion recognition model based on many-objective optimization algorithm is proposed for the first time.[29]These results demonstrate the advantage of our emotion recognition model over the current studies in terms of classification accuracy.[30]Conclusion A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after treatment in patients with MDD.[31]It is the basic starting point and it contains the knowledge model and data model of emotion recognition modeling method and it mainly analyzes the music tone said, transfer and recognition in the process of psychological mode.[32]Results showed an improved accuracy compared with previous studies that merely used the EEG signals in both arousal level and valence level, which suggests the effectiveness of our proposed multi-modal fused emotion recognition model.[33]Due to individual variability, training a generic emotion recognition model across different subjects is difficult.[34]This paper proposes a multimodal emotion recognition model based on a multiobjective optimization algorithm.[35]The proposed emotion recognition model is evaluated on two benchmark public databases namely DEAP and SEED.[36]In this paper, we propose a Happy Emotion Recognition model using the 3D hybrid deep and distance features (HappyER-DDF) method to improve the accuracy by utilizing and extracting two different types of deep visual features.[37]There are several advantages, an efficient Facial Emotion Recognition model can help us in self-discipline and control over the drivers, while they are driving the vehicle.[38]Considering the low music emotion recognition rate of traditional music emotion recognition models, a music emotion recognition model based on deep learning is proposed in this paper.[39]The complex narratives and naturalistic expressions in this dataset provide a challenging test for contemporary time-series emotion recognition models.[40]Moreover, the study designs a multimodal emotion recognition model of Chinese painting based on improved AlexNet neural network and chi square test.[41]In line with this problem, an EEG-based emotion recognition model is developed, and a new emotion recognition method based on deep learning is proposed.[42]This paper researches how to use attention mechanism to fuse the time series information of facial expression and speech, and proposes a multi-modal feature fusion emotion recognition model based on attention mechanism.[43]Through the experimental simulation of the specified data set, it can be proved that this model is superior to the current mainstream facial emotion recognition models in the performance of facial emotion detection.[44]The resulting improvement in the quality of emotion recognition in comparison with the known solutions confirms the feasibility of applying multitask learning to increase the accuracy of emotion recognition models.[45]Our previous research showed promising results when transferring features learned from speech to train emotion recognition models for music.[46]The traditional machine learning-based emotion recognition models have shown effective performance for classifying Electroencephalography (EEG) based emotions.[47]PurposeThe purpose of this study is to propose an alternative efficient 3D emotion recognition model for variable-length electroencephalogram (EEG) data.[48]However, insufficient high-quality training data are available for building EEG-based emotion recognition models via machine learning or deep learning methods.[49]The experimental results prove that the performance of the multi-modal fusion model proposed in this paper is superior to the single-modal emotion recognition model.[50]
By establishing an end-to-end speech recognition model based on deep neural network, taking logarithmic amplitude spectrum features as the network input, using connectionist temporal classifiers to carry out time sequence classification, and using convolutional neural network to deal with the correlation between frames, an end-to-end speech recognition system is implemented.[1]In order to further explore and solve the problem of distributed training edge intelligent application in edge devices close to the data side, a joint training method of multi-source federated learning is proposed for the robust speech recognition model.[2]We apply MixSpeech on two popular end-to-end speech recognition models including LAS (Listen, Attend and Spell) and Transformer, and conduct experiments on several low-resource datasets including TIMIT, WSJ, and HKUST.[3]An objective evaluation of the benefit in speech recognition thresholds in noise using an ASR-based speech recognition model suggests that more than half of the class D loss due to an increased level uncertainty might be compensable.[4]Dysarthria, heavy accents, and deaf and hard-of-hearing speech characteristics prove difficult for smart assistants to interpret despite the large amounts of diverse data used to train automatic speech recognition models.[5]Recent methods apply frame-level bottleneck features extracted from an end-to-end sequence-to-sequence speech recognition model.[6]In this paper, we present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers.[7]The DNN-HMM speech recognition model and the traditional GMM-HMM speech recognition model are used to preprocess the original corpus, and the accuracy of the corpus processing is compared.[8]The selection of the speech recognition modeling unit is the primary problem of acoustic modeling in speech recognition, and different acoustic modeling units will directly affect the overall performance of speech recognition.[9]Specifically, we investigate a two-stage training scheme that firstly applies a feature level optimization criterion for pretraining, followed by an ASR-oriented optimization criterion using an end-to-end (E2E) speech recognition model.[10]We propose a multitask training method for attention-based end-to-end speech recognition models to better incorporate language level information.[11]The goal of this work is to make end-to-end speech recognition models available to language workers via a user-friendly graphical interface.[12]The use of a speech recognition model has become extremely important.[13]Phoneme level alignments of the utterances produced by speech recognition models are used to extract articulatory feature vectors representing correct and substituted sounds from L1 and L2 speaker groups respectively.[14]Formulae to convert between SRT and percentage-correct were derived from basic concepts that underlie standard speech recognition models.[15]In recent years, there has been a great deal of research in developing end-to-end speech recognition models, which enable simplifying the traditional pipeline and achieving promising results.[16]We propose using federated learning, a decentralized on-device learning paradigm, to train speech recognition models.[17]While the end-to-end speech recognition models show impressive performance on many domains, they have difficulties in decoding long-form utterances.[18]Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.[19]To enable experiment reproducibility and ease the corpus usage, we also released an ESPnet recipe for our speech recognition models.[20]Our solution was built over deep speech recognition layers, namely the first two convolutional layers of the pre-trained 2015 Baidu’s speech recognition model.[21]In this paper, we propose methods to compute confidence score on the predictions made by an end-to-end speech recognition model in a 2-pass framework.[22]We adopt both these methods during the training of transformer-transducer speech recognition models, and show consistent WER improvements on Librispeech as well as across different languages.[23]Most of the datasets for speech recognition models consist of datasets collected from adult speakers.[24]The linguistic artificial intelligence teaching model can be assisted by the intelligent speech recognition model.[25]By training multilingually, we build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.[26]When trained on related or low-resource languages, multilingual speech recognition models often outperform their monolingual counterparts.[27]In the first part, I will talk about NLP Beyond Text, where we integrate visual context into a speech recognition model and find that the recovery of different types of masked speech inputs is improved by fine-grained visual grounding against detected objects [2].[28]To the best of our knowledge, this is the first work on detection of adversarial attacks on audiovisual speech recognition models.[29]In contrast, we propose an end-to-end non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once).[30]In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition.[31]As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.