, , , ,
Deep learning-based Hand Gesture Recognition (HGR) via surface Electromyogram (sEMG) signals has shown significant potential for the development of advanced myoelectric-controlled prostheses. However, existing deep learning approaches often struggle to maintain acceptable generalization performance in changing scenarios. To address this challenge, this paper proposes a hybrid framework called the Transformer for Hand Gesture Recognition (TraHGR), which leverages recent advances in hybrid models and transformers. The TraHGR architecture consists of two parallel paths followed by a fusion center to integrate the advantages of each module and provide robustness across different scenarios. The proposed model was evaluated on the second Ninapro dataset (DB2), which includes sEMG signals from 40 healthy users performing 49 gestures in real-life conditions. The paper analyzes the effect of different window sizes (200ms, 150ms, and 100ms) on overall performance and model complexity. TraHGR achieves recognition accuracies of 86.18% on DB2 (49 gestures), 88.91% on DB2-B (17 gestures), 81.44% on DB2-C (23 gestures), and 93.84% on DB2-D (9 gestures). These results outperform state-of-the-art performance for each subset of gestures in the dataset. Statistical analysis was conducted to compare TraHGR with other proposed architectures, showing statistically significant improvements in accuracy for window sizes of 200ms. Position-wise cosine similarity analysis further demonstrates the effectiveness of TraHGR in capturing similarities between different embeddings. A comparison table showcases the superiority of TraHGR over previous DNN approaches, highlighting its superior performance across different window sizes and gesture subsets within the DB2 dataset. In conclusion, the proposed TraHGR architecture offers a novel approach to hand gesture recognition using sEMG signals, achieving high accuracy rates across various scenarios and outperforming existing methodologies in the field.
- - Deep learning-based Hand Gesture Recognition (HGR) via surface Electromyogram (sEMG) signals is promising for myoelectric-controlled prostheses.
- - Existing deep learning approaches struggle with generalization in changing scenarios.
- - The TraHGR hybrid framework leverages hybrid models and transformers for improved performance.
- - TraHGR architecture includes two parallel paths and a fusion center for robustness across scenarios.
- - Evaluation on Ninapro dataset (DB2) shows recognition accuracies of 86.18% on 49 gestures, outperforming state-of-the-art results.
- - Analysis of different window sizes (200ms, 150ms, 100ms) impacts model performance and complexity.
- - Statistical analysis demonstrates significant accuracy improvements with TraHGR for 200ms window size.
- - Position-wise cosine similarity analysis highlights the effectiveness of TraHGR in capturing similarities between embeddings.
- - Comparison table showcases TraHGR's superiority over previous DNN approaches in terms of performance across window sizes and gesture subsets within the DB2 dataset.
Summary1. Using signals from muscles to control artificial limbs is promising.
2. Some computer programs struggle to work well in different situations.
3. A new method called TraHGR uses special models for better results.
4. TraHGR has two paths and a center to work well in different situations.
5. TraHGR is very good at recognizing hand gestures, better than other methods.
Definitions- Deep learning: A type of technology that helps computers learn and make decisions like humans.
- Hand Gesture Recognition (HGR): Identifying hand movements or signs using technology.
- Electromyogram (sEMG) signals: Electrical signals produced by muscles that can be used to control devices.
- Myoelectric-controlled prostheses: Artificial limbs controlled by muscle signals from the body.
- Transformers: Special models used in deep learning for better performance.
- Fusion center: A place where information from different sources comes together for decision-making.
- Accuracy: How correct or precise something is compared to the truth.
- Dataset: A collection of data used for analysis or testing purposes.
- Window sizes: Different lengths of time used for analyzing data in a specific way.
Introduction
Hand gesture recognition (HGR) has been a topic of interest in the field of human-computer interaction for many years. It involves identifying and interpreting hand movements to control devices or interact with virtual environments. With the advancement of technology, HGR has become an essential component in various applications such as sign language translation, virtual reality gaming, and prosthetic control.
One approach to HGR is through surface electromyogram (sEMG) signals, which measure the electrical activity generated by muscles during movement. These signals can be captured non-invasively using electrodes placed on the skin's surface, making it a suitable method for real-time gesture recognition.
Deep learning techniques have shown promising results in sEMG-based HGR due to their ability to learn complex patterns from data. However, existing deep learning approaches often struggle to maintain acceptable generalization performance in changing scenarios. This is because they are trained on specific datasets and may not perform well when faced with new users or gestures.
To address this challenge, researchers have proposed hybrid models that combine different deep learning architectures to improve performance across different scenarios. One such model is the Transformer for Hand Gesture Recognition (TraHGR), which leverages recent advances in hybrid models and transformers.
The TraHGR Architecture
The TraHGR architecture consists of two parallel paths followed by a fusion center to integrate the advantages of each module and provide robustness across different scenarios. The first path uses convolutional neural networks (CNNs) to extract features from raw sEMG signals. The second path uses long short-term memory (LSTM) networks to capture temporal dependencies between consecutive sEMG samples.
The outputs from both paths are then fused at the fusion center before being fed into a transformer layer. Transformers have gained popularity in natural language processing tasks due to their ability to handle long sequences effectively while capturing global dependencies between input elements. In the context of HGR, the transformer layer helps to capture relationships between different gestures and improve generalization performance.
Evaluation on Ninapro Dataset
The proposed TraHGR model was evaluated on the second Ninapro dataset (DB2), which includes sEMG signals from 40 healthy users performing 49 gestures in real-life conditions. The dataset is divided into four subsets: DB2-A (all 49 gestures), DB2-B (17 gestures), DB2-C (23 gestures), and DB2-D (9 gestures).
To analyze the effect of different window sizes on overall performance and model complexity, experiments were conducted with window sizes of 200ms, 150ms, and 100ms. The results showed that a window size of 200ms achieved the highest accuracy rates across all subsets.
TraHGR achieved recognition accuracies of 86.18% on DB2-A, 88.91% on DB2-B, 81.44% on DB2-C, and 93.84% on DB2-D. These results outperformed state-of-the-art performance for each subset of gestures in the dataset.
Statistical Analysis
Statistical analysis was conducted to compare TraHGR with other proposed architectures using a t-test at a significance level of p<0.05. The results showed statistically significant improvements in accuracy for window sizes of 200ms compared to other models.
Cosine Similarity Analysis
Position-wise cosine similarity analysis further demonstrated the effectiveness of TraHGR in capturing similarities between different embeddings generated by the transformer layer. This shows that TraHGR can effectively learn relationships between different hand gestures and generalize well across various scenarios.
Comparison with Previous Approaches
A comparison table showcased the superiority of TraHGR over previous deep neural network (DNN) approaches, highlighting its superior performance across different window sizes and gesture subsets within the DB2 dataset.
Conclusion
In conclusion, the proposed TraHGR architecture offers a novel approach to hand gesture recognition using sEMG signals. By leveraging recent advances in hybrid models and transformers, TraHGR achieves high accuracy rates across various scenarios and outperforms existing methodologies in the field. This research has significant implications for the development of advanced myoelectric-controlled prostheses and other applications that require accurate hand gesture recognition. Future work could involve testing TraHGR on larger datasets and exploring its potential for real-time applications.