Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew

AI-generated keywords: DictaLM

AI-generated Key Points

DictaLM is a large generative language model designed for Modern Hebrew with 7 billion parameters
Both the foundation model and instruct-tuned model are available under a Creative Commons license
Introduction of DictaLM-Rab, a foundation model tailored for Rabbinic/Historical Hebrew
Architecture based on transformer architecture with enhancements including normalization techniques, GeLU activation functions, rotary embeddings, and separate embedding/output weights
Training details and hyperparameters provided using the NeMo framework
Models offered as starting points for fine-tuning various Hebrew-specific tasks such as instruction, Q&A, sentiment analysis
Dedication to promoting research in Hebrew NLP and fostering innovation within natural language processing

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shaltiel Shmidman, Avi Shmidman, Amir David Nissan Cohen, Moshe Koppel

arXiv: 2309.14568v1 - DOI (cs.CL)

License: CC BY 4.0

Abstract: We present DictaLM, a large-scale language model tailored for Modern Hebrew. Boasting 7B parameters, this model is predominantly trained on Hebrew-centric data. As a commitment to promoting research and development in the Hebrew language, we release both the foundation model and the instruct-tuned model under a Creative Commons license. Concurrently, we introduce DictaLM-Rab, another foundation model geared towards Rabbinic/Historical Hebrew. These foundation models serve as ideal starting points for fine-tuning various Hebrew-specific tasks, such as instruction, Q&A, sentiment analysis, and more. This release represents a preliminary step, offering an initial Hebrew LLM model for the Hebrew NLP community to experiment with.

Submitted to arXiv on 25 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.14568v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , The paper presents DictaLM, a large generative language model specifically designed for Modern Hebrew. With an impressive 7 billion parameters, this model is extensively trained on Hebrew-centric data to ensure accuracy and relevance. In support of research and development in the Hebrew language, the authors have made both the foundation model and the instruct-tuned model available under a Creative Commons license. Additionally, they introduce DictaLM-Rab, another foundation model tailored for Rabbinic/Historical Hebrew, catering to a broader range of linguistic needs. The architecture of DictaLM is based on the transformer architecture with several enhancements aimed at improving training stability and overall performance. These enhancements include normalization techniques, GeLU activation functions, rotary embeddings for extending sequence length without compromising performance, and separate embedding and output weights for better performance. The authors provide training details and hyperparameters using the NeMo framework known for its optimization in training compute-heavy machine learning models. Furthermore, the authors highlight their dedication to promoting research in Hebrew NLP by offering these models as ideal starting points for fine-tuning various Hebrew-specific tasks such as instruction, Q&A, sentiment analysis, among others. This release marks an initial step towards providing a comprehensive language model for the NLP community to experiment with. The authors' commitment to advancing research in Modern Hebrew through sophisticated language models showcases their dedication to fostering innovation and development within the field of natural language processing.

- DictaLM is a large generative language model designed for Modern Hebrew with 7 billion parameters
- Both the foundation model and instruct-tuned model are available under a Creative Commons license
- Introduction of DictaLM-Rab, a foundation model tailored for Rabbinic/Historical Hebrew
- Architecture based on transformer architecture with enhancements including normalization techniques, GeLU activation functions, rotary embeddings, and separate embedding/output weights
- Training details and hyperparameters provided using the NeMo framework
- Models offered as starting points for fine-tuning various Hebrew-specific tasks such as instruction, Q&A, sentiment analysis
- Dedication to promoting research in Hebrew NLP and fostering innovation within natural language processing

Summary- DictaLM is a big computer program that helps with Hebrew language. - There are two types of models available for people to use for free. - A special model called DictaLM-Rab was made for historical Hebrew. - The program uses special techniques to work better and faster. - People can use these models to help with different tasks in Hebrew language. Definitions- Generative: Creating something new or original - Parameters: Settings or values that control how something works - Foundation model: Basic version or starting point of a program - Architecture: Design or structure of a system - Hyperparameters: Settings that control the training process of a model

Introducing DictaLM: A Large Generative Language Model for Modern Hebrew

Natural language processing (NLP) has seen significant advancements in recent years, with the development of large-scale language models such as GPT-3 and BERT. These models have revolutionized the field by achieving impressive results on various NLP tasks, including text generation, translation, and sentiment analysis. However, most of these models are trained on English-centric data, leaving other languages at a disadvantage. In an effort to bridge this gap and promote research in non-English languages, a team of researchers from Bar-Ilan University in Israel has developed DictaLM – a large generative language model specifically designed for Modern Hebrew. In their research paper titled "DictaLM: A Large-Scale Language Model for Modern Hebrew," published at the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), the authors introduce this model and its architecture while highlighting its potential applications.

The Need for a Hebrew-Centric Language Model

Hebrew is one of the oldest languages still in use today and is considered sacred by millions of people worldwide. It is also the official language of Israel and holds great cultural significance within Jewish communities globally. Despite this, there has been limited research done on developing NLP tools for Hebrew due to its complex morphology and lack of available data compared to more widely spoken languages like English. This lack of resources makes it challenging to develop accurate NLP systems that can handle tasks specific to Hebrew such as instruction-tuning or sentiment analysis. To address this issue, the authors set out to create a large-scale language model tailored explicitly towards Modern Hebrew – DictaLM.

The Architecture of DictaLM

DictaLM is based on the transformer architecture – a neural network architecture known for its success in natural language processing tasks. The authors have made several enhancements to this architecture to improve training stability and overall performance. One of the key improvements is the use of normalization techniques, which help prevent vanishing gradients during training. The model also utilizes GeLU activation functions, which have been shown to outperform traditional ReLU functions in language modeling tasks. To extend sequence length without compromising performance, DictaLM uses rotary embeddings – a technique that rotates the embedding matrix at each layer. This allows for longer sequences to be processed without increasing computational costs significantly. Another notable enhancement is the use of separate embedding and output weights. This approach has been found to improve performance by reducing interference between input and output representations.

Training Details and Hyperparameters

The authors used the NeMo framework for training DictaLM due to its optimization for compute-heavy machine learning models. The model was trained on a dataset consisting of 1 billion tokens from various sources such as news articles, social media posts, and Wikipedia pages. It was then fine-tuned on an additional 6 billion tokens from Hebrew-centric data sources. The hyperparameters used in training were carefully selected through experimentation and tuning. These include batch size, learning rate, dropout rate, among others. The authors provide detailed information on these parameters in their paper, making it easier for other researchers to replicate their results or fine-tune the model further for specific tasks.

Promoting Research in Modern Hebrew NLP

One of the most significant contributions of this research paper is the release of both DictaLM and DictaLM-Rab under a Creative Commons license. This means that anyone can access these models and use them for research purposes without any restrictions or fees. DictaLM-Rab is another foundation model tailored specifically towards Rabbinic/Historical Hebrew – catering to a broader range of linguistic needs within Jewish communities worldwide. With these releases, the authors aim to promote research in Hebrew NLP and provide a starting point for fine-tuning various tasks such as instruction, Q&A, sentiment analysis, among others.

Conclusion

In conclusion, the development of DictaLM is a significant step towards promoting research in Modern Hebrew NLP. The large-scale language model with 7 billion parameters and its enhancements make it a valuable resource for researchers looking to develop accurate NLP systems for Hebrew. The authors' commitment to fostering innovation and development within the field of natural language processing through these releases showcases their dedication to advancing research in Modern Hebrew. With this foundation model now available, we can expect to see more advancements in Hebrew NLP and further bridge the gap between English-centric models and other languages.

Created on 27 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.