Light and Fast Language Models for Spanish Through Compression Techniques

Abstract

Large language models (LLMs) have become a prevalent and successful approach to address natural language processing (NLP) tasks, including but not limited to document classifica- tion, named-entity recognition, and question answering. Despite their remarkable perfor- mance, utilizing these LLMs on constrained resources, such as web or mobile applications, is challenging, particularly in real-time scenarios that demand fast responses. Techniques to compress these LLM into smaller and fastest models have emerged for English or Multilingual settings, but it is still a challenge for other languages. In fact, Spanish is the second language with most native speakers but lacks of these kind of resources. In this work, we present ALBETO and Speedy Gonzales, two new resources for the Spanish NLP community that aim to bridge the gap in terms of lighter and faster models for Spanish. ALBETO is a set of 5 lightweight models, with sizes ranging from 5M to 223M parameters, that are pre-trained exclusively on Spanish corpora following the ALBERT architecture. We evaluate our ALBETO models along with other publicly available models for Spanish on a set of 6 tasks and then, by leveraging on Knowledge Distillation (KD), we present Speedy Gonzales, a collection of more inference-efficient task-specific language models based on ALBETO. The outcomes of our study reveal that our ALBETO models perform at a similar level to other models with comparable inference speed, despite being lighter in weight and having substantially fewer parameters. Moreover, our ALBETO xxlarge model outperforms all other pre-trained Spanish models that are currently available. Regarding our Speedy Gonzales models, the results indicate an enhancement in inference speed at the expense of a slight decline in task performance. Notably, this decay is minimal in the case of our 8 and 10 layer models, while it is more pronounced in the faster models with 2-4 layers. Moreover, our 10-layer model, referred to as ALBETO base-10, delivers performance that is generally comparable to base-sized models, while also demonstrating improved inference speed. All of our models (pre-trained, fine-tuned and distilled) are publicly available on: https://huggingface.co/dccuchile.

Publication
Master’s Thesis - Universidad de Chile
José Cañete
José Cañete
Expert Machine Learning Engineer | MSc. in Computer Science

My research interests include Artificial Intelligence and how to handle and optimize these systems for production environments.