José Cañete
MSc. in Computer Science based in Chile 🇨🇱
👨🎓 Education
MSc. in Computer Science - Thesis
Universidad de Chile - Santiago, Chile (2021 - 2023)
Grade: 7/7 - Class Rank: 1/9
BSc. in Computer Science and Engineering
Universidad de Chile - Santiago, Chile (2014 - 2021)
Grade: 7/7 - Class Rank: 1/94
🧑💻 Work Experience
Expert Machine Learning Engineer @ Walmart Chile (Sept 2023 - now)
AI and Data Exploitation team on Walmart Data.
- MLE for a team of 6 Data Scientist and a Product Owner working on models for the optimization of the Supply Chain.
- In charge of developing clean, efficient and scalable machine learning pipelines to be served in production environments.
- Developing systems for the CI/CD/CT of the solutions.
- Reviewing and providing feedback of the code of the entire team to improve and assure good software practices across the team.
- Technologies used: Python, Kubeflow Pipelines, Vertex AI.
Senior Machine Learning Engineer @ Walmart Chile (Mar 2023 - Sept 2023)
AI and Data Exploitation team on Walmart Data.
- MLE for a team of 6 Data Scientist and a Product Owner working on models for the optimization of the Supply Chain.
- In charge of developing clean, efficient and scalable machine learning pipelines to be served in production environments.
- Developing systems for the CI/CD/CT of the solutions.
- Reviewing and providing feedback of the code of the entire team to improve and assure good software practices across the team.
- Technologies used: Python, Kubeflow Pipelines, Vertex AI.
Machine Learning Engineer @ National Center for Artificial Intelligence (CENIA) (Mar 2022 - Jan 2023)
National research and development center of AI of Chile.
- Conducted research in the AI Medical Imaging group, exploring various pre-training methods using images or multimodal (language-image) techniques, including CLIP, MAE, and Multimodal MAEs. Contributed to a paper published at the MedNeurIPS 2022 workshop.
- Participated in the NLP research group and presented papers on Multimodal models (CLIP and Multimodal MAE).
- Technologies used: Python, PyTorch.
Lead Data Scientist @ Adereso (Jan 2021 - Nov 2021)
Helpdesk software for messaging and social customer service.
- Served as the technical lead for the Adereso Minds team, supervising a Data Scientist and Full Stack Engineer.
- Led the development of AI and data-related projects from conception to production using agile methodologies and cloud technologies.
- Technologies used: Python, PyTorch, Pandas, Tornado, BigQuery, AWS, GCP.
Data Scientist @ Adereso (Jan 2019 - Dec 2020)
Helpdesk software for messaging and social customer service.
- Developed, tested, and deployed various NLP-based Machine Learning models into production.
- Trained models such as BERT and its variants (ALBERT, S-BERT, among others) to understand Spanish informal text messages from social networks.
- Designed and implemented a robust NLP microservice, featuring sentiment analysis, named-entity recognition, and intent classification, handling over 300,000 messages daily.
- Technologies used: Python, PyTorch, Pandas, Tornado, BigQuery, AWS, GCP.
📝 Publications
- 2024
- Speedy Gonzales: A Collection of Fast Task-Specific Models for Spanish in *SEM 2024 @ NAACL, Mexico City, Mexico.
- José Cañete, Felipe Bravo-Marquez. Paper, Repo.
- 2023
- Light and Fast Language Models for Spanish Through Compression Techniques - Master’s Thesis - Universidad de Chile
- José Cañete. Thesis, Repo.
- 2022
- Two-stage Conditional Chest X-ray Radiology Report Generation in MedNeurIPS 2022, New Orleans, USA.
- Pablo Messina, José Cañete, Denis Parra, Alvaro Soto, Cecilia Besa, Jocelyn Dunstan. Paper, Repo.
- ALBETO and DistilBETO: Lightweight Spanish Language Models in LREC 2022, Marseille, France.
- José Cañete, Sebastián Donoso, Felipe Bravo-Marquez, Andrés Carvallo, Vladimir Araujo. Paper, Repo.
- Evaluation Benchmarks for Spanish Sentence Representations in LREC 2022, Marseille, France.
- Vladimir Araujo, Andrés Carvallo, Souvik Kundu, José Cañete, Marcelo Mendoza, Robert E. Mercer, Felipe Bravo-Marquez, Marie-Francine Moens, Alvaro Soto. Paper, Repo.
- 2020
- Spanish Pre-trained BERT Model and Evaluation Data in PML4DC @ ICLR 2020, Addis Ababa, Ethiopia.
- José Cañete, Gabriel Chaperon, Rodrigo Fuentes, Jou-Hui Ho, Hojin Kang and Jorge Perez. Paper, Repo.
🏆 Awards
Outstanding Student in Research
DCC UChile (2019 - 2020)
1st Place
Winter Competitive Programming Camp of Chile (2017)
3rd Place
Chilean Programming Contest (2017)
3rd Place
ACM ICPC Chile (2017)
9th Place and First to Solve
ACM ICPC Chile (2016)
👨🏫 Conferences and Summer Schools
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
Mexico City, Mexico (2024)
Language Resources and Evaluation Conference (LREC)
Marseille, France (2022)
IV Workshop Millenium Institute for Foundational Research on Data (IMFD)
Santa Cruz, Chile (2022)
International Conference on Learning Representations (ICLR)
Virtual (2020)
Khipu AI - Latin American Meeting in Artificial Intelligence
Montevideo, Uruguay (2019)
II Workshop Millenium Institute for Foundational Research on Data (IMFD)
Viña del Mar, Chile (2019)
Brazilian ICPC Summer School
Campinas, São Paulo, Brazil (2017)
Winter Competitive Programming Camp Chile (ICPCCL)
Chile (2015 and 2016)
Outreach Assistant @ Universidad de Chile (Dec 2015 - Dec 2019)
Engineering faculty communications department.
- Gave talks and tours about engineering and sciences at the faculty to high school students.
- Organized the ”regions plan” of 2018, in which we visited schools all over Chile giving talks.
- Co-organized the faculty’s Engineering and Science Fair 2018,the largest outreach event of Universidad de Chile, which had more than 16.000 attendees.
⌨️ Programming
DevOps: Docker, GCP, AWS, Kubeflow Pipelines
Data: SQL, Apache Beam, Apache Spark, ElasticSearch, Pandas
Machine Learning: PyTorch, HuggingFace
Languages: Python
💬 Languages
Spanish: Native
English: B2