I reviewed the recently released
Hugging Face course. I look at the course content, its offerings, and whether or not it ticks the right boxes for us.
In this article, I present a quick tour of some of the libraries that I recently encountered and which could be a great supplement to your machine learning stack. These are not your basic EDA libraries but advanced libraries which compile trained traditional machine learning models into tensor computations, a topic modeling technique that leverages BERT embeddings and libraries enabling interpretability for Pytorch models.
Handling categorical variables forms an essential component of a machine learning pipeline. There are many ways to encode categorical variables, and pandas’ dummy variable encoding is one. However, this encoding technique comes with its own limitation, and in this article, I present some workarounds to save ourselves from the trap.
Have you ever found yourself in a situation where it became difficult to decipher your codebase? Do you often end up with multiple files like untitled1.py
or untitled2.ipynb
? The situation is even grimmer in data science. Often, we limit our focus on the analysis and the end product while ignoring the quality of the code that is responsible for the analysis. In this article, I share my three favorite tools to help organize and structure your projects in a reusable and reproducible format.
Writing in Data Science can have a transformative effect not only in your journey but also in your career. I appeared on the
FastBook Reading Sessions organised by
Weights & Biases to discuss the same. I wrote this piece to summarize what I covered there. Primarily it discusses why writing matters in data science and how it can be used as a tool to leverage your portfolio.