Managing research papers, ImageNet replacement, interview with book authors, and more…
Glad to bring you the sixth edition of ‘Breaking the Jargons.’ There are tutorials, paper summaries, and interviews with book authors. As always, I have shared my favorite data science resource and a tool of the month.
ImageNet is one of the most widely used datasets in Computer Vision applications. However, studies have shown biases prevalent in this dataset based on the collection methodology and the types of images present. In this respect, a team of researchers at the Visual Geometry Group, the University of Oxford, have proposed a new dataset called PASS for self-supervised (SSL) model pretaining to address privacy and fairness issues specifically. This article is a summary of the paper published by the team.
It is often said that one of the best ways to keep up to date with the latest happenings in the field of machine learning is by reading research papers. However, this is easier said than done. While many find them intimidating, others find it impossible to keep up with the daily dose of published papers. Hence, I have compiled a few tools that I use to organize, manage and read my favorite research papers and also get notified of the latest ones.
In an endeavor to bring some of the notable work in the field of machine learning to the forefront, I started an interview series last year. During the first season, I presented stories from established data scientists and Kaggle Grandmasters, who shared their journey, inspirations, and accomplishments. For the second season, I’m interviewing book authors. This edition of the interviews will bring to light the story of some of the well-known authors in the data science field.
To kick off the series, I interviewed Alexey Grigorev, author of the book- Machine Learning Bookcamp, a principal data scientist at OLX, and founder of DataTalks.Club — a community for data enthusiasts.
The Github Octo project is a way to auto-generate a bird’s-eye-view of codebases and understand how our code is structured. The figure below is a visualization of the H2O-3 repository. You can click on Try it out for yourself!
🎁 Resource of the Month
Introduction to Probability for Data Science is an undergraduate textbook on probability for data science by Stanley H. Chan who has made the pdf version free. The book provides broad coverage from classical probability theory to modern data analytic techniques and provides code in Matlab, Python, Julia as well as R.
That is all for this edition. See you with another roundup next month. You can subscribe to receive the newsletter directly in your mailbox every month or share it with someone who could find them helpful.