It’s said, that the most effective way to learn on kaggle is to participate in a competition and afterwards wrangling the top solutions. At the current state (as of January 2022) of my ML skills this approach doesn’t work well for me.

  1. Top notebook solutions are often complicated, contain implementations that are difficult to grasp or are made out of tons of ensembled models. Solutions in the discussion section contain broad overviews that are inspiring, yet hard to rebuild.
  2. The solutions are often different from my own solution. It is difficult to compare them with my own code and to derive how to improve my own code specifically.

The second best way to learn on kaggle is to follow the discussions and read the notebooks during a competition. This approach suites me better, since there are often small insights to be discovered that are shared by others and that I can easily integrated in my solution. Filtering the valuable information can be hard, because the notebook and discussion sections are often flooded with similar content. You can easily get distracted by too many different techniques and ideas.

I often enjoy the Playground competitions to focus on a few skills to improve. They are also less challenging and it is easier to get a baseline implementation up running because the data is already prepared to quickly get started.

In January 2022 there is a community competition hosted by Abhishek Thakur, that provides the opportunity for yet another learning approach. During the competition there are sessions being recoreded on YouTube, with two Grandmasters covering the topics EDA and Imputation. So we get high quality guidance for two important topics while working on the competition. I took the opportunity to learn along.

I wrote down my experience with this learning approach and shared it with my solution in the following notebook.