Chengyi (Jeff) Chen’s Data Science Blog ΨΦ: Pursuit to discover Unknown Unknowns

_images/header.jpg

Fig. 1 Hi, I’m Jeff and \(\Psi \Phi\) (pronounced “sci-fi”) is my technical blog dedicated to connecting seemingly disparate data science concepts.


Introduction

United States Secretary of Defense Donald Rumsfeld once stated::

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones.

Learning about the categorization of information by United States Secretary of Defense Donald Rumsfeld was an important point in my life. Although this quote is more often used in the context of risk management, I see it more as a guide on how to traverse to the right end of the Dunning-Kruger effect curve.

https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Dunning%E2%80%93Kruger_Effect_01.svg/1024px-Dunning%E2%80%93Kruger_Effect_01.svg.png

Fig. 2 Dunning-Kruger effect curve

Obviously, in order to succeed in life (“succeed” could be in the context of anything – e.g. healthy relationships, career, financial, academics, …) without relying on purely luck, one needs to have the bare minimum drive and resolve to want to work hard and actually execute it. But what does it mean to work hard? What does it mean to really understand the material that you’re learning? Clearly, the first step would be to start learning about what you know you don’t know. However, if you’re only doing this, you might get the false sense of confidence that brings you up to the Peak of “Mount Stupid” as shown in the figure above. To get to the “Plateau of Sustainability”, one has to trudge forward to find out what one doesn’t know one doesn’t know, i.e. the unknown unknowns. I’ll be using this space to demonstrate times when I’ve tried to push past the area of known unknowns through asking questions that drive me to the unknown unknowns.


Machine Learning Notes

The first section of my blog just contains some notes for my own reference on machine learning. Overall, my interests lie in the realm of probabilistic machine learning because it provides a framework for both learning about unbobserved variables as well as a good measure of uncertainty for predictions. “Pattern Recognition and Machine Learning” by Christopher M. Bishop is the best textbook resource I could find for learning about probabilistic machine learning.


Personal Projects

This section features some of the projects that I do in my free time to better understand some machine learning concepts as well as to better understand the pyro probabilistic programming language.