Nail down understanding of core concepts
and suggestions for getting started with data science
🧐 Identify the core concepts
When learning about a new area, there are certain concepts that are most important to understand.
The pareto principle almost always applies: roughly 20% of the concepts will be used in 80% of the work.
🚦Where to start in Data Science and Machine Learning
I’m often asked what concepts to learn when getting started with data science. I don’t find it easy to answer.
Data science is a big (and rapidly expanding) field. There’s a lot you could learn and the ‘new and exciting’ field is often changing.
I wouldn’t consider myself an authority on the subject, but these pages suggest some core skills:
If I were to suggest core skills for being a data scientist, I’d say something like:
comfortable with python, pandas, numpy and scikit-learn
familiar with GitHub
familiar with the command line
familiar with installing packages
classification algorithms (SVMs, random forest, logistic regression, AdaBoost): higher level principles of how they work
regression algorithms (linear/OLS regression, lasso and ridge regression, regression trees): higher level principles of how they work, and common considerations
performance measures for classification and for regression algorithms
a familiarity with neural networks and deep learning
clustering algorithms: K means and hierarchical clustering
familiar with dimensionality reduction techniques such as PCA
📖 How to learn them?
I’d suggest using online courses to build up your understanding in each of these key areas. Ones I found helpful were Brilliant.org and Khan academy for principles and maths, plus various courses on Coursera and Udemy for the more technical aspects. I wrote about this a while back.
👷♂️Build on core concepts with specialist knowledge
After building the fundamentals, the logical next step is to expand into your area of special interest.
What to learn depends on the industry of interest and type of work you’ll be doing.
For example, if working with Big Data, Apache Spark will be helpful. If working with time series, familiarity with ARIMA models will be helpful.
If you’re not learning data science, what are the core concepts and skills in the area you are learning?
My favourite things this week:
Ultralearning (a book) provides a great overview on the latest research around methods for learning quickly. It also provides a helpful practical roadmap to try out the ideas. I’m currently applying this to studying Spanish and Ken Jee made a great video about applying it to data science.
This video by Devon Crawford summarising some key computer science concepts. I found it satisfying to follow everything he spoke about, after having taken some online courses in the area recently.
(Just for fun) In case you missed it, a totally serious blog post with a transcript of the Trump-Biden debate 😉
Enjoy this email?
Please click the heart below, and forward the email to a friend!
Hi! I’m Chris Lovejoy, a Junior Doctor and Data Scientist based in London.
I’m on a mission to improve healthcare through technology (particularly AI / machine learning), but along the way I want to share learnings that are relevant no matter your career choice or background.
In this weekly newsletter, I share my top thoughts and learnings from each week, as well as links to the best things on the internet that I come across.