This is a light-hearted overview of what's been going on in the world of Data Science this week. See it as your 5-minute update such that you can sound at least slightly knowledgeable at your next coffee chat ☕
Have you been forwarded this? You can subscribe here!
I'm back in Norway after spending the Christmas break in England, where I was trying ever so hard to avoid the urge to work on a little (OK, maybe not so little) side project. When the idea popped into my mind, I believe I even uttered the famous phrase:
"that would be a fun side project and shouldn't take too long either"
It turns out, that wasn't true.
Even after all these years, I still clearly have not learnt that projects always take longer than you expect!
But anyway, enough of me rambling about not being able to set good boundaries, let's get to the main reason we are here...
The tools that will make your life that little bit easier, or at least more interesting... but either way it's fun to play with new toys.
This neat little extension allows you to interact with your Snowflake instance in VSCode, with many of the same features you would use in their GUI.
Polars is a package designed to be a fast DataFrame library for Python, written in Rust. It's been the talk of the town lately in my company and could be the answer for those of us who struggle with large and slow operations in Pandas.
Some more Rust for you! PRQL is a SQL replacement written in Rust which is supposed to be simpler to use and easier to analyse data with.
🧑🔬 In practice
Stories of those who are genuinely implementing Data Science. Step aside Titanic dataset, this is the real deal
In this interview on the TWIML AI Podcast, Tony Jebara (Head of Machine Learning) from Spotify explains how they use reinforcement learning to improve their personalized recommendations.
This blog post from Neal Lathia (Staff Machine Learning Engineer) is a good reminder of the choices and tradeoffs we must make when machine learning becomes an established part of the business.
🐦 The best of Data Twitter
Data Twitter is the best Twitter.
It's an oldie but a goldie. The recent popularity of ChatGPT means it's time to bring out this evergreen tweet.
2028: “You want to write a prompt? First you need to hire 10-15 promptOps Engineers to build out your PromptFlow pipelines which sends promptjobs to your PromptLake from the PromptQueue using the EventPrompt stream”
Could this be the next big breakthrough in machine learning? Pigeon learning 🐦
Simple machine learning tutorials are good, but it's helpful to be reminded of how real-world data science looks.
• Data comes in
• Model makes predictions
What they actually are:
• Raw data comes in
• Goes through multiple transformation layers
• Quality checks, anomaly detection, etc
• Feature engineering
Who doesn't enjoy a good data visualisation 📊
My goal this year was to get a little lighter and fitter. This #dataviz illustrates how I went above and beyond that goal, and got addicted to cardio in the process! 🕺🏻
Who's ready for a fit 2023? 🙃
Content to inspire, or at the very least keep you informed.
A recent TalkPython podcast covered the topic of using the command line for data science 🤯. There's a lot here that I didn't know was possible before, so I guarantee you will come away with at least one new trick after watching this.
How could homework work at educational institutions in the age of ChatGPT and other AI language models? Well, here is one suggestion:
Did you know that your favourite Python packages actually get updated regularly and you should update your
New theme for you! The Streamlit theme for Altair, Plotly, and Vega-Lite charts is here 📈
A few other minor updates you should be aware of: