Fit Predict #3: At this point, what's not written in Rust?
In this issue of the Fit Predict newsletter, we look at a Python linter written in Rust, Copliot extensions for VSCode, deep learning search ranking, and more.
This is a light-hearted overview of what's been going on in the world of Data Science this week. See it as your 5-minute update such that you can sound at least slightly knowledgeable at your next coffee chat β
Have you been forwarded this? You can subscribe here!
Hey there,
My past week or so has primarily been spent making improvements to a new customer behaviour prediction model. Whilst it's been fun, it has become a classic case of scope creep. What started out as a simple bug fix, ended up being a two-week-long overhaul.
Whoops. It happened again.
So, my advice to you. If you ever find yourself thinking "oh let me just add this quickly", put the keyboard down, go for a walk, and really think long and hard about your life choices.
I'm happy to say that the changes I made have improved model performance, but at what cost I ask you... what cost?!
Anyway, on with the show!
π§° Tools
The tools that will make your life that little bit easier, or at least more interesting... but either way it's fun to play with new toys.
This addition to Copilot's VSCode extension helps you to modify your code in a similar way to working in a tool like Photoshop.
An extremely fast Python linter, written in... you guessed it, Rust! Like everything is these days.Β
π§βπ¬ In practice
Stories of those who are genuinely implementing Data Science. Step aside Titanic dataset, this is the real deal
Etsy moved from a boosting model for their search rankings over to a deep learning model. This blog post is an insightful and honest look at why they made the switch, how they did it, and what the results were.Β
LinkedIn open-sourced the feature store that they use to develop their machine learning models. In this post, they explain why they built it and how it works, which is a great reminder of how speeding up the basics of machine learning development can pay dividends as you scale.
π¦ The best of Data Twitter
Data Twitter is the best Twitter.
The model always exists.
It existed before you were born and it exists after your death.
You can only find the model.
"Training" is just your way of looking for the model's location in the infinite hypothesis space and binding its essence to siliconEvery single day.
from @ChristophMolnar
from @marktenenholtz
π Thought-provoking
Content to inspire, or at the very least keep you informed.
I've personally never looked at a SQL file and thought it told a compelling story, but after reading this I think I might start to.

What is a realistic goal or target? This is a difficult question to answer as a data professional, but this post offers a solution.

π§ Updates
Did you know that your favourite Python packages actually get updated regularly and you should update your requirements.txt
file?
A few other minor releases to be aware of: