Fit Predict #3: At this point, what's not written in Rust?

In this issue of the Fit Predict newsletter, we look at a Python linter written in Rust, Copliot extensions for VSCode, deep learning search ranking, and more.

Stephen Allwright

17 Jan 2023

📥

What is Fit Predict?
This is a light-hearted overview of what's been going on in the world of Data Science this week. See it as your 5-minute update such that you can sound at least slightly knowledgeable at your next coffee chat ☕

Have you been forwarded this? You can subscribe here!

Hey there,

My past week or so has primarily been spent making improvements to a new customer behaviour prediction model. Whilst it's been fun, it has become a classic case of scope creep. What started out as a simple bug fix, ended up being a two-week-long overhaul.

Whoops. It happened again.

So, my advice to you. If you ever find yourself thinking "oh let me just add this quickly", put the keyboard down, go for a walk, and really think long and hard about your life choices.

I'm happy to say that the changes I made have improved model performance, but at what cost I ask you... what cost?!

Anyway, on with the show!

🧰 Tools

The tools that will make your life that little bit easier, or at least more interesting... but either way it's fun to play with new toys.

Code Brushes

This addition to Copilot's VSCode extension helps you to modify your code in a similar way to working in a tool like Photoshop.

Ruff linter

An extremely fast Python linter, written in... you guessed it, Rust! Like everything is these days.

Kangas

This Python package will help you explore multimedia datasets.

🧑‍🔬 In practice

Stories of those who are genuinely implementing Data Science. Step aside Titanic dataset, this is the real deal

Deep learning search ranking at Etsy

Etsy moved from a boosting model for their search rankings over to a deep learning model. This blog post is an insightful and honest look at why they made the switch, how they did it, and what the results were.

LinkedIn's feature store

LinkedIn open-sourced the feature store that they use to develop their machine learning models. In this post, they explain why they built it and how it works, which is a great reminder of how speeding up the basics of machine learning development can pay dividends as you scale.

🐦 The best of Data Twitter

Data Twitter is the best Twitter.

You can't "train" a model.

The model always exists.

It existed before you were born and it exists after your death.

You can only find the model.

"Training" is just your way of looking for the model's location in the infinite hypothesis space and binding its essence to siliconEvery single day.

from @ChristophMolnar

Hello, I've been using Matplotlib for 7+ years and I still google how to do everything except plt.plot()

from @marktenenholtz

worriedly waiting until they're going to rewrite me in rust

from @vboykis

Every single day.

from @rabaath

💭 Thought-provoking

Content to inspire, or at the very least keep you informed.

I've personally never looked at a SQL file and thought it told a compelling story, but after reading this I think I might start to.

What is a realistic goal or target? This is a difficult question to answer as a data professional, but this post offers a solution.

🔧 Updates

Did you know that your favourite Python packages actually get updated regularly and you should update your requirements.txt file?

Streamlit released version 1.17

A few other minor releases to be aware of:

Polars released a handful of updates
XGBoost released 1.7.3
Poetry released 1.3.2

🔗 stephenallwright.com/newsletter-issue-3
🐦 Share on Twitter
✉️ Forward via email

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.