Fit Predict #4: Why do tech companies kick off everything?

In this issue of the Fit Predict newsletter, we look at outlier detection, machine learning feature stores, why you should use Polars, and more.

Stephen Allwright
Stephen Allwright
What is Fit Predict?
Thisis a light-hearted overview of what's been going on in the world of Data Science this week. See it as your 5-minute update such that you cansound at least slightly knowledgeable at your next coffee chat ☕

Have you been forwarded this? You can subscribe here!

Hey there,

I'm not sure what it's like in tech companies outside of Norway, but here we have a tendency to have "kick-offs" before seemingly every change in season. ☀️️🍂❄️🌱

These kick-offs involve gathering everyone physically in the same venue for a few days to align™️ on company strategy, have socials, eat cold pizza, and afterwards be released to work on our projects for the next few months.

This got me thinking... 💡

Why do we insist on "kicking things off"?

I don't know about you, but I don't tend to kick my software. Why not "switch on" or "compile"? 🤔

I will leave that thought with you as I segway into the news portion of this letter...

🧰 Tools

The tools that will make your life that little bit easier, or at least more interesting... but either way it's fun to play with new toys.

Alibi detect

Dealing with outliers is one of the key steps when developing a machine learning model. This tool helps you with that by providing algorithms for outlier, adversarial and drift detection.

Having a feature store is a great time-saver when scaling machine learning teams. This is where Feathr could come in handy. It is in-fact the feature store used at LinkedIn, which has recently been open-sourced for all to use.

ipyflow is a Python kernel for Jupyter, and other notebook interfaces, that tracks dataflow relationships between symbols and cells during a given interactive session.

🧑‍🔬 In practice

Stories of those who are genuinely implementing Data Science. Step aside Titanic dataset, this is the real deal

Match Cutting at Netflix

Seeing machine learning mix with the creative arts is fascinating, and there are not many places better at doing this than Netflix. In this post, they outline how they use machine learning to find smooth visual transitions in their content.

🐦 The best of Data Twitter

Data Twitter is the best Twitter.

ML is dead. Long live AI.

from @chrisalbon
It's *actual data* time...

TensorFlow & Keras usage is at an all-time high, at >2.5M users. It has increased ~30% yoy.

In the largest developer survey in the world last year (60k respondents), 13% of *all devs* said they used TensorFlow. That's 1.5x more devs than PyTorch.

from @fchollet

So I said,

So I said, what if we train the model the test set?

from @untitled01ipynb
An interesting chart from @OurWorldInData showing that notable AI creations have increasingly been done by companies over the last 20 years.

from @misraturp

💭 Thought-provoking

Content to inspire, or at the very least keep you informed.

I've been talking about Polars a lot recently, so I'll let someone else do it for me instead this time:

Episode #140: Speeding Up Your DataFrames With Polars – The Real Python Podcast
How can you get more performance from your existing data science infrastructure? What if a DataFrame library could take advantage of your machine’s available cores and provide built-in methods for handling larger-than-RAM datasets? This week on the show, Liam Brannigan is here to discuss Polars.

Could the market beat the best machine learning models when it comes to predicting stock prices? Apparently so:

The Options Market Beat 94% of Participants in the M6 Financial Forecasting Contest
As of today I have permission to make public my mischievous entry in the year-long world-wide stock and ETF forecasting contest … the…

This is an interesting take on looking at how an algorithm solves a problem in order to learn from it:

Reverse Engineering a Neural Network’s Clever Solution to Binary Addition
While training small neural networks to perform binary addition, a surprising solution emerged that allows the network to solve the problem very effectively. This post explores the mechanism behind that solution and how it relates to analog electronics.

🔧 Updates

Did you know that your favourite Python packages actually get updated regularly and you should update your requirements.txt file?

💬 Enjoyed this issue? Share it

🐦 Share on Twitter
✉️ Forward via email

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.