Fit Predict #5: Pubs are the conference centres of the future

In this issue of the Fit Predict newsletter, we look at Approximate Nearest Neighbors, focussed work, how Netflix chooses personalised artwork, and more.

Hey there,

Have you ever had a work conference in a pub? 🍺


Well, I'm happy to report that I now have.

Last week was the aforementioned company "kick-off", and it was held in a stand-up comedy pub here in Oslo. There was something quite cosy about sitting on a stool at a bar counter, whilst listening to a talk about product strategy for our app.

Scandinavian society at work, I guess. πŸ‡§πŸ‡»

I have a feeling this will be one of those stories I tell people in five years when the company is five times the size. I'll sit around, telling all the new joiners about the "good old days", when we had our meetings in a pub surrounded by posters from the 50s advertising "beer for children".

Ah yes, nostalgia.

🧰 Tools

The tools that will make your life that little bit easier, or at least more interesting... but either way it's fun to play with new toys.


Annoy is an open-sourced project from Spotify for running Approximate Nearest Neighbors. It's written in C++/Python and optimized for memory usage.

Gradio is a tool to demo your machine learning model with a friendly web interface.
One thing

This is not a data-related tool, but something which could help you in doing your data work. It's a simple app that adds one visible task to your menu bar, so you can remain focused on doing one thing at a time.

πŸ§‘β€πŸ”¬ In practice

Stories of those who are genuinely implementing Data Science. Step aside Titanic dataset, this is the real deal

Discovering Creative Insights in Promotional Artwork

The artwork and trailers that you see on Netflix are personalised for you, which is already pretty impressive, but here they go further and explain how the data informs the creation of those creative assets.
Improving Support for Deep Learning in Etsy's ML Platform

In this article, Etsy explains how they improved their support for deep learning models. It's a good reminder of the considerations we need to make when increasing the complexity of our model architectures.

🐦 The best of Data Twitter

Data Twitter is the best Twitter.

absolutely devastated when a friend told me "I bet you pay for linkedin premium"

from @Jiminy_Kirket
Even XGBoost is of no use when you have no data.

from @tunguz

Junior DS: Got any fatherly advice for me?

Senior DS: You will inevitably get in quarrels with stakeholders

You can get out of most of them by shouting "drift" and backpedalling away

from @untitled01ipynb

πŸ’­ Thought-provoking

Content to inspire, or at the very least keep you informed.

This blog post argues that data scientists should work in teams, similar to software engineers, rather than going solo in order to improve their skills.

Data scientists work alone and that’s bad | Ethan Rosenthal
In Need of a Good Editr Growing up, I had always considered myself a decent writer based on my decent grades in English class. My sophomore year English teacher made it very clear that I did not, in fact, know how to properly write. All of my essays were returned riddled with red-inked edits culmina…

One big question that has arisen since ChatGPT exploded onto the scene, is how to know whether the text has been generated by a human or not. This paper outlines a potential solution.

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
The fluency and factual knowledge of large language models (LLMs) heightensthe need for corresponding systems to detect whether a piece of text ismachine-written. For example, students may use LLMs to complete writtenassignments, leaving instructors unable to accurately assess student learning.I…

πŸ”§ Updates

Did you know that your favourite Python packages actually get updated regularly and you should update your requirements.txt file?

