Should I use YAML config files for my data science project?
Storing your variables in a single config file improves your quality of life by reducing the time to make changes and also improves your code quality in two key ways: it makes your code more readable for others, and it reduces the chance of bugs occuring in your code due to mistyped variable names.
How do I create a config file for my data science project in Python?
One of the easiest, and most common, ways to create a config file is to create a YAML file which stores the values and then read this into the Python file. Here I will demonstrate a simple of example of how this can be done.
First we will create our YAML file,
config.yaml, to hold the variables:
project_name: "My Project" variable_list: [1,2,3] boolean: True
Then in order to use these variables in our Python file,
run.py, we will do the following:
import yaml config_file = yaml.safe_load(open("config.yaml", "rb")) project_name=config_file.get("project_name")
Creating more complex Python config files
This method can be of course expanded to use more complex YAML data structures and multiple Python files, further improving your code quality and speed of development.