Python utilizes a lot of features that are not available in other languages which makes it possible to write more elegant and powerful code. However it also means we have many ways of doing things and this can be confusing at times. In this article we’ll take a look at python file organization best practices, how to organize your scripts, modules/packages and classes.
If you are a beginner programmer, or someone with years of coding experience, this article is for you. We’ll explore the best practices for Python file and directory organization. You will learn why file and directory structures are important and how to organize your files so your code is easy to read.
Python File Organization Best Practices. This guide was written to help you to make your applications more robust, more maintainable, and more likely to be used by others. We include recommendations for file organization, Python package organization (i.e., how to structure your Python application), and distribution. All of these recommendations assume that you have chosen a good project name.
Python has become one of the most popular general purpose programming languages. As a developer, you are likely to have worked on multiple code bases that deal with different domains in the Python language. The design and structure of these code bases might vary but basic elements will remain the same across all projects, libraries, and applications. This module is intended to explain some best practices for Python project architecture and also cover some tips about how to create a folder structure within your source tree.
Structure of the Repository
It’s Important.
Just as Code Style, API Design, and Automation are essential for a healthy development cycle. Repository structure is a crucial part of your project’s architecture.
When a potential user or contributor lands on your repository’s page, they see a few things:
- Project Name
- Project Description
- Bunch O’ Files
Only when they scroll below the fold will the user see your project’s README.
If your repo is a massive dump of files or a nested mess of directories, they might look elsewhere before even reading your beautiful documentation.Dress for the job you want, not the job you have.
Of course, first impressions aren’t everything. You and your colleagues will spend countless hours working with this repository, eventually becoming intimately familiar with every nook and cranny. The layout is important.
Project Structuring
In this part, we will basically talk about some good practices on how the complete python project can be structured. For this, we will look at two different possibilities, which anyone can choose based on how simple or complex their project is going to be.
Type 1: The Classic
- This is the most basic format and yet gives the hint of organized structuring. This can be followed when our project consists of only a few modules/scripts. The directory of a sample project could look something like this:
my_project # Root directory of the project
├── code # Source codes
├── input # Input files
├── output # Output files
├── config # Configuration files
├── notebooks # Project related Jupyter notebooks (for experimental code)
├── requirements.txt # List of external package which are project dependency
└── README.md # Project README
- As obvious from the names, folder
code
contains the individual modules (.py
files),input
andoutput
contains the input and output files respectively, andnotebook
contains the.ipynb
notebooks files we use for experimentation. Finally,config
folder could contain parameters withinyaml
orjson
orini
files and can be accessed by the code module files using [configparser](configparser — Configuration file parser — Python 3.7.11 documentation). requirements.txt
contains a list of all external python packages needed by the project. One advantage of maintaining this file is that all of these packages can be easily installed usingpip install -r requirements.txt
command. (No need of manually installing each and every external package!). One examplerequirements.txt
file is shown below (withpackage_name==package_version
format),
BeautifulSoup==3.2.0
Django==1.3
Fabric==1.2.0
Jinja2==2.5.5
PyYAML==3.09
Pygments==1.4
- Finally,
README.MD
contains the what, why and how of the project, with some dummy codes on how to run the project and sample use cases.
Type 2: Kedro
- Kedro is not a project structuring strategy, it’s a python tool released by QuantumBlack Labs, which does project structuring for you. On top of it, they provide a plethora of features to make our project organization and even code execution process super-easy, so that we can truly focus on what matters the most — the experimentations and implementations!
- Their project structure is shown below. And btw, we can create a blank project by running
kedro new
command (don’t forget to install kedro first bypip install kedro
)
get-started # Parent directory of the template
├── conf # Project configuration files
├── data # Local project data (not committed to version control)
├── docs # Project documentation
├── logs # Project output logs (not committed to version control)
├── notebooks # Project related Jupyter notebooks (can be used for experimental code before moving the code to src)
├── README.md # Project README
├── setup.cfg # Configuration options for `pytest` when doing `kedro test` and for the `isort` utility when doing `kedro lint`
└── src # Project source code
- While most of the directories are similar to other types, a few points should be noted. Kedro’s way of grouping different modules is by creating different ”pipelines”. These pipelines are present within
src
folder, which in turn contains the module files. Furthermore, they have clear segregation of individual functions which are executed – these are stored withinnodes.py
file, and these functions are later connected with the input and output withinpipeline.py
file *(all within the individual pipeline folder). Kedro also segregates the code and the parameters, by storing the parameters withinconf
folder. - Apart from just helping with organizing the project, they also provide options for sequential or parallel executions. We can execute individual functions (within
nodes.py
), or individual pipelines (which are a combination of functions), or the complete project at one go. We can also create doc of the complete project or compile and package the project as a python.whl
file, with just a single command run. For more details, and believe me, we have just touched the surface, refer to their official documentation.
Code formatting
- With a top-down approach, let’s first have a look at a neat piece of code. We will discuss individual aspects of the code in more detail later. For now, just assume if someone asks you to do some scripting, what an ideal piece of code file should look like.
- The following code is taken from
csv_column_operations.py
module file. It was generated for the prompt: “write a function which takes CSV file as input and returns the sum of a column”.
Some might argue why do such an overkill for a simple piece of code. Note, it's a dummy example. In real life, you will develop more complex pieces of codes and hence it become quite important that we understand the gist.
- Now let’s take a deeper dive into the individual aspect of the above code.
Module structure
- A module is a python file with
.py
extension that contains the executable code or functions or classes, etc. - Usually, we start the module with module definition, which is an area where we provide some basic details of the module. We can do so using the following template (and it can be easily compared to a real code shown above)
"""<Short description><Long description>Author: <Name> <email>Created: <date>
"""
- Next, we should clearly segregate the parts of the module such as imports, code area, etc using comment lines.
- Finally, at the bottom, we could include some examples on how to run the code. Including these scripts within
if __name__ == '__main__':
makes sure that they only run when the file is directly executed (likepython csv_column_operations.py
). So these pieces of code doesn’t run when you say import the module in another script.
Functions structure
- Functions are the basic block of code that performs a specific task. A module consists of several functions. To inform the user what a particular block of code does, we start the function with a function definition. A sample template is provided below,
"""DescriptionParamters
---------
<parameter_1>: <data_type>
<parameter_1_description>Returns
---------
<output_1>: <data_type>
<output_1_description>
"""
- After this, we can start adding the relevant code lines. Make sure to separate different logical blocks of code within the functions using comments.
- One important thing to handle at the start of the coding section is to check the parameters and input data for some data type or data content related basic issues. A majority of code break happens due to silly mistakes like when someone provides wrong input, in which case we should print or log warning message and gracefully exit. The above same code contains two such preliminary but important checks inside the step 1 section.
Naming convention
There are several formatting conventions that we can follow, like Camel Case, Snake case, etc. It’s quite subjective and depends on the developer. Below are some examples of naming different entities of a python code (taken from PIP8 conventions — with some modifications),
- Module name: Modules should have short, all-lowercase names (ex:
csv_column_operations.py
) - Function or method name: Function names should be lowercase, with words separated by underscores as necessary to improve readability. Also, don’t forget to add your verbs! (ex:
perform_column_sum()
) - Variable name: Similar to function name but without the verbs! (ex:
list_of_news
) - Class name: Class names should normally use the CapWords convention. (ex:
FindMax
) - Constant name: Constants are usually defined on a module level and written in all capital letters with underscores separating words. (ex:
MAX_OVERFLOW
andTOTAL
).
Add comments
PEP-8 defines three types of comments,
- Block comments: which is written for a single or a collection of code lines. This can be done either when you want to explain a set of lines or just want to segregate code. In the above example, you can see
# Step {1, 2, 3}
used as segregation comments and# run when file is directly executed
used to explain a set of code lines. - Inline comments: which are added on the same line as the code. For example, see how
# to handle csv files
is used to justify the pandas package import. PEP-8 suggests using inline comments sparingly. - Documentation Strings: these are used for documentation for module, functions or classes. PEP-257 suggests using multiline comment for docstring (using “””). An example of module and function docstrings (short for documentation strings) is provided in the sample code above.
We should be as descriptive in our comments as possible. Try to separate functional sections of your code, provide explanations for complex code lines, provide details about the input/output of functions, etc. How do you know you have enough comments? — If you think someone with half your expertise can understand the code without calling you middle of the night!
Indentations — Tabs vs Spaces
- Frankly, I am only going to touch this topic with a long stick. There are already several articles, reddit threads and even tv series (Silicon valley ) where this topic has been discussed a lot!
- Want my 2 cents? Pick any modern IDE (like VSCode, Sublime, etc), set indentations to tabs, and set 1 tab = 4 spaces. Done
structuring your applications.
On the one hand, this flexibility is great: it allows different use cases to use structures that are necessary for those use cases. On the other hand, though, it can be very confusing to the new developer.
The Internet isn’t a lot of help either—there are as many opinions as there are Python blogs. In this article, I want to give you a dependable Python application layout reference guide that you can refer to for the vast majority of your use cases.
You’ll see examples of common Python application structures, including command-line applications (CLI apps), one-off scripts, installable packages, and web application layouts with popular frameworks like Flask and Django.
How to write a good commit message?
A good commit message should complete the sentence, “if applied, this commit will …” They should be in sentence case but without a trailing period. An optimal length for a commit message is about 50 characters.
The following is an example commit message.
git commit -am 'Print a hello world message'
You can also create it with more details. You can run git commit
without a commit message. This will open up an editor where you can add multi-line commit messages. Yet, use the above convention to create a title of your commit message. You can use a blank line to separate the title and the body of your message.
Print a hello <user> message
Print a hello world message and a hello <user> message
The main function was hardcoded with 'hello world' message.
But we need a dynamic message that takes the an argument and greet.
Amend the main function to take an argument and string formating to
print hello <user> message
Conclusion
File Organization is simply a collection of related files. File organization is important in order to manage software development projects and websites. Every programming language has its file organization rules. Here we will discuss only python file organization.
When learning how to program in Python, one of the first things you learn is how to organize your files. However, as you start writing larger programs, you might have hundreds of different Python files that sometimes lack a good structure. The goal of this article is to explain my best practices of file organization when writing Python applications. This text was originally composed as a presentation for the journaldev.fr French user group (first video).