Have you contributed to open source code? Did you really get into the code? Did you find that the documentation is written in a style that is difficult to comprehend? Sometimes, code doesn’t have any relevant comments. Sometimes, there are no comments for methods or functions. Sometimes, the writers just write the function’s purpose at the beginning of the function.
When you write python code in your classes and functions, it is advisable to document the code. Python documentation methods are extensively used. This article covers the different types of python documentation style and examples of each of them.
Learning how to document python code will put you in high demand. Python’s open source nature has led to an explosion in the community, more than doubling its user base just over the last five years.
Have you ever found yourself in a situation where you have been writing a code, and you feel like you need to remember some other explicit information that’s related to your coding? Unless you are able to save this as a separate file within the program, it is possible that you could lose access to this information for instance when you want to retrieve it.
A Foolish Consistency is the Hobgoblin of Little Minds
One of Guido’s key insights is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. As PEP 20 says, “Readability counts”.
A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.
However, know when to be inconsistent – sometimes style guide recommendations just aren’t applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!
In particular: do not break backwards compatibility just to comply with this PEP!
Some other good reasons to ignore a particular guideline:
- When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.
- To be consistent with surrounding code that also breaks it (maybe for historic reasons) – although this is also an opportunity to clean up someone else’s mess (in true XP style).
- Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.
- When the code needs to remain compatible with older versions of Python that don’t support the feature recommended by the style guide.
Code Lay-out
Indentation
Use 4 spaces per indentation level.
Continuation lines should align wrapped elements either vertically using Python’s implicit line joining inside parentheses, brackets and braces, or using a hanging indent [1]. When using a hanging indent the following should be considered; there should be no arguments on the first line and further indentation should be used to clearly distinguish itself as a continuation line:
# Correct: # Aligned with opening delimiter. foo = long_function_name(var_one, var_two, var_three, var_four) # Add 4 spaces (an extra level of indentation) to distinguish arguments from the rest. def long_function_name( var_one, var_two, var_three, var_four): print(var_one) # Hanging indents should add a level. foo = long_function_name( var_one, var_two, var_three, var_four)
# Wrong: # Arguments on first line forbidden when not using vertical alignment. foo = long_function_name(var_one, var_two, var_three, var_four) # Further indentation required as indentation is not distinguishable. def long_function_name( var_one, var_two, var_three, var_four): print(var_one)
The 4-space rule is optional for continuation lines.
Optional:
# Hanging indents *may* be indented to other than 4 spaces. foo = long_function_name( var_one, var_two, var_three, var_four)
When the conditional part of an if
-statement is long enough to require that it be written across multiple lines, it’s worth noting that the combination of a two character keyword (i.e. if
), plus a single space, plus an opening parenthesis creates a natural 4-space indent for the subsequent lines of the multiline conditional. This can produce a visual conflict with the indented suite of code nested inside the if
-statement, which would also naturally be indented to 4 spaces. This PEP takes no explicit position on how (or whether) to further visually distinguish such conditional lines from the nested suite inside the if
-statement. Acceptable options in this situation include, but are not limited to:
# No extra indentation. if (this_is_one_thing and that_is_another_thing): do_something() # Add a comment, which will provide some distinction in editors # supporting syntax highlighting. if (this_is_one_thing and that_is_another_thing): # Since both conditions are true, we can frobnicate. do_something() # Add some extra indentation on the conditional continuation line. if (this_is_one_thing and that_is_another_thing): do_something()
(Also see the discussion of whether to break before or after binary operators below.)
The closing brace/bracket/parenthesis on multiline constructs may either line up under the first non-whitespace character of the last line of list, as in:
my_list = [ 1, 2, 3, 4, 5, 6, ] result = some_function_that_takes_arguments( 'a', 'b', 'c', 'd', 'e', 'f', )
or it may be lined up under the first character of the line that starts the multiline construct, as in:
my_list = [ 1, 2, 3, 4, 5, 6, ] result = some_function_that_takes_arguments( 'a', 'b', 'c', 'd', 'e', 'f', )
Tabs or Spaces?
Spaces are the preferred indentation method.
Tabs should be used solely to remain consistent with code that is already indented with tabs.
Python disallows mixing tabs and spaces for indentation.
Maximum Line Length
Limit all lines to a maximum of 79 characters.
For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.
Limiting the required editor window width makes it possible to have several files open side by side, and works well when using code review tools that present the two versions in adjacent columns.
The default wrapping in most tools disrupts the visual structure of the code, making it more difficult to understand. The limits are chosen to avoid wrapping in editors with the window width set to 80, even if the tool places a marker glyph in the final column when wrapping lines. Some web based tools may not offer dynamic line wrapping at all.
Some teams strongly prefer a longer line length. For code maintained exclusively or primarily by a team that can reach agreement on this issue, it is okay to increase the line length limit up to 99 characters, provided that comments and docstrings are still wrapped at 72 characters.
The Python standard library is conservative and requires limiting lines to 79 characters (and docstrings/comments to 72).
The preferred way of wrapping long lines is by using Python’s implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.
Backslashes may still be appropriate at times. For example, long, multiple with
-statements could not use implicit continuation before Python 3.10, so backslashes were acceptable for that case:
with open('/path/to/some/file/you/want/to/read') as file_1, \ open('/path/to/some/file/being/written', 'w') as file_2: file_2.write(file_1.read())
(See the previous discussion on multiline if-statements for further thoughts on the indentation of such multiline with
-statements.)
Another such case is with assert
statements.
Make sure to indent the continued line appropriately.
Should a Line Break Before or After a Binary Operator?
For decades the recommended style was to break after binary operators. But this can hurt readability in two ways: the operators tend to get scattered across different columns on the screen, and each operator is moved away from its operand and onto the previous line. Here, the eye has to do extra work to tell which items are added and which are subtracted:
# Wrong: # operators sit far away from their operands income = (gross_wages + taxable_interest + (dividends - qualified_dividends) - ira_deduction - student_loan_interest)
To solve this readability problem, mathematicians and their publishers follow the opposite convention. Donald Knuth explains the traditional rule in his Computers and Typesetting series: “Although formulas within a paragraph always break after binary operations and relations, displayed formulas always break before binary operations” [3].
Following the tradition from mathematics usually results in more readable code:
# Correct: # easy to match operators with operands income = (gross_wages + taxable_interest + (dividends - qualified_dividends) - ira_deduction - student_loan_interest)
In Python code, it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth’s style is suggested.
Blank Lines
Surround top-level function and class definitions with two blank lines.
Method definitions inside a class are surrounded by a single blank line.
Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).
Use blank lines in functions, sparingly, to indicate logical sections.
Python accepts the control-L (i.e. ^L) form feed character as whitespace; Many tools treat these characters as page separators, so you may use them to separate pages of related sections of your file. Note, some editors and web-based code viewers may not recognize control-L as a form feed and will show another glyph in its place.
Source File Encoding
Code in the core Python distribution should always use UTF-8, and should not have an encoding declaration.
In the standard library, non-UTF-8 encodings should be used only for test purposes. Use non-ASCII characters sparingly, preferably only to denote places and human names. If using non-ASCII characters as data, avoid noisy Unicode characters like z̯̯͡a̧͎̺l̡͓̫g̹̲o̡̼̘ and byte order marks.
All identifiers in the Python standard library MUST use ASCII-only identifiers, and SHOULD use English words wherever feasible (in many cases, abbreviations and technical terms are used which aren’t English).
Open source projects with a global audience are encouraged to adopt a similar policy.
Imports
- Imports should usually be on separate lines:# Correct: import os import sys # Wrong: import sys, os It’s okay to say this though:# Correct: from subprocess import Popen, PIPE
- Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.Imports should be grouped in the following order:
- Standard library imports.
- Related third party imports.
- Local application/library specific imports.
- Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on
sys.path
):import mypkg.sibling from mypkg import sibling from mypkg.sibling import example However, explicit relative imports are an acceptable alternative to absolute imports, especially when dealing with complex package layouts where using absolute imports would be unnecessarily verbose:from . import sibling from .sibling import example Standard library code should avoid complex package layouts and always use absolute imports. - When importing a class from a class-containing module, it’s usually okay to spell this:from myclass import MyClass from foo.bar.yourclass import YourClass If this spelling causes local name clashes, then spell them explicitly:import myclass import foo.bar.yourclass and use “myclass.MyClass” and “foo.bar.yourclass.YourClass”.
- Wildcard imports (
from <module> import *
) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance).When republishing names this way, the guidelines below regarding public and internal interfaces still apply.
Module Level Dunder Names
Module level “dunders” (i.e. names with two leading and two trailing underscores) such as __all__
, __author__
, __version__
, etc. should be placed after the module docstring but before any import statements except from __future__
imports. Python mandates that future-imports must appear in the module before any other code except docstrings:
"""This is the example module. This module does stuff. """ from __future__ import barry_as_FLUFL __all__ = ['a', 'b', 'c'] __version__ = '0.1' __author__ = 'Cardinal Biggles' import os import sys
String Quotes
In Python, single-quoted strings and double-quoted strings are the same. This PEP does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. It improves readability.
For triple-quoted strings, always use double quote characters to be consistent with the docstring convention in PEP 257.
Whitespace in Expressions and Statements
Pet Peeves
Avoid extraneous whitespace in the following situations:
- Immediately inside parentheses, brackets or braces:# Correct: spam(ham[1], {eggs: 2}) # Wrong: spam( ham[ 1 ], { eggs: 2 } )
- Between a trailing comma and a following close parenthesis:# Correct: foo = (0,) # Wrong: bar = (0, )
- Immediately before a comma, semicolon, or colon:# Correct: if x == 4: print(x, y); x, y = y, x # Wrong: if x == 4 : print(x , y) ; x , y = y , x
- However, in a slice the colon acts like a binary operator, and should have equal amounts on either side (treating it as the operator with the lowest priority). In an extended slice, both colons must have the same amount of spacing applied. Exception: when a slice parameter is omitted, the space is omitted:# Correct: ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:] ham[lower:upper], ham[lower:upper:], ham[lower::step] ham[lower+offset : upper+offset] ham[: upper_fn(x) : step_fn(x)], ham[:: step_fn(x)] ham[lower + offset : upper + offset] # Wrong: ham[lower + offset:upper + offset] ham[1: 9], ham[1 :9], ham[1:9 :3] ham[lower : : upper] ham[ : upper]
- Immediately before the open parenthesis that starts the argument list of a function call:# Correct: spam(1) # Wrong: spam (1)
- Immediately before the open parenthesis that starts an indexing or slicing:# Correct: dct[‘key’] = lst[index] # Wrong: dct [‘key’] = lst [index]
- More than one space around an assignment (or other) operator to align it with another:# Correct: x = 1 y = 2 long_variable = 3 # Wrong: x = 1 y = 2 long_variable = 3
Background
Python is the main dynamic language used at Google. This style guide is a list of dos and don’ts for Python programs.
To help you format code correctly, we’ve created a settings file for Vim. For Emacs, the default settings should be fine.
Many teams use the yapf auto-formatter to avoid arguing over formatting.
Project Documentation
A README
file at the root directory should give general information to both users and maintainers of a project. It should be raw text or written in some very easy to read markup, such as reStructuredText or Markdown. It should contain a few lines explaining the purpose of the project or library (without assuming the user knows anything about the project), the URL of the main source for the software, and some basic credit information. This file is the main entry point for readers of the code.
An INSTALL
file is less necessary with Python. The installation instructions are often reduced to one command, such as pip install module
or python setup.py install
, and added to the README
file.
A LICENSE
file should always be present and specify the license under which the software is made available to the public.
A TODO
file or a TODO
section in README
should list the planned development for the code.
A CHANGELOG
file or section in README
should compile a short overview of the changes in the code base for the latest versions.
Why Documenting Your Code Is So Important
Hopefully, if you’re reading this tutorial, you already know the importance of documenting your code. But if not, then let me quote something Guido mentioned to me at a recent PyCon:
“Code is more often read than written.”
— Guido van Rossum
When you write code, you write it for two primary audiences: your users and your developers (including yourself). Both audiences are equally important. If you’re like me, you’ve probably opened up old codebases and wondered to yourself, “What in the world was I thinking?” If you’re having a problem reading your own code, imagine what your users or other developers are experiencing when they’re trying to use or contribute to your code.
Conversely, I’m sure you’ve run into a situation where you wanted to do something in Python and found what looks like a great library that can get the job done. However, when you start using the library, you look for examples, write-ups, or even official documentation on how to do something specific and can’t immediately find the solution.
After searching, you come to realize that the documentation is lacking or even worse, missing entirely. This is a frustrating feeling that deters you from using the library, no matter how great or efficient the code is. Daniele Procida summarized this situation best:
Conclusion
As Python is a dynamically typed language, there are no type definitions in the code. The function name is used as the main documentation. The docstring should be a of single line with a summary of the function and should be followed by a blank line. Detailed documentation goes below that.
If you’re not a developer you probably don’t spend much time reading the documentation for Python code. However, from the perspective of a developer, your documentation is incredibly important and how you present it says a lot about your attitude toward development and that of your team.