= Code Style Guidelines for the Seattle project =
These guidelines provide examples about what to do (or not to do) when writing code for the Seattle project. These are based upon Justin's experiences working with Stork, Guido van Rossum's Python style guidelines, and the experiences and suggestions of team members. Please give Justin feedback if there is anything you'd like to change.
One of Guido's key insights in building Python is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. The primary goal of the code we write for Seattle is readability. The other features that your code must have are security, correctness, and robustness (notice that performance is not listed). The purpose of this document is to help to improve the readability of Seattle code (which I believe strongly impacts the security, correctness, and robustness).
A style guide is about consistency. Consistency within a project is very important and since we're writing basically all of the code ourselves, there should be little reason for inconsistency.
There is only one good reason to break a rule in the style guideline and that's when applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.
Indentation
Use 2 spaces per indentation level.
Tabs
Never use tabs to indent project code. There should be no tabs in Seattle code.
Maximum Line Length
Try to limit most lines to a maximum of 79 characters. This certainly should be done for comments, but use common sense when applying this rule to code! I've seen developers who have a highly indented loop wrap a relatively short line across three separate lines to try to avoid going over 80 characters (don't do it!).
The preferred way of wrapping long lines is by using Python's implied line
continuation inside parentheses, brackets and braces. If necessary, you
can add an extra pair of parentheses around an expression, but sometimes
using a backslash looks better. Make sure to indent the continued line
appropriately. The preferred place to break around a binary operator is
*after* the operator, not before it. Some examples:
class Rectangle(Blob):
def __init__(self, width, height,
color='black', emphasis=None, highlight=0):
if width == 0 and height == 0 and
color == 'red' and emphasis == 'strong' or
highlight > 100:
raise ValueError("sorry, you lose")
if width == 0 and height == 0 and (color == 'red' or
emphasis is None):
raise ValueError("I don't think so -- values are %s, %s" %
(width, height))
Blob.__init__(self, width, height,
color, emphasis, highlight)Blank Lines
Separate top-level function and class definitions with at least 5 blank lines.
Method definitions inside a class are separated by at least 2 blank lines.
Use blank lines in functions to indicate logical sections and help to offset comments.
- Imports should usually be on separate lines, e.g.:
Yes:
import os
import sys No:
import sys, os- It is preferable to import an entire module than items from the module
Yes:
import subprocess No:
from subprocess import Popen, PIPE Definitely Not:
from subprocess import * An exception to the last rule is when you must import items in a specific way for repy portability.
- Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
- Always use the absolute package path for all imports.
- **Avoid circular imports**. This is where ```a.py``` imports ```b.py``` and then either ```b.py``` or a path of imports from ```b.py``` imports ```a.py```. This does really odd things to Python, in particular if you perform any actions during import.
- Try to avoid performing actions on module import. This doesn't play well with circular imports and is non-intuitive for most programmers.
Pet Peeves
Avoid extraneous whitespace in the following situations:
- Immediately inside parentheses, brackets or braces.
Yes:
spam(ham[1], {eggs: 2}) No:
spam( ham[ 1 ], { eggs: 2 } )- Immediately before a comma, semicolon, or colon:
Yes:
if x == 4: print x, y; x, y = y, x No:
if x == 4 : print x , y ; x , y = y , x- Immediately before the open parenthesis that starts the argument
list of a function call:
Yes:
spam(1) No:
spam (1)- Immediately before the open bracket that starts an indexing or
slicing:
Yes:
dict['key'] = list[index] No:
dict ['key'] = list [index]- More than one space around an assignment (or other) operator to
align it with another.
Yes:
x = 1
y = 2
long_variable = 3 No:
x = 1
y = 2
long_variable = 3Other Recommendations
- Always surround these binary operators with a single space on
either side: assignment (=), augmented assignment (+=, -= etc.),
comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not),
Booleans (and, or, not).
- Use spaces around arithmetic operators:
Yes:
i = i + 1
submitted += 1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b) Maybe not:
i=i+1
submitted +=1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)- Compound statements (multiple statements on the same line) are
not allowed.
Yes:
if foo == 'blah':
do_blah_thing()
do_one()
do_two()
do_three() No:
if foo == 'blah': do_blah_thing()
do_one(); do_two(); do_three() Definitely not:
if foo == 'blah': do_blah_thing()
else: do_non_blah_thing()
try: something()
finally: cleanup()
do_one(); do_two(); do_three(long, argument,
list, like, this)
if foo == 'blah': one(); two(); three()Comments that contradict the code are worse than no comments. Always make
a priority of keeping the comments up-to-date when the code changes!
Comments should be complete sentences. If a comment is a phrase or
sentence, its first word should be capitalized, unless it is an identifier
that begins with a lower case letter (never alter the case of
identifiers!).
If a comment is short, the period at the end can be omitted. Block
comments generally consist of one or more paragraphs built out of complete
sentences, and each sentence should end in a period.
You should use two spaces after a sentence-ending period.
When writing English, Strunk and White apply. This means you shouldn't have spelling or
grammar errors. It also helps to make the project's code look more professional.
Block Comments
Block comments generally apply to some (or all) code that follows them,
and are indented to the same level as that code. Each line of a block
comment starts with a # and a single space (unless it is indented text
inside the comment).
Paragraphs inside a block comment are separated by a line containing a
single #.
Inline Comments
Use inline comments very sparingly.
An inline comment is a comment on the same line as a statement. Inline
comments should be separated by at least two spaces from the statement.
They should start with a # and a single space.
Inline comments are unnecessary and in fact distracting if they state
the obvious. Don't do this:
x = x + 1 # Increment xBut sometimes, this is useful:
x = x + 1 # Compensate for borderPurpose of Comments
Comments should give the reader the context for why you are performing a specific action and indicate authorship and mindset.
It is very important that when you change code you didn't write you
comment the change. I've spent days looking for bugs that people
introduced because they changed code they thought they understood.
'''Any time you make changes to a module or function you didn't write, use
your initials in a comment describing the scope and purpose of the change.'''
Use the comment to explain what you are thinking when you make the
change.
old code:
for line in file("foo"):
print line.split()[1]new code:
for line in file("foo"):
# JAC: Need to check if there is a second word, if not then we
# can skip the line because it isn't relevant to the output.
# Prior code threw an IndexError in this case.
if len(line.split()) > 2:
print line.split()[1]Now when someone reads the new code they can understand why you
changed what you did. If the original author reads your updated
code, they can tell if you misunderstood their code and easily fix
it (but if you misunderstood the code, this is a hint it needs better comments).
If you spend time struggling to understand something in code, it is
useful to put a comment indicating you find it unclear but you *think*
this is what is intended. This helps us to locate portions of the code
that need to be cleaned up and also indicates "how" to clean the code.
Comments should describe why you are performing an action, not what action you are performing.
No:
i = i + 2 # add 2 to iYes:
i = i + 2 # skipping even numbers since even numbers > 2 are not primeAn exception to this rule is if you need to do some "magic operation" in the code.
# reuse the socket if it's recently been closed but is available
socketobj.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)Hints for writing good comments
It is helpful many times to write comments as questions
# Is the file correctly signed?
if (foo.bar(fn)):The "right time" to write comments is as you write the code
(sometimes I even write them before the code). You will never
have a better understanding of the code than when you write it.
If you are making assumptions then you should check that your
assumptions are valid. If you can't test your assumptions then
at least comment your code
total = 0
for item in list:
# The list should contain only integers
total = total + itemComment quantity
There is no hard and fast rule for the number of comments a file
should have. However, one way to check it is to read only the
comments (not the code). If you could re-construct the code using
only the comments, you are likely at the right level of comments.
Note that "well commented" modules may have more lines of comments
than lines of code!
Conventions for writing good documentation strings (a.k.a. "docstrings")
are immortalized in [PEP 257](http://www.python.org/dev/peps/pep-0257/)
- Write docstrings for all public modules, functions, classes, and
methods. Docstrings are not necessary for non-public methods, but you
should have a comment that describes what the method does. This comment
should appear after the "def" line.
- Note that most
importantly, the """ that ends a multiline docstring should be on a line
by itself, and preferably preceded by a blank line, e.g.:
"""Return a foobang
Optional plotz says to frobnicate the bizbaz first.
"""- For one liner docstrings, it's okay to keep the closing """ on the same
line.
Example documentation strings for files, modules, and classes.
Each file should have a header block that explains the purpose of the
module, when it was started, who wrote it (or made very substantial
revisions), and a list of any caveats or issues with the module.
"""
<Program Name>
storkreport.py
<Started>
February 4, 2005
<Author>
[email protected]
Justin Cappos
Jeffry Johnston
<Purpose>
Provides Error Reporting/Logging Utility Functions.
The functions are named loosely based on the syslog levels. However,
because syslog has 7 priority levels, and stork only has 4, the
send_error and send_out_* series only contain debug, info, warning, and
err (notice, alert, and emergency were omitted). send_syslog supports
all seven levels, however.
Current syslog implementation assumes local syslog (no remote servers),
and a facility of LOG_USER (defaults from the syslog module).
On July 10th, 2007, Justin is doing a pretty substantial rewrite. Blame
him for any resulting cruft.
""" Documentation strings are needed for every function in a module that
will be called from other modules. It is recommended to create comment
blocks for long functions even if they are private to a module. There are 5 main
parts of a function comment block: purpose, arguments, exceptions,
side effects, and return value. The purpose is to specify everything someone
who calls your function needs to know (so they don't need to look at the code).
For example:
def redirect_stdout(stream):
"""
<Purpose>
Redirects the standard output stream (stdout) to a new file stream.
If this is the first time that output has been redirected, the
original stdout stream will be saved for use with the restore_stdout
function.
<Arguments>
stream:
The new file stream for stdout.
<Exceptions>
TypeError on bad parameters.
<Side Effects>
Changes sys.stdout.
<Returns>
None.
""" Here's a copy-and-paste ready version:
"""
<Purpose>
...
<Arguments>
...
<Exceptions>
...
<Side Effects>
...
<Returns>
...
""" In general the use of classes is discouraged (link to below), but in the
cases it is clear classes should be used, use the following format:
class single_conn(Thread):
"""
<Purpose>
Wraps up the client side of arizonacomm into a single class.
<Side Effects>
Defaults to running itself in a thread.
<Example Use>
# open a connection
connection = single_conn()
# write information to the connected party
connection.send("hello")
# and disconnect without checking if they have said anything
connection.disconnect()
""" Classes that are used for exceptions can be written more succinctly (since the purpose, side effects, example use are obvious).
class UserError(Exception):
"""This exception indicates the user provided us with bad input"""Descriptive Naming Styles
Use descriptive variable and function names.
Put at least one adjective in each variable name like 'slicename' or
'inputfileobj' or 'deststring'. Also make sure that the type of the
variable is readable by looking at the name. Also, the word "file" is
ambiguous, use fileobj or filename (fn is an acceptable abbreviation for
filename).
These rules are especially important for function arguments or variables that have long lifespans.
Yes:
inputfileobj = file(inputfn) # fn is an acceptable abbreviation for file name
deststring = "abc" + currentstring
slicename = "uw_seattle"No:
inobj = file(filename)
dest = "abc" + current
name = "uw_seattle"Definitely not:
i = i + 1 The only case where it is okay to use single letter variable names is for arguments
passed into a constructor or using e for an exception. For example:
class foo:
... # doc string omitted
itemcount = None
maxitemcount = None
itemlist = None
def __init__(self, i, m):
self.itemcount = i
self.maxitemcount = m
self.itemlist = []
def popanitem(self):
try:
return self.itemlist.pop()
except KeyError, e:
# Raise an error, the list is empty...
raise KeyError("Can't pop an empty foo")General Variable Names
Should be the same as function names, always use lower case, using
an underscore if a separation is needed, such as: popanitem or pop_an_item
Constants
Variables which are set once and never change value should be all
upper case, if a seperator is needed use an underscore.
Exception Names
Because exceptions should be classes, the class naming convention
applies here. However, you should use the suffix "Error" on your
exception names (if the exception actually is an error).
Global Variable Names
(Let's hope that these variables are meant for use inside one module
only.) The conventions are about the same as those for functions.
If you can do it without globals in an intelligent way, then don't use globals.
When you must use globals then explain why you need to use globals. An example
that uses globals in a good way is a program that keeps a cache.
Function Names
Function names should be lowercase, with words separated by underscores
as necessary to improve readability.
!mixedCase is allowed only in contexts where that's already the
prevailing style (e.g. threading.py), to retain backwards compatibility.
Function and method arguments
Always use 'self' for the first argument to instance methods.
If a function argument's name clashes with a reserved keyword, it is
generally better to append a single trailing underscore rather than use
an abbreviation or spelling corruption. Thus "print_" is better than
"prnt". (Perhaps better is to avoid such clashes by using a synonym.)
- Comparisons to singletons like None should always be done with
'is' or 'is not', never the equality operators.
Also, beware of writing "if x" when you really mean "if x is not None"
-- e.g. when testing whether a variable or argument that defaults to
None was set to some other value. The other value might have a type
(such as a container) that could be false in a boolean context!
- Use class-based exceptions.
String exceptions in new code are forbidden, because this language
feature is being removed in Python 2.6.
Modules or packages should define their own domain-specific base
exception class, which should be subclassed from the built-in Exception
class. Always include a class docstring. E.g.:
class MessageError(Exception):
"""Base class for errors in the email package.""" Class naming conventions apply here, although you should add the suffix
"Error" to your exception classes, if the exception is an error.
Non-error exceptions need no special suffix.
- When raising an exception, use "raise ValueError('message')" instead of
the older form "raise ValueError, 'message'".
The paren-using form is preferred because when the exception arguments
are long or include string formatting, you don't need to use line
continuation characters thanks to the containing parentheses.
- When catching exceptions, mention specific exceptions
whenever possible instead of using a bare 'except:' clause.
For example, use:
try:
import platform_specific_module
except ImportError:
platform_specific_module = None A bare 'except:' clause will catch SystemExit and KeyboardInterrupt
exceptions, making it harder to interrupt a program with Control-C,
and can disguise other problems. If you want to catch all
exceptions that signal program errors, use 'except Exception:'.
A good rule of thumb is to limit use of bare 'except' clauses to two
cases:
1) If the exception handler will be printing out or logging
the traceback; at least the user will be aware that an
error has occurred.
2) If the code needs to do some cleanup work, but then lets
the exception propagate upwards with 'raise'.
'try...finally' is a better way to handle this case.
- Additionally, for all try/except clauses, limit the 'try' clause
to the absolute minimum amount of code necessary. Again, this
avoids masking bugs.
Yes:
try:
value = collection[key]
except KeyError:
return key_not_found(key)
else:
return handle_value(value) No:
try:
# Too broad!
return handle_value(collection[key])
except KeyError:
# Will also catch KeyError raised by handle_value()
return key_not_found(key)- Use ```assert``` sparingly.
Assert has some tricky semantics because it is a statement, not a function. Thus you cannot call assert with a string to raise in the logical way
assert(x = y, 'x and y must be equal') # BAD CODE
The above code will actually always be True because the tuple (bool, 'string') is True.
You should use assertions only when it is truly an internal error (i.e. somewhere it should be impossible to reach). Even so, it is fine to log / exitall instead.
- Use string methods instead of the string module.
String methods are always much faster and share the same API with
unicode strings. Override this rule if backward compatibility with
Pythons older than 2.0 is required.
- Use ''.startswith() and ''.endswith() instead of string slicing to check
for prefixes or suffixes.
startswith() and endswith() are cleaner and less error prone. For
example:
Yes:
if foo.startswith('bar'): No:
if foo[:3] == 'bar':- Object type comparisons should always use isinstance() instead
of comparing types directly.
Yes:
if isinstance(obj, int): No:
if type(obj) is type(1): When checking if an object is a string, keep in mind that it might be a
unicode string too! In Python 2.3, str and unicode have a common base
class, basestring, so you can do:
if isinstance(obj, basestring): The types module has the StringTypes type defined for that purpose, e.g.:
import types
if isinstance(obj, types.StringTypes):- For sequences, (strings, lists, tuples), use the fact that empty
sequences are false.
Yes:
if not seq:
if seq: No:
if len(seq)
if not len(seq)- Don't write string literals that rely on significant trailing
whitespace. Such trailing whitespace is visually indistinguishable and
some editors (or more recently, reindent.py) will trim them.
- Don't compare boolean values to True or False using ==
Yes:
if greeting: No:
if greeting == True: Worse:
if greeting is True: However, there are a few cases where a function may return True, False, or other values. In these cases, checking if a value with an unknown type is True or False, is allowed.
- Use the % string formatting operator only when necessary
The string formatting operator '%' provides a convenient way to do printf like substitution of variables in strings. It's better in many cases to put variables in line when you don't need the more advanced options (like fixed length fields). The issue is that the string formatting operator makes understanding the output difficult in many cases. For example in the following example, it's difficult to read the code and determine what the resulting output will look like:
print "python %s %s %s/vesselinfodir/ %s/ > /tmp/carter.out 2> /tmp/carter.err"%(carter_script, dist_char, prefix,prefix) A better way to write this:
print "python "+carter_script+" "+dist_char+" "+prefix+"/vesselinfodir/ "+prefix+"/ > /tmp/carter.out 2> /tmp/carter.err"- Use else statements for handling errors, not for normal flow in most cases.
For example, if you have a function that is supposed to take a positive integer argument, don't do the following:
if x > 0:
...
else: # x must be 0
...It may be the case that x is negative, a different type, etc. Instead do the following:
if x > 0:
...
elif x == 0: # x must be 0
...
else:
raise InternalError('Expected x to be a positive integer, not "'+str(x)+'"')== Potpourri ==
Unix not Windows style text files.
If you develop on Windows make sure you use dos2unix before checking in.
Avoid objects
Objects are a terrific programming tool when used correctly, but make code
nearly unreadable when used poorly. Unfortunately, even experienced developers
have a hard time knowing how to correctly use objects. About 80% of the code one
could write is equally good with and without objects. 10% is easier with objects
and 10% is easier without them. Seattle code should use objects only in very
rare cases. The 90% of the code where objects can be avoided without significant
impact, should be written without objects.
No lambda functions or lisp-esque code
Don't use lambda functions. Don't use map, flatten, etc. It's best not to use
these functions and to write longer code that produces the same result instead
because using these functions makes your code very dense and difficult to read.
Do not use syntax like:
must_testy = [must_test(line) for line in changed.split("
n")]Write test cases
Since you're going to test your module anyways before integrating it
into the rest of Seattle (right?), it makes sense to do it in a reusable way.
This way when you make changes later, you can immediately test to see if anything
is broken.
Write 90% tests. The test should make sure that it will catch 90% of the
potential problems with the function. Avoid writing 50% tests (that only
check a common case or two) and avoid writing 99.9% tests because it will
consume too much of your time.
All "external" functions **must** be tested. Internal functions (starting with ```__```) are up to your discretion.
Never use short circuit evaluation to produce output
No:
if synched or askokcancel("Change slice without synching?", "You
have changes that are not synched with the repository. These changes
will be lost if you change the slice right now.
n
nClick OK to change
slice without synching.", default=CANCEL):
# then clause
else:
# else clause Yes:
if not synched:
if askokcancel("Change slice without synching?", "You have
changes that are not synched with the repository. These changes will
be lost if you change the slice right now.
n
nClick OK to change
slice without synching.", default=CANCEL):
#then clause
else:
#else clause
else:
#then clause or if the then and else clauses are large write:
if not synched:
isok = askokcancel("Change slice without synching?", "You have
changes that are not synched with the repository. These changes will
be lost if you change the slice right now.
n
nClick OK to change
slice without synching.", default=CANCEL):
else:
isok = True
if isok:
#then clause
else:
#else clause Writing the code the first way is hard for someone to read. The
second two ways make it more clear that you are relying on the short
circuit evaluation to perform or avoid the evaluation of the
askokcancel function.
Don't use os.popen or os.system
These are deprecated and we've found bugs in them. Don't use them! Use subprocess instead.
Don't use subprocess.Popen with a string argument (#508).
Call subprocess.Popen with a first argument that looks like ['command', 'arg1', 'arg2', ...]. Do not use 'command arg1 arg2 ...'!
Avoid changing directory (#487).
Changing the current directory can cause multithreaded programs to break is scary ways and is usually an indication of bad programming style. Don't do it!
Don't use mutable objects as argument defaults (#828).
If they are modified in the function (or returned and modified outside of the function), the changes persist to future function calls.
If you want the default of an empty list or dict, for example, use a default of None and then check for this value in the function and
assign an empty list/dict inside the function.