5 Essential Tips to Improve the Readability of Your Python Code

It is, however, different from Software engineering mostly because, as Data Scientists, we are able to work quite independently on the code. Yes, you do work in teams, but seldom do you commit to a project, or at least in my experience, you would rarely be editing the same file as a colleague, and generally, it is a one-person responsibility for a section of the project. This will mainly mean that it will not be a team responsibility where several people are in the code section, but a single person will be assigned the duty. This tends to lead to many projects where the code looks rather messy and formatted in a variety of different ways; far less time is spent on ensuring that the code can be easily read and especially maintained by other members of your project team. Since the very first day of my working life, most of my experience has been in consultative positions for 'greenfield' contexts. These projects generally included automatically creating new code.

1. Auto-Formatters are Recommended for Formatting the Code

The easiest way to increase code readability is to use automatic code formatting features, which are available to ensure that your code conforms to a general type of coding system. There are a couple of ways in Python to format your code. Some of the most popular include: The four tools recommended for usage in the Python environment are Black, sort, autopep8, and YAPF.

It is preferred that black specifically for code formatting and isort when it comes to organizing the library imports. Using a simple and staying with this code style or format within a project makes the code easy to read for other programmers.

To simplify, comprehension of code with bad formatting is as difficult as reading an unpleasant article with grammatical mistakes. You can read words, but the way they are arranged in the syntax, you can sometimes construct the meaning, but it will be vague and, in some cases, a rather confusing one.

Cohesion in the style where formatting is done according to the PEP8 standards assists the developers when they read through the code. They already know where everything has to be in the code and can here concentrate on the actual 'meat' of the code rather than the layout.

Code formatting is also made easier through standardizing and automation in version control. Apparently, what could be taken into consideration is the following situation where you have generated a new file that does not meet the PEP8 standards. Your coworker then opens the file on version control and modifies one line of code. When they complete saving a file, their IDE reformats the original file to make it in compliance with PEP8 (for instance, they previously showed a max line length of 79 characters, multiple empty lines between the functions, excess spaces within function arguments and so on). They then trace the file back into the repository. Now that the state is weaker after the transition from absolutism, there is a problem. They have included a link, 'diff,' that shows a very large difference between the old and updated code, even if the change made in the underlying code logic was very small. Uniform and script-based code formatting lessens the chance of enormous diff in the version system that is unrelated to the functionality of the code under modification.

How to Use Code Formatters?

There are many auto code formatters that you can run from the command line against your code. Otherwise, all the current versions of most IDEs, like VS Code, have options or extensions that would automatically format the code every time to save the code file.

For example, on the command line, one can implement the Black code style to your code and organize your imports with sort using the following: For example, on the command line, apply the Black code style the code and organize your imports with sort using the following:

Here is the representation of the part of my VS Code settings: json file, which I detected to be applying black formatting to my Python files, which are opened in VS Code whenever I save them. It also categorizes my imports, which is similar to sort, so that it can arrange import statements when many libraries or functions are being imported in a way that humans can easily read.

Note: While it is asked to follow PEP8 coding representations, there are no universally accepted Python code styling rules. All members of your team working on the project must adhere to the same code-styling rules. Therefore, it is best practice to be explicit about which style guide you are following.

2. Use Code-checking Tools (Linters) to Catch Bugs and Follow Best Practices

Static analyzers, also known as Linters, allow you to identify syntactical and stylistic issues in your Python code.

Linters assist you in maintaining the aesthetics of your code in accordance with PEP8, checking your code for errors before executing it, and informing you of wrong practices.

For Example:

Recognizing unused variables or variables that were used before they were assigned, identifying unused imported libraries, making sure that all functions/classes have docstrings, and advising against what is referred to as bare exceptions.

What is important to understand is that sometimes the violations to which the linter points are not an issue at all. For instance, pylint aides that each Python module should contain a docstring that describes what the module does. It is here that you might choose to agree with the notion that this is not necessary for your project. The linters' behavior can be managed by either direct configuration files or comments in the code, which can be set to disregard certain mistakes.

Again, there is a vast range of linting tools available for Python. The two most common are: Both flake8 and pylint. I mostly used flake8 for the pre-commit hooks (see the tip below). I also use Pylint during development to assist with finding problems and to notify about the potential improvements, but, as opposed to flake8, it is much more preachy about seemingly non-harmful problems, which makes it rather useless as a final line of defense in preventing new code from being committed to the repo.

Pylint also proceeds to rate your code percentage wise out of 10 which is rather nice. Despite all of this, as much as it can be painful to the ego, they might give your full working code a 3 out of 10.

How to Use Linters to Check the Code?

Currently, you can enable linting in VS Code from the settings. json file. For example, using pylint:

Note: Extend code-checking tools to your Jupyter notebooks using the nbQA library. If you work with Jupyter Notebooks, the use of code-checking tools becomes as important as, if not more, than the usage of .py files. Due to the nature of a Jupyter notebook, any work being done in the intermediary is likely to be a work in progress and could require the running of cells in different sequences when fixing issues. Thus, it is quite possible to inadvertently have uninitialized variables or refer to variables that you have not yet defined. It is also not very easy to identify these issues separately at the preparatory stage, right before writing the notebook to the git repo. nbQA executes the Python linting tools on your notebooks to detect if there are broken notebooks that are checked into your repository.

3. Static Type Hinting

Doing type hinting in my code proved to be the game-changer for me. Let me tell you this, there is no better indication of your code's legibility than when you begin adhering to the type hinting principle. If there is anything you should recall from this post, it should be to start practicing type hinting immediately.

Type Hinting:

Python belongs to the group of dynamically typed languages. This means that you do not declare the data type of the variable within your code in a direct form. As a result of assigning a particular type of data structure to a specific variable while executing the code, it acquires its data type.

The absence of type checking in dynamically typed languages also helps reduce the amount of syntactical overhead-all of the aforementioned languages are relatively easy to pick even for inexperienced programmers, and do not impose a lot of unnecessary restrictions on the coder; however, this can quickly turn into a downside as well, as it favors the formation of bad habits and results in code that is often rather muddled and barely readable, especially when a project grows large.

Type hints in Python, from the 3.6 version, are annotations placed in comments that inform the end consumer of your code what type/structure of data you expect regarding the variable or function. For example, str, list, dict, int, etc.

Python's interpreter does not pay attention to the type of hint, and the problem is that Python's type hint is quite limited, whereas the Java type system is very rich. Type hints do not slow your code down, and your code will still run as before if you have not provided type hints at all or if you have provided incorrect type hints. The concept of type hinting is solely for the developer's angle.

For Example:

The following is a typical function where type hints can be applied to enhance the development workflow:

The function above should, in fact, take a list of numbers as its argument and should return the highest value in the list. Simple, right?

Now, you are new to the project as a developer, and you encounter this code. So, when and how should you use this function?

The author of the function is surely completely aware of what it does when he/she writes it- it is a very simple script in Python. Nonetheless, ambiguity is apparent for a new developer who could use the function in the wrong way or get unwanted results.

The biggest question in this example is: What type of data structure should be passed to this function argument called items?

The max keyword in Python works with many different data structures, including arrays, linked lists, strings, arrays of strings, sets, and tuples. The function above wouldn't raise an error if any of these data structures were passed to it. But a list of integers only would give the behavior of the program that was expected by the original developer; it may also be a float.

Although there is no way to prevent future developers from accidentally passing a string or a list of strings to this function, type hints help the developer and indicate how this function was supposed to be used.

Tools Used for Type Hinting

For suspected typing errors, you can use mypy and pylance if you are using vs code.

If you set the type hints, these tools will scan your code for the use of that variable / function and ensure that the right data structure is being used.

4. Automate Everything with Pre-Commit Hooks

Programmers are fond of automated things, and this is why there is a whole new domain of testing called automated testing. Formatting and checking of codes should also not be an exception. Some of the aspects that can be automated include code styling and checking related to styling using pre-commit hooks. The pre-commit configuration enables you to select which tests should be run against your code before changing the commit to the git repository. These checks are executed on the commit operation of the code. All the checks are strictly conditioned, and if these fail, the code will not proceed until its correction.

This serves to prevent bad code, which is not desirable in any project history, from being added to the repo and guarantees that all the code meets the established code styling standards.

Start a Pre-Commit Hook

1. Install pre-commit:

2. Adding the configuration file to the project directory.

3. Installing hooks:

After following these instructions, the next time you attempt to commit files to the repo, the pre-commit hooks will then check that the code you are about to commit is compliant with the black formatter. In the above case, the pre-commit will also strip the whitespaces at the end of the file.

You can find a more detailed tutorial on pre-commit hooks along with my full template in my article about it. pre-commit-config. Refer to the .yaml file as my Data Science configuration .yaml file.

5. Write Documentation

This is why we all wish to see the code that we produce remain active even when we are inactive on that project. Therefore, it doesn't matter how good your software is; if the Documentation is not good, people will not buy or use it. Documentation is usually one thing that gets left halfway through the development, which makes it very easy to neglect. It is often the last aspect of a project to receive attention.

It is pointless to write good and voluminous Documentation for a feature that is still under construction - it is better not to write anything at all than to provide the wrong information, but there are several simple rules that will allow you not to get completely lost and not overwork. Another reason for taking better Documentation is that the likelihood that others will utilize your code will be high, and it will also ease your work in the future.

In Python, docs generally take three forms: comments within code (often called inline comments), comments at the head of functions/methods (docstrings), and comments in external documentation files (e.g., a README file).

As for what the code is doing, well, that ought to be fairly obvious from the preceding sections. On the other hand, how that 'magic' number, the unusual method or the workaround originated by the developer is not clear. That is why inline comments are rather logical. Use inline comments in cases when the purpose of the code, which is written, is not quite clear. An example of this is when you are using HEX codes for colors - with such a scenario, you are never going to remember the color just by looking at the hex code. Adding simple one-line doc strings to your code functions, even if you think the function is unique, goes a long way to improve readability and reduce ambiguity for future developers. It is essential to note that doc strings and inline/block comments are not interchangeable.

On the other hand, inline/block comments and doc strings can be accessible at code runtime using either the __doc__ dunder attribute or the built-in help() function. This shows that the doc string can be taken by IDEs and displayed as hints while typing new code. It can be noticed that programmers who come from different languages do not make proper use of Python's doc string features.

Conclusion:

Writing efficient code is a best practice that not only improves the readability of Python code but also helps make a good impression. Use comments and tools to improve the readability of Python code.