Testing and Debugging Jupyter Notebooks

Jupyter notebook is a great tool for a data scientist to create and share documents that contain code, visualizations, and text. A combination of the notebook development environment and a reach Python data-science stack allows to start with an idea sketch and develop it to a full featured data-science project. At some point between the sketch and the finished project you may get that unsettling feeling about changing some function or even a single line of code, because you are not sure how this may impact the rest of the code. This is a good moment to invest some time in writing regression tests (if you still have not done that). In this post I will show how to use Python standard testing tools, such as doctest and unittest, to add tests to a Jupyter notebook.

Running Example

As an example, we will use a function that is meant to return the sum of its two parameters and that is stored in a Jupyter notebook cell:

def add(a, b):
    """Return the sum of a and b."""
    sum = a
    return sum

Contrary to the specification, the function returns the first argument and not the sum of the two arguments. Using doctest and unittest modules we will write tests that will help us to reveal the bug in the function. Then we will run this tests in a Jupyter notebook and debug the failing tests using the Python debugger (pdb).

Although, the example is very simple, the techniques that we will use work the same for notebooks of any complexity.

Doctest

The tests of the doctest module look like interactive Python sessions embedded in the python docstrings. In the following code snippet we extend our running example with a test that consists of two lines: a function call (starts with >>>) and the expected output.

def add(a, b):
    """Return the sum of a and b.

    >>> add(2, 2)
    4
    """
    sum = a
    return sum

In the last cell of our notebook, we import the doctest module and run all tests in all docstrings:

import doctest
doctest.testmod(verbose=True)

The test for the add() function will fail, because of the bug, and the test output will look something like this:

Trying:
    add(2, 2)
Expecting:
    4
**********************************************************************
File "__main__", line 4, in __main__.add
Failed example:
    add(2, 2)
Expected:
    4
Got:
    2
1 items had no tests:
    __main__
**********************************************************************
1 items had failures:
   1 of   1 in __main__.add
1 tests in 2 items.
0 passed and 1 failed.
***Test Failed*** 1 failures.

If we remove the verbose=True argument the output will be more concise.

Doctest is very simple to use and suits well for writing simple test cases. For more complicated test cases Python provides a full featured unit testing framework unittest.

Unittest

The unittest framework looks and works similar to the unit testing frameworks in other languages. It allows for more complex testing scenarios than doctest, but also requires to write more code.

The following code snippet contains a test case for the add() function. A test case is created by subclassing unittest.TestCase. A test case contains one ore more tests that are implemented with methods whose names start with test. The tests use assert methods to check for an expected result.

import unittest

class TestNotebook(unittest.TestCase):

    def test_add(self):
        self.assertEqual(add(2, 2), 5)

To test a notebook one would write a number of cells with test cases. The very last cell will include the following line of code, which will run all test cases as soon as the cell is executed.

unittest.main(argv=[''], verbosity=2, exit=False)

Running the test case for our example will produce an output similar to this:

test_add (__main__.TestNotebook) ... FAIL

======================================================================
FAIL: test_add (__main__.TestNotebook)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-3-61dc57f4e00b>", line 6, in test_add
    self.assertEqual(add(2, 2), 4)
AssertionError: 2 != 4

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (failures=1)

We need the argv=[''] argument, because we run the tests from a notebook and not form a command line. exit=False argument prevents unittest from shutting down the notebook kernel. verbosity adjust the verbosity of the output (higher values = more verbose output).

Debugging a Failed Test

If a test fails it is often useful to halt the test case execution at some point and run a debugger to inspect the state of the program to find clues about a possible bug. For this, insert the following code just before the line at which you want the execution to halt:

import pdb; pdb.set_trace()

For example:

def add(a, b):
    """Return the sum of a and b."""
    sum = a
    import pdb; pdb.set_trace()
    return sum

For this example, the next time you run the test, the execution will halt just before the return statement and the Python debugger (pdb) will start. You will get a pdb prompt directly in the notebook (as shown in the figure), which will allow you to inspect the values of variables, step over lines, etc.

Python debugger

Summary

The standard Python testing modules can be easily used to write tests for Jupyter notebooks. Doctest is very simple to use and should be used for simple test cases, while unittest should be used for more complex testing scenarios. In combination with the Python debugger these testing modules will help you keep your notebooks bug free.

To start experimenting with the techniques that I have just described you can use this Jupyter notebook.

Written on January 23, 2018