Testing and Debugging Jupyter Notebooks

Jupyter notebook is a great tool for a data scientist to create and share documents that contain code, visualizations, and text. A combination of the notebook development environment and a reach Python data-science stack allows to start with an idea sketch and develop it to a full featured data-science project. At some point between the sketch and the finished project you may get that unsettling feeling about changing some function or even a single line of code, because you are not sure how this may impact the rest of the code. This is a good moment to invest some time in writing regression tests (if you still have not done that). In this post I will show how to use Python standard testing tools, such as doctest and unittest, to add tests to a Jupyter notebook.

Read More

Datasets in Python

There are many providers of free datasets for data science. Some of them are summarized here and here. These datasets are often provided through an API and are stored in different formats. Getting them into a pandas DataFrame is often an overkill if we just want to quickly try out some machine-learning algorithm or a visualization. In this post, I give an overview of “built-in” datasets that are provided by popular python data science packages, such as statsmodels, scikit-learn, and seaborn. These datasets can be easily accessed in form of a pandas DataFrame and can be used for quick experimenting.

Read More

In 5 Minutes to a Remote MySQL Server

There are many resources on the Internet to learn SQL from scratch or to refresh your knowledge. For example, SQLZOO is a set of interactive lessons in which you learn SQL by writing and running SQL queries against several small databases. But what if you want to try out your skills on your own dataset that may be larger and more complex than that on SQLZOO? Of course you can install on your computer one of the free and open-source database systems, such as MySQL or PostgreSQL. It is not difficult, but wouldn’t it be nice to have a remote database set up and ready for you to experiment with, anytime and everywhere (provided an Internet access)? Read on to learn how to get it for free!

Read More