Which Python to use, vanilla Python or Anaconda Python

Python is cross-platform: it works in Windows, Linux and MacOS, there are also apps available in IOS and Android that allow you to run Python on your phone. Limiting our scope to desktop, there are more than one distributions of Python, e.g. the vanilla, official Python, Anaconda Python, Enthought Python. Which one should you choose?

I have little experience with Enthought, so I will only compare vanilla and Anaconda Python, mostly from a numerical computation perspective.

In vanilla Python, one installs packages using pip, the go-to package installer for Python. For instance, pip install numpy gets the numpy package. To setup isolated working environments, common choices are virtualenv (for py2), venv (for py3) and Pipenv.

In Anaconda Python, one gets access to a big collection of the most popular numerical packages (including numpy, scipy, matplotlib, jupyter, hdf5, netcdf4 and scikit-image etc.) with the installation of Anaconda (that’s why the installer has a larger file size). Environment management and package installations are handled using the conda commands in the command line, or the Anaconda Navigator GUI manager. Once you have your own conda environment set up, you can still install packages using pip, and it will install new packages in the isolated conda environments.

It is safe to say that for numerical computation purposes, Anaconda is a clear winner. Besides the large collection (7500+ open source packages) of packages, Anaconda also makes some optimizations to the Python distribution that make the Python code run faster.

A simple test of speed of numpy:

import numpy as np
import time

data = np.random.rand(500,500)
t1 = time.time()
for i in range(1000):
    x = np.linalg.inv(data)

t2 = time.time()
print('t2-t1 = ', t2-t1)

What it does is creating a random 500×500 matrix, and inverting it for 1000 times, and reporting the time used measured in seconds.

Setup 1 is using my system Python (version 3.8, come with the Manjaro Linux) and pip installed numpy. Here is the screenshot of the output:

The top panel executes the system Python (/usr/bin/python3 speed_test.py). The lower panel is the top program, a commandline task manager. Notice only the Cpu2 thread is busy at 99.7%, other threads are pretty much idle. It took about 80 seconds (not shown in this image) to finish the computations.

Here is the screenshot using Anaconda Python:

The which python3 command tells me that the default python3 executable is using the Anaconda one (/home/guangzhi/anaconda3/bin/python3). I ran it twice, because the 1st time I didn’t manage to capture the screenshot of top showing the multi-threaded computation. In the 2nd attempt, it is showing that 4 threads are running at >50 % load, and 4 more at ~ 6%. Both runs finished in less than 4 seconds, a 20 times speed-up compared with my vanilla Python.

I also repeated the same test in Windows, and the results are consistent: vanilla Python only tortures a single thread of my CPU, despite that it has a total of 8. While Anaconda Python can utilize multiple threads and achieve a much greater speed, for free: no single line of code needs to be changed, and it boosts the computation by about 20 times.

It should be noted that the Anaconda Python still has the GIL, the acceleration we observed is related to numpy. I am not 100 percent sure, but my guess is that Ananconda installs the MKL numerical package for you, allowing greater computing efficiency. If the code is written in pure Python, without calling the compiled C code, there is no speed difference between vanilla and Ananconda Python.

One comment

  1. Don’t know about vanilla Python, just switch from Enthought to Anaconda.

Leave a Reply