[CDAT tutorial] Installation

This is the start of a CDAT tutorial series. We will be installing a "lite" version of the CDAT in Linux via conda.

In a previous post I introduced CDAT as a Python package for the manipulation of netCDF data, and made some comparisons with netcdf4. In my own opinion, implementations like CDAT, Iris and Xarray stick to the netCDF philosophy better than netcdf4, in that they permit meta-data preserving operations. Compared with Xarray, CDAT‘s array manipulation API is closer to native numpy.

In this post, and a number of posts to follow, I’d like to share some experiences using CDAT. I’ll start from installation, some basic file reading and saving, then move onto array manipulation, meta-data maintenance, etc., before covering some more advanced topics. So stay tuned if you are interested in learning this power tool.

Platform

CDAT is supported in Linux or MacOS. If you can setup a WSL ("Windows Subsystem for Linux"), it can be used in Windows 10 as well (although technically it is still running inside Linux). Of cause you can create a Linux virtual machine inside Windows to achieve the similar effects but at a greater cost of system resources. The CDAT team provides some very detailed instructions on how to setup a WSL for CDAT installation.

In this post I will only show the installation in Linux. The process in MacOS is the same. Aside from the setting-up of WSL or a virtual machine, the subsequent installation in Windows is also the same.

Install CDAT in Linux

Installation of CDAT has become way easier than before with the help of conda. When I first learned CDAT about 8 years ago, I had to compile all the dependencies myself, then configure and compile CDAT from source code.

This wiki page gives full instructions on the installation of Anaconda or Miniconda (a lighter-weight version of the former), and the installation of CDAT via conda. I’ll only show the steps to install the "lite" version of CDAT, which contains only the core data array manipulation modules, leaving out the virtualization-related modules. For plot creation we will be using the matplotlib+Cartopy combination.

Step 1: Install Anaconda

Go to the download page of Anaconda, download the installer for Linux (labeled as "64-Bit (x86) Installer" in the screenshot below):

Figure 1. Anaconda installer download page.

Once finished, you will get a file with a name like Anaconda3-2020.02-Linux-x86_64.sh. Navigate to the folder storing this file in a terminal, then run:

bash ./Anaconda3-2020.02-Linux-x86_64.sh

This will launch the command-line installer. Just follow the instructions given within. Specifically, when asked about an installation location (see Figure 2 below), choose the default (which is the anaconda3 folder in your HOME folder).

Figure 2. Choose the installation location in the Anaconda installer.

After seeing installation finished being printed out to the terminal, it will prompt you to decide

Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]

Type yes and ENTER.

Remember to close the terminal window and open a new one to finalize the anaconda installation.

Step 2: Create a new conda environment

A Python environment is an isolated virtual space in the system. Inside the environment one can install Python packages that are isolated from those in other environments, so that different versions of the same packages, or even different versions of Python, can co-exist in the same system. This way, one can create different environments, with different packages, for different workflows or tasks.

It is advised to always build up your Python working environment from an empty one, so that the chance of package conflicts can be minimized, and if anything goes wrong, it is easier to destroy the compromised environment and start afresh.

To create a new environment (or env for short) for CDAT installation:

conda create -n cdat python=3

The -n cdat part specifies the name of the environment, you can choose a different name if you like. In the remaining part of the command you can specify what packages to install upon env creation. In this case we only specify the Python version to be 3. You can also be more specific about the version by using, for instance, python=3.7.

When finished, activate the environment:

conda activate cdat

[As a side note, in Windows, you will be using activate cdat instead to activate an env. Since I’m assuming you are doing this in WSL or a Linux virtual machine, the activation command would be conda activate.]

After that the shell prompt will be prepended with the env name, like:

(cdat):$

Another side note: if you use a specific environment frequently, it may be worth adding the activation command to the .bashrc file, so that the env is activated in every new terminal session.

Step 3: Install packages

Make sure you are already in the newly created cdat environment. Then we’ll install the 2 core module of CDATcdms2 and cdutil, plus a few extra numerical (scipy and pandas) and plotting packages (matplotlib and cartopy) that one is almost certainly gonna use sooner or later. You can add your own choices to the list, or install them separately when in need.

The installation command is:

conda install -c conda-forge scipy pandas matplotlib cartopy cdms2 cdutil

Then validate the installation:

python -c "import cdms2, MV2, cdtime"

If nothing prints out, installation is done.

Leave a Reply