Working remotely on Google Colaboratory

  • Google Colab (short for Colaboratory) is basically a combination of Jupyter notebook and Google Drive.
  • Colab is Google’s flavor of Jupyter notebooks that is particularly suited for machine learning and data analysis.
  • Colab is free and runs entirely in the cloud and comes preinstalled with many packages (e.g. PyTorch and Tensorflow) so everyone has access to the same dependencies. Even cooler is the fact that Colab benefits from free access to hardware accelerators like GPUs (K80, P100) and TPUs which will be particularly useful for assignments 2 and 3.

Requirements

  • To use Colab, you must have a Google account with an associated Google Drive. Assuming you have both, you can connect Colab to your Drive with the following steps:
  1. Click the wheel in the top right corner and select Settings.
  2. Click on the Manage Apps tab.
  3. At the top, select Connect more apps which should bring up a GSuite Marketplace window.
  4. Search for Colab then click Add.

Workflow

  • You can start a new Colab notebook or upload existing one with any starter code to Google Drive and mount the drive onto your notebook, to begin work.
  • Once you’re done with your work, you can save your progress back to Drive.

Best practices

  • There are a few things you should be aware of when working with Colab. The first thing to note is that resources aren’t guaranteed (this is the price for being free).
  • If you are idle for a certain amount of time or your total connection time exceeds the maximum allowed time (~12 hours), the Colab VM will disconnect. This means any unsaved progress will be lost!
  • Thus, get into the habit of frequently saving your code whilst working on a project.
  • To read more about resource limitations in Colab, read their FAQ here.

Using a GPU

  • Using a GPU is as simple as switching the runtime in Colab.
  • Specifically, click Runtime -> Change runtime type -> Hardware Accelerator -> GPU and your Colab instance will automatically be backed by GPU compute. Similarly, you can also access TPU instances.

Resources

Working locally on your machine

  • If you already own GPU-powered hardware and prefer to work locally, you should use a virtual environment.
  • You can install one via Anaconda (recommended) or via Python’s native venv module. Ensure you are using a recent release of Python, preferably the latest (steps below).

Installing Python 3

  • macOS:
    • To get the latest version of Python on your local machine, head over to the downloads page on python.org.
      • Alternatively, on macOS, you can install the latest release using Homebrew with brew install python3.
    • If you’re looking to play safe, and only want to use the latest Python release that has been tested by Apple to gel well with your macOS, just upgrade to the latest macOS to get the latest “official” supported Python release that ships with the macOS release.
  • Windows:
    • To get the latest version of Python on your local machine, head over to the downloads page on python.org.
  • Ubuntu:
    • You can find instructions here.

Virtual environments

Anaconda

  • We strongly recommend using the free Anaconda Python distribution, which provides an easy way for you to handle package dependencies.
  • Please be sure to download the Python 3 Anaconda version, which currently installs Python 3.7.
  • The neat thing about Anaconda is that it ships with MKL optimizations by default, which means your numpy and scipy code benefit from significant speed-ups without having to carry out any code changes.
  • Once you have Anaconda installed, it makes sense to create a virtual environment so you can keep Python library versions specific to your project fully contained within a “sandbox”.
  • If you choose not to use a virtual environment (strongly not recommended!), it is up to you to make sure that all dependencies for the code are installed globally on your machine.
  • To set up a virtual environment called myEnv, run the following in your terminal:
# this will create an anaconda environment
# called myEnv in 'path/to/anaconda3/envs/'
conda create -n myEnv python=3.7

To activate and enter the environment, run conda activate myEnv.

# sanity check that the path to the python
# binary matches that of the anaconda env
# after you activate it
which python
# for example, on my machine, this prints
# $ '/Users/kevin/anaconda3/envs/sci/bin/python'
  • To deactivate the environment, either run conda deactivate myEnv or simply exit the terminal.
  • Remember to re-run conda activate myEnv every time you wish to return to the environment.

  • You may refer to Conda’s documentation on managing environments for more detailed instructions on managing virtual environments with Anaconda.

Note: If you’ve chosen to go the Anaconda route, you can safely skip the next section and move straight to installing packages/dependencies.

venv

Python 3.3+
  • As of version 3.3, Python natively ships with a lightweight virtual environment module called venv. Each virtual environment packages its own independent set of installed Python packages that are isolated from system-wide Python packages and runs a Python version that matches that of the binary that was used to create it.
  • To set up a virtual environment called myEnv:
# this will create a virtual environment
# called myEnv in your home directory
python -m venv ~/myEnv
  • To activate the virtual environment, run source ~/myEnv/bin/activate.

  • As a sanity check, ensure that the path to the Python binary matches that of the virtualenv after you activate it using:

which python # this should print: '/Users/<yourUser>/myEnv/bin/python'
  • Run deactivate if you want to deactivate the virtual environment or simply exit the terminal.
  • Remember to re-run source ~/myEnv/bin/activate every time when you wish to return to the environment.
Older Python releases
  • Older Python releases do not ship with virtualenv, so you’ll need to install virtualenv first.
  • Install virtualenv using sudo pip install virtualenv (or pip install --user virtualenv if you don’t have sudo) in your terminal.
  • Next, to create a virtual environment named myEnv:
virtualenv -p python3 myEnv
source myEnv/bin/activate

requirements.txt

  • If you’ve browsed Python projects on Github or elsewhere, you’ve probably noticed a file called requirements.txt. This file is used for specifying what python packages (and their corresponding versions) are required to run the project. Typically requirements.txt is located in the root directory of your project.
  • If you open a requirements.txt file, you’ll see something similar to this:
pyOpenSSL==0.13.1
pyparsing==2.0.1
python-dateutil==1.5
pytz==2013.7
scipy==0.13.0b1
six==1.4.1
virtualenv==16.3.0
  • Notice that we have a line for each package along with a version number. This is important because as you start developing your python applications, you will develop the application with specific versions of the packages in mind.
  • However, later on, the package maintainer might make changes which can potentially break your application! To keep track of every downstream package change is virtually impossible, especially if what you have is a large project. So you want to keep track of what version of each package you’re using to prevent unexpected changes.
  • To generate a requirements.txt file for your project which contains a list of every package that is installed in your virtual environment for your project, run pip freeze. Note that you can also run pip freeze outside of your virtual environment to get a list of packages installed on your “broader” system-wide Python setup (a.k.a. your “site packages”).

Installing packages/dependencies

  • Once you’ve setup and activated your virtual environment (via conda or venv), you should load your project’s dependencies using pip and requirements.txt using:
# again, ensure your virtual env (either conda or venv)
# has been activated before running the commands below
cd myProject  # cd to the project directory

# install assignment dependencies.
# since the virtual env is activated,
# this pip is associated with the
# python binary of the environment
pip install -r requirements.txt

Jupyter Notebooks

  • A Jupyter notebook lets you write and execute Python code locally in your web browser. Jupyter notebooks make it very easy to tinker with code and execute it in bits and pieces; for this reason they are widely used in scientific computing.
  • To install Jupyter notebook:
pip install notebook
  • If you wish to launch a notebook locally with Jupyter, make sure your virtual environment is setup correctly (per the instructions in the virtual environments section) and activated.
  • Next, from your directory that holds the notebook, run jupyter notebook.
  • This should automatically launch a notebook server at http://localhost:8888.
  • If everything worked correctly, you should see a screen like the one shown below, showing all available notebooks in the current directory.

References