How I Python

Describing my Python dev setup (virtualenvs, etc.) as someone who has had to work with virtualenv/pip and conda side-by-side

Published at: 2024-01-04

tl;dr

Use conda/mamba as a pyenv replacement to manage Python interpreters
Create isolated Python interpreters (with a mamba env) as needed (per-project, etc.)
Continue to use pip and pypi.org like you normally would
You now have the option to break out into the more poweful conda/mamba dependency management to leapfrog a pip install issue (like a failing Rust PEP517 compile step or something)

I've never worked as a Python developer. In my personal projects on GitHub, I just copy project structures and build files from what I see in other GitHub projects until I incidentally learn of something better or have a specific need. I first read Hitchhiker's Guide Structuring Your Project in 2018; up until then I would generally struggle to get productive with Python, messing up import paths, lacking __init__.pys, etc.

However, as a devops person, I had to dive deep into the world of Python packaging tooling when I worked on the NVIDIA RAPIDS data science libraries.

RAPIDS are a collection of Python libraries whose core logic are implemented in C++/CUDA for GPU acceleration, and the Python layer is the high-level API. This is similar to many high-performance math/scientific computing landscape of Python with C/C++ code: PyTorch, Tensorflow, NumPy, etc.

Rule 1: generally don't use your OS Python for non-OS things

When you install a standard Linux distribution, it generally has Python installed to be able to use and run some default and extra packages (either core parts of the Linux distribution or user applications) written in Python.

In general, using this system interpreter is discouraged for your own personal development tools or developing a specific project. The main risk is that if you fuck up your system's Python by putting it in a bad state (installing a bad package, etc.), you'll be interfering with your operating system's natural operation.

My older solution: virtualenvs

The typical solution, and one I used to use, is to use virtualenvs. I would create a per-project virtualenv in ~/venvs, and I would simply do ~/venvs/<currentproject>/bin/active when working on a project.

Conda and Mamba

While working on RAPIDS, I had to get better at conda. Conda (and the faster frontend, mamba, which I personally use) are "tools for package and environment management" - to me, they are like virtualenvs on steroids, since their most common use is to allow you to install Python packages and their C/C++ dependency libraries isolated from your system.

In a conda environment, you start with a Python version that is installed into the environment:

(system) sevagh@pop-os:~$ mamba create \
    --name demo python=3.12
...
Looking for: ['python=3.12']

conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
Transaction

  Prefix: /home/sevagh/mambaforge/envs/demo

  Updating specs:

   - python=3.12


  Package                Version  Build               Channel                    Size
───────────────────────────────────────────────────────────────────────────────────────
  Install:
───────────────────────────────────────────────────────────────────────────────────────
  ... 
  + pip                   23.3.2  pyhd8ed1ab_0        conda-forge/noarch       Cached
  + python                3.12.1  hab00c5b_1_cpython  conda-forge/linux-64       32MB
  ...

conda/mamba function as a way to shield your OS Python from the per-project Python

However, the conda environment is more powerful than the virtualenv environment. The virtualenv is at the mercy of the C/C++ libraries installed at the OS level. That's why wheels that need their own versions of C/C++ libraries have to include copies:

$ pip install --upgrade numpy
Requirement already satisfied: numpy in /home/sevagh/mambaforge/envs/system/lib/python3.11/site-packages (1.26.2)
Collecting numpy
  Downloading numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 23.2 MB/s eta 0:00:00
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.2
    Uninstalling numpy-1.26.2:
      Successfully uninstalled numpy-1.26.2
Successfully installed numpy-1.26.3

Check out all the system libraries it shipped with itself:

$ fd '.*\.so' /home/sevagh/mambaforge/envs/system/lib/python3.11/site-packages/numpy*
core/_multiarray_tests.cpython-311-x86_64-linux-gnu.so
core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
core/_operand_flag_tests.cpython-311-x86_64-linux-gnu.so
core/_rational_tests.cpython-311-x86_64-linux-gnu.so
core/_simd.cpython-311-x86_64-linux-gnu.so
core/_struct_ufunc_tests.cpython-311-x86_64-linux-gnu.so
core/_umath_tests.cpython-311-x86_64-linux-gnu.so
fft/_pocketfft_internal.cpython-311-x86_64-linux-gnu.so
linalg/_umath_linalg.cpython-311-x86_64-linux-gnu.so
linalg/lapack_lite.cpython-311-x86_64-linux-gnu.so
random/_bounded_integers.cpython-311-x86_64-linux-gnu.so
random/_common.cpython-311-x86_64-linux-gnu.so
random/_generator.cpython-311-x86_64-linux-gnu.so
random/_mt19937.cpython-311-x86_64-linux-gnu.so
random/_pcg64.cpython-311-x86_64-linux-gnu.so
random/_philox.cpython-311-x86_64-linux-gnu.so
random/_sfc64.cpython-311-x86_64-linux-gnu.so
random/bit_generator.cpython-311-x86_64-linux-gnu.so
random/mtrand.cpython-311-x86_64-linux-gnu.so
libs/libgfortran-040039e1.so.5.0.0
libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
libs/libquadmath-96973f99.so.0.0.0

Meanwhile If I do mamba install numpy, I get:

Package               Version  Build                Channel                    Size
───────────────────────────────────────────────────────────────────────────────────────
  Install:
───────────────────────────────────────────────────────────────────────────────────────

  + libblas               3.9.0  20_linux64_openblas  conda-forge/linux-64     Cached
  + libcblas              3.9.0  20_linux64_openblas  conda-forge/linux-64     Cached
  + libgfortran-ng       13.2.0  h69a702a_0           conda-forge/linux-64       23kB
  + libgfortran5         13.2.0  ha4646dd_0           conda-forge/linux-64        1MB
  + liblapack             3.9.0  20_linux64_openblas  conda-forge/linux-64     Cached
  + libopenblas          0.3.25  pthreads_h413a1c8_0  conda-forge/linux-64     Cached
  + libstdcxx-ng         13.2.0  h7e041cc_3           conda-forge/linux-64     Cached
  + numpy                1.26.3  py311h64a7726_0      conda-forge/linux-64        8MB
  + python_abi             3.11  4_cp311              conda-forge/linux-64     Cached

  Upgrade:
───────────────────────────────────────────────────────────────────────────────────────

  - ca-certificates   2022.12.7  ha878542_0           conda-forge
  + ca-certificates  2023.11.17  hbcca054_0           conda-forge/linux-64     Cached
  - openssl               3.1.0  h0b41bf4_0           conda-forge
  + openssl               3.2.0  hd590300_1           conda-forge/linux-64     Cached

Instead of bundling those in the numpy wheel, the appropriate libraries are installed into the environment, then the (much smaller) numpy Python package is installed that uses those libraries from the mamba environment.

Managing independent Python interpreters with conda/mamba

Instead of the popular pyenv tool, I prefer to use conda.

Remember: mamba is a faster drop-in replacement for conda

When I first install my OS (Pop!_OS 22.04 NVIDIA driver edition is my daily driver), I install mamba and a few settings from my dotfiles:

$ cat bash/.bashrc
...

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/sevagh/mambaforge/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/sevagh/mambaforge/etc/profile.d/conda.sh" ]; then
        . "/home/sevagh/mambaforge/etc/profile.d/conda.sh"
    else
        export PATH="/home/sevagh/mambaforge/bin:$PATH"
    fi
fi
unset __conda_setup

if [ -f "/home/sevagh/mambaforge/etc/profile.d/mamba.sh" ]; then
    . "/home/sevagh/mambaforge/etc/profile.d/mamba.sh"
fi
# <<< conda initialize <<<

mamba activate system

My condarc file:

channels:
  - conda-forge
#auto_activate_base: false
#changeps1: false

I start by creating a named environment called system and have that as the default activated env in my bashrc file. This is for my personal dev tools: things like yt-dlp, pympress, grip. Mamba by default has a base environment that I don't care for, which is why I use my own system environment as a catchall/"daily driver" Python.

When I open a new terminal instance, I always have (system) printed my mamba so I'm aware that:

mamba is active and working

The current version of Python, pip, and everything else Python-related is pointing to the mamba copies:

(system) sevagh@pop-os:~$ which python
/home/sevagh/mambaforge/envs/system/bin/python
(system) sevagh@pop-os:~$ which pip
/home/sevagh/mambaforge/envs/system/bin/pip

Working on projects

For projects, I will create a new conda/mamba env, but I won't actually use conda or mamba for dependencies. I will prefer requirements.txt files and pyproject.toml these days. The best thing though is that I don't need a virtualenv, since I'm in the isolated Python interpreter created for that project.

This gives me a choice of breaking out into the more complicated (but more powerful) conda/mamba package distribution, but starting off with the slightly easier pip tools.