{ "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
venv
and pip
.\n- Run our software from the command line.\n\n**Time Estimation: 30M**\n“Virtual Environments” allow you to easily manage your installed Python packages and prevent conflicts between different project’s dependencies. In general most modern projects should use conda
for dependency management, but venv
can be convenient for Python-only projects.
\n\nComment\nThis tutorial is significantly based on the Carpentries lesson “Intermediate Research Software Development”.
\n
If you have a python project you are using, you will often see something like\nfollowing two lines somewhere at the top.
\nfrom matplotlib import pyplot as plt\nimport numpy as np\n
This means that our code requires two external libraries (also called third-party packages or dependencies) -\nnumpy
and matplotlib
.\nPython applications often use external libraries that don’t come as part of the standard Python distribution. This means\nthat you will have to use a package manager tool to install them on your system.\nApplications will also sometimes need a\nspecific version of an external library (e.g. because they require that a particular\nbug has been fixed in a newer version of the library), or a specific version of Python interpreter.\nThis means that each Python application you work with may require a different setup and a set of dependencies so it\nis important to be able to keep these configurations separate to avoid confusion between projects.\nThe solution for this problem is to create a self-contained virtual\nenvironment per project, which contains a particular version of Python installation plus a number of\nadditional external libraries.
Virtual environments are not just a feature of Python - all modern programming languages use them to isolate code\nof a specific project and make it easier to develop, run, test and share code with others. In this tutorial, we learn how\nto set up a virtual environment to develop our code and manage our external dependencies.
\n\n\nAgenda\nIn this tutorial, we will cover:
\n\n
So what exactly are virtual environments, and why use them?
\nA Python virtual environment is an isolated working copy of a specific version of\nPython interpreter together with specific versions of a number of external libraries installed into that\nvirtual environment. A virtual environment is simply a directory with a particular\nstructure which includes links to and enables multiple side-by-side installations of\ndifferent Python interpreters or different versions of the same external library to coexist on your machine and only one to be selected for each of our projects. This allows you to work on a particular\nproject without worrying about affecting other projects on your machine.
\nAs more external libraries are added to your Python project over time, you can add them to\nits specific virtual environment and avoid a great deal of confusion by having separate (smaller) virtual environments\nfor each project rather than one huge global environment with potential package version clashes. Another big motivator\nfor using virtual environments is that they make sharing your code with others much easier (as we will see shortly).\nHere are some typical scenarios where the usage of virtual environments is highly recommended (almost unavoidable):
\nYou do not have to worry too much about specific versions of external libraries that your project depends on most of the time.\nVirtual environments enable you to always use the latest available version without specifying it explicitly.\nThey also enable you to use a specific older version of a package for your project, should you need to.
\n\n\n\nNote that you will not have a separate Python or package installations for each of your projects - they will only\never be installed once on your system but will be referenced from different virtual environments.
\n
There are several commonly used command line tools for managing Python virtual environments:
\nvenv
, available by default from the standard Python
distribution from Python 3.3+
virtualenv
, needs to be installed separately but supports both Python 2.7+
and Python 3.3+
versionspipenv
, created to fix certain shortcomings of virtualenv
conda
, package and environment management system (also included as part of the Anaconda Python distribution often used by the scientific community)poetry
, a modern Python packaging tool which handles virtual environments automaticallyWhile there are pros and cons for using each of the above, all will do the job of managing Python\nvirtual environments for you and it may be a matter of personal preference which one you go for.\nIn this course, we will use venv
to create and manage our\nvirtual environment (which is the preferred way for Python 3.3+).
Until you encounter the needs of a project which goes beyond what is available\nin the Python ecosystem, e.g. when you depend on external packages like htslib\nor bioinformatics tools that are simply not distributed as part of PyPI, then\nvenv
is a good choice to get started with.
Part of managing your (virtual) working environment involves installing, updating and removing external packages\non your system. The Python package manager tool pip
is most commonly used for this - it interacts\n and obtains the packages from the central repository called Python Package Index (PyPI).\npip
can now be used with all Python distributions (including Anaconda).
\n\n\nAnaconda is an open source Python\ndistribution commonly used for scientific programming - it conveniently installs Python, package and environment management
\nconda
, and a\nnumber of commonly used scientific computing packages so you do not have to obtain them separately.\nconda
is an independent command line tool (available separately from the Anaconda distribution too) with dual functionality: (1) it is a package manager that helps you find Python packages from\nremote package repositories and install them on your system, and (2) it is also a virtual environment manager. So, you can useconda
for both tasks instead of usingvenv
andpip
.
Installing and managing Python distributions, external libraries and virtual environments is, well,\ncomplex. There is an abundance of tools for each task, each with its advantages and disadvantages, and there are different\nways to achieve the same effect (and even different ways to install the same tool!).\nNote that each Python distribution comes with its own version of\npip
- and if you have several Python versions installed you have to be extra careful to use the correct pip
to\nmanage external packages for that Python version.
venv
and pip
are considered the de facto standards for virtual environment and package management for Python 3.\nHowever, the advantages of using Anaconda and conda
are that you get (most of the) packages needed for\nscientific code development included with the distribution. If you are only collaborating with others who are also using\nAnaconda, you may find that conda
satisfies all your needs. It is good, however, to be aware of all these tools,\nand use them accordingly. As you become more familiar with them you will realise that equivalent tools work in a similar\nway even though the command syntax may be different (and that there are equivalent tools for other programming languages\ntoo to which your knowledge can be ported).
Let us have a look at how we can create and manage virtual environments from the command line using venv
and manage packages using pip
.
venv
EnvironmentCreating a virtual environment with venv
is done by executing the following command:
where /path/to/new/virtual/environment
is a path to a directory where you want to place it - conventionally within\nyour software project so they are co-located.\nThis will create the target directory for the virtual environment (and any parent directories that don’t exist already).
For our project, let’s create a virtual environment called venv
off the project root:
If you list the contents of the newly created venv
directory, on a Mac or Linux system\n(slightly different on Windows as explained below) you should see something like:
So, running the python3 -m venv venv
command created the target directory called venv
\ncontaining:
pyvenv.cfg
configuration file with a home key pointing to the Python installation from which the command was run,bin
subdirectory (called Scripts
on Windows) containing a symlink of the Python interpreter binary used to create the\nenvironment and the standard Python library,lib/pythonX.Y/site-packages
subdirectory (called Lib\\site-packages
on Windows) to contain its own independent set of installed Python packages isolated from other projects,\n\n\nWhat is a good name to use for a virtual environment? Using “venv” or “.venv” as the\nname for an environment and storing it within the project’s directory seems to be the recommended way -\nthis way when you come across such a subdirectory within a software project,\nby convention you know it contains its virtual environment details.\nA slight downside is that all different virtual environments\non your machine then use the same name and the current one is determined by the context of the path\nyou are currently located in. A (non-conventional) alternative is to\nuse your project name for the name of the virtual environment, with the downside that there is nothing to indicate\nthat such a directory contains a virtual environment. In our case, we have settled to use the name “venv” since it is\nnot a hidden directory and we want it to be displayed by the command line when listing directory contents (hence,\nno need for the “.” in its name that would, by convention, make it hidden). In the future,\nyou will decide what naming convention works best for you. Here are some references for each of the naming conventions:
\n\n
\n- The Hitchhiker’s Guide to Python notes that “venv” is the general convention used globally
\n- The Python Documentation indicates that “.venv” is common
\n- “venv” vs “.venv” discussion
\n
Once you’ve created a virtual environment, you will need to activate it:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-7", "source": [ "source venv/bin/activate" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ ">Activating the virtual environment will change your command line’s prompt to show what virtual environment\nyou are currently using (indicated by its name in round brackets at the start of the prompt),\nand modify the environment so that running Python will get you the particular\nversion of Python configured in your virtual environment.
\nYou can verify you are using your virtual environment’s version of Python by checking the path using which
:
When you’re done working on your project, you can exit the environment with:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-11", "source": [ "deactivate" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ ">If you’ve just done the deactivate
, ensure you reactivate the environment ready for the next part:
\n\n\nWithin a virtual environment, commands
\npython
andpip
will refer to the version of Python you created the environment with. If you create a virtual environment withpython3 -m venv venv
,python
will refer topython3
andpip
will refer topip3
.On some machines with Python 2 installed,
\npython
command may refer to the copy of Python 2 installed outside of the virtual environment instead, which can cause confusion. You can always check which version of Python you are using in your virtual environment with the commandwhich python
to be absolutely sure. We continue usingpython3
andpip3
in this material to avoid confusion for those users, but commandspython
andpip
may work for you as expected.
pip
We noticed earlier that our code depends on two external libraries - numpy
and matplotlib
. In order\nfor the code to run on your machine, you need to\ninstall these two dependencies into your virtual environment.
To install the latest version of a package with pip
you use pip’s install
command and specify the package’s name, e.g.:
or like this to install multiple packages at once for short:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "pip3 install numpy matplotlib" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ ">\n\n\nWhy are we not using
\npip
as an argument topython3
command, in the same way we did withvenv
\n(i.e.python3 -m venv
)?python3 -m pip install
should be used according to the\nofficial Pip documentation; other official documentation\nstill seems to have a mixture of usages. Core Python developer Brett Cannon offers a\nmore detailed explanation of edge cases when the two options may produce\ndifferent results and recommendspython3 -m pip install
. We kept the old-style command (pip3 install
) as it seems more\nprevalent among developers at the moment - but it may be a convention that will soon change and certainly something you should consider.
If you run the pip3 install
command on a package that is already installed, pip
will notice this and do nothing.
To install a specific version of a Python package give the package name followed by ==
and the version number, e.g.\npip3 install numpy==1.21.1
.
To specify a minimum version of a Python package, you can\ndo pip3 install numpy>=1.20
.
To upgrade a package to the latest version, e.g. pip3 install --upgrade numpy
.
To display information about a particular installed package do:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-19", "source": [ "pip3 show numpy" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ ">To list all packages installed with pip
(in your current virtual environment):
To uninstall a package installed in the virtual environment do: pip3 uninstall package-name
.\nYou can also supply a list of packages to uninstall at the same time.
pip
You are collaborating on a project with a team so, naturally, you will want to share your environment with your\ncollaborators so they can easily ‘clone’ your software project with all of its dependencies and everyone\ncan replicate equivalent virtual environments on their machines. pip
has a handy way of exporting,\nsaving and sharing virtual environments.
To export your active environment - use pip freeze
command to\nproduce a list of packages installed in the virtual environment.\nA common convention is to put this list in a requirements.txt
file:
The first of the above commands will create a requirements.txt
file in your current directory.\nThe requirements.txt
file can then be committed to a version control system and\nget shipped as part of your software and shared with collaborators and/or users. They can then replicate your environment and\ninstall all the necessary packages from the project root as follows:
As your project grows - you may need to update your environment for a variety of reasons. For example, one of your project’s dependencies has\njust released a new version (dependency version number update), you need an additional package for data analysis\n(adding a new dependency) or you have found a better package and no longer need the older package (adding a new and\nremoving an old dependency). What you need to do in this case (apart from installing the new and removing the\npackages that are no longer needed from your virtual environment) is update the contents of the requirements.txt
file\naccordingly by re-issuing pip freeze
command and propagate the updated requirements.txt
file to your collaborators\nvia your code sharing platform (e.g. GitHub).
\n\n\nFor a full list of options and commands, consult the official
\nvenv
documentation\nand the Installing Python Modules withpip
guide. Also check out the guide “Installing packages usingpip
and virtual environments”.
Congratulations! Your environment is now activated and set up to run your script\nfrom the command line.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "cell_type": "markdown", "id": "final-ending-cell", "metadata": { "editable": false, "collapsed": false }, "source": [ "# Key Points\n\n", "- Virtual environments keep Python versions and dependencies required by different projects separate.\n", "- A virtual environment is itself a directory structure.\n", "- Use `venv` to create and manage Python virtual environments.\n", "- Use `pip` to install and manage Python external (third-party) libraries.\n", "- `pip` allows you to declare all dependencies for a project in a separate file (by convention called `requirements.txt`) which can be shared with collaborators/users and used to replicate a virtual environment.\n", "- Use `pip3 freeze > requirements.txt` to take snapshot of your project's dependencies.\n", "- Use `pip3 install -r requirements.txt` to replicate someone else's virtual environment on your machine from the `requirements.txt` file.\n", "\n# Congratulations on successfully completing this tutorial!\n\n", "Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-venv/tutorial.html#feedback) and check there for further resources!\n" ] } ] }