{ "metadata": {}, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
check_call
and check_output
and when to use each of these.\n- Read it's output.\n\n**Time Estimation: 45M**\nSometimes you need to run other tools in Python, like maybe you want to\nHere we’ll give a quick tutorial on how to read and write files within Python.
\n\n\nAgenda\nIn this tutorial, we will cover:
\n\n
\n- Subprocesses
\n
Programs can run other programs, and in Python we do this via the subprocess
module. It lets you run any other command on the system, just like you could at the terminal.
The first step is importing the module
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-1", "source": [ "import subprocess" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "" ], "id": "" } } }, { "id": "cell-2", "source": "You’ll primarily use two functions:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-3", "source": [ "help(subprocess.check_call)" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "" ], "id": "" } } }, { "id": "cell-4", "source": "Which executes a command and checks if it was successful (or it raises an exception), and
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-5", "source": [ "help(subprocess.check_output)" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "" ], "id": "" } } }, { "id": "cell-6", "source": "Which executes a command returns the output of that command. This is really useful if you’re running a subprocess that writes something to stdout, like a report you need to parse. We’ll learn how to use these by running two gene callers, augustus and glimmer. You can install both from Conda if you do not have them already.
\nconda create -n subprocess augustus glimmer3\n
Additionally you’ll need two files, you generally should not do this, but you can use a subprocess to download the file! We’ll use subprocess.check_call
for this which simply executes the program, and continues on. If there is an error in the execution, it will raise an exception and stop execution.
\n\n\n\nwget https://ftp.ncbi.nlm.nih.gov/.... -O \"Escherichia virus T4.fna.gz\"\ngzip -d \"Escherichia virus T4.fna.gz\"\n
The above segment
\nEscherichia virus T4.fna.gz
check_call
with a single argument: a list\nwget
a tool we use to download files-O
indicating the next argument will be the ‘output name’check_call
with a single argument: a list\ngzip
a tool to decompress files\n --d
indicating we want to decompressThis list is especially important. When you run commands on the command line, normally you just type in a really bit of text by yourself. It’s one big string, and you’re responsible for making sure quotation marks appear in the right place. For instance, if you have spaces in your filenames, you have to quote the filename. Python requires you specify a list of arguments, and then handles the quoting for you! Which, honestly, is easier and safer.
\n\n\n\n\nCode In: Terminal\nHere we manually quote the argument
\n\nglimmer3 \"bow genome.txt\"\n
\n\nCode Out: Python\nHere python handles that for us
\n\nsubprocess.check_call(['glimmer3', 'bow genome.txt'])\n
\n\n\nThis is one of the major reasons we don’t use
\nos.system
or older Python interfaces for running commands.\nIf you’re processing files, and a user supplies a file with a space, if your program isn’t expecting that space in that filename, then it could do something dangerous!\nLike exploit your system!So, always use
\nsubprocess
if you run to commands, never any other module, despite what you see on the internet!
There are more functions in the module, but the vast majority of the time, those are sufficient.
\n\n\n\n\naugustus --species=E_coli_K12 'Escherichia virus T4.fna' --gff3=on\n
If you’re using subprocess.check_output()
python doesn’t return plain text str
to you, instead it returns a bytes
object. We can decode that into text with .decode('utf-8')
, a phrase you should memorise as going next to check_output()
, for 99% of use cases.
Let’s look at the results!
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-11", "source": [ "print(gff3[0:20])" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "" ], "id": "" } } }, { "id": "cell-12", "source": "It’s a lot of comment lines, starting with #
. Let’s remove those
And now you’ve got a set of gff3 formatted gene calls! You can use all of your loop processing skills to slice and dice this data into something great!
\nstdin
, stderr
, stdout
All unix processes have three default file handles that are available to them:
\nstdin
, where data is passed to the program via a pipe. E.g. generate-data | my-program
, there the program would read the output of generate-data
from the pipe.stdout
, the default place where things are written. E.g. if you print()
in python, it goes to stdout
. People often redirect stdout
to a file, like my-program > output.txt
to save the output.stderr
, generally if your program produces output on stdout
, you might still want to log messages (errors, % done, etc.) If you write to stdout
, it might get mixed in with the user’s outputs, so we write to stderr
, which also gets printed to the screen, and looks identical as any print statement, but it’s coming from a separate pipe.One of the more complicated cases, however, is when you need pipes.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-15", "source": [ "url = \"https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/721/125/GCF_001721125.1_ASM172112v1/GCF_001721125.1_ASM172112v1_cds_from_genomic.fna.gz\"\n", "cds = 'E. Coli CDSs.fna.gz'\n", "subprocess.check_call(['wget', url, '-O', cds])\n", "subprocess.check_call(['gzip', '-d', cds])" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "" ], "id": "" } } }, { "id": "cell-16", "source": "With subprocesses, you can control the stdin, and stdout of the process by using file handles.
\n\n\n\n\nCode In: Terminal\nHere we pipe a file to a process named
\nbuild-icm
which takes one argument, the output name. It reads sequences from stdin.\ncat seq.fa | build-icm test.icm\n# OR\nbuild-icm test.icm < seq.fa\n
\n\nCode Out: Python\nHere we need to do a bit more.
\n\n
\n- Open a file handle
\n- Pass that file handle to
\ncheck_call
orcheck_output
. This determines where stdin comes from.\n\nwith open('seq.fa', 'r') as handle:\n subprocess.check_call(['build-icm', 'test.icm'], stdin=handle)\n
We’ll do that now:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "with open('E. Coli CDSs.fna', 'r') as handle:\n", " subprocess.check_call(['build-icm', 'test.icm'], stdin=handle)" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "" ], "id": "" } } }, { "id": "cell-18", "source": "\n\n\n\nbuild-icm test.icm < 'E. Coli CDSs.fna'\n
Here we build a model, based on the sequences of E. Coli K-12, that Glimmer3 can use.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-19", "source": [ "output = subprocess.check_output([\n", " 'glimmer3', # Our program\n", " 'Escherichia virus T4.fna', # The input genome\n", " 'test.icm', # The model we just built\n", " 't4-genes' # The base name for output files. It'll produce t4-genes.detail and t4-genes.predict.\n", "]).decode('utf-8') # And of course we decode as utf-8\n", "\n", "print(output)" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "" ], "id": "" } } }, { "id": "cell-20", "source": "\n\n\n\nglimmer3 'Escherichia virus T4.fna' test.icm t4-genes\n
What happened here? The output of the program was written to stderr
, not stdout
, so Python may print that out to your screen, but output
will be empty. To solve this common problem we can re-run the program and collect both stdout
and stderr
.
Here we’ve re-directed the stderr
to stdout
and mixed both of them together. This isn’t always what we want, but here the program produces no output, and we can do that safely, and now we can parse it or do any other computations we need with it! Our Glimmer3 gene calls are in t4-genes.detail
and t4-genes.predict
if we want to open and process those as well.