{
  "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5,
  "cells": [
    {
      "id": "metadata",
      "cell_type": "markdown",
      "source": "<div style=\"border: 2px solid #8A9AD0; margin: 1em 0.2em; padding: 0.5em;\">\n\n# Python - Subprocess\n\nby [Helena Rasche](https://training.galaxyproject.org/hall-of-fame/hexylena/), [Donny Vrins](https://training.galaxyproject.org/hall-of-fame/dirowa/), [Bazante Sanders](https://training.galaxyproject.org/hall-of-fame/bazante1/)\n\nCC-BY licensed content from the [Galaxy Training Network](https://training.galaxyproject.org/)\n\n**Objectives**\n\n- How can I run another program?\n\n**Objectives**\n\n- Run a command in a subprocess.\n- Learn about <code style=\"color: inherit\">check_call</code> and <code style=\"color: inherit\">check_output</code> and when to use each of these.\n- Read it's output.\n\n**Time Estimation: 45M**\n</div>\n",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-0",
      "source": "<p>Sometimes you need to run other tools in Python, like maybe you want to\nHere we’ll give a quick tutorial on how to read and write files within Python.</p>\n<blockquote class=\"agenda\" style=\"border: 2px solid #86D486;display: none; margin: 1em 0.2em\">\n<div class=\"box-title agenda-title\" id=\"agenda\">Agenda</div>\n<p>In this tutorial, we will cover:</p>\n<ol id=\"markdown-toc\">\n<li><a href=\"#subprocesses\" id=\"markdown-toc-subprocesses\">Subprocesses</a></li>\n</ol>\n</blockquote>\n<h1 id=\"subprocesses\">Subprocesses</h1>\n<p>Programs can run other programs, and in Python we do this via the <code style=\"color: inherit\">subprocess</code> module. It lets you run any other command on the system, just like you could at the terminal.</p>\n<p>The first step is importing the module</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-1",
      "source": [
        "import subprocess"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-2",
      "source": "<p>You’ll primarily use two functions:</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-3",
      "source": [
        "help(subprocess.check_call)"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-4",
      "source": "<p>Which executes a command and checks if it was successful (or it raises an exception), and</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-5",
      "source": [
        "help(subprocess.check_output)"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-6",
      "source": "<h1 id=\"check-call-downloading-files\">Check Call: Downloading Files</h1>\n<p>Which executes a command returns the output of that command. This is really useful if you’re running a subprocess that writes something to stdout, like a report you need to parse. We’ll learn how to use these by running two gene callers, augustus and glimmer. You can install both from Conda if you do not have them already.</p>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">conda create -n subprocess augustus glimmer3\n</code></pre></div></div>\n<p>Additionally you’ll need two files, you <em>generally should not do this</em>, but you can use a subprocess to download the file! We’ll use <code style=\"color: inherit\">subprocess.check_call</code> for this which simply executes the program, and continues on. If there is an error in the execution, it will raise an exception and stop execution.</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-7",
      "source": [
        "url = \"https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/836/945/GCF_000836945.1_ViralProj14044/GCF_000836945.1_ViralProj14044_genomic.fna.gz\"\n",
        "genome = 'Escherichia virus T4.fna.gz'\n",
        "subprocess.check_call(['wget', url, '-O', genome])\n",
        "subprocess.check_call(['gzip', '-d', genome])"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-8",
      "source": "<blockquote class=\"tip\" style=\"border: 2px solid #FFE19E; margin: 1em 0.2em\">\n<div class=\"box-title tip-title\" id=\"tip-what-do-these-commands-look-like-on-the-cli\"><button class=\"gtn-boxify-button tip\" type=\"button\" aria-controls=\"tip-what-do-these-commands-look-like-on-the-cli\" aria-expanded=\"true\"><i class=\"far fa-lightbulb\" aria-hidden=\"true\" ></i> <span>Tip: What do these commands look like on the CLI?</span><span class=\"fold-unfold fa fa-minus-square\"></span></button></div>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">wget https://ftp.ncbi.nlm.nih.gov/.... -O \"Escherichia virus T4.fna.gz\"\ngzip -d \"Escherichia virus T4.fna.gz\"\n</code></pre></div>  </div>\n</blockquote>\n<p>The above segment</p>\n<ul>\n<li>sets a url variable</li>\n<li>sets an output filename, <code style=\"color: inherit\">Escherichia virus T4.fna.gz</code></li>\n<li>runs <code style=\"color: inherit\">check_call</code> with a single argument: a list\n<ul>\n<li><code style=\"color: inherit\">wget</code> a tool we use to download files</li>\n<li>the URL</li>\n<li><code style=\"color: inherit\">-O</code> indicating the next argument will be the ‘output name’</li>\n<li>what we want the output filename to be called</li>\n</ul>\n</li>\n<li>runs <code style=\"color: inherit\">check_call</code> with a single argument: a list\n<ul>\n<li><code style=\"color: inherit\">gzip</code> a tool to decompress files\n  -<code style=\"color: inherit\">-d</code> indicating we want to decompress</li>\n<li>and the filename.</li>\n</ul>\n</li>\n</ul>\n<p>This list is especially important. When you run commands on the command line, normally you just type in a really bit of text by yourself. It’s one big string, and you’re responsible for making sure quotation marks appear in the right place. For instance, if you have spaces in your filenames, you have to quote the filename. Python requires you specify a list of arguments, and then handles the quoting for you! Which, honestly, is easier and safer.</p>\n<blockquote class=\"code-2col\">\n<blockquote class=\"code-in\" style=\"border: 2px solid #86D486; margin: 1em 0.2em\">\n<div class=\"box-title code-in-title\" id=\"code-in-terminal\"><i class=\"far fa-keyboard\" aria-hidden=\"true\" ></i> Code In: Terminal</div>\n<p>Here we manually quote the argument</p>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">glimmer3 \"bow genome.txt\"\n</code></pre></div>    </div>\n</blockquote>\n<blockquote class=\"code-out\" style=\"border: 2px solid #fb99d0; margin: 1em 0.2em\">\n<div class=\"box-title code-out-title\" id=\"code-out-python\"><i class=\"fas fa-laptop-code\" aria-hidden=\"true\" ></i> Code Out: Python</div>\n<p>Here python handles that for us</p>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">subprocess.check_call(['glimmer3', 'bow genome.txt'])\n</code></pre></div>    </div>\n</blockquote>\n</blockquote>\n<blockquote class=\"tip\" style=\"border: 2px solid #FFE19E; margin: 1em 0.2em\">\n<div class=\"box-title tip-title\" id=\"tip-exploitation\"><button class=\"gtn-boxify-button tip\" type=\"button\" aria-controls=\"tip-exploitation\" aria-expanded=\"true\"><i class=\"far fa-lightbulb\" aria-hidden=\"true\" ></i> <span>Tip: Exploitation!</span><span class=\"fold-unfold fa fa-minus-square\"></span></button></div>\n<p>This is one of the major reasons we don’t use <code style=\"color: inherit\">os.system</code> or older Python interfaces for running commands.\nIf you’re processing files, and a user supplies a file with a space, if your program isn’t expecting that space in that filename, then it could do something dangerous!\nLike exploit your system!</p>\n<p>So, <strong>always</strong> use <code style=\"color: inherit\">subprocess</code> if you run to commands, never any other module, <em>despite what you see on the internet!</em></p>\n</blockquote>\n<p>There are more functions in the module, but the vast majority of the time, those are sufficient.</p>\n<h1 id=\"check-output-gene-calling-with-augustus\">Check Output: Gene Calling with Augustus</h1>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-9",
      "source": [
        "gff3 = subprocess.check_output([\n",
        "    'augustus', # Our command\n",
        "    '--species=E_coli_K12', # the first argument, the species, we're using a phage so we call genes  based on it's host organism\n",
        "    'Escherichia virus T4.fna', # The path to our genome file, without the .gz because we decompressed it.\n",
        "    '--gff3=on' # We would like gff3 formatted output (it's easier to parse!)\n",
        "])\n",
        "\n",
        "gff3 = gff3.decode('utf-8')\n",
        "gff3 = gff3.split('\\n')"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-10",
      "source": "<blockquote class=\"tip\" style=\"border: 2px solid #FFE19E; margin: 1em 0.2em\">\n<div class=\"box-title tip-title\" id=\"tip-what-does-this-commands-look-like-on-the-cli\"><button class=\"gtn-boxify-button tip\" type=\"button\" aria-controls=\"tip-what-does-this-commands-look-like-on-the-cli\" aria-expanded=\"true\"><i class=\"far fa-lightbulb\" aria-hidden=\"true\" ></i> <span>Tip: What does this commands look like on the CLI?</span><span class=\"fold-unfold fa fa-minus-square\"></span></button></div>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">augustus --species=E_coli_K12 'Escherichia virus T4.fna' --gff3=on\n</code></pre></div>  </div>\n</blockquote>\n<p>If you’re using <code style=\"color: inherit\">subprocess.check_output()</code> python doesn’t return plain text <code style=\"color: inherit\">str</code> to you, instead it returns a <code style=\"color: inherit\">bytes</code> object. We can decode that into text with <code style=\"color: inherit\">.decode('utf-8')</code>, a phrase you should memorise as going next to <code style=\"color: inherit\">check_output()</code>, for 99% of use cases.</p>\n<p>Let’s look at the results!</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-11",
      "source": [
        "print(gff3[0:20])"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-12",
      "source": "<p>It’s a lot of comment lines, starting with <code style=\"color: inherit\">#</code>. Let’s remove those</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-13",
      "source": [
        "cleaned_gff3 = []\n",
        "for line in gff3:\n",
        "    if line.startswith('#'):\n",
        "        continue\n",
        "    cleaned_gff3.append(line)\n",
        "\n",
        "print(cleaned_gff3[0:20])"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-14",
      "source": "<p>And now you’ve got a set of gff3 formatted gene calls! You can use all of your loop processing skills to slice and dice this data into something great!</p>\n<h1 id=\"aside-stdin-stderr-stdout\">Aside: <code style=\"color: inherit\">stdin</code>, <code style=\"color: inherit\">stderr</code>, <code style=\"color: inherit\">stdout</code></h1>\n<p>All unix processes have three default file handles that are available to them:</p>\n<ul>\n<li><code style=\"color: inherit\">stdin</code>, where data is passed to the program via a pipe. E.g. <code style=\"color: inherit\">generate-data | my-program</code>, there the program would read the output of <code style=\"color: inherit\">generate-data</code> from the pipe.</li>\n<li><code style=\"color: inherit\">stdout</code>, the default place where things are written. E.g. if you <code style=\"color: inherit\">print()</code> in python, it goes to <code style=\"color: inherit\">stdout</code>. People often redirect <code style=\"color: inherit\">stdout</code> to a file, like <code style=\"color: inherit\">my-program &gt; output.txt</code> to save the output.</li>\n<li><code style=\"color: inherit\">stderr</code>, <em>generally</em> if your program produces output on <code style=\"color: inherit\">stdout</code>, you might still want to log messages (errors, % done, etc.) If you write to <code style=\"color: inherit\">stdout</code>, it might get mixed in with the user’s outputs, so we write to <code style=\"color: inherit\">stderr</code>, which also gets printed to the screen, and looks identical as any print statement, but it’s coming from a separate pipe.</li>\n</ul>\n<h1 id=\"pipes\">Pipes</h1>\n<p>One of the more complicated cases, however, is when you need pipes.</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-15",
      "source": [
        "url = \"https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/721/125/GCF_001721125.1_ASM172112v1/GCF_001721125.1_ASM172112v1_cds_from_genomic.fna.gz\"\n",
        "cds = 'E. Coli CDSs.fna.gz'\n",
        "subprocess.check_call(['wget', url, '-O', cds])\n",
        "subprocess.check_call(['gzip', '-d', cds])"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-16",
      "source": "<p>With subprocesses, you can control the stdin, and stdout of the process by using file handles.</p>\n<blockquote class=\"code-2col\">\n<blockquote class=\"code-in\" style=\"border: 2px solid #86D486; margin: 1em 0.2em\">\n<div class=\"box-title code-in-title\" id=\"code-in-terminal-1\"><i class=\"far fa-keyboard\" aria-hidden=\"true\" ></i> Code In: Terminal</div>\n<p>Here we pipe a file to a process named <code style=\"color: inherit\">build-icm</code> which takes one argument, the output name. It reads sequences from stdin.</p>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">cat seq.fa | build-icm test.icm\n# OR\nbuild-icm test.icm &lt; seq.fa\n</code></pre></div>    </div>\n</blockquote>\n<blockquote class=\"code-out\" style=\"border: 2px solid #fb99d0; margin: 1em 0.2em\">\n<div class=\"box-title code-out-title\" id=\"code-out-python-1\"><i class=\"fas fa-laptop-code\" aria-hidden=\"true\" ></i> Code Out: Python</div>\n<p>Here we need to do a bit more.</p>\n<ol>\n<li>Open a file handle</li>\n<li>Pass that file handle to <code style=\"color: inherit\">check_call</code> or <code style=\"color: inherit\">check_output</code>. This determines where stdin comes from.\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">with open('seq.fa', 'r') as handle:\n subprocess.check_call(['build-icm', 'test.icm'], stdin=handle)\n</code></pre></div>        </div>\n</li>\n</ol>\n</blockquote>\n</blockquote>\n<p>We’ll do that now:</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-17",
      "source": [
        "with open('E. Coli CDSs.fna', 'r') as handle:\n",
        "    subprocess.check_call(['build-icm', 'test.icm'], stdin=handle)"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-18",
      "source": "<blockquote class=\"tip\" style=\"border: 2px solid #FFE19E; margin: 1em 0.2em\">\n<div class=\"box-title tip-title\" id=\"tip-what-does-this-commands-look-like-on-the-cli-1\"><button class=\"gtn-boxify-button tip\" type=\"button\" aria-controls=\"tip-what-does-this-commands-look-like-on-the-cli-1\" aria-expanded=\"true\"><i class=\"far fa-lightbulb\" aria-hidden=\"true\" ></i> <span>Tip: What does this commands look like on the CLI?</span><span class=\"fold-unfold fa fa-minus-square\"></span></button></div>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">build-icm test.icm &lt; 'E. Coli CDSs.fna'\n</code></pre></div>  </div>\n</blockquote>\n<p>Here we build a model, based on the sequences of <em>E. Coli</em> K-12, that Glimmer3 can use.</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-19",
      "source": [
        "output = subprocess.check_output([\n",
        "    'glimmer3', # Our program\n",
        "    'Escherichia virus T4.fna', # The input genome\n",
        "    'test.icm', # The model we just built\n",
        "    't4-genes'  # The base name for output files. It'll produce t4-genes.detail and t4-genes.predict.\n",
        "]).decode('utf-8') # And of course we decode as utf-8\n",
        "\n",
        "print(output)"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-20",
      "source": "<blockquote class=\"tip\" style=\"border: 2px solid #FFE19E; margin: 1em 0.2em\">\n<div class=\"box-title tip-title\" id=\"tip-what-does-this-commands-look-like-on-the-cli-2\"><button class=\"gtn-boxify-button tip\" type=\"button\" aria-controls=\"tip-what-does-this-commands-look-like-on-the-cli-2\" aria-expanded=\"true\"><i class=\"far fa-lightbulb\" aria-hidden=\"true\" ></i> <span>Tip: What does this commands look like on the CLI?</span><span class=\"fold-unfold fa fa-minus-square\"></span></button></div>\n<div class=\"language-plaintext highlighter-rouge\"><div><pre style=\"color: inherit; background: transparent\"><code style=\"color: inherit\">glimmer3 'Escherichia virus T4.fna' test.icm t4-genes\n</code></pre></div>  </div>\n</blockquote>\n<p><em>What happened here?</em> The output of the program was written to <code style=\"color: inherit\">stderr</code>, not <code style=\"color: inherit\">stdout</code>, so Python may print that out to your screen, but <code style=\"color: inherit\">output</code> will be empty. To solve this common problem we can re-run the program and collect both <code style=\"color: inherit\">stdout</code> and <code style=\"color: inherit\">stderr</code>.</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "id": "cell-21",
      "source": [
        "output = subprocess.check_output([\n",
        "    'glimmer3', # Our program\n",
        "    'Escherichia virus T4.fna', # The input genome\n",
        "    'test.icm', # The model we just built\n",
        "    't4-genes'  # The base name for output files. It'll produce t4-genes.detail and t4-genes.predict.\n",
        "], stderr=subprocess.STDOUT).decode('utf-8') # And of course we decode as utf-8\n",
        "\n",
        "print(output)"
      ],
      "cell_type": "code",
      "execution_count": null,
      "outputs": [],
      "metadata": {
        "attributes": {
          "classes": [
            ""
          ],
          "id": ""
        }
      }
    },
    {
      "id": "cell-22",
      "source": "<p>Here we’ve re-directed the <code style=\"color: inherit\">stderr</code> to <code style=\"color: inherit\">stdout</code> and mixed both of them together. This isn’t always what we want, but here the program produces no output, and we can do that safely, and now we can parse it or do any other computations we need with it! Our Glimmer3 gene calls are in <code style=\"color: inherit\">t4-genes.detail</code> and <code style=\"color: inherit\">t4-genes.predict</code> if we want to open and process those as well.</p>\n",
      "cell_type": "markdown",
      "metadata": {
        "editable": false,
        "collapsed": false
      }
    },
    {
      "cell_type": "markdown",
      "id": "final-ending-cell",
      "metadata": {
        "editable": false,
        "collapsed": false
      },
      "source": [
        "# Key Points\n\n",
        "- **DO NOT USE `os.system`**\n",
        "- **DO NOT USE shell=True**\n",
        "- 👍Use `subprocess.check_call()` if you don't care about the output, just that it succeeds.\n",
        "- 👍Use `subprocess.check_output()` if you want the output\n",
        "- Use `.decode('utf-8')` to read the output of `check_output()`\n",
        "\n# Congratulations on successfully completing this tutorial!\n\n",
        "Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-subprocess/tutorial.html#feedback) and check there for further resources!\n"
      ]
    }
  ]
}