{ "metadata": {}, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
In some languages type annotations are a core part of the language and types are checked at compile time, to ensure your code can never use the incorrect type of object. Python, and a few other dynamic languages, instead use “Duck Typing” wherein the type of the object is less important than whether or not the correct methods or attributes are available.
\nHowever, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point.
\n\n\nAgenda\nIn this tutorial, we will cover:
\n\n
\n- Types
\n
Types used for annotations can be any of the base types:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-1", "source": [ "str\n", "int\n", "float\n", "bool\n", "None\n", "..." ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point." ], "id": "" } } }, { "id": "cell-2", "source": "or they can be relabeling of existing types, letting you create new types as needed to represent your internal data structures
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-3", "source": [ "from typing import NewType\n", "\n", "NameType = NewType(\"NameType\", str)\n", "Point2D = NewType(\"Point2D\", tuple[float, float])" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point." ], "id": "" } } }, { "id": "cell-4", "source": "\n\n\nYou might be on a python earlier than 3.9. Please update, or rewrite these as Tuple and List which must be imported.
\n
Imagine for a minute you have a situation like the following, take a minute to read and understand the code:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-5", "source": [ "# Fetch the user and history list\n", "(history_id, user_id) = GetUserAndCurrentHistory(\"hexylena\")\n", "\n", "# And make sure all of the permissions are correct\n", "history = History.fetch(history_id)\n", "history.share_with(user_id)\n", "history.save()" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point." ], "id": "" } } }, { "id": "cell-6", "source": "\n\nQuestion\n\n
\n- Can you be sure the
\nhistory_id
anduser_id
are in the correct order? It\nseems like potentially not, given the ordering of “user” and “history” in the\nfunction name, but without inspecting the definition of that function we\nwon’t know.- What happens if
\nhistory_id
anduser_id
are swapped?\n👁 View solution
\n\n\n
\n- This is unanswerable without the code.
\n- \n
\nDepending on the magnitude of
\nhistory_id
anduser_id
, those may be within allowable ranges. Take for example\n\n
\n\n \n\n\nUser \nHistory Id \n\n \n1 \n1 \n\n \n1 \n2 \n\n \n2 \n3 \n\n \n\n2 \n4 \nGiven
\nuser_id=1
andhistory_id=2
we may intend that the second row in our tables, history #2 owned by user #1, is shared with that user, as they’re the owner. But if those are backwards, we’ll get a situation where history #1 is actually associated with user #1, but instead we’re sharing with user #2. We’ve created a situation where we’ve accidentally shared the wrong history with the wrong user! This could be a GDPR violation for our system and cause a lot of trouble.
However, if we have type definitions for the UserId
and HistoryId
that declare them as their own types:
And then defined on our function, e.g.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-9", "source": [ "def GetUserAndCurrentHistory(username: str) -> tuple[UserId, HistoryId]:\n", " x = UserId(1) # Pretend this is fetching from the database\n", " y = HistoryId(2) # Likewise\n", " return (x, y)" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point." ], "id": "" } } }, { "id": "cell-10", "source": "we would be able to catch that, even if we call the variable user_id
, it will still be typed checked.
If we’re using a code editor with typing hints, e.g. VSCode with PyLance, we’ll see something like:
\n\nHere we see that we’re not allowed to call this function this way, it’s simply impossible.
\n\n\nQuestion\nWhat happens if you execute this code?
\n\n👁 View solution
\n\nIt executes happily. Types are not enforced at runtime. So this case where they’re both custom types around an integer, Python sees that it expects an int in both versions of the function call, and that works fine for it. That is why we are repeatedly calling them “type hints”, they’re hints to your editor to show suggestions and help catch bugs, but they’re not enforced.\nIf you modified the line
\ny = HistoryId(2)
to be something likey = \"test\"
, the code will also execute fine. Python doesn’t care that there’s suddenly a string where you promised and asked for, an int. It simply does not matter.However, types are checked when you do operations involving them. Trying to get the
\nlen()
of an integer? That will raise anTypeError
, as integers don’t support thelen()
call.
Adding types to variables is easy, you’ve seen a few examples already:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-13", "source": [ "a: str = \"Hello\"\n", "b: int = 3\n", "c: float = 3.14159\n", "d: bool = True" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point." ], "id": "" } } }, { "id": "cell-14", "source": "But you can go further than this with things like tuple
and list
types:
Likewise you’ve seen an example of adding type hints to a function:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "def reverse_list_of_ints(a: list[int]) -> list[int]:\n", " return a[::-1]" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point." ], "id": "" } } }, { "id": "cell-18", "source": "But this is a very specific function, right? We can reverse lists with more than just integers. For this, you can use Any
:
But this will lose the type information from the start of the function to the end. You said it was a list[Any]
so your editor might not provide any type hints there, even though you could know, that calling it with a list[int]
would always return the same type. Instead you can do
Now this will allow the function to accept a list of any type of value, int, float, etc. But it will also accept types you might not have intended:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-23", "source": [ "w: list[tuple[int, int]] = [(1, 2), (3, 4), (5, 8)]\n", "reverse_list(w)" ], "cell_type": "code", "execution_count": null, "outputs": [], "metadata": { "attributes": { "classes": [ "However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point." ], "id": "" } } }, { "id": "cell-24", "source": "We can lock down what types we’ll accept by using a Union
instead of Any
. With a Union
, we can define that a type in that position might be any one of a few more specific types. Say your function can only accept strings, integers, or floats:
Here we have used a Union[A, B, ...]
to declare that it can only be one of these three types.
\n\nQuestion\n\n
\n- \n
\nAre both of these valid definitions?`
\n\nq1: list[Union[int, float, str]] = [1, 2, 3]\nq2: list[Union[int, float, str]] = [1, 2.3214, \"asdf\"]\n
- \n
\nIf that wasn’t what you expected, how would you define it so that it would be?
\n\n\n\nYes, both are valid, but maybe you expected a homogeneous list. If you wanted that, you could instead do
\n\nq3: Union[list[int], list[float], list[str]] = [1, 2, 3]\nq4: Union[list[int], list[float], list[str]] = [1, 2.3243, \"asdf\"] # Fails\n
Sometimes you have an argument to a function that is truly optional, maybe you have a different code path if it isn’t there, or you simply process things differently but still correctly. You can explicitly declare this by defining it as Optional
While this superficially looks like a keyword argument with a default value, however it’s subtly different. Here an explicit value of None is allowed, and we still know that it will either be a string, or it will be None. Not something that was possible with just a keyword argument.
\nYou can use mypy
to ensure that these type annotations are working in a project, this is a step you could add to your automated testing, if you have that. Using the HistoryId
/UserId
example from above, we can write that out into a script and test it out by running mypy
on that file:
$ mypy tmp.py\ntmp.py:15: error: Incompatible types in assignment (expression has type \"UserId\", variable has type \"HistoryId\")\ntmp.py:15: error: Incompatible types in assignment (expression has type \"HistoryId\", variable has type \"UserId\")\n
Here it reports the errors in the console, and you can use this to prevent bad code from being committed.
\nHere is an example module that would be stored in corp/__init__.py
And here are some example invocations of that module, as found in test.py
\n\nHands On: Add type annotations\n\n
\n- Add type annotations to each of those functions AND the variables
\nx
,y
,z
- How did you know which types were appropriate?
\n- Does
\nmypy
approve of your annotations? (Runmypy test.py
, once you’ve written the above files out to their appropriate locations.)
The proper annotations:
\ndef repeat(x: str, n: int) -> list[str]:\n# Or\nfrom typing import TypeVar\nT = TypeVar(\"T\")\ndef repeat(x: T, n: int) -> list[T]:\n\ndef print_capitalized(x: str) -> str:\n\ndef concatenate(x: str, y:str) -> str:\n
and
\nx: list[str] = ...\ny: str = ...\nz: str = ...\n
You can use MonkeyType to automatically apply type annotations to your code. Based on the execution of the code, it will make a best guess about what types are supported.
\n\n\nHands On: Using MonkeyType to generate automatic annotations\n\n
\n- Create a folder for a module named
\nsome
- Touch
\nsome/__init__.py
to ensure it’s importable as a python module- \n
\nCreate
\nsome/module.py
and add the following contents:\ndef add(a, b):\n return a + B\n
- \n
\nCreate a script that uses that module:
\n\nfrom some.module import add\n\nadd(1, 2)\n
- \n
pip install monkeytype
- \n
\nRun MonkeyType to generate the annotations
\n\nmonkeytype run myscript.py\n
- \n
\nView the generated annotations
\n\nmonkeytype stub myscript.py\n
\n\n\nQuestion\n\n
\n- What was the output of that command?
\n- This function will accept strings as well, add a statement to exercise that in
\nmyscript.py
and re-runmonkeytype run
andmonkeytype stub
. What is the new output?👁 View solution
\n\n\n
\n- \n
\nThe expected output is:
\n\ndef add(a: int, b: int) -> int: ...\n
- \n
\nYou can add a statement like
\nadd(\"a\", \"b\")
belowadd(1, 2)
to see:\ndef add(a: Union[int, str], b: Union[int, str]) -> Union[int, str]: ...\n
\n\n\nQuestion\nWhy is it different?
\n👁 View solution
\n\nBecause MonkeyType works by running the code provided (
\nmyscript.py
) and annotating based on what executions it saw. In the first invocation it had not seen any calls toadd()
with strings, so it only reportedint
as acceptable types. However, the second time it sawstr
s as well. Can you think of another type that would be supported by this operation, that was not caught? (list!)
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "cell_type": "markdown", "id": "final-ending-cell", "metadata": { "editable": false, "collapsed": false }, "source": [ "# Key Points\n\n", "- Typing improves the correctness and quality of your code\n", "- It can ensure that editor provided hints are better and more accurate.\n", "\n# Congratulations on successfully completing this tutorial!\n\n", "Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-typing/tutorial.html#feedback) and check there for further resources!\n" ] } ] }Question\n\n
\n- Does that type annotation make sense based on what you’ve learned today?
\n- Can you write a better type annoation based on what you know?
\n👁 View solution
\n\n\n
\n- It works, but it’s not a great type annotation. Here the description looks like it can accept two
\nint
s and return astr
which isn’t correct.- \n
\nHere is a better type annotation
\n\nfrom typing import TypeVar\nT = TypeVar(\"T\", int, str, list)\n\ndef add(a: T, b: T) -> T:\n return a + b\n