Architecture of mypy - DESOSA 2020

Welcome back all! Today we will dive deeper into the architecture of mypy. As this is our second blogpost, we assume that you have some basic knowledge about what mypy is and what you can do with it. If you have no clue what we are talking about, please check our previous blogpost.

What are your thoughts?

All projects have their respective stakeholders, and the responsibility of the architect is to make sure that all thoughts about what a system should do are conveyed in the project. By subdividing these thoughts into different topics, one can find the different viewpoints for the project.

Rozanski and Woods ¹ have described a nice method for subdividing these thoughts into architectural viewpoints, to meet the requirements of the main stakeholders.

Viewpoint	Importance	Explanation
Functional	++	Because functionality is the main reason for starting projects, the functional viewpoint is important
Information	+	How to give the result to the end-user. Readability of the result is important. Less important than the function.
Concurrency	-	Important to notice if programs run concurrently. In the case of mypy, the user should be able to check a file without closing other programs.
Development	+	Mypy is open source, so to contribute, the architecture has to be known.
Deployment	–	Low demands on hardware. As long as python can run, mypy can too.
Operational	+	Mypy should be easy to install, easy to maintain and it has to work on different systems.

Since each architecture is different, each of the viewpoints has a distinct importance to the system. For mypy the table shows this for each of the viewpoints. Furthermore the viewpoints are explained in short. For a description of the functional and information viewpoint, one should read the previous blogpost. The development, deployment and operational views, will be discussed in the next sections.

The coming sections are a lot more technical, so buckle up, because this is going to be one hell of a ride!

You got style!

Really bad software mostly is made without thinking about the architecture. When you as a developer start a project, you should think about the functional viewpoints and adjust your design accordingly.

While the functional viewpoint of mypy is mostly described in the previous blogpost, the overall design is not discussed yet. The mypy code base consists mostly of a lot of files in one folder. After scrolling through 1000s lines of code, we know one thing for certain, it all starts with main.py.

All jokes aside, the mypy architecture can be described as a pipe and filter architecture. Pipe and filter architectures apply filters sequentially to run it. One can see this pattern in for instance compilers (which coincidentally could also type check).

Analyze this

Are you still with us? Good! Lets start our descent into the complex codebase of mypy and shine some light on it. So what if you want to contribute to mypy, do you know where to start? We would suggest taking a look at the following image.

In this image we can distinguish eight different components. This is our vision on mypy and its dependencies. The main component is the semantic analyzer. This component includes the main.py file (called when mypy is run) and the build.py file. The main.py file calls the build file, which then handles all responsibilities of doing the type checking. As you can see in the image, there are many packages used by the semantic analyzer to do the type checking. An observative reader, could see that the daemon is surrounded by a dotted line. This means that it is optional to use the daemon.

We will go over the different components briefly, in order to give you some feeling about the system. First we start with the utilities package, which contains all helper functions for the other packages. The second component is the filesystem, which maintains the analysis metadata cache. This analysis metadata is useful for speeding up, and avoiding redundant system calls. When the semantic analyzer has constructed the project metadata, a typechecker component is instantiated and called. This is where the real types are compared and where an error report is generated. Following, the common component contains a definition library of the different errors found. Finally, to be able to have a simple way of testing the mypy program itself, a set of integration and unit tests is available to run mypy against specific scenario’s. Mypy is also capable of being extended with (third party) plugins. The plugins component handles these extensions. ²

Mypy as a command-line pipe

There are two main packages that are important for how the pipe flow architecture is composed in mypy. Looking at the image below, we can see the sequential workflow. The numbers represent the order of the calls. Don’t get confused by the name of the package and the name of the submodules. We will tear it down in a runtime view for you:³

Main.py is calling the build.py file. Since this file consist of many classes we decided to put it in a bigger submodule (step 1).
Build.py calls many different methods to initialize the right configurations for the semantic analyzer (as a package). One of these configurations is what error messages can sent (step 3 and 4).
With all these settings a Buildmanager is created. The Buildmanager will correctly call the right functions of the semantic analyzer (as a submodule) and handles dependencies between files. (step 5).
The semantic analyzer checks if the program is so called “valid”, so if there are any problems with the code itself. It returns any errors found within the program. (step 6).
The type checker package does the real typechecking. With the information gathered by the Buildmanager (from among others the semantic analyzer), it type checks a given file. Then it will return found errors (step 7 and 8).
Lastly the errors are transferred to the end-user (step 9). Depending on the chosen options a report might be created (also see for a more in depth description).

Can you run it?

So, now you all know what the mypy architecture looks like, let’s talk about deployment. Arc42 ⁴ talks about the deployment view as the technical infrastructure used to execute the system. The most important thing is the hardware that mypy can be run on. Using that point of view, mypy is a very simple program, that does not require much. It just requires a computer that has python installed on it. Using the command line, it can be used. Alternatively it can be installed via pip, to make it easier to use.

Finally, a daemon is also present, to be able to have continuous type checking with python.

Non-functional is the new functional

Lastly lets talk a bit about some lighter stuff. What if a system architecture is perfectly working, however type checking one file will take a day? This would mean developers will not use mypy and thus the architecture is not correct. To overcome this problem, an architect should always know the NFPs (Non functional properties) of his project. For mypy the following non functional properties are important: ⁵

Strictness. To make sure that developers have freedom in type checking their project, they can add a mypy.ini configuration file. In this file they can specify how their whole project should be checked by mypy or certain parts of it. For example: are files without types allowed? Or another example, some developers might want a so called Any for some of their functions, which can be allowed or not. Mypy structures the configuration sections in multiple system wide mypy.ini or mypy.cfg files and finally merges all sources with any given command line arguments.
Speed. As described before, the need for a fast type checker is important. Mypy deals with this in two ways. First of all the config file can be used to make the type checker faster by changing the behaviour of type checking (for instance only type check a module instead of the full project). Secondly mypy is working on a compiled c version of their type checker to increase speed.
Documenting errors. As a developer, documenting errors might be important. However having a terminal output might be hard to work with. Therefore mypy has an option to create documentation of the errors in multiple ways (for instance html and xml).
Documentation. There are many options mypy, therefore starting with it is easier said than done. To make it a bit easier extensive documentation can be found online (or can be build locally).
Testing. A developer might want to verify the correct functioning of mypy, to make sure that using it is helpful. As described before, mypy has many different tests to verify the correct functioning, which can be run with the pytest library.

We hope you all enjoyed this new blogpost about mypy, and hope you have learned something. The next blogpost will contain an exam! Just kidding, we hope to see you again at our next post!

Nick Rozanski and Ein Woods.Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives. Addison-Wesley Professional, 2 edition, 2011. ↩
Mypy github for the code base https://github.com/python/mypy ↩
Mypy documentation found on https://mypy.readthedocs.io/en/stable/index.html ↩
The views as described in https://docs.arc42.org/home/ ↩
I-Hsin Chou and Chin-Feng Fan. Regulatory software configuration man-agement system design. volume 4166, pages 99–112, 09 2006. ↩