Solidity and its variability

This is the fourth and the last essay on the Solidity project. It covers an analysis on the variability of the Solidity compiler. Like most compilers, Solidity allows a lot of command line arguments that alter the expected input and output files and the expected command line arguments themselves. Because in the first place we are talking about a compiler, and in the second place security plays a crucial role in the project, it is rather important that the process and the output is trustworthy in all possible variances. This is why we decided that a variability analysis would be an interesting final vision on the Solidity project.

Variability modeling

The first step in determining how variability is handled in a system, is by determining what variability actually exists. This part is dedicated to that. We will first look at what types of variabilities are present in Solidity. Then we will show how these variabilities influence eachother, and finally we will show a feature model of the system.

Main features

We can define different kinds of variability. The first, most obvious kind is command line arguments. When we run the command solc.exe --help we find there are 46 different command line arguments. 10 of these allow one or multiple additional arguments. This is already a massive pool of variability, and it’s only the first type of variability.

Another variability in Solidity is the operating system it compiles on. Solidity is designed to function on Windows, Linux and MacOs. The program is intended to run the same on all of these platforms, however considering that Solidity is written in C++, that may not always be guaranteed. We will do an analysis on how Solidity handles the differences in operating systems, and possible differences in libraries used depending on the operating system.

The last variability in Solidity is the choice of C++ compilers. GCC, Clang and MSVC can be used to build the Solidity code base. MSVC is only available on Windows.

Incompatibilities

There are little real incompatibilities as we are dealing with a compiler, however we have found some interesting things when searching through the codebase. One of the things that stand out is the number of Microsoft software related design choices. The code is full of comments that explain how certain choices relate to Windows or MSVC.

Take for example the batch file for installing pre-requisite packages for solidity on Windows. There we found the following: “The lack of a standard C++ packaging system for Windows is problematic for us”. The developers have considered various options for improving this situation, one of them was switching to NuGet C++ packages. Another option that they have considered is to add dependencies as git-submodules so that Solidity does not depend on platform specific packaging systems. It would add more control and robustness but increase the duration of the build process. Currently the issue is still open on the Ethereum Aleth repository: a collection of C++ libraries and tools for Ethereum which is also used in Solidity.

Another interesting finding regarding configurations is the existence of a special header file UndefMacros.h which “should be used to #undef some really evil macros defined by windows.h which result in conflict with our Token.h”.

Not only windows but also the MSVC compiler demands certain configurations. It for example requires MSVC specific Ethereum dependencies. All of the compilers require compiler specific compile options. But it seems that the focus is mainly on GCC an Clang as the compiler options used for MSVC for the most part disable warnings.

We also see a special flag for the MSVC compiler as it has difficulties with compiling large objects. This flag will be removed as soon as the files concerned are reduced in size which is being worked on.

Feature model

An easy and clear way to visualize features, is by constructing a feature model. We created this model in FeatureIDE for Eclipse.

This model shows the different options for compiling solidity itself. This includes different possible compilers and platforms.

Solidity compilation feature model

We did not make a feature model of the command line options solidity offers. This is because these options are all completely distinct. The only way some flags influence others is by ignoring them completely if they are present (thus allowing them to be anything instead of constraining them).

Variability management

Now that we know what types of variability exist in the system, we want to know how these are managed. This includes both how different stakeholders can find out what types of settings and toggles exist in the system, and how these are managed in the system itself. Secondly, how are these variabilities managed in the long-term? Is for example the performance and functionality for different platforms tested?

Information sources

The groups of users and developers overlap in the case of Solidity, this became clear from our stakeholder analysis in our first essay. The users and developers therefore share the same sources of information.

The main source of information regarding variability and solidity is the documentation. In the solidity documents there is a lot of information on the different platforms that Solidity supports. There are specific prerequisites for the different operating systems and compilers.

Management mechanisms

As mentioned in our previous essay, during the continuous integration, there is a category of tests that performs all defined unit tests on the different platforms, i.e. Windows, Linux and MacOS.

There is also a windows specific upgrade tool and the Visual C++ compiler is only available on windows.

Variability implementation mechanism and binding time

Finally, we want to know how the variability is implemented, and how (and when) the different versions are bound into the system.

Mechanisms and binding time

There is a variation point at compile time, for instance where you choose which C++ compiler you use to compile. All choices made here are to be done by modifying the files contained in the cmake folder, as well as the CMakeLists.txt file, although discouraged. Here, constraints for specific compilers and flags that are only needed on some compilers are maintained.

For instance, at compile-time (of the compiler itself, not to be confused with the compile-time of actual Solidity files!), you can for instance choose the target platform, and whether to also build supporting tools for the compiler in the EthOptions file.

As mentioned before, apart from the compiler choice, Solidity offers a slew of command line parameters. These parameters are all read in the solc module, which handles all command-line related code. The parameters are then managed by the boost::program_options library. This library ensures parameters are all managed the same way and can all be reached from anywhere in the program.

As these program_options are only set when reading the command line arguments, these variables are fixed after the command line interpreting is done. This means that after this these variables are bound and cannot be changed anymore, the entire program after the command line interpretation runs with the same command line arguments.

Although you can build the project locally as mentioned previously, the project is also built automatically on CircleCI. This makes use of the CircleCI config file, which allows you to choose for which platforms to build (e.g. ArchLinux, OSX, Ubuntu, etc.).

At run-time, the compiler presents a choice of flags to the user in order to compile his program, this is one of the main variation points. Information which can be specified at this point is which target to compile to (e.g. the version of EVM if compiling to EVM bytecode), whether to optimize, to produce the AST in JSON, etc.

It is important here to not mistake the definition of the target at two variation points as binding tim variability, as these are two different targets: at compile-time, you define the target for which you compile the compiler so it can run on a designated machine, whereas at run-time, you specify the target on which you want to run your smart contract. Pretty meta, right?

Design choices

Looking at the code base you can see that the design choices with respect to the operating system and compiler are focused on facilitating Solidity for Windows, Linux, MacOs with GCC, Clang and MSVC. However it looks as if most is designed to work on Linux and MacOs in combination with GCC and Clang, support for Windows and MSVC is added with flags and some parts of the code are changed to be compatible with the former.

Solidity