Rollercoaster (tycoon) should not crash

I’m sure if you’re reading this, you’ve most likely at one point as a kid thought that being a game tester would be an awesome job. Playing games and getting paid for it, sounds great. However, why do video game developers rely so greatly on players to test their games?

Writing automated tests is generally a tough job. It requires a software system to be well defined, unchanging and loosely coupled for it to be a reasonable time investment. Video games, unfortunately, tend to be tightly knit systems that can change drastically over the course of their development. 1

This is one of the reasons why OpenRCT2 does not have a lot of automated tests. It depends on players to find bugs and other issues. When automated testing is not really an option, you have to rely on other ways to maintain the quality of your code and prevent the build-up of technical debt, which is what we’ll be covering in this essay.

Software quality process overview

OpenRCT2 uses continuous integration (CI) checks, mandatory code review and bug reports to ensure correct functionality of code changes. The CI builds for Windows, macOS, Linux and Android. The Windows and Linux builds also run the tests. Afterwards, the build artifacts are uploaded to the CI and the website.

CI checks

Besides code functionality, there are also rules for commit messages 2 and code formatting 3. The code formatting also has its own CI check. Tracking of bugs is done through the Github issues, where players can submit bugs they find. Windows crash logs are also automatically uploaded here with 4

Finally, the in-game replay system can be used for creating tests. It allows for commands to be recorded and saved in a replay file. Then later, a replay can be played to execute the same set of commands.

Importance of code quality, testing, and technical debt

In this section we will look at how OpenRCT2’s issues, pull requests, and documentation indicate the importance of code quality, testing, and technical debt in the project.

OpenRCT2 has a page available with planned projects, many of which have the goal of decreasing technical debt, or improving code quality and flexibility. Prime examples hereof are the new save format, or the plug-in API.

Another indicator of the importance of code quality in the project is the priority given to refactoring. Currently there are over 30 open issues regarding refactoring, indicating that refactoring is a major concern for the project.

Furthermore, the comment sections of the project’s issues and pull requests give additional indication of the importance of code quality and technical debt. For example, let’s look at the pull request our team offered to OpenRCT2 earlier this month:

This contribution concerns the addition of a command to the in-game console. In the comments of this pull request, a core developer mentions a potential problem in multiplayer games:

Another core developer comments that this command could perhaps be offered as a plug-in for the new upcoming plug-in system:

Finally, IntelOrca (the main developer) responds:

Although this example is no hard evidence of the importance of code quality and technical debt of OpenRCT2, it does offer some insight as to how the core developers value these aspects and discuss them. To illustrate this point even further: as of March 21st 2020, there are 5180 closed pull requests, of which 3066 have at least one comment, and 1238 have more than 5 comments.

Assessment of test process quality

The OpenRCT2 test processes consist of CI checks, unit tests, and manual testing by developers and players. Numerous questions about these test processes arise. Is the quality sufficient? Are the tests adequate?

To start, we will look at the test to code ratio to get a general feel for the adequacy of the unit tests. The number of lines in .cpp files in the project’s /src directory equals 443938, while this number in the /test directory equals 28349. This yields a test to code ratio of 6.39%. However, the ride component contains many lines of code which should be considered data, and not functional code. This is due to many files in the component containing data on ride properties.

Excluding ride results in a test to code ratio of 13.87%. As ride consists partly of data and partly of functional code, this suggests that the real test to code ratio is between these two boundaries.

Should OpenRCT2 spend more resources on testing? It might be wiser to first implement the plug-in system. This would allow continued development of new features and modifications that do not have to be included in the base game. The benefit would be that these functions can be tested seperately, and that this would limit the main functionality of the base game that needs to be tested.

Furthermore, one could argue that small bugs are not a huge problem in a game like OpenRCT2. Large bugs that ruin the end-user’s experience, such as the game crashing every hour, should get very high priority. However, once the game is playable and enjoyable by the vast majority of players, occasional small bugs might not have a large negative effect on their experience.

Our recommendation is that the developers of OpenRCT2 keep listening to the community’s wishes, and actively request their thoughts and opinions. If many complaints are heard about an aspect of the game, efforts can be concentrated on fixing bugs and improving tests of those components. Additionally, we applaud the goal of developing features such as the plug-in system.

Assessment of code quality and maintainability

Although no single metric can definitely measure code’s quality and maintainability, several metrics can give an overall indication.

For clarification, unit interfacing refers to the number of unit (method) parameters. Also, a module refers to a file and a component is a collection of modules (files).

The OpenRCT2 project is fairly large, with many architectural components that were discussed in our previous essay. Therefore we have chosen to assess the overall system and the five components that are most likely going to be affected by future changes based on their volume (lines of code) and the OpenRCT2 roadmap. The corresponding metrics were obtained using Sigrid and visualized in the following diagram.

Quality Metrics for OpenRCT2 and its components

Based on the results of the measurements, we suggest some refactoring candidates. Using Sigrid, we can see dozens of potential refactoring candidates, but we will only list a couple to support the findings.

It can be seen that the project and its components scores low on duplication, so there is a lot of duplicated code present. A good refactoring candidate is the following:

  • Remove duplicated code on lines 360-637 in ImageImporter.cpp of the drawing component and lines 710-987 in CmdlineSprite.cpp of the engine component.

Even though the score for duplication is low, it is important to note that the metric is influenced by the presence of data files, such as the TrackData.cpp in the ride component. It contains information about which track elements are available for various rides. It contains arrays that are filled with mostly -1’s, which are then detected as duplicated code by Sigrid, even though this is not the case.

Furthermore, unit size seems to be a problem throughout the entire project, for instance, several methods with hundreds lines of code can be found in the in the ride component. The vehicle_update_track_motion_mini_golf method in Vehicle.cpp contains 661 lines of code, which makes it difficult to understand. It is important to note that OpenRCT2 is a re-implementation of the original RCT2 game, some methods like this one have been added but have not been subject to refactoring.

As can be seen from the diagram, OpenRCT2 and its components score very low on unit complexity and unit interfacing. This probably explains the lack of tests throughout the project, since it is difficult to write tests for complex methods. We suggest the refactoring of the following two methods:

  • The vehicle_update_track_motion_mini_golf method in Vehicle.cpp of the ride component, since it has a McGabe complexity5 of 111. In contrast, the recommended maximum McGabe complexity is 156.
  • The peep_pathfind_heuristic_search method in GuestPathfinding.cpp of the peep component, since it has 10 parameters and a McGabe complexity of 95.

On module coupling, OpenRCT2 and its components score a little higher than on other metrics, but the score is still low. Several modules have several hundreds of dependencies. We suggest refacoring those modules, including:

  • The BolligerMabillardTrack.cpp file in the ride component, since it has 359 incoming dependencies from other other modules within ride.

Another suggestion is to increase the use of documentation within the code. Very few units and modules contain a description of its concern. This makes it harder, especially for new developers, to make contributions.

A more general suggestion is refactoring components to reduce interdependencies of components. The components of the OpenRCT2 project have a lot of interdependencies. This interdependency has probably been inherited from the original implementation of RCT2, which will briefly be discussed in the last section.

Coding activity

The most frequently changed components in the past two weeks are ride, windows, ui, actions, object, network and engine.

Recent coding activity is seen in the following components: ride, windows, world, actions, scenario, rct1, rct2, peep, interface, ui, platform, paint, object, audio, drawing, management, network, title, localisation.

Due to active work on a new standard for references to screen space, there have been frequent changes to windows, rct1, rct2, action and ride. More activity is expected in these components as work on this new standard progresses. Additionally, we expect changes will need to be made to interface.

The following is a subset of the system’s roadmap and its mapping onto architectural components. It explains the biggest features and their dependency on components.

  • New save format: currently, OpenRCT2 uses the save format SV6, the native save format used by RCT2. SV6 imposes many limits such as limits on the number of rides, sprites, or map size. Hence, a new save format is being developed to allow limits to be removed or increased. The components involved are rct1, rct2, ride, action, world, scenario, config, windows and the cheats file.

  • Scripting/plug-in API: currently, features have to be built directly into the game, which can clutter the source code. API scripting decouples extra features from the core architecture, so it is not influenced by additional features other people implement. Implementation of this feature will require changes to many components, as the feature is so fundamental.

Technical debt

A function in the ride ratings module

OpenRCT2 is a reverse-engineered version of a game from 2002, a game from an era where developers did not send out day-one patches or monthly updates over the internet. Therefore, OpenRCT2 inherited some technical debt from the original game, in the form of bugs and issues but also code architecture. There still exist some functions that are decompiled versions of assembly code, the functionality of which can also be unclear.

There is also debt in the form of a somewhat limited amount of automated testing. The continuous integration catches code that does not build but errors can still get through that. It would require people to play the game and find a bug to let the developers know it even exists.

The issues page on Github currently contains 1519 open issues, with the oldest being opened the first of September 2014. This can also be considered a form of debt, as the issues are filled with topics that are no longer relevant, or buried so far they will likely never get a resolution. It is important to mention that a large fraction of these issues consist of feature requests and (duplicate) backtrace crash reports. Hence, the number of open issues regarding bugs is much smaller.

However, after these negative points, the team is very aware of this debt and is working on removing it. Evidence of this can be found in the repository indicators, as mentioned in one of the preceding sections.

In conclusion, there is work to be done in terms of relieving technical debt, but the current efforts going into the new save format and refactoring the references to screen space show that the team is dedicated to making and maintaining a quality architecture.

  1. Murphy-Hill, E., Zimmermann, T., & Nagappan, N. (2014). Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development? Proceedings of the 36th International Conference on Software Engineering - ICSE 2014. doi: 10.1145/2568225.2568226 

  2. Commit message rules 

  3. Clang coding style rules 

  4. Pull request introducing 

  5. T. J. McCabe, “A Complexity Measure,” in IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320, Dec. 1976. 

  6. Arthur H. Watson; Thomas J. McCabe (1996). “Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric”. NIST Special Publication 500-235