Welcome to the third part of our four-part series on Gatsby. The previous essays can be found here: first, second. For a complete understanding we highly recommend reading those first. In this part we will evaluate technical debt and architectural decisions made. To be able to cover this all we will first take a look at the quality assurance systems Gatsby has in place. We’ll explain what a contributor needs to do to have their pull request successfully merged and what Gatsby does to protect their code quality. When we have in place how pull requests are merged, we will analyze what pull requests have been merged recently, what changes are coming up, and how these changes impact the technical debt and maintainability of Gatsby. This should give a clear understanding of what is going on inside Gatsby. After getting this understanding, we get a bit more formal and dive into absolute quality assessment, analysing results from SIG and BetterCodeHub. In conclusion we will take all evaluated aspects together. If you want to learn more about these topics, this essay is just for you!
Let’s start with one of the most important parts of software quality: bugs per square meter… No, testing, of course. To assess the test quality of Gatsby, we first looked at the analysis by SIG. This didn’t yield accurate results. Luckily Gatsby uses Jest1 for testing. Allowing us to generate coverage reports ourselves.
With Jest, you can use a function to specify which directories should be included in the coverage report2. This function can easily adhere to the project structure and automatically pick up newly added plugins or packages. A nice configuration aspect that makes assessing the correctness of the coverage report a breeze. The function hasn’t changed for over a year, which is remarkable. This might be a good sign, but there is currently no way to validate its correctness.
Diving into the tests themselves, it appears that Gatsby has a quite structured and overall high quality testing policy. With simple unit-tests verifying behavior of single functions, integration-tests verifying component interaction and fully fledged end-to-end-tests. Test structure is very clear, due to elaborate documentation, allowing new developers to maintain the same standard in new code. The testing policy for newly added or changed code is also clearly communicated through the docs3.
Overall, the codebase has 60% statement, 55% branch, 57% function
and 61% line coverage. While these numbers are low compared to the commonly
used 80%, we have to take into account the low coverage of some plugins. On the
other hand, it also seems that some core components have low coverage, for
gatsby-cli and some parts of the
gatsby package. From a coverage
point of view, it would be nice to see all core packages to have a coverage
greater than 80%. Currently, coverage is already quite satisfactory in our
As part of sharing the entire codebase with the community, anyone is free to
contribute and review new changes to the code. Some sidenotes: Firstly, changes
regarding the internal organization structure are reserved for Gatsby employees.
Secondly, every PR needs to be approved by the respective code owner in the
CODEOWNERS file4, approval is required by someone relevant to that part of
the code. Finally, merging a PR is only reserved for the core team and Gatsbot.
Gatsbot is an automated bot that merges PRs with the label
bot: merge on
green for which all pipeline checks succeeded5.
Figure 1: Pipeline
Gatsby merges over 100 PRs every week. For documentation specifically it is important to follow their style and formatting guidelines. To code changes some other rules apply: they should include tests asserting implemented behavior and ensuring that fixed bugs can’t re-occur. The pipeline (figure 1) runs checks dependent on each other, some checks require others to have passed. Type checks are being done in the pipeline. The code is automatically reviewed by danger.js for simple mistakes.
One important aspect for Gatsby are the guidelines for reviewing a PR in the docs6. The main points of these guidelines are:
- Be kind
- Use GitHub suggestions
- Link examples
- Try to avoid bikeshedding
Especially the point on bikeshedding7 stands out. Bikeshedding is defined as follows: “Futile investment of time and energy in discussion of marginal technical issues”. This has led to a lot of lengthy discussions as Go error handling891011. In python it even led to Guido’s (Python’s creator) resignation12. Since so many people are contributing to Gatsby, this rule should really be taken to heart. If you have the time, read the link for bikeshedding, it provides some interesting insights.
To see how Gatsby is evolving, we’ll first look at its recent history. For this
matter we analyzed all pull requests merged in the last month, i.e. from 19
February to 19 March. This yielded 430 pull requests divided over many areas
of code. Three hotspots stood out. As expected from the Gatsby community,
documentation updates top the list, with 75 PRs. Second place, with 65 pull
requests was the TypeScript migration13 of the core codebase, which only
started on March 5th! The community was very active as well with 45 websites
added to the showcase. Besides these, support for MDX14 had significant
updates with 21 related pull requests and there were 17 dependency updates by
renovate [bot]. This covers roughly half of the pull requests. The other half
of the pull requests were either focused on one of the many plugins, or
sometimes on core features such as GraphQL, yarn 215 compatibility and
moving from hot-reload to FastRefresh.
Variables in StaticQuery
In Gatsby, you can get the correct sized image on the page by querying it from the GraphQL api, with a page query, or a StaticQuery. The StaticQuery component is a React component that allows a developer to specify a GraphQL query and to use the result of this query in its child components. This GraphQL query gets executed at compile time, this now no longer has to be done when loading the page. Currently, it is not possible to add variables to these queries. This is a problem when multiple components rely on really similar data. In this case, the components either need to be duplicated with slight changes, or the data flow needs to be changed. This leads to a suboptimal development experience with Gatsby, which would be solved by allowing variables in a StaticQuery.
Since Gatsby runs the StaticQuery at compile time, it is hard to see which value the variables in the StaticQuery instances will have upfront. While struggling with the issue, Wes Bos (a well known web developer) made a tweet resulting in a solution22. The proposed solution would insert an additional step in Gatsby’s build pipeline, compiling files using StaticQuery components into multiple specialized instances without those variables. This additional step in the build process might add some technical debt to the Gatsby system itself, but it will reduce the technical debt in projects created with Gatsby by a lot for sure!
To assess the maintainability of Gatsby, we analyzed the project using tools from SIG. SIG rates code on many different aspects, including code duplication and module coupling, on a scale from 0.5 to 5.5 stars. These scores indicate the project’s code quality compared to other codebases: to get 5 stars on a metric you need to be in the top 5%, for 4 stars in the top 5-35%, for 3 stars in the top 35-65%, for 2 stars in the top 65-95% and if you score 1 star you are in the bottom 5%.
These are the results for the full Gatsby repository:
Figure 2: Scoring by SIG
Since a big part of the codebase consists of plugins, these numbers don’t say
too much and we performed an additional analysis of the
gatsby package using
Better Code Hub23 (also created by SIG). Better Code Hub rates projects using
ten simple pass/fail guidelines. The great thing about their method is that they
allow for some violations in a small percentage of the code. For example, for
“Write Short Units of Code”, at most 6.9% of units may contain more than 60
lines, at most 22.3% may contain more than 30 lines of code, etc. 24. A pass
on each guideline corresponds to receiving at least 4 stars for the SIG/TÜViT
Evaluation Criteria25. Below, some of the results of the Gatsby core package
are shown. The vertical bars represent the minimal quality that the codebase
Components most likely affected by future change?
Figure 3: Sig guidelines evaluation
As can be seen above, two guidelines are not met: writing short units of code and writing code once (no duplication). In fact, Gatsby includes a function consisting of 867 lines26! As for code duplication, one of the refactoring candidates is the comparators file for the loki database27, which contains a duplicate block of 56 lines of code. Even though it concerns test code, duplication should be avoided.
We analyzed the software quality of Gatsby from various angles. Gatsby tries to be a very open and inclusive community, allowing many people to add their plugins to the system. This does lead to a lowered test coverage. However, the tests that exist, are actually good. With this huge community, comes a lot of communication. Gatsby handles the communication well by applying many guidelines and policies.
Altogether, Gatsby, like all large projects, has some technical debt and has made some tradeoffs. They are working hard on improving their technical debt and have some truly exciting changes ahead. We hope to have guided you well through the technical depths of Gatsby and we hope to see you in our next and final essay!