Gatsby in Debt

Welcome to the third part of our four-part series on Gatsby. The previous essays can be found here: first, second. For a complete understanding we highly recommend reading those first. In this part we will evaluate technical debt and architectural decisions made. To be able to cover this all we will first take a look at the quality assurance systems Gatsby has in place. We’ll explain what a contributor needs to do to have their pull request successfully merged and what Gatsby does to protect their code quality. When we have in place how pull requests are merged, we will analyze what pull requests have been merged recently, what changes are coming up, and how these changes impact the technical debt and maintainability of Gatsby. This should give a clear understanding of what is going on inside Gatsby. After getting this understanding, we get a bit more formal and dive into absolute quality assessment, analysing results from SIG and BetterCodeHub. In conclusion we will take all evaluated aspects together. If you want to learn more about these topics, this essay is just for you!

Quality Assessment

Let’s start with one of the most important parts of software quality: bugs per square meter… No, testing, of course. To assess the test quality of Gatsby, we first looked at the analysis by SIG. This didn’t yield accurate results. Luckily Gatsby uses Jest1 for testing. Allowing us to generate coverage reports ourselves.

With Jest, you can use a function to specify which directories should be included in the coverage report2. This function can easily adhere to the project structure and automatically pick up newly added plugins or packages. A nice configuration aspect that makes assessing the correctness of the coverage report a breeze. The function hasn’t changed for over a year, which is remarkable. This might be a good sign, but there is currently no way to validate its correctness.

Diving into the tests themselves, it appears that Gatsby has a quite structured and overall high quality testing policy. With simple unit-tests verifying behavior of single functions, integration-tests verifying component interaction and fully fledged end-to-end-tests. Test structure is very clear, due to elaborate documentation, allowing new developers to maintain the same standard in new code. The testing policy for newly added or changed code is also clearly communicated through the docs3.

Overall, the codebase has 60% statement, 55% branch, 57% function and 61% line coverage. While these numbers are low compared to the commonly used 80%, we have to take into account the low coverage of some plugins. On the other hand, it also seems that some core components have low coverage, for instance gatsby-cli and some parts of the gatsby package. From a coverage point of view, it would be nice to see all core packages to have a coverage greater than 80%. Currently, coverage is already quite satisfactory in our opinion.

As part of sharing the entire codebase with the community, anyone is free to contribute and review new changes to the code. Some sidenotes: Firstly, changes regarding the internal organization structure are reserved for Gatsby employees. Secondly, every PR needs to be approved by the respective code owner in the CODEOWNERS file4, approval is required by someone relevant to that part of the code. Finally, merging a PR is only reserved for the core team and Gatsbot. Gatsbot is an automated bot that merges PRs with the label bot: merge on green for which all pipeline checks succeeded5.

Figure 1: Pipeline

Gatsby merges over 100 PRs every week. For documentation specifically it is important to follow their style and formatting guidelines. To code changes some other rules apply: they should include tests asserting implemented behavior and ensuring that fixed bugs can’t re-occur. The pipeline (figure 1) runs checks dependent on each other, some checks require others to have passed. Type checks are being done in the pipeline. The code is automatically reviewed by danger.js for simple mistakes.

One important aspect for Gatsby are the guidelines for reviewing a PR in the docs6. The main points of these guidelines are:

  • Be kind
  • Use GitHub suggestions
  • Link examples
  • Try to avoid bikeshedding

Especially the point on bikeshedding7 stands out. Bikeshedding is defined as follows: “Futile investment of time and energy in discussion of marginal technical issues”. This has led to a lot of lengthy discussions as Go error handling891011. In python it even led to Guido’s (Python’s creator) resignation12. Since so many people are contributing to Gatsby, this rule should really be taken to heart. If you have the time, read the link for bikeshedding, it provides some interesting insights.

System evolution

To see how Gatsby is evolving, we’ll first look at its recent history. For this matter we analyzed all pull requests merged in the last month, i.e. from 19 February to 19 March. This yielded 430 pull requests divided over many areas of code. Three hotspots stood out. As expected from the Gatsby community, documentation updates top the list, with 75 PRs. Second place, with 65 pull requests was the TypeScript migration13 of the core codebase, which only started on March 5th! The community was very active as well with 45 websites added to the showcase. Besides these, support for MDX14 had significant updates with 21 related pull requests and there were 17 dependency updates by renovate [bot]. This covers roughly half of the pull requests. The other half of the pull requests were either focused on one of the many plugins, or sometimes on core features such as GraphQL, yarn 215 compatibility and moving from hot-reload to FastRefresh.

In the upcoming time, Gatsby has exciting features ahead. Some of these features will have a positive impact on the project’s technical debt: Most notably, the support for yarn 216, and the migration of the core package from JavaScript to TypeScript17. Some changes, like allowing variables in the StaticQuery component, the implementation of the schema customization API18 and updating gatsby-plugin-sharp to allow image processing on demand in dev will greatly improve the developer experience. The Gatsby team is also planning to create a Desktop app: a GUI on top of the CLI. This will not impact the current codebase, but as it is an additional product, they have to watch out for introducing a lot of technical debt in the system.

Of the changes mentioned above, we expect two to have the largest impact on Gatsby’s architecture and technical debt: the migration from TypeScript to JavaScript and the introduction of variables to the StaticQuery component.

JavaScript -> TypeScript

For the last four and a half years, the language of choice for Gatsby has been JavaScript, after having been written in CoffeeScript19 initially 20. This has been a solid choice for now. However with the increased size of the project, having untyped code in the core becomes a burden and a source of bugs rather than an opportunity. TypeScript is more commonly used in webdevelopment projects nowadays. TypeScript is a drop in replacement for JavaScript, allows gradual migration of the codebase21. Typescript also provides static type checking, allowing developers’ development environments to better understand the code and offering auto-completion suggestions. This makes it easier for any of the 3000 contributors to understand and thus to contribute to the code. Fundamentally, this should result in less unexpected testing failures, easier refactorings and less bugs!

Variables in StaticQuery

In Gatsby, you can get the correct sized image on the page by querying it from the GraphQL api, with a page query, or a StaticQuery. The StaticQuery component is a React component that allows a developer to specify a GraphQL query and to use the result of this query in its child components. This GraphQL query gets executed at compile time, this now no longer has to be done when loading the page. Currently, it is not possible to add variables to these queries. This is a problem when multiple components rely on really similar data. In this case, the components either need to be duplicated with slight changes, or the data flow needs to be changed. This leads to a suboptimal development experience with Gatsby, which would be solved by allowing variables in a StaticQuery.

Since Gatsby runs the StaticQuery at compile time, it is hard to see which value the variables in the StaticQuery instances will have upfront. While struggling with the issue, Wes Bos (a well known web developer) made a tweet resulting in a solution22. The proposed solution would insert an additional step in Gatsby’s build pipeline, compiling files using StaticQuery components into multiple specialized instances without those variables. This additional step in the build process might add some technical debt to the Gatsby system itself, but it will reduce the technical debt in projects created with Gatsby by a lot for sure!


To assess the maintainability of Gatsby, we analyzed the project using tools from SIG. SIG rates code on many different aspects, including code duplication and module coupling, on a scale from 0.5 to 5.5 stars. These scores indicate the project’s code quality compared to other codebases: to get 5 stars on a metric you need to be in the top 5%, for 4 stars in the top 5-35%, for 3 stars in the top 35-65%, for 2 stars in the top 65-95% and if you score 1 star you are in the bottom 5%.

These are the results for the full Gatsby repository:

Figure 2: Scoring by SIG

Since a big part of the codebase consists of plugins, these numbers don’t say too much and we performed an additional analysis of the gatsby package using Better Code Hub23 (also created by SIG). Better Code Hub rates projects using ten simple pass/fail guidelines. The great thing about their method is that they allow for some violations in a small percentage of the code. For example, for “Write Short Units of Code”, at most 6.9% of units may contain more than 60 lines, at most 22.3% may contain more than 30 lines of code, etc. 24. A pass on each guideline corresponds to receiving at least 4 stars for the SIG/TÜViT Evaluation Criteria25. Below, some of the results of the Gatsby core package are shown. The vertical bars represent the minimal quality that the codebase should deliver.

Components most likely affected by future change?

Figure 3: Sig guidelines evaluation

As can be seen above, two guidelines are not met: writing short units of code and writing code once (no duplication). In fact, Gatsby includes a function consisting of 867 lines26! As for code duplication, one of the refactoring candidates is the comparators file for the loki database27, which contains a duplicate block of 56 lines of code. Even though it concerns test code, duplication should be avoided.


We analyzed the software quality of Gatsby from various angles. Gatsby tries to be a very open and inclusive community, allowing many people to add their plugins to the system. This does lead to a lowered test coverage. However, the tests that exist, are actually good. With this huge community, comes a lot of communication. Gatsby handles the communication well by applying many guidelines and policies.

Mainly in their plugins, Gatsby contains too much duplicate code, which makes it prone to unnecessary errors. This could be partly solved by rewriting some plugins to minimize duplicate code. However, due to the amount of packages which are not connected with one-another, this will be quite a challenge. Gatsby acknowledges some of its technical debt and is working hard on solving problems. This can for example be seen in their effort transitioning from JavaScript to TypeScript.

Altogether, Gatsby, like all large projects, has some technical debt and has made some tradeoffs. They are working hard on improving their technical debt and have some truly exciting changes ahead. We hope to have guided you well through the technical depths of Gatsby and we hope to see you in our next and final essay!