Research shows that work dependencies – i.e., engineering decisions constraining other engineering decisions – is a fundamental challenge in software development organizations, especially those that are geographically distributed.1
Modular Design, the traditional technique intended to reduce interdependencies among components of a system (and between teams developing them), imposes certain limitations in the context of software development. The theory around modular design of software revolves around the assumption that by reducing the technical interdependencies among the modules, task interdependencies are reduced, thereby allowing teams to work in parallel on different parts of the system without needing to communicate among themselves.2
There are problems with these assumptions though. Research by Garcia A et. al. suggests that existing modularization approaches consider only a subset of all technical dependencies 3. Additionally, minimal communication between teams causes variability in the evolution of projects and their subsequent integration. It is also common for software systems to develop or reveal their requirements over time which challenges the determinism of assumptions in the modularity approach. 4 5
In the paper by Cataldo et. al.6 the authors argue that the traditional software modularization is broken and that past work has not taken into consideration, both the technical and work components of dependencies. They propose a new framework for assessing the impact of dependencies on software development productivity called “Socio-Technical Congruence” which shows that development time is reduced significantly when developers’ coordination patterns are congruent with their needs.
They do this by conceptualizing a product development project as a socio-technical system, where the technical and the social components need to be aligned in order to have a successful project. The concept of congruence has two components. First, the needs of coordination that are determined by the technical dimension of the socio-technical system and, secondly, the coordination activities carried out by the organization representing the social dimension. If the needs match the activities being carried out, we have congruence!
In order to compute socio-technical congruence, we need deep insight into the repository. We need syntactic dependencies among files, files that each developer modified and coordination instances between developers. In the interest of time, we substitute each of these with our own alternatives, which we discuss in more detail below. The subtitutes are:
- spaCy’s architecture with modules INSTEAD OF syntactic dependencies among files
- Issues/PRs related to a module INSTEAD OF files that each developer modified
- interactions on comment threads of issues related to a module INSTEAD OF coordination instances between developers
We can get all this information quite easily from the spaCy website (architecture) and additionally, using the Github API.
We do not intend to be as exact as the authors in computing the socio-technical congruence since that requires a rigorous statistical analysis. This seems to be beyond the scope of this blog post. Instead, we will do a more qualitative analysis of the dependencies.
We first identify the tightly and loosely coupled components of spaCy’s architecture, after which we present the communication network of the developers in raising issues / pull requests and interacting on their threads via comments. Finally, we contrast the two and comment on the result in relation to Conway’s Law2. One would expect the more loosely coupled components of the architecture to have fewer developers in common, whereas a strong network of interactions is expected between tightly coupled components.
Components of SpaCy’s architecture
The relationship between modules as seen in the figure7 act as a fair proxy for syntactic dependencies between files, preserving the same folder/file structure as the actual code while also abstracting each file behind modules. This decreases resulution, but allows for a vizualization that is easier to understand.
Loosely Coupled Components:
- Text Categorizer
- Documentation (which is not shown here as part of the architecture)
Tightly Coupled Components:
Communication in SpaCy
To analyze the communication network in SpaCy, we wrote a few python scripts8 that use the GitHub API to fetch users who interacted with each other via comments on issues and pull request threads in last 2 years. We filtered issues for labels matching:
feat / doc,
feat / ner,
feat / tagger,
feat / parser,
feat / textcat,
feat / tokenizer, and
docs. This was done in order to compare the collaboration of users in different parts of the architecture. Different issue labels with their respective colors are shown in the legend of this network. A longe time span for an analysis might lead to discreptancies due to labels and their usage changing over time, which is why we’ve chosen to analyze two years.
Getting to the visualization itself, each yellow node represents a user and all other nodes are a module from the architecture as shown in the legend. Connections are made only between users and modules. A connection signifies that the said user made comments on an issue or a PR thread related to that particular module. The thickness of the connection tells how many comment threads the user was involved in. The visualization was developed using Flourish. Click here for an interactive version.
It can be seen from the network plot that there is an emergence of both independent and interdependent groups working together in spaCy. The loosely coupled components such as Text Categorizer, Tagger, Documentation, and Parser are mostly independent with a few shared collaborators. The tightly coupled components such as the Doc, Tokenizer, and Language have many shared collaborators, but there is no strong pattern. A more interesting pattern that emerges is that the maintainers of SpaCy (the core members which are the user nodes in the center of the image) contribute across modules. This leads us to conclude that there are no obvious roles that each maintainer is playing (or none that are apparent through our analysis, at least).
Conway’s Law… Is it still relevant?
Conway Law states that the component structure and organizational structure are in a homomorphic relationship. More than one component can be assigned to a team, but each component must be assigned to only a single team. This means that the organizational structure ends up mimicking the component structure of the software2.
Based on our analysis, eventhough SpaCy’s architecture comes close to obeying Conway’s Law, it is NOT convincing enough in our opinion. This is mostly owing to the absence of a strict distinction amongst the departments working on different parts of spaCy’s architecture. In hindsight, the scenario is better described as a small group of super enthusiastic developers working on almost all parts of the architecture with additional (even bigger) clusters of developers (i.e. contributors) making a few contributions to one or two modules. We believe that Conway’s Law still applies to organizations with a strict distinction between departments but not so much in an open-source project which favours flexibility in terms of what one can work on.
Cataldo, M. et al. 2007. On Coordination Mechanism in Global Software Development. In Proceedings of the International Conference on Global Software Engineering (ICGSE’07), Munich, Germany, http://casos.cs.cmu.edu/publications/papers/cataldo_p1.pdf ↩
Garcia, A., et al. 2007. Assessment of Contemporary Modularization Techniques, ACOM’07 Workshop Report. ACM SIGSOFT Software Engineering Notes, 35, 5, 31-37, https://dl.acm.org/doi/10.1145/1290993.1291005 ↩
Grinter, R.E., Herbsleb, J.D. and Perry, D.E. 1999. The Geography of Coordination Dealing with Distance in R&D Work. In Proceedings of the Conference on Supporting Group Work (GROUP’99), Phoenix, Arizona, https://dl.acm.org/doi/10.1145/320297.320333 ↩
Socio-Technical Congruence: A Framework for Assessing the Impact of Technical and Work Dependencies on Software Development Productivity, https://herbsleb.org/web-pubs/pdfs/cataldo-socio-2008.pdf ↩
Python scripts to analyze the communication network, https://gitlab.ewi.tudelft.nl/in4315/2019-2020/desosa2020/-/tree/group-7-essays/projects%2Fspacy%2Fscripts ↩