spaCy

spaCy Logo

spaCy is a free, commercially open-source library for industrial-Strength Natural Language Processing (NLP) in Python, released under the MIT license. Part of the Explosion Company. It is designed to do real work, build real products or gather real insights. The library is designed in such a way that it tries to avoid wasting your time. The owners like to think of Space as the “Ruby on Rails” of Natural Language Processing. spaCy is the way to go when dealing with large volumes of text, whether it is used for Deep Learning, information extraction or dealing with words in context.

spaCy is compatible with 64-bit CPython 2.7 / 3.5+ and runs on Unix/Linux, macOS/OS X and Windows. It is already trusted by Airbnb, Uber, Quora, Retriever, Stitch Fix, Chartbeat, the Allen Institute for Artificial Intelligence and many more. In 2015, independent researchers from Emory University and Yahoo! Labs even showed that spaCy offered the fastest syntactic parser in the world and that its accuracy was within 1% of the best available (Choi et al., 2015). spaCy is maintained by two people, and they are welcoming help. They even want questions, issues, bugs etc. to be shared publicly, so more people can benefit from it. They even made a very detailed contributing file.

In the coming sections, an in-depth analysis of spaCy’s architecture will be included. First the stakeholder analysis and merging pipeline based on existing pull requests will be laid out. Afterwards, the architecture will be examined from three different perspectives (Context, Deployment and Development). Afterwards, the project’s Technical and Testing debt will be analyzed and the evolution of the project from the moment it was released will be identified. Lastly, we will summarize the findings and conclusions regarding the architecture of spaCy.

The Team

Meet the team:

  • Anwesh Marwade
  • Michael Leichtfried
  • Nikhil Saldanha
  • Thijs Timmer

spaCy: For All Things NLP

In this blog post, we take a sneak-peek into a leading-edge Natural Language Processing (NLP) library for Python, its stakeholders and its journey up until now. Hence we decided to get some words together (pun intented) to describe the insipiration behind ExplosionAI’s NLP project: spaCy

From Vision to Architecture

SpaCy is the brainchild of Matthew Hannibal, who has a background in both Linguistics and Computer Science. After finishing his PhD and further 5 years of research in state-of-the art NLP systems, he decided to leave academia, created SpaCy and started interacting with a wider development community.

Quality and Technical Debt

In the previous blog, we zoomed into spaCy through the lens of the architectural views as described by Kruchten. After discussing the product vision and the architecture, it is time to take a look at the quality safeguards and the architectural integrity of the underlying system.

Dependencies and Modular Software Design

Research shows that work dependencies – i.e., engineering decisions constraining other engineering decisions – is a fundamental challenge in software development organizations, especially those that are geographically distributed.