Nothing is more important than truly learning to read code, but it is a skill nobody will teach you in college. Here is how to learn it fast.
When I was in college, I participated in lots of hackathons. I figured that these were a great way to up-skill myself and learn new things, not to mention push myself out of my comfort zone. In general, I would say that I was absolutely correct as well - I ended up with three internships through college, and received a full-time offer during my senior year at a large company. On top of all of that, if I had to give a college student advice, I would tell them to try and go out of their way to do two major things: first, is to focus on making coding something that they enjoy. Second, is to make it a habit.
However, that’s only half the story. Much like with health, there are a lot of things that can holistically affect your development skills, but they vary in importance. Maximizing any one thing doesn’t necessarily mean that you will be proficient. For example, if you were to manage to get all of the processed foods, sugar, and high-calorie items out of your diet, you might be a decent bit healthier - but if you still lived a sedentary lifestyle you would likely still struggle with common health issues. Similarly, coding is a difficult and complex skill to learn, with even the shortcuts requiring lots of time and effort.
For me, creating lots of personal projects and winning hackathons was great, but as soon as a I sat down on with massive codebase and a ticket I had to solve, I quickly realized that I was struggling to make any progress on my stories. The reason for this, I realized, was because of one thing: I had never really sat down and attempted to work with a large codebase other than my own. If I had a bug to fix in a 5,000+ line hackathon project that I had written that very same night, I could quickly tell you exactly what was going wrong, and fix it just as fast. But when I was told that something was going wrong in a 100,000+ line project that interacts with four other 100,000+ line projects, I had no idea where to start.
I was sorely lacking in a fundamental skill: reading code effectively.
But how does one go about learning to read code effectively? Learning to write better code is easy - you just sit down and meticulously learn to create something that you are envisioning. While it may take a while, as you struggle through problems you will learn both how to solve issues that you encounter effectively, and also learn to have a deeper understanding and appreciation for the tools that you are using.
On the other hand, if you sit down and try to read through the React codebase you will quickly realize that it is both overwhelming and frustrating, and without a goal in mind you will likely learn nothing. Try opening up that link. Where would you even begin? Maybe let’s start with the compiler folder as that’s pretty important. Now you’re looking at a whole other subproject with its own changelog. Now let’s imagine that you were just hired to work on this codebase. How can you realistically gain familiarity with a project this large, and start to become a valuable contributor when there’s truly no way you will ever have time to read the entire codebase.
Before we begin talking about how to become familiar with a codebase, I want to make an important distinction between learning while you are already on a team, and learning while joining a team because these are two very different things. If you are just joining a team, while you may feel pressured to immediately start working, the truth is that your immediate manager will not expect you to be pushing out story points at the same rate as your teammates of 5 years. This gives you a valuable chance to really dig into a codebase without too much oversight or tunnel vision. On the other hand, if you are already on a project, you are expected to be pushing out code, but you don’t feel like you have a good grasp on the codebase, your approach will likely need to adapt. Be forewarned that this is a case-by-case basis, and you should tailor your approach with that in mind.
Let’s begin with some strategies.
One of the most important things to understand when looking at a new codebase is the entry point(s). In a JavaScript project, this can generally be found quickly by looking at the package.json
file. Not only can this tell you about how the developers start the project locally, but it can also give you a lot more information on the broader usage of CI/CD in the project. In Java, this would instead be the pom.xml
file, but it’s important to be familiar enough with the framework and languages being used to know where to look for this.
To apply this to a new codebase that you probably aren’t familiar with, let’s use the following example: the Bitwarden Desktop client.
When you first arrive at the repository, you will likely see something like this:
At first glance, it’s pretty overwhelming. There are a lot of files, and a lot of folders. This is also the mono-repo for all of Bitwarden’s desktop clients - not just the browser interface or the mobile app. Don’t get too stressed though! Let’s focus on the entry points. We will start by looking at the package.json:
There’s a lot of information here, but the most important thing to note is the scripts
and workspaces
sections. First, the scripts
section tells us a lot about the different tools that the project uses - for instance we can see that this project uses storybook for developing and testing components, meaning that you can use npm run storybook
to start a dev server that will let you see the components in isolation. The workspaces
, on the other hand, tell us even more valuable information - it appears that the various separated client projects are in the apps
subdirectory as subprojects.
Now let’s take a look at the Desktop client itself, which we now know we can find by looking at the apps/desktop
directory. Inside of here we can find another package.json
with the following content:
This has a ton of scripts, but we can quickly narrow it down to find out what we want to know. First, we can quickly see that one way for us to run the desktop client is to use Electron. As is the common convention, we can quickly run this command by using npm run start
. If we want to know more about any specific platform, we can also inspect the various build scripts for each of those platforms and dig into how they work too.
If you are working on a project that has CI/CD set up, this is another great way to get familiar with the project. By looking at the CI/CD configuration, you can get a better understanding of how the project is built and deployed, and how it interacts with other parts of the system. I can’t even begin to express how many times a CI/CD file has held the answer to my many questions of “how in the world do I run this project”.
This is the just tip of the iceberg though - there’s a lot more we need to learn before we can start contributing to a project like this.
One of the most important - and often overlooked - things to review when you’re stepping into a new project is the PRs. Especially if you’re planning on contributing, knowing what the core maintainers prefer, look for, and even how they write their own code is beyond valuable. Let’s take a look at a whole new example: hrms (an open source payroll software):
Let’s start by going to their closed PRs page (I put in “feat” to look for features):
Now let’s look at the first PR:
Here we can quickly see that the PR is fairly simple, shows us where exactly the check-in code is, and how it handles errors. If we were to change or upgrade the overall check-in system we would likely have a solid understanding of where we need to look for more information just by looking at the files changed in this commit. On the other hand, if we needed to resolve an error with the check-in system, this would be the perfect place to start. On top of this, considering that - by looking at the PR reviewer - we can see this is from one of the core maintainers of this project, this is a great example of the style of code we can follow to have meaningful contributions to this project. All of this is just scratching the surface of what you can learn from reading a PR.
I highly recommend taking some time to read through the PRs of other projects you are interested in or need to work on. This really is a brilliant way to get a “finger on the pulse” of the codebase.
Overall, I find it a bit strange that nobody tends to talk about reading code effectively. By my estimation, around 80% of the code that I have ever worked on professionally has been written by someone else first, and then maintained by many more people later. This leads to the ability to effectively read and integrate into code being one of the most valuable skills you can learn as a software engineer, and, regardless, it is rarely talked about.
I hope you have enjoyed this article and found it useful. If you have any concerns or comments, please feel free to reach out to me - I would love the feedback, and I’m always happy to integrate feedback into my articles and cite my readers.
Happy Coding!
-Nicholas