Art of debugging

Building software is a conundrum of sorts. A computer is a machine that does exactly what we tell it to do and yet we face so many problems. So the problem here is not a computer but us.

Act of building software is that of realising the full potential of our intelligence and also our limitations.

There is an impedance between how we usually solve our problems and how computers do it.

The impedance is also the reason why we need constraints in how we build software.

Meta much? 😃

For more about constraints in software:

https://www.varenya.dev/posts/api-design

https://www.varenya.dev/posts/api-design-2

Whenever our mental model about how computer operates deviates from how the computer actually operates - we end up with a bug.

Now that we know the source of the bugs let's see how to debug software.

As you might guess there is no hard and fast rule approach to debugging as well. But over time some patterns have emerged in my approach and I wanted to codify them in this post.

Challenge your assumptions

More often than not we struggle to find the issue because we assume that - this is how it’s supposed to work. But of course, if it did we wouldn’t be facing this bug.

Now in practice, this takes different forms.

Example:

If you have ever faced problems with modals showing up in unexpected order even with a style like this:

.modal {
  z-index: 1000000;
}

Here the assumption is that a higher the z-index will result in the DOM element being at the top.

Well, now that you know it's not working as expected.

Our assumption is wrong - so what is it that we are missing in the above case? - stacking contexts!

I won’t go too deep into it but this is a problem that a lot of folks run into when they start off doing CSS. There is more nuance here and I would urge the readers to look for material on this.

And FYI, I too learned about stacking contexts and other nuances involved after the fact.

Another side effect of debugging is that you gain a deeper knowledge of the thing you are working on.

If the bug you found got fixed by some random changes - try and dig deeper into the “why”. It will grow you in more ways than one.

Read the error messages

This one’s straightforward right?

After a while, we take loads of things for granted. Jump to conclusions about what caused the issue. Only to find yourself wasting hours to realize the answer was staring you right in your face.

Debugging lot of times an antidote to hubris 😃

Example:

While working on React app nothing showed up on the UI that I was expecting.

I went through these assumptions:

I didn't return the JSX from the component.
Didn't pass the props.
Applied wrong CSS - white background on white text?
...

Only to see the error message and see that I had misspelled the filename.

Read the docs

I know right? Well, trust me reading docs for a few mins can spare you hours of debugging.

If you open up a popular repo on Github most of the issues reported have answers in documentation. Folks jump to report an issue instead of doing some due diligence.

Some frameworks and tools have very specific semantic way of using them. If the semantics are not followed it can lead to subtle issues that will escape us.

Even after reading issue's will appear but we will likely get a signal about what went wrong.

Sometimes the documentation is poor. But it’s still worth giving a cursory look. Paying attention to the possible “gotchas” listed before digging in.

Example:

I tried to use a library for async actions only to realize that the framework I was using was not compatible.

A more nuanced one:

When I used the useEffect in React for the first time to fetch some data I ended up in a infinite loop. Turns out the mental model behind useEffect isn't as simple as it looks.

Gif worth a thousand words:

Context switching

This I found to be one of the sinister ways bugs crept into my code.

Also affected my debug process quite a bit.

The idea here is when a task is in execution one should not switch over to something else while doing so. I found the cost to be massive for even a short switch to something else.

For deep work this can hurt your output.

Don’t do this:

Example:

I was in flow while debugging a hard to reproduce issue.

I got called into a meeting. After the meeting I started off from where I left only to be a mess.

This applies to most tasks in general.

Debugging is where I am the most knee-deep into the guts of the complexity and in deep flow state. So if something else demands your attention. Make sure to take a breather and start from scratch and not assume you can get back to it.

Peel the layers of abstractions

If the above approaches didn’t solve the bug. Most likely it’s something that will need you to dig deeper.

Depending on the issue the "layer" will differ but advice is same.

Example:

A place in UI where the total number items should show up - shows up as NaN.

The layers here could be:

State Management
Parsing
Caching
Query
......

And the above can happen at frontend and backend (web dev perspective).

To isolate where the issue occurred the approach could be:

Bottom-up - starting from where the issue happened and going up the layers.
Top to bottom - starting from where data entered system to where the issue happened.
combination of both - somewhere in the middle.

Tools help here a lot. Setting up a breakpoint and walking you through the code - peeling the layers 😃.

Pair Programming

A lot of times when we are in a rut of debugging something it’s likely that a second set of eyes will reveal the answer. This is where pair programming has helped me a lot.

If you have seen Naruto and how to get out of Gen Jutsu, you know what I mean 😃:

Reproduce the bug consistently

Well, this was the unsaid assumption in all my previous advice that I shared. And a lot of times that is the case.

These sorts of bugs happen less at the app level than at lower layers. The reason is lower-level primitives tend to combine/compose in a very complex mesh.

A good metaphor for this is Chess - each piece has easy to define moves but the combination of them is complex.

Some language primitives which are easy to define but difficult to master:

Pointers - I mean phew right!
Closures - Closures - stale closures anyone? 😄
Async - Well this is most trickiest of all . These issues are hard to reproduce and result in erratic behavior, to say the least.
CSS cascade rules - I mean after a point the complexity is so high it becomes difficult to predict the results.
....

To reproduce issues of such nature we will likely need to setup some instrumentation.

Sometimes it’s as simple as putting in loads of logs to see what the heck is happening. And from that reproduce it in your environment by creating those conditions.

Example:

If its some CSS rule not getting applied as you expect the best way to isolate is:

Create a sample html/css with similarities to the original.
Add the rules one by one to see which one is causing the issue.

If the issue is intermittent. Like some piece of text not showing up every now and then:

Clear the cache.
Isolate the logic where the text is being loaded - run it on a isolated test environment.
If possible get the whole state of the system at that point..
If any async logic is there separate the logic and run it couple of times to see the output.
........

Get some sleep/break

If you keep finding yourself hitting a wall. Then that is a great sign to move away from the problem.

Loads of times the bug that took up the better part of my day got solved first thing in the morning. So get out of your own way by taking a break.

Well, that's it. I hope this helps!.