Lesson Learned: Code Regression


Giving advice is not very effective most of the time. Sharing real experiences and trouble and how it is handled is more effective. The best is whoever learns from others mistakes. So I am sharing here a story where I and my team introduced a bug and it was leaked to the production environment. I know most of the advice here when I was consulting people in Agile but when the situation changed, I did that same mistake. The shame was painful and hence the learning was intense. Experience is where you really learn, not advice from ivory towers.

I am sharing here hoping that it will be useful for myself and for others.

Look at this code fragment that is a part of timezone based message broadcast.
    private String timezoneMetadataId = “23467fe6523310”;
       
       …

     //Check Metadata for Timezone
    List subTimezoneValue = sub.getSubscriberMetaData().get(timezoneMetadataId);
    for (String timezone : timezones.split(“[;,]”)) {
        for (MetadataValue mv : subTimezoneValue) {
            //Do some processing here

        }
    }

The question here: Is timezoneMetadataId acceptable to be null? In our case, it must not be null. The initialization above ensures it is not null.
Until now, everything is ok, but the hardcoded timezoneMetadataId is not recommended especially as it is repeated in many places in the source code.
We wanted to change this hardcoded id and place it inside the configuration file. As we use Guice dependency injection, we changed the above code to be:

    @Named(value = “mongo.wmprod.subscriber.metadata.timezoneId”)
    private String timezoneMetadataId;

And here we made the mistake of forgetting the @Inject annotations which meant the @Named annotation has no effect and the variable, timezoneMetadataId is initiated to null. Sadly the above for loop code will not raise any errors as getting the value of key null is an empty list and looping over the empty list is ok.

This is a code regression. The changes that intend to make the code better ended injected a critical bug.

We have a Code Review process, the reviewer did not notice the mistake. Maybe because it was buried into a broad set of changed lines. 
QA also did their own testing, they did not run the timezone based broadcast test case. 
Here I am sharing lessons learned to avoid such regressions in the future:
Lessons Learned for Programming:
You are changing the code. Do not be fooled by the simplicity of your changes or assume safety. Follow proven programming practices all the time.
1) Verify value is not null

All configuration items must be verified. See below:

import com.google.common.base.Preconditions;
Preconditions.checkNotNull(timezoneMetadataId);

You can also verify the size of the key or the validity of the key. Fail Fast is your best friend.
2) Add log statement that shows the id.

logger.info(“Sending broadcast with timezone metadata id: {}”, timezoneMetadataId);
If the value is invalid in some way, you can check it in the log files.


3) Write a unit test that ensures failure in case of null or invalid timezone metadata.
Actually, this will ensure that your preconditions are really working. It is indeed easy to make mistakes. Ensuring your preconditions and verifications are working is essential. If the project has no unit tests at all, create a task to create your first unit test immediately. 

4) Write integration tests. 

In this project codebase, there was no integration testing framework in effect. If you join such team, create a task to create your first integration test in your first employment week.


5) Use Constructor Injection
See https://github.com/google/guice/wiki/Injections
Lessons Learned for Code Review:

We did not have a written checklist for code review. This is a mistake. Even if you join a team that does not have a checklist, do not wait until you have team experiences to create your own list. Conduct a meeting and start with a simple checklist immediately. With each Story or Bug Code Review, the checklist should be filled and attached to the Story or Bug. The checklist template should be updated frequently.

We took Code Review seriously but with no structure. The above ideas are to make it serious process with explicit checks and responsibility. 

Lessons Learned for QA:

We kept assuring the safety of changes to Project Leader and QA. This is really a mistake. Fears should be reduced by adding more test cases not by getting developers assurance about the safety of the changes.

I find it necessary to list all changed functionality to QA. It is even better for QA to have access to changes themselves. Once QA read source code nouns such as variables names, functions names, and class names, they can associate it to UI functionality and create the corresponding test cases.
In the end, I hope my mistakes and lessons learned will be useful.
From ahm507.blogspot.com