Microbenchmark as a SDLC practice

Microbenchmark as a SDLC practice
2021-07
While developing software, many times we develop a correct solution but we always overlook the non functional requirements (specially performance and security).

This is the first case:
We were tracking down an issue on a chain of calls in a Java app that was already optimised in the past but it turns out the volumetry of data changed, so, parameters were slightly outdated.
The fix was as simple as increasing the buffer to 64KB in a call in order to fit the whole json file on a Java BufferedReader object.
While we didn't had much time to further analyse it, my personal hypothesis is that the Java app was calling the Linux read syscall quite a few times instead of a single read that gets the whole file content.
The performance improvement was significant, a call that was taking quite a few milliseconds, sometimes, going up to seconds and blocking the Quarkus/Vert.x workers, now, it does the whole chain of calls in less than 60 milliseconds.

Following that case, we tracked down another chain of calls in C++ related to this first case.
Again, this C++ app was properly tuned in the past and always worked like a charm but (there is always a but), we noticed that strings were concatenated and that a call had 2 loops that could be unified in a single loop.

So, I wrote this fairly simple C++ code to replicate the situation and microbenchmark it:
```
#include <chrono>
#include <ctime>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

//     g++ -std=c++17 -O2 -o loop__gcctest loop.cpp
// clang++ -std=c++17 -O2 -o loopclangtest loop.cpp

// https://gist.github.com/polaris/adee936198995a6f8c697c419d21f734
static std::string timePointAsString()
{
    auto now = std::chrono::system_clock::now();
    auto in_time_t = std::chrono::system_clock::to_time_t(now);
    std::stringstream ss;
    ss << std::put_time(std::localtime(&in_time_t), "%Y-%m-%d %X");
    return ss.str();
}

int main()
{
    auto w = 50000;
    std::chrono::duration<double> case1, case3;
    std::string str1, str3;

    //std::cout << "starting double loop with concatenation\n";
    auto start = std::chrono::system_clock::now();
    str1 = str1 + "part1\n";
    for (auto x = 0; x < w; x++)
    {
        str1 = str1 + std::to_string(x) + "," + timePointAsString() + "," + "abcdef\n";
    }

    str1 = str1 + "part2\n";
    for (auto x = 0; x < w; x++)
    {
        str1 = str1 + std::to_string(x) + "," + timePointAsString() + "," + "abcdef\n";
    }
    case1 = std::chrono::system_clock::now() - start;

    std::string temp1, temp2;
    //std::cout << "starting single loop with append\n";
    start = std::chrono::system_clock::now();
    temp1.append("part1\n");
    temp2.append("part2\n");
    for (auto x = 0; x < w; x++)
    {
        auto curr_time = timePointAsString();

        temp1.append(std::to_string(x));
        temp1.append(",");
        temp1.append(curr_time);
        temp1.append(",");
        temp1.append("abcdef\n");

        temp2.append(std::to_string(x));
        temp2.append(",");
        temp2.append(curr_time);
        temp2.append(",");
        temp2.append("abcdef\n");
    }
    str3.append(temp1);
    str3.append(temp2);
    case3 = std::chrono::system_clock::now() - start;

    std::cout << case1.count() << ";" << case3.count() << "\n";

    return 0;
}
```
The time difference stored on case1 variable is the current scenario AS-IS, the case3 variable is the TO-BE (the variable case2 was another variation of the concatenation, omitted to simplify the reading of this article).
Microbenchmarking this sample code, I extracted the following graph which reads as "under 95% of the cases, it takes n seconds to concatenate/append data using gcc/clang":

As we can see, the changes from double loop with concatenation to single loop with append brings a significant performance gain and it will be applied in the coming sprint! :)
Ah, of course, the explanation for this performance issue is simple, replacing 2 loops by one reduces the number of iterations (O(2n) -> O(n)), replacing the concatenation is related to another thing: string append in order to be efficient, is usually implemented as a linked list, while string concatenations is usually implemented by copying the whole array every time a new piece of string is added.

To sum up, these are some of the takeaways:
- Always make resources (CPU, memory, disk, threads, file descriptors, etc) 'tunable' through an external configuration
- Every time a new feature is introduced, benchmark it to establish a baseline
- If you're changing the current code, pay attention:
- Don't forget to check the best practices for your programming language
- Most important: Please, add microbenchmark as a practice on your SDLC

Posts By Date

{
"name": "Geraldo Netto"
"contacts": [


]
"sponsor me": [

]
"food": [

]
"folk": [

]
"mindset": [
   "Control Theory"
   "Seneca"
   "utilitarianism"
   "..."
]
"praying": [
   "Hermetism"
   "Kabbalah"
   "Spiritism"
   "Tantra"
   "Umbanda Branca"
   "..."
]
"music": [
   "Cafe Del Mar"
   "Fito Paez"
   "Jennifer Rostock"
   "Juli"
   "Leningrad"
   "Mafalda Veiga"
   "Oasis"
   "Raimundos"
   "Rammstein"
   "Sabaton"
   "Skank"
   "The Cure"
   "Tom Jobim"
   "Zucchero"
   "..."
]
"arts": [
   "Kandinsky"
   "Pollock"
   "..."
]
"tech": [
   "algorithms"
   "distributed systems"
   "linux"
   "statistics"
   "..."
]
}

eXtreme Development

Microbenchmark as a SDLC practice

2021-07