[c++] In-depth analysis of your favorite stringstream and snprintf performance

I recently wrote two similar modules in a program. One uses snprintf to output intermediate data, and the other uses stringstream lazily. And guess what? The frame is actually pressed! ! Who is holding back performance?

Performance analysis experiment from Alibaba Cloud

I searched online and found that someone had done a performance analysis experiment. His experimental demo has roughly 4 steps:

  1. The stringstream object is constructed inside the loop body and filled with data
  2. The stringstream object is constructed outside the loop body, and the object is cleared and reused each time inside the loop body.
  3. Create a buffer inside the loop body and use snprintf to fill the data
  4. Create a buffer outside the loop body. Clear the buffer inside the loop body and then use snprintf to fill in the data.

After 100,000 calls are completed, the time consumption of the above four methods is:

method

2

>

method

3

>

method

4

>

method

1

Method 2>Method 3>Method 4>Method 1

Method 2>Method 3>Method 4>Method 1

It can be seen thatdon’t do this unnecessary repeated construction and destruction inside the loop body.

Reason

So, why? What did these two do?

C99 snprintf

Observing its source code, you will find that like other printf, it is a variable parameter function, which means that it will undergo a series of recursive expansions:

/* Maximum chars of output to write in MAXLEN. */
extern int snprintf (char *__restrict __s, size_t __maxlen,
const char *__restrict __format, ...)
     __THROWNL __attribute__ ((__format__ (__printf__, 3, 4)));

When expanded to the lowest level, this function first pre-allocates memory based on the required string length. The bottom level is almost like this:

char* buf = (char*)malloc(buf_size);

Then, perform a formatting operation on the allocated memory:

int result = vsnprintf(buf, buf_size, format, args);

What is interesting is that its parameter expansion depends on the vsnprintf function:

extern int vsnprintf (char *__restrict __s, size_t __maxlen,
const char *__restrict __format, _G_va_list __arg)
     __THROWNL __attribute__ ((__format__ (__printf__, 3, 0)));

In order not to make my head too big, I will make a very short and concise implementation of the essence version of vsnprintf here:

int vsnprintf(char *__restrict __s, size_t __maxlen,
const char *__restrict __format, _G_va_list __arg) {<!-- -->
    int result;
    va_list copy;
    va_copy(copy, args);
    result = vsnprintf_l(__restrict __s, __maxlen, __restrict __format, copy);
    va_end(copy);
    return result;
}

In fact, its implementation here varies from compiler to compiler. I used vsnprintf_l here, which is a thread-safe version.

Next, stop it! Make sure you understand the basics of variadic lists and related functions! If you don’t know much about it, I will write another blog in two days (The good news should not flow to outsiders.jpg)

Next, let’s take a look at what this function does:

  1. va_copy(copy, args);Creates a copy of the variadic argument list. Why create a copy? Essentially, it is to prevent modification of the parameter list from causing painful effects on the original parameter list that are difficult to debug.
  2. vsnprintf_l(__restrict __s, __maxlen, __restrict __format, copy);This function receives the following parameters and formats buf
    1. Pointer to the buf we specify to write to
    2. The size of the buf we specify
    3. A format string containing the format of the resulting string (a palindrome, yay!)
    4. Parameter list, the actual value to be written in the string
  3. Our vsnprintf_l function returns an integer value representing the number of characters successfully written to the buffer (excluding the trailing null character). This value will be returned to the caller by the vsnprintf function.
  4. To avoid memory leaks, clean the parameter list in the last step.

Obviously, at this time, we have fallen into a matryoshka doll: it seems that what snprintf wants to do is taken away by vsnprintf, and what vsnprintf wants to do is taken away by vsnprintf_l!

why? Because it involves writing data, we must consider whether the writing operation is safe under multi-threading.

Let’s take a look at this thread-safe vsnprintf_l:

#define MAX_BUFFER_SIZE 1024

typedef struct {<!-- -->
locate_t locate;
char buffer[MAX_BUFFER_SIZE];
size_t size;
}vsnprintf_data;

static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

// I can’t write the parameter name with a very long name. Let’s change it to a nickname =.=
int vsnprintf_l(char *str, size_t size, const char *format, va_list args) {<!-- -->
    vsnprintf_data data;
    data.size = size;
    data.locale = locale_t();
    if (format) {<!-- -->
        data.locale = newlocale(LC_ALL_MASK, format, data.locale);
    }
    int result = vsnprintf(data.buffer, MAX_BUFFER_SIZE, format, args);
    if (data.locale) {<!-- -->
        freelocale(data.locale);
    }
    return result;
}

We see that we have a structure vsnprintf_data used to store thread-safe data. Its content is as follows:

  1. locate_t locate: stores the locate information of the current thread
  2. buffer: stores the formatted string
  3. size: Our old friend size, indicating the size of the buffer

We first defined a mutex lock to protect the correctness of operations in a multi-threaded environment. then:

  1. An instance data of vsnprintf_data is created and its size is initialized to the passed in size parameter. If a format string (format) is passed, then we use the newlocate function to create a new locate object and store it in data.locate. This locate object is created from the passed format string to support a specific locale.
  2. Next, the function calls the vsnprintf function to write the data into data.buffer. We see the beginning of the nesting doll: the vsnprintf function will format the data into a string according to the specified format string and parameter list, and write the result to the buffer. If the formatted string exceeds the buffer size, vsnprintf will automatically adjust the buffer size and dynamically allocate and release memory. If the formatted string exceeds the initially allocated memory size, the function will reallocate a large enough memory area by calling realloc and perform the formatting operation again. If there is enough space to accommodate the formatted string after the memory is first allocated, no reallocation of memory will occur. After completing the formatting operation, the allocated memory can be released by calling free.
  3. If a new locale object is created, the function uses the freelocale function to release the object. Then return the return value of the vsnprintf function, indicating the number of characters successfully written to the buffer (excluding the trailing null character).

It should be noted that in the snprintf function, after each memory reallocation, a new memory block will be written to the back of the original memory block to make full use of the allocated memory space. Additionally, if there is enough space to accommodate the formatted string after the first memory allocation, no reallocation of memory will occur.

It can be seen that due to factors such as the complexity of formatted string parsing, the number and type of parameters, the size and content of the string, etc., the performance of this function will be affected to some extent.

stringstream

It’s over, I can’t get a taxi. I’ll take a taxi home first and then write 55555 tomorrow

I’m here to update.

Let’s look at std::stringstream again.

Stringstream is essentially a class. I intercepted part of the definition:

  template <typename _CharT, typename _Traits, typename _Alloc>
    class basic_stringstream : public basic_iostream<_CharT, _Traits>
    {<!-- -->
    public:
      // Types:
      typedef _CharT char_type;
      typedef _Traits traits_type;
      // _GLIBCXX_RESOLVE_LIB_DEFECTS
      // 251. basic_stringbuf missing allocator_type
      typedef _Alloc allocator_type;
      typedef typename traits_type::int_type int_type;
      typedef typename traits_type::pos_type pos_type;
      typedef typename traits_type::off_type off_type;

      // Non-standard Types:
      typedef basic_string<_CharT, _Traits, _Alloc> __string_type;
      typedef basic_stringbuf<_CharT, _Traits, _Alloc> __stringbuf_type;
      typedef basic_iostream<char_type, traits_type> __iostream_type;

    private:
      __stringbuf_type _M_stringbuf;

    public:
      // Constructors/destructors
      /**
       * @brief Default constructor starts with an empty string buffer.
       * @param __m Whether the buffer can read, or write, or both.
       *
       * Initializes @c sb using the mode from @c __m, and passes @c
       * &sb to the base class initializer. Does not allocate any
       * buffer.
       *
       * That's a lie. We initialize the base class with NULL, because the
       * string class does its own memory management.
      */
      explicit
      basic_stringstream(ios_base::openmode __m = ios_base::out | ios_base::in)
      : __iostream_type(), _M_stringbuf(__m)
      {<!-- --> this->init( & amp;_M_stringbuf); }

As you can see, it also has a buffer for storing strings, and uses ios::in and ios::out to perform underlying input and output.

Since there is a buffer, we should think that this class will ensure whether the buffer is full and whether the buffer needs to be cleared and reallocated. And this should be the factor that affects its performance the most.

Think of three ways to write data to stringstream:

  1. put(): It writes a single character into the stringstream buffer we just saw. When using this function, it will check whether the current buffer is full. If not, write directly; otherwise, allocate more space.
  2. write(): This function writes the specified number of characters from the given character array to the buffer.
  3. Overloaded << operator for complex types.

It seems the latter is more convenient. sigh.