C++ Riddle - April-05, 2022

Hi all, a new riddle!

Two questions following a discussion I had at ACCU conference yesterday, about std::string.

[1]
Does C++ allow std::string to use Copy-on-Write (COW)?
If not: since when it was forbidden, and what exactly forbids it.
If yes: how can you tell if the string class in your std implementation has implemented COW?

[2]
Does C++ allow std::string to use Small-String-Optimization (SSO)?
If not: since when it was forbidden, and what exactly forbids it.
If yes: how can you tell if the string class in your std implementation has implemented SSO?

A bonus:
Same questions as above, for std::vector!

Regarding [1], not sure if it the earliest but: since C++17 we have a non-const data method that allows direct access (and change out side of our control) the internal content of the string. So upon calling that method our instance of the string must have its own internal representation. But since data is declared as nothrow it means we can’t allocate and copy the data at that point in time. So we have to make sure every instance at any time has its own internal representation - hence no COW.

Regarding [2] I have encountered SSO implementation, so I very much believe it is possible (unless a later version of the standard somehow changed that, which I doubt). As to how you can tell… Well, you can always look at the implementation I guess. But for a more “generalized” solution, you can provide your own allocator to the string and see if it is called for an empty string (or string with one character). This way you can also detect how small does the string need to be for the SSO.

1 Like

@noamw great answer and great use of the [spoiler] tag!
We are still looking for additional answers.

Regarding 1 - It’s not allowed, by the standard

invalidation of iterators/references is only allowed for a COW string, calling non-const operator[] would require making a copy (and invalidating references), which is disallowed. for example

std::string obj1("Copy On Write"); 
char &r1 = obj1[0]; 
std::string obj2(obj1);  
char &r2 = obj1[1];

r1 is a reference to obj1. You then “copy” obj1. Then, when you attempt to take the reference the second time, it has to make a copy to get a non-const reference since there are two strings that point to the same buffer. This would have to invalidate the first reference taken, and is against the section quoted above.

Regarding 2 - std::string uses Small-String-Optimization (SSO) there are 4 WORDs that uses to hold the data and its architecture depended on x86 it is 16 bytes (you need a bit to hold if it is SSO or not and a byte for the null terminate) so it is 15 characters.
So an easy way to detect for example on x86 machine

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

A proof-of-concept string that uses all available bytes for SSO