On Sat, 10 Feb 2007 19:06:14 -0800, websnarf wrote:
<snip>
> That's actually a follow up involving Michael B. Allen. (It all
> started when he made the ridiculous claim that the C std library could
> not be beat in terms of performance on strings.)
>
> This is the original thread that I was thinking of:
>
> http://groups.google.com/group/comp....3b05cfb3818d7/
>
Ridiculous? Michael's comments echo exactly my experience. I write network
servers day-in and day-out. The standard library functions work great
for strings, memcpy() in particular. snprintf() is also very useful, but
like Michael pointed out I often deal with string parsing in situ, using
FSMs. I've written stateful base64, base16, base32, URL
percent decoders/encoders. Instead of allocating memory willy-nilly, as
much as possible I write all of my code to deal with "strings" terms of
streams, not discrete objects. I've even written a streaming MIME parser.
Actually, if a committee wanted to standardize something they'd be better
off standardizing a vector structure (not unlike the bstring package, just
less distasteful). Like was discussed in the above thread, the problem
with C strings is that everybody and their grandmother has invented some
kind of wrapper which does the same thing using different names.
Personally, of late I often make use of the iovec structure from POSIX's
sys/uio.h. If I were to revamp string handling in C I'd make two modifications.
1) Add an svec structure. struct svec { unsigned char *sv_base;
size_t sv_len; }.
2a) Mandate that (char) was unassigned (very radical), or
2b) Mandate that (char) and (unsigned char) were interchangeable without
maddening signedness compiler warnings in GCC-4.1, and add a new type
uchar (typedef unsigned char uchar), 'cause I'm lazy.
Then, if I wanted to push my luck, I'd deprecate all the wide-character
interfaces, and include a small suite of functions for manipulating UTF-8
encoded Unicode strings. I might start by adding a struct uvec, something
like: struct uvec { uchar *uv_base, [a bunch of opaque members,
noticeably and intentionally missing uv_len] }.