![]() |
Aliasing in C++11
Based on my reading of the standard the compiler is free to assume a pointer to
a strongly typed enumeration aliases only other pointers to the same type and raw character pointers. For example, I would like to define a more restrictive byte array for interacting with binary data as follows: enum byte : uint8_t {}; std::vector <byte> buffer; Specifically, I believe string-like classes could benefit greatly from this sort of implementation by defining their internal state using definitions similar to the above: class string { ... private: enum byte : char {}; std::unique_ptr <byte[]> buffer; }; The tests I've performed with GCC support the above interpretation – doesa strict reading of the standard support the above, and/or is there some other well-known alternative? Thanks in advance, -molw5 |
Re: Aliasing in C++11
On Thursday, 21 February 2013 23:27:25 UTC+2, molw...@gmail.com wrote:
> Based on my reading of the standard the compiler is free to assume a pointer > to a strongly typed enumeration aliases only other pointers to the same type > and raw character pointers. For example, I would like to define a more > restrictive byte array for interacting with binary data as follows: > > enum byte : uint8_t {}; No you are wrong. Your's is traditional enum with underlying type. This is strongly typed enum: enum class byte : uint8_t {}; > std::vector <byte> buffer; Even with yours 'byte' it is definitely more restrictive. > Specifically, I believe string-like classes could benefit greatly from this > sort of implementation by defining their internal state using definitions > similar to the above: > > class string > { > ... > private: > enum byte : char {}; > std::unique_ptr <byte[]> buffer; > }; Do not perhaps write yet another text-containing class. The market is full there are way too lot of such. > The tests I've performed with GCC support the above interpretation – does a > strict reading of the standard support the above, and/or is there some other > well-known alternative? Thanks in advance, What is the "this"? It should work. Currently most people use std::string (that actually contains UTF-8 encoded text) for storing texts. I fully agree with you that it is loose and unsafe thing. However it is unlikely that some revolution is coming. Billions of lines of code and millions of interfaces all over the world use that std::string and problems are consistently elsewhere. |
Re: Aliasing in C++11
On Thursday, February 21, 2013 2:48:20 PM UTC-7, Öö Tiib wrote:
> No you are wrong. Your's is traditional enum with underlying type. This is > > strongly typed enum: > > > > enum class byte : uint8_t {}; Apologies - never the less in this context the distinction is irrelevant (the enumeration has no members). > What is the "this"? It should work. Currently most people use std::string > > (that actually contains UTF-8 encoded text) for storing texts. I fully > > agree with you that it is loose and unsafe thing. However it is unlikely > > that some revolution is coming. Billions of lines of code and millions of > > interfaces all over the world use that std::string and problems are > > consistently elsewhere. It does work – in the context of serialization, however, writes to the buffer almost always invalidate every other state as the underlying char* pointer may alias everything (including the pointer itself). The compiler is almost never able to inline to the point where it can resolve these sort of aliasing problems. I was asking whether or not another solution was commonly used to define a raw character array (string, buffer, vector, what have you) with stronger aliasing properties similar to the above – clearly this solutionis C++11 specific and I'd imagine others have attempted to address this problem in the past. Clearly the above could be used to define other primitive-equivalent types with stronger aliasing properties, string is merely the most interesting as the external interface need not change (as char* may still alias byte*). I believe it would be possible to write a standard conforming string library that uses such a byte definition, freeing the compiler to maintain state across writes to the string; I'm not, at present, planning to write one myself. |
Re: Aliasing in C++11
On Friday, 22 February 2013 01:04:15 UTC+2, molw...@gmail.com wrote:
> On Thursday, February 21, 2013 2:48:20 PM UTC-7, Öö Tiib wrote: > > No you are wrong. Your's is traditional enum with underlying type. Thisis > > strongly typed enum: > > > > enum class byte : uint8_t {}; > > Apologies - never the less in this context the distinction is irrelevant > (the enumeration has no members). It is somewhat relevant. By language rules a value of enum class type does not implicitly convert to values of integral types. Traditional enum does. Lack of named enumerators actually does not matter since enum may have all the values of underlying type regardless if enumerator for particular value exists or not. > > What is the "this"? It should work. Currently most people use std::string > > (that actually contains UTF-8 encoded text) for storing texts. I fully > > agree with you that it is loose and unsafe thing. However it is unlikely > > that some revolution is coming. Billions of lines of code and millions of > > interfaces all over the world use that std::string and problems are > > consistently elsewhere. > > It does work – in the context of serialization, however, writes to the buffer > almost always invalidate every other state as the underlying char* pointer > may alias everything (including the pointer itself). The compiler is almost > never able to inline to the point where it can resolve these sort of aliasing > problems. I was asking whether or not another solution was commonly usedto > define a raw character array (string, buffer, vector, what have you) with > stronger aliasing properties similar to the above – clearly this solution is > C++11 specific and I'd imagine others have attempted to address this problem > in the past. Lot of people certainly have. It is very likely that you can find something already implemented. I in fact haven't. I use std::string for text and std::vector<char> for byte buffer. I know it is unsafe so I am more careful. The benefit why I do it is that majority of libraries and tools support types like that. I would have to waste performance into conversions when using something else. > Clearly the above could be used to define other primitive-equivalent types > with stronger aliasing properties, string is merely the most interesting > as the external interface need not change (as char* may still alias byte*). > I believe it would be possible to write a standard conforming string > library that uses such a byte definition, freeing the compiler to maintain > state across writes to the string; I'm not, at present, planning to write > one myself. It feels that you are correct that it is possible. However ... writing standard conforming string library does not feel to have point whatsoever. Standard currently requires the std::string to be externally as loose and unsafe as it is. So only thing possible is to make it internally more efficient for particular purpose, not safer. It is unlikely to make some major difference in efficiency either since there are lot of different implementations of std::string already floating around as there are lot of other text-containing and managing libraries and classes for any purpose imaginable. |
Re: Aliasing in C++11
On Thursday, February 21, 2013 4:36:09 PM UTC-7, Öö Tiib wrote:
> It is somewhat relevant. By language rules a value of enum class type does > > not implicitly convert to values of integral types. Traditional enum does.. > > Lack of named enumerators actually does not matter since enum may have all > > the values of underlying type regardless if enumerator for particular value > > exists or not. Agreed – I'm still not seeing the relevance to this topic. > Lot of people certainly have. It is very likely that you can find something > > already implemented. I in fact haven't. I use std::string for text and > > std::vector<char> for byte buffer. I know it is unsafe so I am more > > careful. The benefit why I do it is that majority of libraries and tools support types like that. I would have to waste performance into conversions > > when using something else. Like I said – still looking for additional information. Thank you for the response. > It feels that you are correct that it is possible. However ... writing > > standard conforming string library does not feel to have point whatsoever.. > > Standard currently requires the std::string to be externally as loose and > > unsafe as it is. So only thing possible is to make it internally more > > efficient for particular purpose, not safer. It is unlikely to make > > some major difference in efficiency either since there are lot of different > > implementations of std::string already floating around as there are lot > > of other text-containing and managing libraries and classes for any > > purpose imaginable. The advantage is the compiler is able to maintain state across string writes, as I mentioned above; that alters the performance of user code. Obviously the impact is domain specific – what isn't? |
Re: Aliasing in C++11
On Friday, 22 February 2013 04:13:07 UTC+2, molw...@gmail.com wrote:
> The advantage is the compiler is able to maintain state across string writes, > as I mentioned above; that alters the performance of user code. Obviously the > impact is domain specific – what isn't? I am still unsure why compiler can not optimize away any aliasing checks already by simply assuming that you do not somehow use underlying buffer of std::string or std::vector<char> under question as storage for some other objects possibly involved in your domain-specific solution? |
Re: Aliasing in C++11
On Thursday, February 21, 2013 8:57:57 PM UTC-7, Öö Tiib wrote:
> I am still unsure why compiler can not optimize away any aliasing > > checks already by simply assuming that you do not somehow use underlying buffer > > of std::string or std::vector<char> under question as storage for some other > > objects possibly involved in your domain-specific solution? I honestly don't know how to respond to that. Review the strict aliasing rules? |
Re: String is not UTF (was Re: Aliasing in C++11)
On Friday, 22 February 2013 17:13:52 UTC+2, Andy Champ wrote:
> On 21/02/2013 21:48, Öö Tiib wrote: > > What is the "this"? It should work. Currently most people use std::string > > (that actually contains UTF-8 encoded text) for storing texts. I fully > > agree with you that it is loose and unsafe thing. However it is unlikely > > that some revolution is coming. Billions of lines of code and millions of > > interfaces all over the world use that std::string and problems are > > consistently elsewhere. > > std::string does not contain UTF-8 encoded text. It contains chars. If > your implementation treats those chars as UTF-8 encoded characters, then > fine - but that is NOT part of the standard, it's just something that > *nix operating systems tend to do. I did in fact describe most widespread practice. char is a byte by C++ standard keep there whatever encoding standard is silent. Other possibility is to use std::wstring for texts if wchar_t can contain UTF-16LE. It might help in Windows or with QT as GUI. That is anyway minority maybe 20% of C++ code written. > You might like to consider what happens when you resize a string to > remove part of a multibyte character. There's nothing there to make it > UTF safe... There are no alternatives. Such and all other difficulties are normal work. That is why developers are for. > I suspect this is why fstream::open takes a char* - someone assumed that > a char* was utf-8, and for those operating systems where a filename is > unicode it's broken. I repeat ... there are no serious support to Unicode in C++. fstream was likely designed when no one thought that file names can be anything but ASCII. UTF-8 is most popular encoding. Majority of HTML or other XML you see in internet are in that. So it makes sense to use something what you do not have to convert. |
Re: Aliasing in C++11
On Friday, 22 February 2013 06:12:59 UTC+2, molw...@gmail.com wrote:
> On Thursday, February 21, 2013 8:57:57 PM UTC-7, Öö Tiib wrote: > > I am still unsure why compiler can not optimize away any aliasing > > checks already by simply assuming that you do not somehow use underlying buffer > > of std::string or std::vector<char> under question as storage for some other > > objects possibly involved in your domain-specific solution? > > I honestly don't know how to respond to that. Review the strict aliasingrules? It all seems to be about storage taken with malloc(). It feels that if you use underlying buffer of std::string or std::vector<char> for odd purposes then you are on your own anyway. I can't find that standard compliant compiler is required to expect that std::string::iterator and double* may point to same thing. So ... what you do seems more and more domain-specific. |
Re: Aliasing in C++11
On Friday, February 22, 2013 9:19:28 AM UTC-7, Öö Tiib wrote:
> It all seems to be about storage taken with malloc(). It feels that if you > > use underlying buffer of std::string or std::vector<char> for odd purposes > > then you are on your own anyway. I can't find that standard compliant > > compiler is required to expect that std::string::iterator and double* may > > point to same thing. > > > > So ... what you do seems more and more domain-specific. I don't know why I'm still replying to this – std::string:iterator contains a raw character pointer or offset into it's buffer. The compiler is forced to assume the write itself may alias double*. |
| All times are GMT. The time now is 04:59 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.