Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   Aliasing in C++11 (http://www.velocityreviews.com/forums/t957904-aliasing-in-c-11-a.html)

molw5.iwg@gmail.com 02-21-2013 09:27 PM

Aliasing in C++11
 
Based on my reading of the standard the compiler is free to assume a pointer to
a strongly typed enumeration aliases only other pointers to the same type and
raw character pointers. For example, I would like to define a more restrictive
byte array for interacting with binary data as follows:

enum byte : uint8_t {};
std::vector <byte> buffer;

Specifically, I believe string-like classes could benefit greatly from this
sort of implementation by defining their internal state using definitions
similar to the above:

class string
{
...
private:
enum byte : char {};
std::unique_ptr <byte[]> buffer;
};

The tests I've performed with GCC support the above interpretation – doesa
strict reading of the standard support the above, and/or is there some other
well-known alternative? Thanks in advance,

-molw5

Öö Tiib 02-21-2013 09:48 PM

Re: Aliasing in C++11
 
On Thursday, 21 February 2013 23:27:25 UTC+2, molw...@gmail.com wrote:
> Based on my reading of the standard the compiler is free to assume a pointer
> to a strongly typed enumeration aliases only other pointers to the same type
> and raw character pointers. For example, I would like to define a more
> restrictive byte array for interacting with binary data as follows:
>
> enum byte : uint8_t {};


No you are wrong. Your's is traditional enum with underlying type. This is
strongly typed enum:

enum class byte : uint8_t {};

> std::vector <byte> buffer;


Even with yours 'byte' it is definitely more restrictive.

> Specifically, I believe string-like classes could benefit greatly from this
> sort of implementation by defining their internal state using definitions
> similar to the above:
>
> class string
> {
> ...
> private:
> enum byte : char {};
> std::unique_ptr <byte[]> buffer;
> };


Do not perhaps write yet another text-containing class. The market is full
there are way too lot of such.

> The tests I've performed with GCC support the above interpretation – does a
> strict reading of the standard support the above, and/or is there some other
> well-known alternative? Thanks in advance,


What is the "this"? It should work. Currently most people use std::string
(that actually contains UTF-8 encoded text) for storing texts. I fully
agree with you that it is loose and unsafe thing. However it is unlikely
that some revolution is coming. Billions of lines of code and millions of
interfaces all over the world use that std::string and problems are
consistently elsewhere.


molw5.iwg@gmail.com 02-21-2013 11:04 PM

Re: Aliasing in C++11
 
On Thursday, February 21, 2013 2:48:20 PM UTC-7, Öö Tiib wrote:
> No you are wrong. Your's is traditional enum with underlying type. This is
>
> strongly typed enum:
>
>
>
> enum class byte : uint8_t {};


Apologies - never the less in this context the distinction is irrelevant (the enumeration has no members).

> What is the "this"? It should work. Currently most people use std::string
>
> (that actually contains UTF-8 encoded text) for storing texts. I fully
>
> agree with you that it is loose and unsafe thing. However it is unlikely
>
> that some revolution is coming. Billions of lines of code and millions of
>
> interfaces all over the world use that std::string and problems are
>
> consistently elsewhere.


It does work – in the context of serialization, however, writes to the buffer
almost always invalidate every other state as the underlying char* pointer may
alias everything (including the pointer itself). The compiler is almost
never able to inline to the point where it can resolve these sort of aliasing
problems. I was asking whether or not another solution was commonly used to
define a raw character array (string, buffer, vector, what have you) with
stronger aliasing properties similar to the above – clearly this solutionis
C++11 specific and I'd imagine others have attempted to address this problem in
the past.

Clearly the above could be used to define other primitive-equivalent types with
stronger aliasing properties, string is merely the most interesting as the
external interface need not change (as char* may still alias byte*). I believe
it would be possible to write a standard conforming string library that uses
such a byte definition, freeing the compiler to maintain state across writes to
the string; I'm not, at present, planning to write one myself.

Öö Tiib 02-21-2013 11:36 PM

Re: Aliasing in C++11
 
On Friday, 22 February 2013 01:04:15 UTC+2, molw...@gmail.com wrote:
> On Thursday, February 21, 2013 2:48:20 PM UTC-7, Öö Tiib wrote:
> > No you are wrong. Your's is traditional enum with underlying type. Thisis
> > strongly typed enum:
> >
> > enum class byte : uint8_t {};

>
> Apologies - never the less in this context the distinction is irrelevant
> (the enumeration has no members).


It is somewhat relevant. By language rules a value of enum class type does
not implicitly convert to values of integral types. Traditional enum does.
Lack of named enumerators actually does not matter since enum may have all
the values of underlying type regardless if enumerator for particular value
exists or not.

> > What is the "this"? It should work. Currently most people use std::string
> > (that actually contains UTF-8 encoded text) for storing texts. I fully
> > agree with you that it is loose and unsafe thing. However it is unlikely
> > that some revolution is coming. Billions of lines of code and millions of
> > interfaces all over the world use that std::string and problems are
> > consistently elsewhere.

>
> It does work – in the context of serialization, however, writes to the buffer
> almost always invalidate every other state as the underlying char* pointer
> may alias everything (including the pointer itself). The compiler is almost
> never able to inline to the point where it can resolve these sort of aliasing
> problems. I was asking whether or not another solution was commonly usedto
> define a raw character array (string, buffer, vector, what have you) with
> stronger aliasing properties similar to the above – clearly this solution is
> C++11 specific and I'd imagine others have attempted to address this problem
> in the past.


Lot of people certainly have. It is very likely that you can find something
already implemented. I in fact haven't. I use std::string for text and
std::vector<char> for byte buffer. I know it is unsafe so I am more
careful. The benefit why I do it is that majority of libraries and tools support types like that. I would have to waste performance into conversions
when using something else.

> Clearly the above could be used to define other primitive-equivalent types
> with stronger aliasing properties, string is merely the most interesting
> as the external interface need not change (as char* may still alias byte*).
> I believe it would be possible to write a standard conforming string
> library that uses such a byte definition, freeing the compiler to maintain
> state across writes to the string; I'm not, at present, planning to write
> one myself.


It feels that you are correct that it is possible. However ... writing
standard conforming string library does not feel to have point whatsoever.
Standard currently requires the std::string to be externally as loose and
unsafe as it is. So only thing possible is to make it internally more
efficient for particular purpose, not safer. It is unlikely to make
some major difference in efficiency either since there are lot of different
implementations of std::string already floating around as there are lot
of other text-containing and managing libraries and classes for any
purpose imaginable.

molw5.iwg@gmail.com 02-22-2013 02:13 AM

Re: Aliasing in C++11
 
On Thursday, February 21, 2013 4:36:09 PM UTC-7, Öö Tiib wrote:
> It is somewhat relevant. By language rules a value of enum class type does
>
> not implicitly convert to values of integral types. Traditional enum does..
>
> Lack of named enumerators actually does not matter since enum may have all
>
> the values of underlying type regardless if enumerator for particular value
>
> exists or not.


Agreed – I'm still not seeing the relevance to this topic.

> Lot of people certainly have. It is very likely that you can find something
>
> already implemented. I in fact haven't. I use std::string for text and
>
> std::vector<char> for byte buffer. I know it is unsafe so I am more
>
> careful. The benefit why I do it is that majority of libraries and tools support types like that. I would have to waste performance into conversions
>
> when using something else.


Like I said – still looking for additional information. Thank you for the
response.

> It feels that you are correct that it is possible. However ... writing
>
> standard conforming string library does not feel to have point whatsoever..
>
> Standard currently requires the std::string to be externally as loose and
>
> unsafe as it is. So only thing possible is to make it internally more
>
> efficient for particular purpose, not safer. It is unlikely to make
>
> some major difference in efficiency either since there are lot of different
>
> implementations of std::string already floating around as there are lot
>
> of other text-containing and managing libraries and classes for any
>
> purpose imaginable.


The advantage is the compiler is able to maintain state across string writes,
as I mentioned above; that alters the performance of user code. Obviously the
impact is domain specific – what isn't?

Öö Tiib 02-22-2013 03:57 AM

Re: Aliasing in C++11
 
On Friday, 22 February 2013 04:13:07 UTC+2, molw...@gmail.com wrote:
> The advantage is the compiler is able to maintain state across string writes,
> as I mentioned above; that alters the performance of user code. Obviously the
> impact is domain specific – what isn't?


I am still unsure why compiler can not optimize away any aliasing
checks already by simply assuming that you do not somehow use underlying buffer
of std::string or std::vector<char> under question as storage for some other
objects possibly involved in your domain-specific solution?


molw5.iwg@gmail.com 02-22-2013 04:12 AM

Re: Aliasing in C++11
 
On Thursday, February 21, 2013 8:57:57 PM UTC-7, Öö Tiib wrote:
> I am still unsure why compiler can not optimize away any aliasing
>
> checks already by simply assuming that you do not somehow use underlying buffer
>
> of std::string or std::vector<char> under question as storage for some other
>
> objects possibly involved in your domain-specific solution?


I honestly don't know how to respond to that. Review the strict aliasing rules?

Öö Tiib 02-22-2013 04:12 PM

Re: String is not UTF (was Re: Aliasing in C++11)
 
On Friday, 22 February 2013 17:13:52 UTC+2, Andy Champ wrote:
> On 21/02/2013 21:48, Öö Tiib wrote:
> > What is the "this"? It should work. Currently most people use std::string
> > (that actually contains UTF-8 encoded text) for storing texts. I fully
> > agree with you that it is loose and unsafe thing. However it is unlikely
> > that some revolution is coming. Billions of lines of code and millions of
> > interfaces all over the world use that std::string and problems are
> > consistently elsewhere.

>
> std::string does not contain UTF-8 encoded text. It contains chars. If
> your implementation treats those chars as UTF-8 encoded characters, then
> fine - but that is NOT part of the standard, it's just something that
> *nix operating systems tend to do.


I did in fact describe most widespread practice. char is a byte by C++
standard keep there whatever encoding standard is silent. Other
possibility is to use std::wstring for texts if wchar_t can contain
UTF-16LE. It might help in Windows or with QT as GUI. That is anyway
minority maybe 20% of C++ code written.

> You might like to consider what happens when you resize a string to
> remove part of a multibyte character. There's nothing there to make it
> UTF safe...


There are no alternatives. Such and all other difficulties are normal
work. That is why developers are for.

> I suspect this is why fstream::open takes a char* - someone assumed that
> a char* was utf-8, and for those operating systems where a filename is
> unicode it's broken.


I repeat ... there are no serious support to Unicode in C++. fstream was
likely designed when no one thought that file names can be anything but
ASCII. UTF-8 is most popular encoding. Majority of HTML or other XML you
see in internet are in that. So it makes sense to use something what you
do not have to convert.

Öö Tiib 02-22-2013 04:19 PM

Re: Aliasing in C++11
 
On Friday, 22 February 2013 06:12:59 UTC+2, molw...@gmail.com wrote:
> On Thursday, February 21, 2013 8:57:57 PM UTC-7, Öö Tiib wrote:
> > I am still unsure why compiler can not optimize away any aliasing
> > checks already by simply assuming that you do not somehow use underlying buffer
> > of std::string or std::vector<char> under question as storage for some other
> > objects possibly involved in your domain-specific solution?

>
> I honestly don't know how to respond to that. Review the strict aliasingrules?


It all seems to be about storage taken with malloc(). It feels that if you
use underlying buffer of std::string or std::vector<char> for odd purposes
then you are on your own anyway. I can't find that standard compliant
compiler is required to expect that std::string::iterator and double* may
point to same thing.

So ... what you do seems more and more domain-specific.

molw5.iwg@gmail.com 02-22-2013 04:28 PM

Re: Aliasing in C++11
 
On Friday, February 22, 2013 9:19:28 AM UTC-7, Öö Tiib wrote:
> It all seems to be about storage taken with malloc(). It feels that if you
>
> use underlying buffer of std::string or std::vector<char> for odd purposes
>
> then you are on your own anyway. I can't find that standard compliant
>
> compiler is required to expect that std::string::iterator and double* may
>
> point to same thing.
>
>
>
> So ... what you do seems more and more domain-specific.


I don't know why I'm still replying to this – std::string:iterator contains a
raw character pointer or offset into it's buffer. The compiler is forced to
assume the write itself may alias double*.


All times are GMT. The time now is 09:59 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.