Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > sorting 5million minute data: choose std::list or std::vector

Reply
Thread Tools

sorting 5million minute data: choose std::list or std::vector

 
 
James Kanze
Guest
Posts: n/a
 
      02-08-2011
On Feb 4, 5:19 pm, Leigh Johnston <(E-Mail Removed)> wrote:
> On 04/02/2011 17:08, James Kanze wrote:
> > On Feb 4, 2:39 pm, (E-Mail Removed) (Yannick Tremblay) wrote:
> >> In article<(E-Mail Removed) id>,
> >> Jorgen Grahn<(E-Mail Removed)> wrote:


> >>> On Wed, 2011-02-02, MM wrote:
> >>>> I have about 5million rows like these (day, minute, and 4 doubles)


> >>>> YYYYMMDD, HHMM, double, double, double, double


> >>>> which I parse from a file and store into


> >>>> struct one_line {
> >>>> boost::uint32_t day;
> >>>> boost::uint32_t minute;
> >>>> double d1, d2, d3, d4;
> >>>> };


> >>> Nitpick: don't call HHMM 'minute', or you'll be surprised one day when
> >>> you forget that 102 doesn't mean 102 minutes but 62 minutes.


> >> Additional nitpick: don't use bit specified integer unless you *need*
> >> to know the size of the integers in bits. Normal "unsigned integer"
> >> would have been perfectly fine here.


> > Normal int would have been perfectly fine here. On the other
> > hand, if he's got 5 million of them, short or even signed char
> > might be more appropriate (although in practice, alignment
> > requirements will probably mean that he won't gain anything by
> > using a type smaller than int).


> Eschewing unsigned integral types is sometimes irrational. *If* it
> makes no sense to have negative days or minutes then why use a signed
> integral type?


Two reasons, really. The first is simple: the *normal* integral
type in C++ is plain int. You don't use anything else unless
plain int won't do the job. You don't use unsigned for the same
reason you don't use long, or short, or whatever. The second is
more fundamental: C++ doesn't have an unsigned type with
appropriate semantics which would be appropriate for days or
minutes, at least in the general case (where you might have to
take the difference between days or minutes). That's not to say
that you should never use unsigned---there are cases where you
want its special semantics. And there are doubtlessly cases
where it really doesn't matter---things like serial numbers, for
example. (But there again, unless there is some reason why int
isn't appropriate, then it should be int. Anything other than
int tells the reader that you need its special
features---smaller size, bigger range, etc.)

--
James Kanze
 
Reply With Quote
 
 
 
 
James Kanze
Guest
Posts: n/a
 
      02-08-2011
On Feb 7, 1:18 pm, gwowen <(E-Mail Removed)> wrote:
> On Feb 7, 1:06 pm, Leigh Johnston <(E-Mail Removed)> wrote:


[...]
> Stick to arguing with Paul, at least you *are* probably smarter than
> him.


Ouch. Now that is hitting low.

(Seriously, there is one machine currently being sold today
where the compiler has an option to turn off standard compliant
behavior of unsigned integers, because of the runtime overhead.
But it's not common enough for it to be an issue for most
people. And from other postings in the past, I think Leigh's in
the camp that only Wintel matters.)

--
James Kanze

 
Reply With Quote
 
 
 
 
Juha Nieminen
Guest
Posts: n/a
 
      02-08-2011
Leigh Johnston <(E-Mail Removed)> wrote:
> Wrong; it is not a sign of bad design at all. If a variable is used as
> an unsigned value more often than as a signed value and it logically
> makes sense for it to be unsigned (e.g. it is never negative) then
> making the variable unsigned makes perfect sense.


What is "more often"? 50%? Do you count the individual instances where
it's being used in the program and change the type accordingly? What if
the percentage changes later?

The major problem with using unsigned types in variables which are being
used in signed expressions is that it's easy to forget the cast, which
easily introduces bugs which might not be immediately obvious.

The most typical case where this introduces bugs is where the unsigned
type is mixed with signed types on an expression that ends up being of a
floating point type. If you forget the cast, a result which should be in
the order of tens might end up to be wrongly in the order of billions.
The problem is that this might not be apparent immediately, only with
certain values (eg. if you are calculating coordinates based on some
input data or whatever).

You might think, for example, "the width and height of an image are
never negative, hence it makes sense to make then unsigned". However,
if the width and height are used to calculate screen coordinates of
objects (which is rather typical in many situations), you will end up
with the useless casts and a high risk of forgetting a cast and potentially
have your coordinates 2 billion pixels to the right and down.

In these cases it actually makes more sense to have the width and height
as signed integers.

And yes, I have experience on this precise problem.

> Perhaps
> Java is a more suitable language for you rather than C++.


Where did that come from?
 
Reply With Quote
 
Paul
Guest
Posts: n/a
 
      02-08-2011

"James Kanze" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On Feb 7, 1:18 pm, gwowen <(E-Mail Removed)> wrote:
>> On Feb 7, 1:06 pm, Leigh Johnston <(E-Mail Removed)> wrote:

>
> [...]
>> Stick to arguing with Paul, at least you *are* probably smarter than
>> him.

>
> Ouch. Now that is hitting low.
>


Look 3 idiots together! ^_^

 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      02-08-2011
Paul <(E-Mail Removed)> wrote:
> Look 3 idiots together! ^_^


Honestly, what do you think you are going to achieve with that kind of
attitude?
 
Reply With Quote
 
Paul
Guest
Posts: n/a
 
      02-08-2011

"Juha Nieminen" <(E-Mail Removed)> wrote in message
news:4d50ea19$0$2852$(E-Mail Removed)...
> Paul <(E-Mail Removed)> wrote:
>> Look 3 idiots together! ^_^

>
> Honestly, what do you think you are going to achieve with that kind of
> attitude?
>


Or is it 4? _

 
Reply With Quote
 
Richard Kettlewell
Guest
Posts: n/a
 
      02-08-2011
Kaba <(E-Mail Removed)> writes:

> ### Looping with unsigned integers
>
> Looping over values from n to zero backwards demonstrates the zero
> boundary problem with unsigned integers:
>
> for (unsigned int i = n;i >= 0;--i)
> {
> // Do something.
> }
>
> Since there are no negative values to fail the loop test,
> this loop never finishes.


While agreeing that unsigned can booby-trapped in some cases, I don't
think this is one of them: a decent compiler will spot the trivial
comparison and generate a warning.

--
http://www.greenend.org.uk/rjk/
 
Reply With Quote
 
Kaba
Guest
Posts: n/a
 
      02-08-2011
Richard Kettlewell wrote:
> Kaba <(E-Mail Removed)> writes:
>
> > ### Looping with unsigned integers
> >
> > Looping over values from n to zero backwards demonstrates the zero
> > boundary problem with unsigned integers:
> >
> > for (unsigned int i = n;i >= 0;--i)
> > {
> > // Do something.
> > }
> >
> > Since there are no negative values to fail the loop test,
> > this loop never finishes.

>
> While agreeing that unsigned can booby-trapped in some cases, I don't
> think this is one of them: a decent compiler will spot the trivial
> comparison and generate a warning.


Yep. So instead:

unsigned int a = f();

for (unsigned int i = n;i >= a;--i)
{
// Do something.
}

The loop works except when a = 0.

--
http://kaba.hilvi.org
 
Reply With Quote
 
Richard Kettlewell
Guest
Posts: n/a
 
      02-08-2011
Kaba <(E-Mail Removed)> writes:
> Richard Kettlewell wrote:


>> While agreeing that unsigned can booby-trapped in some cases, I don't
>> think this is one of them: a decent compiler will spot the trivial
>> comparison and generate a warning.

>
> Yep. So instead:
>
> unsigned int a = f();
>
> for (unsigned int i = n;i >= a;--i)
> {
> // Do something.
> }
>
> The loop works except when a = 0.


Much better l-)

--
http://www.greenend.org.uk/rjk/
 
Reply With Quote
 
Martin
Guest
Posts: n/a
 
      02-10-2011
On Feb 7, 3:58*pm, Kaba <(E-Mail Removed)> wrote:
> Leigh Johnston wrote:

[snip]
>
> Zero boundary problem
> ---------------------
>
> Most of the time we are assuming that, away from boundaries caused
> by the finite representation, an integer object works like an element
> of the integer ring ZZ. In addition, it seems plausible that most
> integer calculations are concentrated around a neighborhood of zero.


Actually, I assume an unsigned integer object behaves more like N away
from the boundaries caused by the finite representation. And it's not
an assumption that it behaves like Z<sub>UINT_MAX</sub>.

>
> The _zero-boundary problem_ (of unsigned integers) is that the zero
> is on the boundary beyond which the assumption of working with integers
> falls apart. Thus, the probability of introducing errors in computations
> increases greatly. Furthermore, those errors can not be catched, since
> every value is legal.


If you don't start with that assumption, it's failure won't cause any
errors at all. And the method you advocate for catching errors with
signed integers can also be transformed to work with unsigned
integers, unless the the unsigned integer's maximum is equal to the
signed integer's maximum. (.e.g. UINT_MAX == INT_MAX would prevent
the transformation from working. Otherwise, reserve the range
(INT_MAX, UINT_MAX] as invalid. If those need to be valid values,
then signed int isn't an option in the first place.)

>

[example elided]
>
> Extended positive range problem
> -------------------------------
>
> Conversion between an unsigned integer and a signed integer
> is an information destroying process in either direction.
> The only way to avoid this is to use one or the other
> consistently.


Not necessarily. There's at least one common implementation where
that conversion is lossless.

And one can, of course, only convert in situations where the
conversion is lossless (e.g. [0, INT_MAX]). Using only one or the
other is hardly the *only* way.

>
> If the normal arithmetic is the intention, then a signed integer
> represents a more general concept than an unsigned integer: the former
> covers both the negative and positive integers, whereas the latter
> only covers non-negative integers.


Which normal arithmetic? On reals? On rationals? On Integers? On
Naturals? On Complex? On Quaternions? The most obvious of those
options is the one on the reals. Why aren't you advocating the use of
double everywhere unless otherwise indicated?

> In programming terms, it is
> possible to create a program using signed integers alone, however,
> the same can't be said about the unsigned integers. Therefore, if
> only one type is to be used consistently, the choice should be a
> signed integer. However, let us do some more analysis.


This is not the previous argument in programming terms. That said,
the *only* reason it's not possible to create a program using only
unsigned integers is that main requires int. In fact, any
functionality that can be coded using only signed integers can be
coded using only unsigned integers.

>
> Since any non-trivial program must use signed integers, the use of
> unsigned integers eventually leads to an unsigned-signed
> conversion. In particular, because `std::size_t` in the Standard
> Library is an unsigned integer, there are few programs that can
> escape the unsigned-signed conversions.


Yep, just about every non-trivial program that is sanely written will
have both signed and unsigned integers.

>
> Despite these conversions, programs still seem to work. The reason
> for this, I reflect, is that the unsigned integers are normally
> not taken into the extended positive range they allow.
> The _extended positive range problem_ is that if the unsigned integers
> are taken to their extended positive range, then the signed-unsigned
> conversions become a reality. Ensuring correctness under such a threat
> is hard, if not practically impossible.


What problem? Let's assume that you have an unsigned integer with a
value in (INT_MAX, UINT_MAX]. Either this value is valid, in which
case the correctness is ensured by not converting. Or it's an invalid
value, in which case you've already failed to ensure correctness.
Your distinction makes no difference.

>
> Conclusion
> ----------
>
> Using unsigned integers to model integers decreases the probability
> that a program is correct.


True enough, but I don't see anyone advocating such a use.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting list vs sorting vector boltar2003@boltar.world C++ 2 07-06-2010 09:40 AM
fired event Sorting which wasn't handled - sorting and SelectedIndexChanged Jason ASP .Net Web Controls 0 10-04-2006 02:19 PM
Refresh every minute, on the minute. Denebola Java 3 02-27-2006 07:39 AM
HELP!, EXACTLY every minute...... =?Utf-8?B?U3dhbXBkb25reQ==?= Wireless Networking 6 08-26-2004 08:08 PM
sorting by multiple criterias (sub-sorting) Tom Kirchner Perl Misc 3 10-11-2003 05:16 PM



Advertisments