Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > statistic of integer values

Reply
Thread Tools

statistic of integer values

 
 
Stefan Wallentowitz
Guest
Posts: n/a
 
      12-11-2005
Am Sun, 11 Dec 2005 08:21:25 +0000 schrieb Walter Roberson:

> Also, as I indicated earlier, the point might not be "can it
> safely be done", but rather, "Is it worth putting in the programming
> effort to do this potential optimization? Do small literals occur
> often enough that the savings would be appreciable? Is it worth
> the time to find good algorithms for the boundary cases such as
> the short load taking longer to execute but the extra cost being worth
> it if it allows a tight loop to fit in cache when it would not otherwise
> fit?" And that's where the statistics come in.
>
> You know the usual refrain about "premature optimization" -- well,
> it looks to me as if the OP is trying to get a feel for whether
> this kind of optimization is premature, or is worth while.


Exactly.
I want to illustrate the possible impact of a simple optimization I made.
In fact this optimization most oftens gains execution, but a quite simple
value of optimization ratio reachid in such an easy way would be nice.
It isn't something very scientific , just for getting a feeling of gain.

Thanks for your elaborated statements,
Stefan
 
Reply With Quote
 
 
 
 
slebetman@yahoo.com
Guest
Posts: n/a
 
      12-11-2005
Stefan Wallentowitz wrote:
> Am Sat, 10 Dec 2005 23:35:03 -0800 schrieb http://www.velocityreviews.com/forums/(E-Mail Removed):
>
> > In this case there would be no need to ask about statistics. It is
> > always safe to do this because the compiler know there is ONLY one
> > value for each literal - not a range. This is implemented by almost all
> > modern compilers. Not only that, if for example on a RISC machine you
> > use the literal 1 or 0 the compiler would simply generate code to read
> > the one and zero registers.

>
> Yes, but creating an own compiler leads me to this question regarding the
> impact of a quite simple optimization.
> I am searching for a statistic of how often a loaded integer (either in
> register or as part of an immediate instruction etc.) in whatever general
> is:
> - in 16-bit domain
> - above
> (might be extended to is zero, 8-bit and so on)
>


For what it's worth, I find the following values very common in code:

0, 1, 0xFF

Also, from experience, I find the majority of string literals to be
below 100, usually in the range of 0 to 10. The value 1 is common
enough that most archictectures have either increment/decrement
instructions or a hardwired 1 register. The value 0 is common in a lot
of data initialisations. The value 255 (0xFF) is very common in
protocol handling code usually used for byte masking.

 
Reply With Quote
 
 
 
 
Stefan Wallentowitz
Guest
Posts: n/a
 
      12-11-2005
Am Sun, 11 Dec 2005 08:20:37 -0800 schrieb (E-Mail Removed):

>
> For what it's worth, I find the following values very common in code:
>
> 0, 1, 0xFF
>
> Also, from experience, I find the majority of string literals to be
> below 100, usually in the range of 0 to 10. The value 1 is common
> enough that most archictectures have either increment/decrement
> instructions or a hardwired 1 register. The value 0 is common in a lot
> of data initialisations. The value 255 (0xFF) is very common in
> protocol handling code usually used for byte masking.


FOr sure, but a number sais more than thousand words..

 
Reply With Quote
 
slebetman@yahoo.com
Guest
Posts: n/a
 
      12-12-2005
Stefan Wallentowitz wrote:
> Am Sun, 11 Dec 2005 08:20:37 -0800 schrieb (E-Mail Removed):
>
> >
> > For what it's worth, I find the following values very common in code:
> >
> > 0, 1, 0xFF
> >
> > Also, from experience, I find the majority of string literals to be
> > below 100, usually in the range of 0 to 10. The value 1 is common
> > enough that most archictectures have either increment/decrement
> > instructions or a hardwired 1 register. The value 0 is common in a lot
> > of data initialisations. The value 255 (0xFF) is very common in
> > protocol handling code usually used for byte masking.

>
> FOr sure, but a number sais more than thousand words..


Download lots of source code, preferably large ones like the Linux
kernel, Apache, Xfree86 and Mozilla. Then find all literals. I did the
following on my company's source codes:

grep -r -e "= *[0-9]" SONaccessGateway/ > stat.txt

This gets all lines containing an equal sign followed by a number.
Then, if you have Tcl installed, run the following script on the file
above:

================== begin script
#! /usr/bin/env tclsh
if {$argc < 1} {
set input stdin
} else {
set input [open $argv "r"]
}
array set numbers {}
while {[eof $input] == 0} {
set line [gets $input]
# extract right hand side:
set line [lindex [split $line "="] 1]
if {$line != ""} {
# extract first word:
set line [split [string trim $line] " "]
set n [lindex $line 0]
# get and count the number:
if {[string is integer -strict $n]} {
set n [expr $n] ;# normalise the number
if {[info exists numbers($n)]} {
incr numbers($n)
} else {
set numbers($n) 1
}
}
}
}
if {$input != "stdin"} {
close $input
}
# now we are done processing, print out stats:
puts "Literals found:"
foreach n [lsort -integer [array names numbers]] {
puts " $n - $numbers($n) times"
}
===================== end script

Save the script as sumstat.tcl. To run it, you can type the following:

tclsh sumstat.tcl stat.txt

where stat.txt is the file we generated earlier. The following is the
statistics of my employer's source codes:

Literals found:
-774712364 - 7 times
-505224220 - 7 times
-235736076 - 7 times
-60 - 1 times
-1 - 3 times
0 - 1039 times
1 - 723 times
2 - 119 times
3 - 159 times
4 - 159 times
5 - 87 times
6 - 10 times
7 - 16 times
8 - 15 times
9 - 1 times
10 - 17 times
11 - 2 times
12 - 41 times
13 - 1 times
14 - 4 times
15 - 6 times
16 - 20 times
17 - 7 times
20 - 172 times
23 - 6 times
24 - 2 times
25 - 2 times
27 - 1 times
28 - 2 times
30 - 10 times
31 - 3 times
32 - 29 times
36 - 2 times
38 - 2 times
39 - 3 times
40 - 5 times
42 - 10 times
44 - 2 times
48 - 19 times
50 - 6 times
54 - 1 times
56 - 5 times
57 - 2 times
60 - 21 times
64 - 46 times
67 - 2 times
71 - 2 times
78 - 2 times
80 - 3 times
85 - 1 times
90 - 35 times
96 - 3 times
100 - 7 times
112 - 3 times
120 - 6 times
127 - 4 times
128 - 12 times
144 - 3 times
160 - 3 times
164 - 1 times
167 - 3 times
168 - 4 times
169 - 2 times
176 - 3 times
192 - 4 times
200 - 9 times
208 - 3 times
210 - 6 times
211 - 8 times
212 - 8 times
224 - 3 times
226 - 1 times
253 - 1 times
255 - 7 times
256 - 14 times
271 - 1 times
299 - 1 times
300 - 1 times
384 - 23 times
400 - 3 times
420 - 4 times
493 - 10 times
500 - 35 times
513 - 1 times
600 - 10 times
636 - 1 times
647 - 1 times
648 - 4 times
691 - 16 times
730 - 1 times
800 - 4 times
802 - 1 times
995 - 1 times
1000 - 8 times
1024 - 15 times
1200 - 1 times
1234 - 3 times
1257 - 1 times
1500 - 3 times
1800 - 1 times
2000 - 1 times
2020 - 1 times
2048 - 3 times
2054 - 2 times
2345 - 2 times
3600 - 1 times
4096 - 3 times
5000 - 3 times
10000 - 3 times
16384 - 9 times
28800 - 3 times
32224 - 1 times
32768 - 6 times
36000 - 4 times
65535 - 3 times
86400 - 2 times
90000 - 1 times
123456 - 3 times
1000000 - 2 times
50331652 - 2 times
67452301 - 1 times
305441741 - 17 times
592100561 - 17 times
806429235 - 10 times
823206451 - 10 times
839983667 - 10 times
883674386 - 17 times
1073741824 - 2 times
2147483647 - 2 times


Hope this helps. Of course, I'm assuming you're on a platform that has
grep and tclsh installed. If not, just use my results. But note that
the statistics above covers C, C++ and Perl code. I was too lazy to
separate them.

 
Reply With Quote
 
slebetman@yahoo.com
Guest
Posts: n/a
 
      12-12-2005
(E-Mail Removed) wrote:
> Stefan Wallentowitz wrote:
> > FOr sure, but a number sais more than thousand words..

> <snip>
> Save the script as sumstat.tcl. To run it, you can type the following:
>
> tclsh sumstat.tcl stat.txt
> <snip>
> Hope this helps. Of course, I'm assuming you're on a platform that has
> grep and tclsh installed. If not, just use my results. But note that
> the statistics above covers C, C++ and Perl code. I was too lazy to
> separate them.


I've modified the sumstat.tcl script to further analyse results:

============================ begin sumstat.tcl script
#! /usr/bin/env tclsh
if {$argc < 1} {
set input stdin
} else {
set input [open $argv "r"]
}
array set numbers {}
while {[eof $input] == 0} {
set line [gets $input]
# extract words:
set line [split $line " "]
foreach n $line {
# get and count the number:
if {[string is integer -strict $n]} {
set n [expr $n] ;# normalise the number
if {[info exists numbers($n)]} {
incr numbers($n)
} else {
set numbers($n) 1
}
}
}
}
if {$input != "stdin"} {
close $input
}

# now we are done processing, print out stats:
set total 0
set bits8 0
set bits16 0
set bits32 0
puts "Literals found:"
foreach n [lsort -integer [array names numbers]] {
puts " $n - $numbers($n) times"
if {$n < 256 && $n > -128} {
incr bits8 $numbers($n)
} elseif {$n < 65536 && $n > -32768} {
incr bits16 $numbers($n)
} else {
incr bits32 $numbers($n)
}
incr total $numbers($n)
}
set percent8 [format "%0.2f" [expr $bits8*100.0/$total]]
set percent16 [format "%0.2f" [expr $bits16*100.0/$total]]
set percent32 [format "%0.2f" [expr $bits32*100.0/$total]]
puts "\nStatistics:"
puts " 8bit numbers - $bits8 ($percent8%)"
puts " 16bit numbers - $bits16 ($percent16%)"
puts " 32bit numbers - $bits32 ($percent32%)"

============================ end script

also, befure I used: grep -r -e "= [0-9]". I now realise that this only
captures assignments and not integers used for other purposes such as a
= b + 12. I also modified the script to get all numbers in the source
code instead of just numbers after the = sign.

I ran this new script with:

grep -r -E "[0-9]+" code_dir/ | tclsh sumstat.tcl

piping the result of grep directly into the script. This simply gets
all lines containing numbers. The new result is obviously much larger
than before and is probably too long to post. The following is the a
snipped version of the result:

Literals found:
-2147483648 - 5 times
-976170042 - 2 times
-823302554 - 1 times
<snip>
-1 - 285 times
0 - 6704 times
1 - 5408 times
2 - 3600 times
3 - 2194 times
4 - 1735 times
<snip>
1936484395 - 1 times
1936484398 - 1 times
2147483647 - 2 times

Statistics:
8bit numbers - 34984 (83.66%)
16bit numbers - 6346 (15.18%)
32bit numbers - 486 (1.16%)

 
Reply With Quote
 
Stefan Wallentowitz
Guest
Posts: n/a
 
      12-12-2005
Am Sun, 11 Dec 2005 20:22:47 -0800 schrieb (E-Mail Removed):

> (E-Mail Removed) wrote:
>> Stefan Wallentowitz wrote:
>> > FOr sure, but a number sais more than thousand words..

>> <snip>
>> Save the script as sumstat.tcl. To run it, you can type the following:
>>
>> tclsh sumstat.tcl stat.txt
>> <snip>
>> Hope this helps. Of course, I'm assuming you're on a platform that has
>> grep and tclsh installed. If not, just use my results. But note that
>> the statistics above covers C, C++ and Perl code. I was too lazy to
>> separate them.

>


Thank you very, very much. I will use this approach and apply it on some
code specific to embedded systems. Although it doesn't in effect have
profiling character it gives a smart feeling of the subject-matter.

Bye and thanks,
Stefan

 
Reply With Quote
 
Michael Wojcik
Guest
Posts: n/a
 
      12-13-2005

In article <(E-Mail Removed) .com>, "(E-Mail Removed)" <(E-Mail Removed)> writes:
>
> Download lots of source code, preferably large ones like the Linux
> kernel, Apache, Xfree86 and Mozilla. Then find all literals. I did the
> following on my company's source codes:
>
> grep -r -e "= *[0-9]" SONaccessGateway/ > stat.txt


As Walter Roberson suggested, it would probably be better to do this
on the result of preprocessing the source (ie, the output of trans-
lation phase 4, 5, or 6). That at least will catch literals that are
expressed using object-like macros, though not necessarily folded
constants (which could be computed during phase 7, or even not until
runtime).

Most implementations have an option to write the result of preprocess-
ing to a file.

It might be possible to get even more accurate results by analyzing
an intermediate result from a later translation phase, such as the
assembly-language output available from many implementations. I
recently used such analysis to check for large auto-class objects in
a large code base. (Details are implementation-dependent and OT,
obviously.)

--
Michael Wojcik (E-Mail Removed)

Cooperation is just like two pagodas, one hardware and one software.
-- Wen Jiabao
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
template template par used for a container statistic er C++ 1 09-05-2007 07:07 AM
Are You a Gaming Statistic? Silverstrand Front Page News 0 10-05-2006 11:19 PM
STATISTIC:how many megapixels enough for you? tiresia2@hotmail.it Digital Photography 35 08-06-2006 12:17 AM
help on access-list statistic command Rob Cisco 4 05-06-2004 06:10 AM
Where are the codes to do statistic calculation? tony lincoln Java 0 10-13-2003 01:34 PM



Advertisments