Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Help on String to array !

Reply
Thread Tools

Help on String to array !

 
 
jis
Guest
Posts: n/a
 
      03-09-2010
Guys,

I have a string $hex which has lets assume "0012345689abcd"

How can I split them into to an array so that
arr[0]=00 ,arr[1] =12..etc

it works with split command like this to some extent
foreach (split(//, $hex){
$arr[$i]=$_;
$i++;
}

Unfortunately when i read big files of 4MB size it takes
like 10mins before it completes execution. No good.
(i couldnt split it like 00,12 but only like 0,0,1,2)

Then I thought unpack wud be a better idea.
@arr = unpack("H2",$data); or
@arr = unpack("H2*",$data);

But only first element got transferred. ie 00.
$arr[0]=00 and arr[1] undefined.

Any one can help me on this?

thanks,
jis





 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      03-09-2010
Don Piven wrote:
> jis wrote:
>> Guys,
>>
>> I have a string $hex which has lets assume "0012345689abcd"
>>
>> How can I split them into to an array so that
>> arr[0]=00 ,arr[1] =12..etc

>
> while ( $hex =~ /[[digit:]]{2}/g ) { push @arr, $1 }


No need for a loop:

my @arr = $hex =~ /[[digit:]]{2}/g;

Also, you don't use capturing parentheses in your regular expression so
$1 will always be empty.


> The "g" flag on the regex tells Perl to do its search from where the
> previous search left off, so this will just walk through your string two
> characters at a time and relieve you from having to keep track of where
> you are in the string and in your array.
>
> The "o" flag may also be useful; check "Regexp Quote-Like Operators" in
> perlop for more info.


The /o option would not be useful in this case as there are no variables
in the regular expression to interpolate and in any case modern versions
of perl would not re-interpolate a variable that doesn't change.

perldoc -q /o




John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway
 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      03-09-2010
jis wrote:
> Guys,
>
> I have a string $hex which has lets assume "0012345689abcd"
>
> How can I split them into to an array so that
> arr[0]=00 ,arr[1] =12..etc


my @arr = unpack '(a2)*', $hex;



John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      03-09-2010
On Tue, 9 Mar 2010 03:34:48 -0800 (PST), jis <(E-Mail Removed)> wrote:

>Guys,
>
>I have a string $hex which has lets assume "0012345689abcd"


>[snip]


>Unfortunately when i read big files of 4MB size it takes
>like 10mins before it completes execution. No good.
>(i couldnt split it like 00,12 but only like 0,0,1,2)
>
>Then I thought unpack wud be a better idea.
> @arr = unpack("H2",$data); or
>@arr = unpack("H2*",$data);
>

Perl distributions for win32 have a problem with
native realloc(). On these, the larger the dynamic list
generated by the function, the longer it takes.
Linux doesen't have this problem.

In general, if you expect to be splitting up very
large data segments, its better to control the list
external to the function, where push() is better.

Of the 3 types of basic methods: substr/unpack/regexp,
the one thats the fastest seems to be substr().
Additionally, on win32 platforms, any method using a
push is far better.

My platform is Windows in generating the below data.
If you have Linux, your results will be different.
Post your numbers if you can.

-sln

Output:
--------------------
Size of bigstring = 560

Substr/push took: 0.00030303 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/list took: 0.000344038 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/push took: 0.000586033 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/list took: 0.000608206 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/push took: 0.000404835 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)

--------------------
Size of bigstring = 5600

Substr/push took: 0.002841 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/list took: 0.00334311 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/push took: 0.00657105 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/list took: 0.00673795 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/push took: 0.004076 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)

--------------------
Size of bigstring = 56000

Substr/push took: 0.0301139 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)
Unpack/list took: 0.0458951 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU)
Unpack/push took: 0.0644789 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Regexp/list took: 0.07149 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Regexp/push took: 0.03965 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)

--------------------
Size of bigstring = 560000

Substr/push took: 0.309315 wallclock secs ( 0.30 usr + 0.02 sys = 0.31 CPU)
Unpack/list took: 0.723145 wallclock secs ( 0.61 usr + 0.11 sys = 0.72 CPU)
Unpack/push took: 0.640141 wallclock secs ( 0.64 usr + 0.00 sys = 0.64 CPU)
Regexp/list took: 0.927701 wallclock secs ( 0.92 usr + 0.00 sys = 0.92 CPU)
Regexp/push took: 0.516143 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU)

--------------------
Size of bigstring = 5600000

Substr/push took: 3.79988 wallclock secs ( 3.75 usr + 0.06 sys = 3.81 CPU)
Unpack/list took: 40.0264 wallclock secs (34.97 usr + 5.06 sys = 40.03 CPU)
Unpack/push took: 6.71793 wallclock secs ( 6.70 usr + 0.01 sys = 6.72 CPU)
Regexp/list took: 34.6208 wallclock secs (34.56 usr + 0.06 sys = 34.63 CPU)
Regexp/push took: 7.93654 wallclock secs ( 7.89 usr + 0.05 sys = 7.94 CPU)

=======
for my $multiplier (40, 400, 4_000, 40_000, 400_000)
{
my $bigstring = '0012345689abcd' x $multiplier;
print "\n",'-'x20,"\nSize of bigstring = ",length($bigstring),"\n\n";

##
{
my ($val, $offs, @pairs) = ('',0);
my $t0 = new Benchmark;
while ($val=substr( $bigstring, $offs, 2))
{
push @pairs, $val;
$offs+=2;
}
my $t1 = new Benchmark;
print "Substr/push took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my $t0 = new Benchmark;
my @pairs = unpack '(a2)*', $bigstring;
my $t1 = new Benchmark;
print "Unpack/list took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my ($val, $offs, @pairs) = ('',0);
my $t0 = new Benchmark;
while ($val=unpack("x$offs a2", $bigstring) )
{
push @pairs, $val;
$offs+=2;
}
my $t1 = new Benchmark;
print "Unpack/push took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my $t0 = new Benchmark;
my @pairs = $bigstring =~ /[0-9a-f]{2}/g;
my $t1 = new Benchmark;
print "Regexp/list took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my @pairs;
my $t0 = new Benchmark;
while ( $bigstring =~ /([0-9a-f]{2})/g ) {
push @pairs, $1;
}
my $t1 = new Benchmark;
print "Regexp/push took: ",timestr(timediff($t1, $t0)),"\n";
}
}

__END__

 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      03-09-2010
On Tue, 09 Mar 2010 09:57:23 -0800, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>=======

use strict;
use warnings;
use Benchmark ':hireswallclock';

>for my $multiplier (40, 400, 4_000, 40_000, 400_000)


 
Reply With Quote
 
jis
Guest
Posts: n/a
 
      03-10-2010
On Mar 9, 10:59*pm, (E-Mail Removed) wrote:
> On Tue, 09 Mar 2010 09:57:23 -0800, (E-Mail Removed) wrote:
> >=======

>
> use strict;
> use warnings;
> use Benchmark ':hireswallclock';
>
>
>
> >for my $multiplier (40, 400, 4_000, 40_000, 400_000)- Hide quoted text -

>
> - Show quoted text -


Thanks for the replies.

As said regex and unpack took longer time than substr.
I use Windows. The following are the time taken.

1. Regex : @arr = $hex =~ /[[digit:]]{2}/g; - To read 4Mb file
into an array it took 1min 7 seconds.
2. Unpack : @arr = unpack("(C2)*",$hex); - To read 4Mb file into
an array it took 3min 26seconds.
3. Substr: while ($val=substr( $hex, $offs, 2))
{
push @arr, $val;
$offs+=2;
} - To read 4Mb file into an array it took 11 seconds.


thanks,
jis

 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-10-2010
>>>>> "j" == jis <(E-Mail Removed)> writes:

j> As said regex and unpack took longer time than substr.
j> I use Windows. The following are the time taken.

j> 1. Regex : @arr = $hex =~ /[[digit:]]{2}/g; - To read 4Mb file
j> into an array it took 1min 7 seconds.
j> 2. Unpack : @arr = unpack("(C2)*",$hex); - To read 4Mb file into
j> an array it took 3min 26seconds.
j> 3. Substr: while ($val=substr( $hex, $offs, 2))
j> {
j> push @arr, $val;
j> $offs+=2;
j> } - To read 4Mb file into an array it took 11 seconds.


i am sorry, i can't believe it took on the order of minutes to read in a
file and convert from hex to binary. this is not possible on anything
but an abacus. given you haven't shown the complete script for each
version i have to assume your code is broken in some way. also there is
no way a substr loop would be faster than unpack or a regex. both of
those would spend all their time in perl's guts while the substr version
spends most of its time doing slow perl ops in a loop. i say this from
plenty of experience benchmarking perl code. you can easily write an
incorrect test of this so i must ask you to post complete working
programs that exhibit the slowness you claim. i will wager large amounts
of quatloos i can fix them so the substr will be outed as the slowest
one.

uri

--
Uri Guttman ------ (E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
jis
Guest
Posts: n/a
 
      03-11-2010


Even I want to beleive it should take very less time.
I post the scripts I used for testing.

1. #!/usr/bin/perl
use strict;
use warnings;
my $binary_file="28247101.bin";
open FILE, $binary_file or die "Can't open $binary_file $!\n";
# binmode FILE to supress conversion of line endings
binmode FILE;
undef $/;
my $data = <FILE>;
close FILE;
# convert data to hex form
my $hex = unpack 'H*', $data;
my ($val, $offs, @arr) = ('',0);
#@arr = $hex =~ /[[digit:]]{2}/g;
@arr = unpack("(C2)*",$hex);
print "bye";
print $arr[2]; ( this took 3minuts 25 sec)

if i uncommment regex protion and comment unpack it would take
1minute 25 sec

#!/usr/bin/perl
use strict;
use warnings;
my $binary_file="28247101.bin";
open FILE, $binary_file or die "Can't open $binary_file $!\n";
# binmode FILE to supress conversion of line endings
binmode FILE;
undef $/;
my $data = <FILE>;
close FILE;
# convert data to hex form
my $hex = unpack 'H*', $data;
my $i=0;

my ($val, $offs, @arr) = ('',0);
while ($val=substr( $hex, $offs, 2)){
push @arr, $val;
$offs+=2;
}
print "bye";
print $arr[2]; This would take only 9 seconds.

I have used a stopwatch to calculate time.

Appreciate your help in finding how it can be improved.

thanks,
jis









On Mar 10, 12:51*pm, "Uri Guttman" <(E-Mail Removed)> wrote:
> >>>>> "j" == jis *<(E-Mail Removed)> writes:

>
> * j> As said regex and unpack took longer time than substr.
> * j> I use Windows. The following are the time taken.
>
> * j> 1. Regex : @arr = $hex =~ /[[digit:]]{2}/g; *- To read *4Mb file
> * j> into an array it took *1min 7 seconds.
> * j> 2. Unpack : @arr = unpack("(C2)*",$hex); * *- To read *4Mbfile into
> * j> an array it took *3min 26seconds.
> * j> 3. Substr: while ($val=substr( $hex, $offs, 2))
> * j> * * {
> * j> * * * * push @arr, $val;
> * j> * * * * $offs+=2;
> * j> * * } - *To read *4Mb file into an array it took *11 seconds.
>
> i am sorry, i can't believe it took on the order of minutes to read in a
> file and convert from hex to binary. this is not possible on anything
> but an abacus. given you haven't shown the complete script for each
> version i have to assume your code is broken in some way. also there is
> no way a substr loop would be faster than unpack or a regex. both of
> those would spend all their time in perl's guts while the substr version
> spends most of its time doing slow perl ops in a loop. i say this from
> plenty of experience benchmarking perl code. you can easily write an
> incorrect test of this so i must ask you to post complete working
> programs that exhibit the slowness you claim. i will wager large amounts
> of quatloos i can fix them so the substr will be outed as the slowest
> one.
>
> uri
>
> --
> Uri Guttman *------ *(E-Mail Removed) *-------- *http://www.sysarch.com--
> ----- *Perl Code Review , Architecture, Development, Training, Support ------
> --------- *Gourmet Hot Cocoa Mix *---- *http://bestfriendscocoa.com---------


 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-11-2010
>>>>> "j" == jis <(E-Mail Removed)> writes:

j> Even I want to beleive it should take very less time.
j> I post the scripts I used for testing.

j> 1. #!/usr/bin/perl

j> # convert data to hex form
j> my $hex = unpack 'H*', $data;
j> my ($val, $offs, @arr) = ('',0);
j> #@arr = $hex =~ /[[digit:]]{2}/g;
j> @arr = unpack("(C2)*",$hex);

j> my $data = <FILE>;
j> close FILE;
j> # convert data to hex form
j> my $hex = unpack 'H*', $data;
j> my $i=0;

j> my ($val, $offs, @arr) = ('',0);
j> while ($val=substr( $hex, $offs, 2)){
j> push @arr, $val;
j> $offs+=2;
j> }
j> print "bye";
j> print $arr[2]; This would take only 9 seconds.

j> I have used a stopwatch to calculate time.

a stopwatch? you need to learn how to use the Benchmark.pm module.

j> Appreciate your help in finding how it can be improved.

easy. let me do a proper benchmark.

and you should learn how to properly bottom post and not leave my entire
post in the message.

uri


--
Uri Guttman ------ (E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-11-2010
>>>>> "j" == jis <(E-Mail Removed)> writes:

j> if i uncommment regex protion and comment unpack it would take
j> 1minute 25 sec

j> print "bye";
j> print $arr[2]; This would take only 9 seconds.

j> I have used a stopwatch to calculate time.

as i said, that is a silly way to time programs. and there is no way it
would take minutes to do this unless you are on a severely slow cpu or
you are low on ram and are disk thrashing. here is my benchmarked
version which shows that unpacking (fixed to use A and not C) is the
fastest and regex (also fixed to do the simplest but correct thing which
is grab 2 chars) ties your code.

uncomment out those commented lines to see that this does the same and
correct thing in all cases.

here is the timing result run for 10 seconds each:

s/iter regex substring unpacking
regex 2.11 -- -0% -25%
substring 2.11 0% -- -25%
unpacking 1.58 33% 33% --

uri


use strict;
use warnings;

use File::Slurp ;
use Benchmark qw(:all) ;

my $duration = shift || -2 ;

my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;

my $data = read_file( $file_name, binary => 1 ) ;

#$data = "\x00\x10" ;

my $hex = unpack 'H*', $data;

# unpacking() ;
# regex() ;
# substring() ;
# exit ;

cmpthese( $duration, {

unpacking => \&unpacking,
regex => \&regex,
substring => \&substring,
} ) ;

sub unpacking {
my @arr = unpack( '(A2)*' , $hex) ;
# print "@arr\n"
}

sub regex {
my @arr = $hex =~ /(..{2})/g ;
# print "@arr\n"
}

sub substring {

my ($val, $offs, @arr) = ('',0);
while ($val=substr( $hex, $offs, 2)){
push @arr, $val;
$offs+=2;
}

# print "@arr\n"
}


--
Uri Guttman ------ (E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
const and array of array (of array ...) Mara Guida C Programming 3 09-03-2009 07:54 AM
'System.String[]' from its string representation 'String[] Array' =?Utf-8?B?UmFqZXNoIHNvbmk=?= ASP .Net 0 05-04-2006 04:29 PM
How to convert integer to string array without specify array size? henrytcy@gmail.com C Programming 7 12-08-2005 06:02 AM
length of an array in a struct in an array of structs in a struct in an array of structs Tuan Bui Perl Misc 14 07-29-2005 02:39 PM
Length of Array of Array of Array Tom Perl Misc 3 12-20-2004 05:23 PM



Advertisments