Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > String Processing Basic Stuff

Reply
Thread Tools

String Processing Basic Stuff

 
 
Vishal G
Guest
Posts: n/a
 
      10-21-2008
Hi Guys,

Very basic question....

Please dont suggest to use other programing language or other data
structure cause I can't...

I read data from file and yes I have to slurp the whole thing to
memory cause I can use upto 4GB...

data in file is in this format

30 56 78 34 2 39 87 (50 values per line, total of 120 million
entries)

reading file in paragraph mode

Now I have to remove multiple spaces without using much memory

This is what I have wrote (might be very low standard code for Gurus
out there)

It works but takes 5 mins consuming 600-700 MB, if I use substitution
to achieve this it takes 4-5 GB and around 2-3 mins...

Could you pls suggest way to process it faster using less memory
possible...

# Process the string $_ to remove leading whitespaces,
multiple whitespaces
# and to padd each value to same size
my $chr = '';
my $str = '';
my $value = '';
my $unitlength = $Alignment::BASEQUALITY_BYTES;
while (length($_) > 0) {
if (($chr = substr($_, 0, 1, "")) ne " ") {
$value = $value . $chr;
} else {
$str = $str . sprintf("%${unitlength}d",
$value) if ($value);
undef $value;
}
}

# BQ field
$ace->{'BQ'}->{$name} = $str;

undef $str;
undef $chr;

Thanks in advance

Vishal
 
Reply With Quote
 
 
 
 
Vishal G
Guest
Posts: n/a
 
      10-21-2008

Unitlength is 3 in this case


 
Reply With Quote
 
 
 
 
sln@netherlands.com
Guest
Posts: n/a
 
      10-21-2008
On Mon, 20 Oct 2008 23:27:24 -0700 (PDT), Vishal G <(E-Mail Removed)> wrote:

> Hi Guys,
>
>Very basic question....
>
>Please dont suggest to use other programing language or other data
>structure cause I can't...
>
>I read data from file and yes I have to slurp the whole thing to
>memory cause I can use upto 4GB...
>
>data in file is in this format
>
>30 56 78 34 2 39 87 (50 values per line, total of 120 million
>entries)
>
>reading file in paragraph mode
>
>Now I have to remove multiple spaces without using much memory
>
>This is what I have wrote (might be very low standard code for Gurus
>out there)
>
>It works but takes 5 mins consuming 600-700 MB, if I use substitution
>to achieve this it takes 4-5 GB and around 2-3 mins...
>
>Could you pls suggest way to process it faster using less memory
>possible...
>
> # Process the string $_ to remove leading whitespaces,
>multiple whitespaces
> # and to padd each value to same size
> my $chr = '';
> my $str = '';
> my $value = '';
> my $unitlength = $Alignment::BASEQUALITY_BYTES;
> while (length($_) > 0) {
> if (($chr = substr($_, 0, 1, "")) ne " ") {
> $value = $value . $chr;
> } else {
> $str = $str . sprintf("%${unitlength}d",
>$value) if ($value);
> undef $value;
> }
> }
>
> # BQ field
> $ace->{'BQ'}->{$name} = $str;
>
> undef $str;
> undef $chr;
>
>Thanks in advance
>
>Vishal


Not really clear on what you mean by 50 values per
line, or if you have slurped an 800 MB string in $_
Looks like your trying to shrink one string and grow
another.
The way you are doing it seems very granular.

Here are a couple approaches you could try if not
tried already.

sln

##############
# ???.pl
##############

use strict;
use warnings;

my $unitlength = 5; #$Alignment::BASEQUALITY_BYTES;
my $line = '30 56 78 34 2 39 87 ';
my $str = $line;


# If its 50 values per line
# do substitution
# ------------------------------
$str =~ s/\s*(\d+)/sprintf "%${unitlength}d", $1/ge;
$str =~ s/\s+$//;
print "'$str'\n";


# If its all on one huge line
# shrink one string, grow another
# (not sure this will save memory)
# ------------------------------------
my $newstr = '';
my $RxNumber = qr/\s*(\d+)/;

while ($str =~ s/$RxNumber//)
{
$newstr .= (sprintf "%${unitlength}d", $1);
}
print "'$newstr'\n";

__END__

output:

' 30 56 78 34 2 39 87'
' 30 56 78 34 2 39 87'

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      10-21-2008
Vishal G <(E-Mail Removed)> wrote:
> Hi Guys,
>
> Very basic question....
>
> Please dont suggest to use other programing language or other data
> structure cause I can't...


If you can't use a different structure, at least for intermediates,
then you can't program.


> I read data from file and yes I have to slurp the whole thing to
> memory cause I can use upto 4GB...


Because you can do it that means you have to? We can't you read line by
line, processing each line and appending the result to $str before moving
to the next?

>
> data in file is in this format
>
> 30 56 78 34 2 39 87 (50 values per line, total of 120 million
> entries)


So then, would this work to make an example file?
perl -le 'foreach (1..2.4e6) {print join " ", map int(rand()*99), 1..50}'


>
> reading file in paragraph mode


Why reading in paragraph mode? From your format description, the data
is not formatted in paragraphs.

>
> Now I have to remove multiple spaces without using much memory
>
> This is what I have wrote (might be very low standard code for Gurus
> out there)
>
> It works but takes 5 mins consuming 600-700 MB,


When I try it, I get many many warnings which suggests that it is not
actually working correctly.


> if I use substitution
> to achieve this it takes 4-5 GB and around 2-3 mins...


How did you use substitution?


Starting your code indented half way across the screen isn't very helpful.
It just leads to messy line wrap problems. I fixed that.

> my $chr = '';
> my $str = '';
> my $value = '';
> my $unitlength = $Alignment::BASEQUALITY_BYTES;
> while (length($_) > 0) {
> if (($chr = substr($_, 0, 1, "")) ne " ") {
> $value = $value . $chr;
> } else {
> $str = $str . sprintf("%${unitlength}d", $value) if ($value);


I get:
Argument "67\n33" isn't numeric in sprintf....

> undef $value;
> }
> }


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
basic if stuff- testing ranges Donn Ingle Python 12 11-26-2007 09:18 PM
Basic Stuff wink.co.nr C++ 4 05-17-2007 07:46 PM
Basic stuff Sparko ASP .Net 3 04-20-2005 10:36 AM
What is some basic stuff? Tom C++ 3 12-14-2003 12:46 AM
brief questions on basic stuff mat DVD Video 0 11-25-2003 10:36 PM



Advertisments