Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > nice parallel file reading

Reply
Thread Tools

nice parallel file reading

 
 
George Mpouras
Guest
Posts: n/a
 
      04-26-2013
# Read files in parallel. FileHandles are closed automatically.
# Files are read at every iteration circulary, hope you like it !

use strict;
use warnings;

my $Read_line = Read_files_round_robin( 'file1.txt', 'file2.txt',
'file3.txt' );

while ( my $line = $Read_line->() ) {
last if $line eq '__ALL_FILES_HAVE_BEEN_READ__';
chomp $line;
print "$line\n";
}


sub Read_files_round_robin
{
my $fc = $#_;
my @FH;
for(my $i=0; $i<@_; $i++) { open $FH[$#_ - $i] , $_[$i] or die "Could not
read file \"$_[$i]\" because \"$^E\"\n" }

sub
{
local $_ = '__ALL_FILES_HAVE_BEEN_READ__';

for (my $i=$fc; $i>=0; $i--)
{
if ( eof $FH[$i] )
{
close $FH[$i];
splice @FH, $i, 1;
next
}

$_ = readline $FH[$i];
last
}

$fc = $fc == 0 ? $#FH : $fc - 1;
$_
}
}

 
Reply With Quote
 
 
 
 
George Mpouras
Guest
Posts: n/a
 
      04-27-2013
# there was a problem with the code at my initial post
# Here is corrected, of how to read files like round-robin
# using an iterator


#!/usr/bin/perl
use strict;
use warnings;

my $Reader = Read_files_round_robin( 'file1.txt', 'file2.txt',
'file3.txt' );

while ( my $line = $Reader->() ) {
last if $line eq '__ALL_FILES_HAVE_BEEN_READ__';
chomp $line;
print "*$line*\n";
}




sub Read_files_round_robin
{
my @FH;
for (my $i=$#_; $i>=0; $i--) { if (open my $fh, $_[$i]) {push @FH, $fh} }
my $k = $#FH;

sub
{
until (0 == @FH)
{
for (my $i=$k--; $i>=0; $i--)
{
$k = $#FH if $k == -1;

if ( eof $FH[$i] )
{
close $FH[$i];
splice @FH, $i, 1;
$k--
}
else
{
return readline $FH[$i]
}
}
}

'__ALL_FILES_HAVE_BEEN_READ__'
}
}

 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      04-27-2013
"George Mpouras"
<(E-Mail Removed) m.com.nospam> wrote:
># there was a problem with the code at my initial post
># Here is corrected, of how to read files like round-robin
># using an iterator


While this might be mildly interesting as an academic exercise I wonder
if there is any actual non-contrived application where you would have to
read multiple files synchronously line-by-line and at the same time the
files are too large to just load them into a variable and then process
their content.

jue
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      04-27-2013
On 2013-04-27 14:49, Jürgen Exner <(E-Mail Removed)> wrote:
> "George Mpouras"
><(E-Mail Removed) am.com.nospam> wrote:
>># there was a problem with the code at my initial post
>># Here is corrected, of how to read files like round-robin
>># using an iterator

>
> While this might be mildly interesting as an academic exercise I wonder
> if there is any actual non-contrived application where you would have to
> read multiple files synchronously line-by-line and at the same time the
> files are too large to just load them into a variable and then process
> their content.


Not exactly like George's code, but very similar: Merge sorted files.

A similar technique could be used to implement comm(1).

hp


--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      04-27-2013
"Peter J. Holzer" <(E-Mail Removed)> wrote:
>On 2013-04-27 14:49, Jürgen Exner <(E-Mail Removed)> wrote:
>> "George Mpouras"
>><(E-Mail Removed) pam.com.nospam> wrote:
>>># there was a problem with the code at my initial post
>>># Here is corrected, of how to read files like round-robin
>>># using an iterator

>>
>> While this might be mildly interesting as an academic exercise I wonder
>> if there is any actual non-contrived application where you would have to
>> read multiple files synchronously line-by-line and at the same time the
>> files are too large to just load them into a variable and then process
>> their content.

>
>Not exactly like George's code, but very similar: Merge sorted files.


Fair enough, but for merge sort you explicitely do _NOT_ read files
synchronously.
The only application I could think of is testing for equality of n
files.
Or implementing a poor man's database in multiple files with each column
of a table in a separate file. Which of course would be synchronization
nightmare.

jue
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      04-27-2013
"George Mpouras"
<(E-Mail Removed) m.com.nospam>
writes:
> # there was a problem with the code at my initial post
> # Here is corrected, of how to read files like round-robin
> # using an iterator


[...]

> sub Read_files_round_robin
> {
> my @FH;
> for (my $i=$#_; $i>=0; $i--) { if (open my $fh, $_[$i]) {push @FH, $fh} }
> my $k = $#FH;
>
> sub
> {
> until (0 == @FH)
> {
> for (my $i=$k--; $i>=0; $i--)
> {
> $k = $#FH if $k == -1;
>
> if ( eof $FH[$i] )
> {
> close $FH[$i];
> splice @FH, $i, 1;
> $k--
> }
> else
> {
> return readline $FH[$i]
> }
> }
> }
>
> '__ALL_FILES_HAVE_BEEN_READ__'
> }
> }


Fun ways to waste your time:

----------------------
#!/usr/bin/perl
use strict;

my $Reader = Read_files_round_robin( 'file1.txt', 'wuzz', 'file2.txt', 'file3.txt');

while ( my $line = $Reader->() ) {
chomp $line;
print "*$line*\n";
}

sub Read_files_round_robin
{
my (@F, $cur);

open($F[0][@{$F[0]}], '<', $_) // --$#{$F[0]}
for @_;

return sub {
my ($fh, $l);

do {
$fh = shift(@{$F[$cur]}) or return
} until defined($l = <$fh>);

push(@{$F[$cur ^ 1]}, $fh);
$cur ^= 1 unless @{$F[$cur]};

return $l;
};
}
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      04-27-2013
Rainer Weikusat <(E-Mail Removed)> writes:

[...]

> sub Read_files_round_robin
> {
> my (@F, $cur);
>
> open($F[0][@{$F[0]}], '<', $_) // --$#{$F[0]}
> for @_;
>
> return sub {
> my ($fh, $l);
>
> do {
> $fh = shift(@{$F[$cur]}) or return
> } until defined($l = <$fh>);
>
> push(@{$F[$cur ^ 1]}, $fh);
> $cur ^= 1 unless @{$F[$cur]};
>
> return $l;
> };
> }


While this is fairly neat, it is unfortunately broken: It is possible
that the 'current' array runs out of usable file handles but that a
usable file handle still exists in the 'next' array (eg, when the
first file is the one containing the most lines of text). This means
the 'current' array needs to be switched exactly once in this case
which, in turn, ends up making the control flow rather ugly (I
tried a few variants but didn't find one I would want to post).

 
Reply With Quote
 
George Mpouras
Guest
Posts: n/a
 
      04-27-2013
push(@{$F[$cur ^ 1]}, $fh);

impressive , I have to study this !!
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      04-28-2013
"George Mpouras"
<(E-Mail Removed) m.com.nospam>
writes:
> push(@{$F[$cur ^ 1]}, $fh);
>
> impressive ,


Not really. The idea to use two arrays cannot work in this way, as I
already wrote in another posting. But it is still possible to do away
with the counting loops (which are IMHO 'rather ugly', IOW, I never
use for (;; for anything):

-----------------
sub Read_files_round_robin
{
my (@FH, $cur);

open($FH[@FH], '<', $_) // --$#FH
for @_;

$cur = -1;

return sub {
my $l;

return unless @FH;

$cur = ($cur + 1) % @FH;
$cur == @FH and --$cur
until ($l = readline($FH[$cur])) // (splice(@FH, $cur, 1), !@FH);

return $l;
};
}
------------------

It is possible to replace the

$cur == @FH and --$cur

with

$cur -= $cur == @FH

This would be a good idea in C because it would avoid a branch in favor
of an arithmetic no-op. I don't really know if this is true or false
for Perl and I'm unusure whether one or the other should be preferred
for clarity.

?


 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      04-28-2013
"Peter J. Holzer" <(E-Mail Removed)> writes:
> On 2013-04-27 14:49, Jürgen Exner <(E-Mail Removed)> wrote:
>> "George Mpouras"
>><(E-Mail Removed) pam.com.nospam> wrote:
>>># there was a problem with the code at my initial post
>>># Here is corrected, of how to read files like round-robin
>>># using an iterator

>>
>> While this might be mildly interesting as an academic exercise I wonder
>> if there is any actual non-contrived application where you would have to
>> read multiple files synchronously line-by-line and at the same time the
>> files are too large to just load them into a variable and then process
>> their content.

>
> Not exactly like George's code, but very similar: Merge sorted files.
>
> A similar technique could be used to implement comm(1).


There's also a paste utility which does round-robin merging of lines
from several input files. This would need a different EOF-handling,
though (it would need to return an empty line every time a file which
ran out of data is supposed to be read from).
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Parallel in, Parallel out shift register Vivek Menon VHDL 0 06-10-2011 10:15 PM
Parallel in, Parallel out shift register Vivek Menon VHDL 5 06-08-2011 03:56 PM
reading an array of parallel input data koyel.aphy@gmail.com VHDL 6 06-27-2008 05:47 PM
Parallel port control with USB->Parallel converter Soren Python 4 02-14-2008 03:18 PM
reading shell output in parallel Steve Python 3 08-16-2004 10:48 AM



Advertisments