Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Break large file down into smaller parts (http://www.velocityreviews.com/forums/t889140-break-large-file-down-into-smaller-parts.html)

Brian F. 11-16-2004 04:10 PM

Break large file down into smaller parts
 
Greets,

I have a 2million+ line file that gets generated twice a day, and was
wondering if there would be a way to read in the amount of lines and
split the file into several (say 5) parts with different file names?

so instead of having list.txt with 2 million lines, i'd end up with
file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
equal) amount of data from the original.

Brian F.

Toni Erdmann 11-16-2004 04:32 PM

Re: Break large file down into smaller parts
 
Brian F. wrote:
> Greets,
>
> I have a 2million+ line file that gets generated twice a day, and was
> wondering if there would be a way to read in the amount of lines and
> split the file into several (say 5) parts with different file names?
>
> so instead of having list.txt with 2 million lines, i'd end up with
> file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
> equal) amount of data from the original.


man split

split --lines=NUMBER

Toni

Peter Hickman 11-16-2004 04:32 PM

Re: Break large file down into smaller parts
 
If you are using unix or the like there is a command called split that will do
it for you.

Tore Aursand 11-16-2004 04:35 PM

Re: Break large file down into smaller parts
 
On Tue, 16 Nov 2004 08:10:45 -0800, Brian F. wrote:
> I have a 2million+ line file that gets generated twice a day, and was
> wondering if there would be a way to read in the amount of lines and
> split the file into several (say 5) parts with different file names?
>
> so instead of having list.txt with 2 million lines, i'd end up with
> file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
> equal) amount of data from the original.


1. Count the number of lines in the file; 'perldoc -q lines'
2. Decide on how many parts you want.
3. Iterate through the file, opening, writing to and closing
each file as appropriate.


--
Tore Aursand <toreau@gmail.com>
"A car is not the only thing that can be recalled by its maker."
(Unknown)

James Willmore 11-16-2004 04:58 PM

Re: Break large file down into smaller parts
 
On Tue, 16 Nov 2004 08:10:45 -0800, Brian F. wrote:

> Greets,
>
> I have a 2million+ line file that gets generated twice a day, and was
> wondering if there would be a way to read in the amount of lines and
> split the file into several (say 5) parts with different file names?
>
> so instead of having list.txt with 2 million lines, i'd end up with
> file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
> equal) amount of data from the original.


If the file you're reading isn't being written to as this script runs,
then the example should do what you want. If the file you want to read
*is* being written to while you're reading it, that opens up a whole host
of other issues (like losing information while reading).

(example - may need work)
#!/usr/bin/perl

use strict;
use warnings;

my $prefix_for_chunks = '/tmp/testing';
my $chunk_count = 1;
my $chunk_size = 100000;
my $file_to_read = '/var/log/messages';

open IN, $file_to_read or die "Can't open $file_to_read: $!\n";

my $current_output_file = sprintf "%s%04d.txt", $prefix_for_chunks,
$chunk_count;

open OUT, '+>', $current_output_file
or die "Can't open $current_output_file for writing: $!\n";

while(<IN>) {
if(!( $. % $chunk_size ) ){
close OUT;
$current_output_file = sprintf "%s%s.txt", $prefix_for_chunks,
$chunk_count++;
open OUT, '+>', $current_output_file
or die "Can't open $current_output_file for writing: $!\n";
}
print OUT $_;
}

close IN;
close OUT;

=cut

HTH

Jim



All times are GMT. The time now is 08:37 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.