Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Break large file down into smaller parts

Reply
Thread Tools

Break large file down into smaller parts

 
 
Brian F.
Guest
Posts: n/a
 
      11-16-2004
Greets,

I have a 2million+ line file that gets generated twice a day, and was
wondering if there would be a way to read in the amount of lines and
split the file into several (say 5) parts with different file names?

so instead of having list.txt with 2 million lines, i'd end up with
file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
equal) amount of data from the original.

Brian F.
 
Reply With Quote
 
 
 
 
Toni Erdmann
Guest
Posts: n/a
 
      11-16-2004
Brian F. wrote:
> Greets,
>
> I have a 2million+ line file that gets generated twice a day, and was
> wondering if there would be a way to read in the amount of lines and
> split the file into several (say 5) parts with different file names?
>
> so instead of having list.txt with 2 million lines, i'd end up with
> file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
> equal) amount of data from the original.


man split

split --lines=NUMBER

Toni
 
Reply With Quote
 
 
 
 
Peter Hickman
Guest
Posts: n/a
 
      11-16-2004
If you are using unix or the like there is a command called split that will do
it for you.
 
Reply With Quote
 
Tore Aursand
Guest
Posts: n/a
 
      11-16-2004
On Tue, 16 Nov 2004 08:10:45 -0800, Brian F. wrote:
> I have a 2million+ line file that gets generated twice a day, and was
> wondering if there would be a way to read in the amount of lines and
> split the file into several (say 5) parts with different file names?
>
> so instead of having list.txt with 2 million lines, i'd end up with
> file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
> equal) amount of data from the original.


1. Count the number of lines in the file; 'perldoc -q lines'
2. Decide on how many parts you want.
3. Iterate through the file, opening, writing to and closing
each file as appropriate.


--
Tore Aursand <(E-Mail Removed)>
"A car is not the only thing that can be recalled by its maker."
(Unknown)
 
Reply With Quote
 
James Willmore
Guest
Posts: n/a
 
      11-16-2004
On Tue, 16 Nov 2004 08:10:45 -0800, Brian F. wrote:

> Greets,
>
> I have a 2million+ line file that gets generated twice a day, and was
> wondering if there would be a way to read in the amount of lines and
> split the file into several (say 5) parts with different file names?
>
> so instead of having list.txt with 2 million lines, i'd end up with
> file1.txt, file2.txt, file3.txt..... each with an equal (or nearly
> equal) amount of data from the original.


If the file you're reading isn't being written to as this script runs,
then the example should do what you want. If the file you want to read
*is* being written to while you're reading it, that opens up a whole host
of other issues (like losing information while reading).

(example - may need work)
#!/usr/bin/perl

use strict;
use warnings;

my $prefix_for_chunks = '/tmp/testing';
my $chunk_count = 1;
my $chunk_size = 100000;
my $file_to_read = '/var/log/messages';

open IN, $file_to_read or die "Can't open $file_to_read: $!\n";

my $current_output_file = sprintf "%s%04d.txt", $prefix_for_chunks,
$chunk_count;

open OUT, '+>', $current_output_file
or die "Can't open $current_output_file for writing: $!\n";

while(<IN>) {
if(!( $. % $chunk_size ) ){
close OUT;
$current_output_file = sprintf "%s%s.txt", $prefix_for_chunks,
$chunk_count++;
open OUT, '+>', $current_output_file
or die "Can't open $current_output_file for writing: $!\n";
}
print OUT $_;
}

close IN;
close OUT;

=cut

HTH

Jim

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Break class into smaller classes Immortal Nephi C++ 4 07-02-2009 02:50 PM
Break large file down into multiple files brianrpsgt1 Python 7 02-13-2009 11:38 AM
How to split a large avi file into smaller avi by size Avner DVD Video 1 10-31-2005 06:34 PM
Dividing a large wav file into smaller pieces Marco Bakker Computer Support 3 08-21-2004 09:34 AM
Parts parts....PARTS!!! ARGHHH dstvns A+ Certification 8 01-07-2004 07:57 PM



Advertisments