Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > data file

Reply
Thread Tools

data file

 
 
friend.05@gmail.com
Guest
Posts: n/a
 
      10-09-2008
I have a large file in following format:

ID | Time | IP | Code


I want only data lines which has unique IP+Code.

If IP+Code is repeated then I don't want line.

 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      10-09-2008

Quoth "" <>:
> I have a large file in following format:
>
> ID | Time | IP | Code
>
>
> I want only data lines which has unique IP+Code.
>
> If IP+Code is repeated then I don't want line.


perldoc -q unique

Ben

--
Musica Dei donum optimi, trahit homines, trahit deos. |
Musica truces mollit animos, tristesque mentes erigit. |
Musica vel ipsas arbores et horridas movet feras. |
 
Reply With Quote
 
 
 
 
friend.05@gmail.com
Guest
Posts: n/a
 
      10-10-2008
On Oct 9, 6:08*pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth "friend...@gmail.com" <hirenshah...@gmail.com>:
>
> > I have a large file in following format:

>
> > ID | Time | IP | Code

>
> > I want only data lines which has unique IP+Code.

>
> > If IP+Code is repeated then I don't want line.

>
> perldoc -q unique
>
> Ben
>
> --
> Musica Dei donum optimi, trahit homines, trahit deos. * *|
> Musica truces mollit animos, tristesque mentes erigit. * | * b...@morrow.me.uk
> Musica vel ipsas arbores et horridas movet feras. * * * *|



Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.

#!/usr/local/bin/perl

print "Welcome\n";

$pri_file = "out_pri.txt";

$cnt = 0;
$flag = 0;

open(INFO_PRI,$pri_file)or die $!;
open(INFO,$pri_file)or die $!;

@pri_lines_ = <INFO>;

while($pri_line = <INFO_PRI>)
{
@primary = split('\|',$pri_line);
$pri_cli_ip = $primary[4];
$pri_id = $primary[7];
print "$pri_id\n";


foreach $p_line (@pri_lines_)
{
@pri = split('\|',$p_line);
$cli_ip = $pri[4];
$id = $pri[7];

if(($pri_cli_ip == $cli_ip) && ($pri__id == $id))
{
$cnt++;
if($cnt == 2){
$cnt = 0;
$flag = 1;
last;
}
}
}
if($flag == 0){
open(FILE,'>>pri_unique.txt');
print FILE "$pri_line\n";
close(FILE);
}else{
$flag = 0;
}
}

close(INFO_PRI);
close(INFO);
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      10-10-2008
"" <> wrote:
>On Oct 9, 6:08*pm, Ben Morrow <b...@morrow.me.uk> wrote:
>> Quoth "friend...@gmail.com" <hirenshah...@gmail.com>:
>>
>> > I have a large file in following format:

>>
>> > ID | Time | IP | Code

>>
>> > I want only data lines which has unique IP+Code.

>>
>> > If IP+Code is repeated then I don't want line.

>
>Below is code which I have written to extract unique IP+Code from
>large file. (File format is ID | Time | IP | code).
>
>I am not sure which will be best way to do this.
>
>#!/usr/local/bin/perl
>$pri_file = "out_pri.txt";
>
>$cnt = 0;
>$flag = 0;
>
>open(INFO_PRI,$pri_file)or die $!;
>open(INFO,$pri_file)or die $!;
>
>@pri_lines_ = <INFO>;
>
>while($pri_line = <INFO_PRI>)

[rest of code snipped]

Many things I don't understand in this code, among them why you are
using 2 file handles to the same file, why you are slurping in the whole
file on one file handle and then process the file line by line on the
other file handle, why you have a nested loop, etc, etc.

Your requirements seem to be straight forward and easy to translate into
a simple algorithm (warning, sketch only, not tested):

my %idtable;
open ($F, '<', $myfile) of die "Cannot read $myfile because $!\n";
while (<$F>) { #loop through file and gather all IP | Code combinations
(undef, undef, $ip, $code) = split '\|';
$idtable{"$ip|$code"}++; #record this ip-code combination
}
seek $F, 0; #reset file to start
while (<$F>) { #loop through file again and ....
(undef, undef, $ip, $code) = split '\|';
print if $idtable{"$ip|$code"} == 1;
#... print that line if the ip-code combination
#exists exactly once in the file
close $F;

jue
 
Reply With Quote
 
Tad J McClellan
Guest
Posts: n/a
 
      10-10-2008
<> wrote:

> $flag = 0;



You should choose meaningful variable names.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      10-10-2008
[don't quote .signatures]

Quoth "" <>:
> On Oct 9, 6:08*pm, Ben Morrow <b...@morrow.me.uk> wrote:
> > Quoth "friend...@gmail.com" <hirenshah...@gmail.com>:
> >
> > > I have a large file in following format:

> >
> > > ID | Time | IP | Code

> >
> > > I want only data lines which has unique IP+Code.

> >
> > > If IP+Code is repeated then I don't want line.

> >
> > perldoc -q unique

>
> Below is code which I have written to extract unique IP+Code from
> large file. (File format is ID | Time | IP | code).
>
> I am not sure which will be best way to do this.
>
> #!/usr/local/bin/perl


Where is

use warnings;
use strict;

? You have already been told to include this.

> print "Welcome\n";
>
> $pri_file = "out_pri.txt";
>
> $cnt = 0;
> $flag = 0;
>
> open(INFO_PRI,$pri_file)or die $!;
> open(INFO,$pri_file)or die $!;


You have already been told to use lexical filehandles and 3-arg open.
You should make the error message actually useful:

open (my $INFO_PRI, "<", $pri_file)
or die "can't open '$pri_file': $!";

Why are you opening the same file twice? Just iterate over @pri_lines_
instead.

> @pri_lines_ = <INFO>;


Why on earth are you using a variable name ending in _?

> while($pri_line = <INFO_PRI>)
> {
> @primary = split('\|',$pri_line);
> $pri_cli_ip = $primary[4];
> $pri_id = $primary[7];
> print "$pri_id\n";
>
>
> foreach $p_line (@pri_lines_)
> {
> @pri = split('\|',$p_line);


You keep doing the same split over and over. Split the line first, and
keep the results in a datastructure till you need them.

> $cli_ip = $pri[4];
> $id = $pri[7];
>
> if(($pri_cli_ip == $cli_ip) && ($pri__id == $id))


Did you read perldoc -q unique? It says to use a hash for finding
uniqueness.

> {
> $cnt++;


You are not resetting $cnt between iterations of the outer loop, so
every other line will be considered duplicate.

> if($cnt == 2){
> $cnt = 0;
> $flag = 1;
> last;


If you give the outer loop a label, you can use next LABEL and avoid
$flag.

> }
> }
> }
> if($flag == 0){
> open(FILE,'>>pri_unique.txt');
> print FILE "$pri_line\n";
> close(FILE);


Why do you keep opening and closing this file?

Ben

--
Outside of a dog, a book is a man's best friend.
Inside of a dog, it's too dark to read.
Groucho Marx
 
Reply With Quote
 
J. Gleixner
Guest
Posts: n/a
 
      10-10-2008
wrote:
> On Oct 9, 6:08 pm, Ben Morrow <b...@morrow.me.uk> wrote:
>> Quoth "friend...@gmail.com" <hirenshah...@gmail.com>:
>>
>>> I have a large file in following format:
>>> ID | Time | IP | Code
>>> I want only data lines which has unique IP+Code.
>>> If IP+Code is repeated then I don't want line.

>> perldoc -q unique
>>
>> Ben


> Below is code which I have written to extract unique IP+Code from
> large file. (File format is ID | Time | IP | code).
>
> I am not sure which will be best way to do this.


Well, it's not the way you posted.

Did you actually read the perldoc Ben mentioned above? You don't use a
hash at all, so I'm guessing not.

>
> #!/usr/local/bin/perl

use strict;

open( my $INFO, '<', $pri_file ) or die "Can't open $pri_file: $!";
open( my $OUT, '>', 'unique.out' ) or die "Can't open unique.out: $!";

my %info;
while ( my $line = <$INFO> )
{
chomp( $line );
# split the data.. you can split directly into the variables..
# my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
# print $line to $OUT if the hash key of $cli_ip and $id doesn't already
exist.

}


 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      10-10-2008
"J. Gleixner" <glex_no-> wrote:
> wrote:
>> On Oct 9, 6:08 pm, Ben Morrow <b...@morrow.me.uk> wrote:
>>> Quoth "friend...@gmail.com" <hirenshah...@gmail.com>:
>>>
>>>> I have a large file in following format:
>>>> ID | Time | IP | Code
>>>> I want only data lines which has unique IP+Code.
>>>> If IP+Code is repeated then I don't want line.
>>> perldoc -q unique
>>>
>>> Ben

>
>> Below is code which I have written to extract unique IP+Code from
>> large file. (File format is ID | Time | IP | code).
>>
>> I am not sure which will be best way to do this.

>
>Well, it's not the way you posted.
>
>Did you actually read the perldoc Ben mentioned above? You don't use a
>hash at all, so I'm guessing not.


ACK!

>while ( my $line = <$INFO> )
>{
> chomp( $line );
># split the data.. you can split directly into the variables..
># my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
># print $line to $OUT if the hash key of $cli_ip and $id doesn't already
>exist.


That will print each IP+code exactly once. I think (but I may be
mistaken, the OPs isn't clear on that) he wants only those lines, that
_are_ unique wrt. the IP+code, i.e. where there is no second line with
the same IP+code.

jue
 
Reply With Quote
 
friend.05@gmail.com
Guest
Posts: n/a
 
      10-10-2008
On Oct 10, 12:57*pm, Jürgen Exner <jurge...@hotmail.com> wrote:
> "J. Gleixner" <glex_no-s...@qwest-spam-no.invalid> wrote:
> >friend...@gmail.com wrote:
> >> On Oct 9, 6:08 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> >>> Quoth "friend...@gmail.com" <hirenshah...@gmail.com>:

>
> >>>> I have a large file in following format:
> >>>> ID | Time | IP | Code
> >>>> I want only data lines which has unique IP+Code.
> >>>> If IP+Code is repeated then I don't want line.
> >>> perldoc -q unique

>
> >>> Ben

>
> >> Below is code which I have written to extract unique IP+Code from
> >> large file. (File format is ID | Time | IP | code).

>
> >> I am not sure which will be best way to do this.

>
> >Well, it's not the way you posted.

>
> >Did you actually read the perldoc Ben mentioned above? *You don't use a
> >hash at all, so I'm guessing not.

>
> ACK!
>
> >while ( my $line = <$INFO> )
> >{
> > * *chomp( $line );
> ># split the data.. you can split directly into the variables..
> ># my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
> ># print $line to $OUT if the hash key of $cli_ip and $id doesn't already
> >exist.

>
> That will print each IP+code exactly once. I think (but I may be
> mistaken, the OPs isn't clear on that) he wants only those lines, that
> _are_ unique wrt. the IP+code, i.e. where there is no second line with
> the same IP+code.
>
> jue- Hide quoted text -
>
> - Show quoted text -


Thanks to all for help. That was helpful.

But.

I created the hash (IP+Code) combination.

But How to chk if this hash(each combination) is exactly one time in
file ?
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      10-10-2008
"" <> wrote:
>I created the hash (IP+Code) combination.
>
>But How to chk if this hash(each combination) is exactly one time in
>file ?


You could count the number of occurences and then compare the count
against 1?

$IDTable{"$IP+$Code"}++;
[......]

if ($IDTable{"$IP+$Code"} == 1) {
print "Look ma, $IP+$Code occurs exactly once in the file\n";

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
fstream data from file, with file periodically written with new data roughtrader C++ 3 02-17-2009 12:42 PM
how to stream or write data into a tar.gz file as if the data werefrom files? bwv549 Ruby 12 10-06-2008 02:01 PM
how to encrypt a C data and write a bin file and read a bin at run time and decrypt C data sweety C Programming 9 02-07-2006 05:28 PM
Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently Jane Austine Python 14 10-09-2004 05:54 PM
- Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently Jane Austine Python 2 10-05-2004 01:54 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57