Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Parsing two files and comparing the first fields..

Reply
Thread Tools

Parsing two files and comparing the first fields..

 
 
clearguy02@yahoo.com
Guest
Posts: n/a
 
      11-28-2007
I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john http://www.velocityreviews.com/forums/(E-Mail Removed) mstella
mstella mary (E-Mail Removed) bborders
msmith martin (E-Mail Removed) mstella
bborders bob (E-Mail Removed) rcasey
swatson sush (E-Mail Removed) mstella
rcasey rick (E-Mail Removed) rcasey


c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Below is the script I am using and can you kindly let me know where I
am doing wrong here?

================================

use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

@array1 = <IN1>;
@array2 = <IN2>;

foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];
}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];

foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}
close (IN1);
close (IN2);
close (OUT1);
close (OUT2);
+++++++++++++++++++++++++++++++++++++


Thanks in advance,
JC
 
Reply With Quote
 
 
 
 
clearguy02@yahoo.com
Guest
Posts: n/a
 
      11-28-2007
On Nov 28, 3:12 pm, (E-Mail Removed) wrote:
> I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
> file has 4 fields and the second one has two fields, but both files
> have the "user_id" as the first field.
>
> Example:
>
> c:\test1.txt
> =================
> jcarter john (E-Mail Removed) mstella
> mstella mary (E-Mail Removed) bborders
> msmith martin (E-Mail Removed) mstella
> bborders bob (E-Mail Removed) rcasey
> swatson sush (E-Mail Removed) mstella
> rcasey rick (E-Mail Removed) rcasey
>
> c:\test2.txt
> ======================
> aaboss active
> jcarter active
> msmith non-active
> ssullivan non-active
> rcasey non-active
> usmiths active
>
> ===============================================
>
> Now I want to check if each id from the second file exists in the
> first one or not. I want the output of both matching and non-matching
> id's.
>
> Below is the script I am using and can you kindly let me know where I
> am doing wrong here?
>
> ================================
>
> use strict;
> use warnings;
>
> open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
> open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
> open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
> file: $!";
> open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
> file: $!";
>
> @array1 = <IN1>;
> @array2 = <IN2>;
>
> foreach $record1 (@array1)
> {
> chomp $record1;
> @fields1= split /\t/, $record1;
> $fist_id = $fields1[0];
> }
>
> foreach $record2 (@array2)
> {
> chomp $record2;
> @fields2= split /\t/, $record2;
> $second_id = $fields2[0];
>
> foreach (@fields1)
> {
> if ($second_id eq $fist_id)
> {
> print OUT1 "$record2\n" ; # matching
> }
> else
> {
> print OUT1 "$record2\n" ; # matching
> }
> }
> close (IN1);
> close (IN2);
> close (OUT1);
> close (OUT2);
> +++++++++++++++++++++++++++++++++++++
>
> Thanks in advance,
> JC


Forgot to add "my" before the variables while typing.. sorry about
that.

--JC
 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      11-28-2007
(E-Mail Removed) wrote in news:b20d8640-91c1-41d7-a46a-ab04bf405239
@d21g2000prf.googlegroups.com:

>
> Now I want to check if each id from the second file exists in the
> first one or not. I want the output of both matching and non-matching
> id's.


Read

perldoc -q intersection

Parse the files into a hashes using the id field values as keys.

> use strict;
> use warnings;
>
> open (IN1, "c:\test1.txt") || die "Can not open the file: $!";


This will probably not succeed as it will look for a file named
{TAB}est1.txt in c:\.

> open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
> open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
> file: $!";
> open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
> file: $!";


I generally prefer to use lexical filehandles and the three argument
form of open. Also, you can just use / as the directory separator in
Windows. For increased portability, I prefer to use File::Spec::catfile.

> @array1 = <IN1>;
> @array2 = <IN2>;


No need to slurp anything.

> foreach $record1 (@array1)
> {
> chomp $record1;
> @fields1= split /\t/, $record1;
> $fist_id = $fields1[0];


my $first_id = (split /\t/, $record)[0];

> }
>
> foreach $record2 (@array2)
> {
> chomp $record2;
> @fields2= split /\t/, $record2;
> $second_id = $fields2[0];



This nested loop approach will have extremely bad performance
characteristics as the number of input lines increases. Use hashes.

> foreach (@fields1)
> {
> if ($second_id eq $fist_id)
> {
> print OUT1 "$record2\n" ; # matching
> }
> else
> {
> print OUT1 "$record2\n" ; # matching
> }
> }


So if $second_id eq $first_id, your write it to OUT1, otherwise, you
also write it to OUT1. What's the point???

The script below represents my best guess as to what you are trying to
achieve.

#!/usr/bin/perl

use strict;
use warnings;

my %myconfig = (
input1 => 'input1.txt',
input2 => 'input2.txt',
matching => 'matching.txt',
non_matching => 'non_matching.txt',
);

my %fields1;

{
open my $input, '<', $myconfig{input1}
or die "Cannot open '$myconfig{input1}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
$fields1{ $1 } = 1;
}
}

close $input
or die "Cannot close '$myconfig{input1}': $!";
}

open my $input, '<', $myconfig{input2}
or die "Cannot open '$myconfig{input2}': $!";

open my $matching, '>', $myconfig{matching}
or die "Cannot open '$myconfig{matching}': $!";

open my $non_matching, '>', $myconfig{non_matching}
or die "Cannot open '$myconfig{non_matching}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
if ( exists $fields1{ $1 } ) {
print $matching "$1\n";
}
else {
print $non_matching "$1\n";
}
}
}

__END__

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input1.txt
jcarter john (E-Mail Removed) mstella
mstella mary (E-Mail Removed) bborders
msmith martin (E-Mail Removed) mstella
bborders bob (E-Mail Removed) rcasey
swatson sush (E-Mail Removed) mstella
rcasey rick (E-Mail Removed) rcasey


C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input2.txt
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active


C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat matching.txt
jcarter
msmith
rcasey

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat non_matching.txt
aaboss
ssullivan
usmiths



--
A. Sinan Unur <(E-Mail Removed)>
(remove .invalid and reverse each component for email address)
clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>

 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      11-28-2007
(E-Mail Removed) wrote:
>
> I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
> file has 4 fields and the second one has two fields, but both files
> have the "user_id" as the first field.
>
> Example:
>
> c:\test1.txt
> =================
> jcarter john (E-Mail Removed) mstella
> mstella mary (E-Mail Removed) bborders
> msmith martin (E-Mail Removed) mstella
> bborders bob (E-Mail Removed) rcasey
> swatson sush (E-Mail Removed) mstella
> rcasey rick (E-Mail Removed) rcasey
>
> c:\test2.txt
> ======================
> aaboss active
> jcarter active
> msmith non-active
> ssullivan non-active
> rcasey non-active
> usmiths active
>
> ===============================================
>
> Now I want to check if each id from the second file exists in the
> first one or not. I want the output of both matching and non-matching
> id's.



Something like this should work:


#!/usr/bin/perl
use warnings;
use strict;

open my $fh2, '<', 'c:/test2.txt' or die "Cannot open 'c:/test2.txt'
$!";

my %ids;
while ( <$fh2> ) {
$ids{ ( split /\t/ )[ 0 ] }++;
}

close $fh2;

open my $fh1, '<', 'c:/test1.txt' or die "Cannot open 'c:/test1.txt'
$!";
open my $match, '>', "$dir1/matching.txt" or die "Cannot open
'$dir1/matching.txt' $!";
open my $nonm, '>', "$dir1/not_matching.txt" or die "Cannot open
'$dir1/not_matching.txt' $!";

while ( <$fh1> ) {
my $id = ( split /\t/ )[ 0 ];
if ( exists $ids{ $id } ) {
print $match $_;
}
else {
print $nonm $_;
}
}

close $nonm;
close $match;
close $fh1;

__END__



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
comparing two test files ruds Java 5 12-23-2011 07:13 PM
Comparing two book chapters (text files) Nick Matzke Python 1 02-05-2009 09:42 AM
How to compare two SOAP Envelope or two Document or two XML files GenxLogic Java 3 12-06-2006 08:41 PM
Comparing Two Files line by line and word by word Frost C Programming 8 02-10-2006 11:16 AM
Comparing two files for equality Edgardo Hames Ruby 11 01-18-2005 05:53 AM



Advertisments