Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Compare 2 files and put the matching part in a 3rd file

Reply
Thread Tools

Compare 2 files and put the matching part in a 3rd file

 
 
BerNaC
Guest
Posts: n/a
 
      01-21-2005
Hi all,

I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

file1 file2 comprare and match=
file3
------ -----------
-----------
1 3
3
2 4
4
3 5
5
4 6
5 7


Thank you

--
-----------------------
BerNaC
___________

 
Reply With Quote
 
 
 
 
Arndt Jonasson
Guest
Posts: n/a
 
      01-21-2005

BerNaC <(E-Mail Removed)> writes:
>
> I need to compare two text files and put the maching result in another
> file. Does anybody have an idea?


Is anything known about the format of the files and in what ways they
can differ? Doing a general comparison and present the differences as
a minimal set of individual differences is quite complex. In that case
I would choose running the Unix 'diff' program on the files and
post-process the output.

CPAN has only "compare and stop when finding a difference", it seems.
 
Reply With Quote
 
 
 
 
BerNaC
Guest
Posts: n/a
 
      01-21-2005
Arndt Jonasson a formulé ce vendredi :
> BerNaC <(E-Mail Removed)> writes:
>>
>> I need to compare two text files and put the maching result in another
>> file. Does anybody have an idea?

>
> Is anything known about the format of the files and in what ways they
> can differ? Doing a general comparison and present the differences as
> a minimal set of individual differences is quite complex. In that case
> I would choose running the Unix 'diff' program on the files and
> post-process the output.
>
> CPAN has only "compare and stop when finding a difference", it seems.


Well the 2 text files have 1 ID from sendmail log per line, it looks
like that :

1U34334Y34
1ZRTRG345
2SDFSDF17
and so on

So one file is ID from mail the other one is ID to mail so il they
match that mean that one mail with this ID has been sent from this guy
to this guy .
So as you can see i'm trying to make a script that parse sendmail log
to find all email from someone to somebody.

--
-----------------------
BerNaC
___________

 
Reply With Quote
 
Arndt Jonasson
Guest
Posts: n/a
 
      01-21-2005

BerNaC <(E-Mail Removed)> writes:
> Arndt Jonasson a formulé ce vendredi :
> > BerNaC <(E-Mail Removed)> writes:
> >> I need to compare two text files and put the maching result in
> >> another
> >> file. Does anybody have an idea?

> >
> > Is anything known about the format of the files and in what ways they
> > can differ? Doing a general comparison and present the differences as
> > a minimal set of individual differences is quite complex. In that case
> > I would choose running the Unix 'diff' program on the files and
> > post-process the output.
> >
> > CPAN has only "compare and stop when finding a difference", it seems.

>
> Well the 2 text files have 1 ID from sendmail log per line, it looks
> like that :
>
> 1U34334Y34
> 1ZRTRG345
> 2SDFSDF17
> and so on
>
> So one file is ID from mail the other one is ID to mail so il they
> match that mean that one mail with this ID has been sent from this guy
> to this guy .
> So as you can see i'm trying to make a script that parse sendmail log
> to find all email from someone to somebody.


That seems to mean that no valuable information is lost if you sort
the files first, which makes the job of comparing them much easier (I'd
say trivial, but maybe that's overstating it). Is that enough for an
idea, or is there some particular aspect of it which you don't know
how to do in Perl?

If the files are not very large, reading in their contents into perl (*)
and sorting there will be OK, otherwise it's better to sort them on disk.

(*) "perl", "Perl", what do I want here? I want a
"case-doesn't-matter-perl"...
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      01-21-2005
BerNaC wrote:
>
> I need to compare two text files and put the maching result in another
> file. Does anybody have an idea?
>
> file1 file2 comprare and match=
> file3
> ------ -----------
> -----------
> 1 3
> 3
> 2 4
> 4
> 3 5
> 5
> 4 6
> 5 7


$ perl -ne'$a?$x{$_}&&print:$x{$_}++;$a||=eof' file1 file2
3
4
5


John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
David Combs
Guest
Posts: n/a
 
      01-25-2005
In article <(E-Mail Removed)>,
Arndt Jonasson <(E-Mail Removed)> wrote:
>

<SNIP>
>
>That seems to mean that no valuable information is lost if you sort
>the files first, which makes the job of comparing them much easier (I'd
>say trivial, but maybe that's overstating it). Is that enough for an
>idea, or is there some particular aspect of it which you don't know
>how to do in Perl?
>
>If the files are not very large, reading in their contents into perl (*)
>and sorting there will be OK, otherwise it's better to sort them on disk.


If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David


PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.




 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      01-25-2005
David Combs <(E-Mail Removed)> wrote in comp.lang.perl.misc:

[...]

> If you're allowed to sort them, then do that, and do "comm"
> on those two.
>
> (It's *exactly* what comm was designed for.)
>
> David
>
>
> PS: Question: does the following conjecture make any sense?:
>
> Oh, by the way, make you sort via the same scheme that comm uses,
> otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.


Conjecture?

No, the remark doesn't make sense. All comm requires is that identical
lines be next to each other. Any sort that considers the whole line will
guarantee that.

My comm man page doesn't even specify the sort to be ascending or descending,
though it does (unnecessarily) specify "lexically".

Anno
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      01-25-2005
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Anno Siegel) wrote:
> David Combs <(E-Mail Removed)> wrote in comp.lang.perl.misc:
>
> [...]
>
> > If you're allowed to sort them, then do that, and do "comm"
> > on those two.
> >
> > (It's *exactly* what comm was designed for.)
> >
> > David
> >
> >
> > PS: Question: does the following conjecture make any sense?:
> >
> > Oh, by the way, make you sort via the same scheme that comm uses,
> > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

>
> Conjecture?
>
> No, the remark doesn't make sense. All comm requires is that identical
> lines be next to each other.


The only way you can ensure that identical lines are next to each other by
sorting the separate files is if the files are identical in the first
place. If you already know that, then you are already done.

In the non-trivial case, comm needs a way to re-align the files once it
encounters a non-indentical lines. In order to do that, the sort order
of the files needs to be done the same way that comm expects.

> My comm man page doesn't even specify the sort to be ascending or
> descending, though it does (unnecessarily) specify "lexically".


Apparently man wasn't good enough, now if you want to know how a
commandline tool works you have read the "info" page too.


from info comm:<<EOF
Before `comm' can be used, the input files must be sorted using the
collating sequence specified by the `LC_COLLATE' locale. If an input
file ends in a non-newline character, a newline is silently appended.
The `sort' command with no options always outputs a file that is
suitable input to `comm'.
EOF

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      01-25-2005
<(E-Mail Removed)> wrote in comp.lang.perl.misc:
> (E-Mail Removed)-berlin.de (Anno Siegel) wrote:
> > David Combs <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> >
> > [...]
> >
> > > If you're allowed to sort them, then do that, and do "comm"
> > > on those two.
> > >
> > > (It's *exactly* what comm was designed for.)
> > >
> > > David
> > >
> > >
> > > PS: Question: does the following conjecture make any sense?:
> > >
> > > Oh, by the way, make you sort via the same scheme that comm uses,
> > > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

> >
> > Conjecture?
> >
> > No, the remark doesn't make sense. All comm requires is that identical
> > lines be next to each other.

>
> The only way you can ensure that identical lines are next to each other by
> sorting the separate files is if the files are identical in the first
> place. If you already know that, then you are already done.


You are so very right. Both files must be sorted according to the same
sort specification, or comm can foul up. Sorry.

Anno
 
Reply With Quote
 
colin_lyse
Guest
Posts: n/a
 
      02-16-2005
In article <(E-Mail Removed)>, BerNaC <(E-Mail Removed)> wrote:
>Hi all,
>
>I need to compare two text files and put the maching result in another
>file. Does anybody have an idea?
>
>file1 file2 comprare and match=
> file3
>------ -----------
> -----------
>1 3
> 3
>2 4
> 4
>3 5
> 5
>4 6
>5 7
>
>
>Thank you
>



{
my %t;
$t{$_} .= "1" for @file1;
$t{$_} .= "2" for @file2;
@matching = grep $t{$_} eq "12", keys %t;
}


no need to sort files first. VERY fast.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Wonder why old 4/3rd lenses are slow on m4/3rd bodies? RichA Digital Photography 8 03-02-2012 11:40 PM
Re: Request: A+ Complete 3rd edition (PDF) [1/1] - A+ Complete 3rd edition.txt (1/1) Spammy Sammy A+ Certification 0 03-04-2005 12:55 PM
RE: Regex matching 3rd word in a line? Harvey Thomas Python 0 10-31-2003 12:03 PM
Regex matching 3rd word in a line? Ian Gil Python 3 10-31-2003 11:43 AM
RE: Regex matching 3rd word in a line? Harvey Thomas Python 0 10-31-2003 09:28 AM



Advertisments