![]() |
Compare 2 files and put the matching part in a 3rd file
Hi all,
I need to compare two text files and put the maching result in another file. Does anybody have an idea? file1 file2 comprare and match= file3 ------ ----------- ----------- 1 3 3 2 4 4 3 5 5 4 6 5 7 Thank you -- ----------------------- BerNaC ___________ |
Re: Compare 2 files and put the matching part in a 3rd file
BerNaC <bernac001@aol.com> writes: > > I need to compare two text files and put the maching result in another > file. Does anybody have an idea? Is anything known about the format of the files and in what ways they can differ? Doing a general comparison and present the differences as a minimal set of individual differences is quite complex. In that case I would choose running the Unix 'diff' program on the files and post-process the output. CPAN has only "compare and stop when finding a difference", it seems. |
Re: Compare 2 files and put the matching part in a 3rd file
Arndt Jonasson a formulé ce vendredi :
> BerNaC <bernac001@aol.com> writes: >> >> I need to compare two text files and put the maching result in another >> file. Does anybody have an idea? > > Is anything known about the format of the files and in what ways they > can differ? Doing a general comparison and present the differences as > a minimal set of individual differences is quite complex. In that case > I would choose running the Unix 'diff' program on the files and > post-process the output. > > CPAN has only "compare and stop when finding a difference", it seems. Well the 2 text files have 1 ID from sendmail log per line, it looks like that : 1U34334Y34 1ZRTRG345 2SDFSDF17 and so on So one file is ID from mail the other one is ID to mail so il they match that mean that one mail with this ID has been sent from this guy to this guy :). So as you can see i'm trying to make a script that parse sendmail log to find all email from someone to somebody. -- ----------------------- BerNaC ___________ |
Re: Compare 2 files and put the matching part in a 3rd file
BerNaC <bernac001@aol.com> writes: > Arndt Jonasson a formulé ce vendredi : > > BerNaC <bernac001@aol.com> writes: > >> I need to compare two text files and put the maching result in > >> another > >> file. Does anybody have an idea? > > > > Is anything known about the format of the files and in what ways they > > can differ? Doing a general comparison and present the differences as > > a minimal set of individual differences is quite complex. In that case > > I would choose running the Unix 'diff' program on the files and > > post-process the output. > > > > CPAN has only "compare and stop when finding a difference", it seems. > > Well the 2 text files have 1 ID from sendmail log per line, it looks > like that : > > 1U34334Y34 > 1ZRTRG345 > 2SDFSDF17 > and so on > > So one file is ID from mail the other one is ID to mail so il they > match that mean that one mail with this ID has been sent from this guy > to this guy :). > So as you can see i'm trying to make a script that parse sendmail log > to find all email from someone to somebody. That seems to mean that no valuable information is lost if you sort the files first, which makes the job of comparing them much easier (I'd say trivial, but maybe that's overstating it). Is that enough for an idea, or is there some particular aspect of it which you don't know how to do in Perl? If the files are not very large, reading in their contents into perl (*) and sorting there will be OK, otherwise it's better to sort them on disk. (*) "perl", "Perl", what do I want here? I want a "case-doesn't-matter-perl"... |
Re: Compare 2 files and put the matching part in a 3rd file
BerNaC wrote:
> > I need to compare two text files and put the maching result in another > file. Does anybody have an idea? > > file1 file2 comprare and match= > file3 > ------ ----------- > ----------- > 1 3 > 3 > 2 4 > 4 > 3 5 > 5 > 4 6 > 5 7 $ perl -ne'$a?$x{$_}&&print:$x{$_}++;$a||=eof' file1 file2 3 4 5 John -- use Perl; program fulfillment |
Re: Compare 2 files and put the matching part in a 3rd file
In article <yzdvf9qkcn7.fsf@invalid.net>,
Arndt Jonasson <do-not-use@invalid.net> wrote: > <SNIP> > >That seems to mean that no valuable information is lost if you sort >the files first, which makes the job of comparing them much easier (I'd >say trivial, but maybe that's overstating it). Is that enough for an >idea, or is there some particular aspect of it which you don't know >how to do in Perl? > >If the files are not very large, reading in their contents into perl (*) >and sorting there will be OK, otherwise it's better to sort them on disk. If you're allowed to sort them, then do that, and do "comm" on those two. (It's *exactly* what comm was designed for.) David PS: Question: does the following conjecture make any sense?: Oh, by the way, make you sort via the same scheme that comm uses, otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc. |
Re: Compare 2 files and put the matching part in a 3rd file
David Combs <dkcombs@panix.com> wrote in comp.lang.perl.misc:
[...] > If you're allowed to sort them, then do that, and do "comm" > on those two. > > (It's *exactly* what comm was designed for.) > > David > > > PS: Question: does the following conjecture make any sense?: > > Oh, by the way, make you sort via the same scheme that comm uses, > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc. Conjecture? No, the remark doesn't make sense. All comm requires is that identical lines be next to each other. Any sort that considers the whole line will guarantee that. My comm man page doesn't even specify the sort to be ascending or descending, though it does (unnecessarily) specify "lexically". Anno |
Re: Compare 2 files and put the matching part in a 3rd file
anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote:
> David Combs <dkcombs@panix.com> wrote in comp.lang.perl.misc: > > [...] > > > If you're allowed to sort them, then do that, and do "comm" > > on those two. > > > > (It's *exactly* what comm was designed for.) > > > > David > > > > > > PS: Question: does the following conjecture make any sense?: > > > > Oh, by the way, make you sort via the same scheme that comm uses, > > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc. > > Conjecture? > > No, the remark doesn't make sense. All comm requires is that identical > lines be next to each other. The only way you can ensure that identical lines are next to each other by sorting the separate files is if the files are identical in the first place. If you already know that, then you are already done. In the non-trivial case, comm needs a way to re-align the files once it encounters a non-indentical lines. In order to do that, the sort order of the files needs to be done the same way that comm expects. > My comm man page doesn't even specify the sort to be ascending or > descending, though it does (unnecessarily) specify "lexically". Apparently man wasn't good enough, now if you want to know how a commandline tool works you have read the "info" page too. from info comm:<<EOF Before `comm' can be used, the input files must be sorted using the collating sequence specified by the `LC_COLLATE' locale. If an input file ends in a non-newline character, a newline is silently appended. The `sort' command with no options always outputs a file that is suitable input to `comm'. EOF Xho -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB |
Re: Compare 2 files and put the matching part in a 3rd file
<xhoster@gmail.com> wrote in comp.lang.perl.misc:
> anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote: > > David Combs <dkcombs@panix.com> wrote in comp.lang.perl.misc: > > > > [...] > > > > > If you're allowed to sort them, then do that, and do "comm" > > > on those two. > > > > > > (It's *exactly* what comm was designed for.) > > > > > > David > > > > > > > > > PS: Question: does the following conjecture make any sense?: > > > > > > Oh, by the way, make you sort via the same scheme that comm uses, > > > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc. > > > > Conjecture? > > > > No, the remark doesn't make sense. All comm requires is that identical > > lines be next to each other. > > The only way you can ensure that identical lines are next to each other by > sorting the separate files is if the files are identical in the first > place. If you already know that, then you are already done. You are so very right. Both files must be sorted according to the same sort specification, or comm can foul up. Sorry. Anno |
Re: Compare 2 files and put the matching part in a 3rd file
In article <mn.abb67d519ce75a90.18030@aol.com>, BerNaC <bernac001@aol.com> wrote:
>Hi all, > >I need to compare two text files and put the maching result in another >file. Does anybody have an idea? > >file1 file2 comprare and match= > file3 >------ ----------- > ----------- >1 3 > 3 >2 4 > 4 >3 5 > 5 >4 6 >5 7 > > >Thank you > { my %t; $t{$_} .= "1" for @file1; $t{$_} .= "2" for @file2; @matching = grep $t{$_} eq "12", keys %t; } no need to sort files first. VERY fast. |
| All times are GMT. The time now is 03:29 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.