Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Trying to compare two files and output it into a third file.

Reply
Thread Tools

Trying to compare two files and output it into a third file.

 
 
Morris Keesan
Guest
Posts: n/a
 
      07-30-2009
On Thu, 30 Jul 2009 07:27:58 -0400, chutsu <(E-Mail Removed)> wrote:

>
>> >> Clarification: It looks like you only want to find a match between
>> >> the two files if the matching base sequence is on the same line
>> >> number in both files? *That appears to be the intent of your code.

>
> Yes I'm trying to match the base sequence, however the match does not
> necessary mean
> they are both on the same line number. So my code was to:
> - read the base sequence from the first file
> - store that in some variable (ie tag_1)
> - read the second file to see if a match is found
> - if found printf match found
> - and loops until there are no more base sequence in file 1

....
> These files are very large, about 120,000 lines long,


Honestly, I don't think C is the correct tool for this problem.
At the very least, you should sit down and think about this
algorithmically, independent of any programming language.

If the files are unsorted, then for each line of file1, you'll
be reading the entire contents of file2 if there's no match,
and on average half of file2 if there is a match. This means
your algorithm is O(n squared): if half of the lines in file1
have a match in file2, then you're reading
(60,000 * 60,000) + (60,000 * 120,000) lines from file2
( approximately 11 BILLION lines )

If you sort both files, then you can keep the files synchronized
while you're reading them, advancing file2 to keep up with file1.
Also, consider extracting just the base sequences from each file,
then using sort and comm (Unix programs) to find the base sequences
that are in common. Then you can go back and find those matching
sequences in the original files and extract the counts from them.
 
Reply With Quote
 
 
 
 
jameskuyper
Guest
Posts: n/a
 
      07-30-2009
Morris Keesan wrote:
> On Thu, 30 Jul 2009 07:27:58 -0400, chutsu <(E-Mail Removed)> wrote:
>
> >
> >> >> Clarification: It looks like you only want to find a match between
> >> >> the two files if the matching base sequence is on the same line
> >> >> number in both files? ᅵThat appears to be the intent of your code.

> >
> > Yes I'm trying to match the base sequence, however the match does not
> > necessary mean
> > they are both on the same line number. So my code was to:
> > - read the base sequence from the first file
> > - store that in some variable (ie tag_1)
> > - read the second file to see if a match is found
> > - if found printf match found
> > - and loops until there are no more base sequence in file 1

> ...
> > These files are very large, about 120,000 lines long,

>
> Honestly, I don't think C is the correct tool for this problem.
> At the very least, you should sit down and think about this
> algorithmically, independent of any programming language.
>
> If the files are unsorted, then for each line of file1, you'll
> be reading the entire contents of file2 if there's no match,
> and on average half of file2 if there is a match. This means
> your algorithm is O(n squared): if half of the lines in file1
> have a match in file2, then you're reading
> (60,000 * 60,000) + (60,000 * 120,000) lines from file2
> ( approximately 11 BILLION lines )
>
> If you sort both files, then you can keep the files synchronized
> while you're reading them, advancing file2 to keep up with file1.
> Also, consider extracting just the base sequences from each file,
> then using sort and comm (Unix programs) to find the base sequences
> that are in common. Then you can go back and find those matching
> sequences in the original files and extract the counts from them.


If he's able to use Unix tools and willing to sort the input file,
then I think that the 'join' command does pretty much exactly what he
wants done.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to pass a third argument to compare function? Lambda C++ 3 06-24-2008 07:17 AM
XSLT Compare two documents and output differences super.raddish@gmail.com XML 4 06-26-2007 11:54 AM
how to compare value of two fileds and based on that insert value into third fileds Tradeorganizer ASP General 5 01-31-2007 04:51 AM
How to compare two SOAP Envelope or two Document or two XML files GenxLogic Java 3 12-06-2006 08:41 PM
Having a Problem Trying to call two html iles to be loaded into two different frames Jofio Javascript 3 10-09-2005 09:50 AM



Advertisments