Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > newbie question

Reply
Thread Tools

newbie question

 
 
scarlet
Guest
Posts: n/a
 
      12-13-2009
Hello,
I have two files : file A.tok and file B.lst
File A contains a hash table of words an ther frequency
File B contains a list of words
I have to generate a file C that contains the list of words form file A AND
if a word form file A matches a word from the list in file B, there has to
come "VZ" next to those specific words in file C.
How can I do this ???

thank you

 
Reply With Quote
 
 
 
 
scarlet
Guest
Posts: n/a
 
      12-13-2009
Tad,
that is just the problem
i don't know how to write program.

greetings,


"Tad McClellan" <(E-Mail Removed)> schreef in bericht
news:(E-Mail Removed)...
> scarlet <(E-Mail Removed)> wrote:
>
>
>> Subject: newbie question

>
> Please put the subject of your article in the Subject of your article.
>
>
>> I have two files : file A.tok and file B.lst
>> File A contains a hash table of words an ther frequency
>> File B contains a list of words
>> I have to generate a file C that contains the list of words form file A
>> AND
>> if a word form file A matches a word from the list in file B, there has
>> to
>> come "VZ" next to those specific words in file C.
>> How can I do this ???

>
>
> I think you may need to write a program to do this.
>
> If you get stuck, then post what you've written so far,
> and we will help you fix it.
>
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"


 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      12-13-2009
"scarlet" <(E-Mail Removed)> wrote:
[Subject: newbie question]

The first half of your subject is irrelevant and actually may cause some
people to score down your posting.
The second half is redundant because most initial postings involve a
question.

>I have two files : file A.tok and file B.lst
>File A contains a hash table of words an ther frequency
>File B contains a list of words
>I have to generate a file C that contains the list of words form file A AND
>if a word form file A matches a word from the list in file B, there has to
>come "VZ" next to those specific words in file C.
>How can I do this ???


What have you tried so far? Where are you stuck? Do you have a problem
with designing the algorithm? Or do you have a problem with a specific
function or feature? Or isn't your code doing what it is supposed to do?

Actually, your question smells a little bit like homework....

jue
 
Reply With Quote
 
scarlet
Guest
Posts: n/a
 
      12-13-2009
This is what I have already:

$file="VZ.lst";
open(FILE,"$file");
while ($lijn=<FILE>){


@words=split(/\n\,$lijn);
foreach $element(@words){


$in="krantenartikel.tok";
open(IN,"$in");
while ($lijn1=<IN>){
chomp $lijn1;
($token,$freq)=split(/\t/,$lijn1);
}


if ($element=$token){
$freq="VZ";
}
else {
$freq="";}
}
}
$out='#krantenartikel.vz#';
open(OUT,">$out");
print OUT "$token\t$freq\n";

First, I open the .lst file and define the array it contains. Then, I open
the other file and make a table of the words and their frequency. I want to
make a new file, "krantenartikel.vz", that contains the elements I mentioned
earlier.

I know the command "if($element=$token) is wrong, but my problem is that I
don't know how to do it otherwise, so it could work.
"Jürgen Exner" <(E-Mail Removed)> schreef in bericht
news(E-Mail Removed)...
> "scarlet" <(E-Mail Removed)> wrote:
> [Subject: newbie question]
>
> The first half of your subject is irrelevant and actually may cause some
> people to score down your posting.
> The second half is redundant because most initial postings involve a
> question.
>
>>I have two files : file A.tok and file B.lst
>>File A contains a hash table of words an ther frequency
>>File B contains a list of words
>>I have to generate a file C that contains the list of words form file A
>>AND
>>if a word form file A matches a word from the list in file B, there has to
>>come "VZ" next to those specific words in file C.
>>How can I do this ???

>
> What have you tried so far? Where are you stuck? Do you have a problem
> with designing the algorithm? Or do you have a problem with a specific
> function or feature? Or isn't your code doing what it is supposed to do?
>
> Actually, your question smells a little bit like homework....
>
> jue


 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      12-13-2009
[Do not stealth-CC me, I happen to read the NGs I am posting in]
[Do not top-post, that is poor style; trying to repair]

"scarlet" <(E-Mail Removed)> wrote:
>"Jürgen Exner" <(E-Mail Removed)> schreef in bericht
>news(E-Mail Removed).. .
>> "scarlet" <(E-Mail Removed)> wrote:
>> [Subject: newbie question]
>>
>> The first half of your subject is irrelevant and actually may cause some
>> people to score down your posting.
>> The second half is redundant because most initial postings involve a
>> question.
>>
>>>I have two files : file A.tok and file B.lst
>>>File A contains a hash table of words an ther frequency
>>>File B contains a list of words
>>>I have to generate a file C that contains the list of words form file A
>>>AND
>>>if a word form file A matches a word from the list in file B, there has to
>>>come "VZ" next to those specific words in file C.
>>>How can I do this ???

>>
>> What have you tried so far? Where are you stuck? Do you have a problem
>> with designing the algorithm? Or do you have a problem with a specific
>> function or feature? Or isn't your code doing what it is supposed to do?
>>
>> Actually, your question smells a little bit like homework....
>>
>> jue


>This is what I have already:
>


Missing
use strict; use warnings;

>$file="VZ.lst";
>open(FILE,"$file");


You should always test if an open() was successful:
open(FILE,"$file") or die("Could not open $file because $!\n");

>while ($lijn=<FILE>){
>@words=split(/\n\,$lijn);


This line causes a syntax error. I think you meant
@words=split(/\n/,$lijn);
instead.

But I don't think it does what you meant it do to.
You are reading the file line by line. That means there is exactly one
newline at the very end of each string. Not much sense in splitting the
line at the very end. I think all you want here is a plain chomp() on
the line itself. Or if each line can contain multiple words then a
split() on white space or whatever separates those wordsm but not on
newline.

>foreach $element(@words){
>$in="krantenartikel.tok";


Proper indentation makes the scope of a loop and in particular nested
loops much, much easier to recognize.

>open(IN,"$in");


You should always test if an open() was successful:
open(IN,"$in") or die("Could not open $in because $!\n");

>while ($lijn1=<IN>){
>chomp $lijn1;


Good.

>($token,$freq)=split(/\t/,$lijn1);


Nice.

>}
>if ($element=$token){


As you noted yourself this is an assignment and certainly not what you
want. Even ($element==$token) would be wrong because it would compare
the numerical values of those two strings.
To compare the textual value of two scalars use
($element eq $token)

>$freq="VZ";
>} else {
>$freq="";}
>}
>}
>$out='#krantenartikel.vz#';
>open(OUT,">$out");


You should always test if an open() was successful:
open(FILE,"$out") or die("Could not open $out because $!\n");

>print OUT "$token\t$freq\n";
>
>First, I open the .lst file and define the array it contains. Then, I open
>the other file and make a table of the words and their frequency. I want to
>make a new file, "krantenartikel.vz", that contains the elements I mentioned
>earlier.


There are a few more conceptual and algorithmic problems with your code.

The most obvious issue is that you are printing only one single item to
your output file. This is because the outermost while() ends before the
print(), so the print will only be called exactly once at the very end
of the program. Had you used proper indentation then this would have
been obviuos (I actually ran your code through indent-region in emacs).

Same problem with the if(). It is executed AFTER the innermost while()
loop has already terminated, thus you are testing only against the very
last line of the krantenartikel.tok file.

Both issues can be fixed with little effort, but your code is also very
inefficient: for each line in VZ.lst you are looping through the while
krantenartikel.tok file. That is very costly, with O(n*m) it's a square
algorithm. It would be easy enough to do much better than that by just
reading all of krantenartikel.tok into memory once and then loop over
the in-memory copy.

However Perl has s data structure that makes looking for "does X exist"
really trivial and very very fast: a hash.

So, the revised plan is:
- create a hash where the tokens from krantenartikel.tok are the keys
- open the output file
- open VZ.lst and for each word in that file
check if it exists in the hash
and print the proper output line
- close and cleanup everything

All together I am getting this code which compiles but which I couldn't
test further because I don't have any test data:

use strict; use warnings;

my %tokens;

my $in="krantenartikel.tok";
open(IN,"$in") or die("Cannot open $in: $!\n");
while (my $lijn1=<IN>){
chomp $lijn1;
my ($tok,$freq)=split(/\t/,$lijn1);
$tokens{$tok} = $freq;
#we don't really need to store the frequency, but because we
#need some dummy value anyway we can just as well use that one
}
close(IN);

my $out='#krantenartikel.vz#';
open(OUT,">$out") or die("Cannot open $out: $!\n");
my $file="VZ.lst";
open(FILE,"$file") or die("Cannot open $file: $!\n");

while (my $lijn=<FILE>){
#I am assuming VZ.lst contains one word per line
chomp $lijn;
if (exists($tokens{$lijn})){
print OUT "$lijn\tVZ\n";
} else {
print OUT "$lijn\n";
}
}

close FILE;
close OUT or die("Problem closing $out: $!\n");



jue
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      12-13-2009
On Sun, 13 Dec 2009 13:36:51 +0100, "scarlet" <(E-Mail Removed)> wrote:

>Hello,
>I have two files : file A.tok and file B.lst
>File A contains a hash table of words an ther frequency
>File B contains a list of words
>I have to generate a file C that contains the list of words form file A AND
>if a word form file A matches a word from the list in file B, there has to
>come "VZ" next to those specific words in file C.
>How can I do this ???
>
>thank you


-sln
-----
the out:
d
e
f
cVZ
d
aVZ
z
bVZ aVZ
-----

use strict;
use warnings;

my $tokstring = "a afreq \n b bfreq \n c cfreq ";
my $bstring = "d \ne \nf \nc \nd \na \nz \nb a\n ";

open my $tfile, '<', \$tokstring or die "can't open tok file: $!";
my %toks = map {/\s*([^\s]+)\s+([^\s]*)/, defined $1 ? ($1,$2) : ()} <$tfile>;
close $tfile;

open my $bfile, '<', \$bstring or die "can't open bstr file: $!";
while (<$bfile>)
{
s/([^\s]+)(?=\s+)/exists $toks{$1} ? $1.'VZ': $1/ge;
print;
}
close $bfile;

 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      12-13-2009
On Dec 13, 7:36*am, "scarlet" <(E-Mail Removed)> wrote:
> Hello,
> I have two files : file A.tok and file B.lst


It would be helpful if you posted a sample of each file, so we would
know exactly what the files look like.

CC.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
VONAGE Newbie w/newbie question New_kid@nowhere.new VOIP 0 08-11-2007 01:40 PM
another newbie question from another newbie.... Lee UK VOIP 4 05-17-2005 04:10 PM
newbie: cisco vlan newbie question No Spam Cisco 3 06-07-2004 10:02 AM
dumb newbie question (or newbie dumb question) Jerry C. Perl Misc 8 11-23-2003 04:11 AM
Newbie! I'm a newbie! What's wrong with this program? Id0x Python 4 07-20-2003 11:40 PM



Advertisments