Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How to test if a string is already in a array

Reply
Thread Tools

How to test if a string is already in a array

 
 
Marc Eggenberger
Guest
Posts: n/a
 
      04-21-2005
Hi there.

I havent done perl coding for some years now and forgot a lot so bear
with me ...

I open a text file, read it line by line and do some splitting and
substr to get an emailadress of the line. Now the textfile has some
10k lines and a lot of dublicate mail addresses. I only need each
emailaddress once .. a bit like a select distinct emailadress would be
in SQL.

My though now was to create a array and test if the address is already
in the array and if not push it into the array. I dont need to have
the position of the address in the array ... so I thought of something
like

if(! exists $address_array[$address]=
{
push(@address_array,$address);
}

this of course does not work ...

how would I achive my goal?

Thanks for any help

Marc
 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      04-21-2005
Marc Eggenberger wrote:

> Hi there.
>
> I havent done perl coding for some years now and forgot a lot so bear
> with me ...


> I open a text file, read it line by line and do some splitting and
> substr to get an emailadress of the line. Now the textfile has some
> 10k lines and a lot of dublicate mail addresses. I only need each
> emailaddress once .. a bit like a select distinct emailadress would be
> in SQL.
>
> My though now was to create a array and test if the address is already
> in the array and if not push it into the array. I dont need to have
> the position of the address in the array ... so I thought of something
> like
>
> if(! exists $address_array[$address]=
> {
> push(@address_array,$address);
> }
>
> this of course does not work ...
>
> how would I achive my goal?


my %address_hash;


and in your loop:

$address_hash{ $address } = 1;

( no need for the test thing )


keys %address_hash gives the unique addresses.

BTW: put use strict; use warnings; on top of your script.


--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
 
 
 
Maxim
Guest
Posts: n/a
 
      04-21-2005
>
> if(! exists $address_array[$address])
> {
> push(@address_array,$address);
> }


Checking every time the string in array would yield O(n^2) complexity.
The easiest way (I guess) is to do the following: (which is O(n*log n) )

my %address_hash;

if( ! exists $address_hash{$address} )
{
$address_hash{$address} = 1;
}

my @address_array = keys %address_hash;

Hope this helps

--
Maxim Sloyko
 
Reply With Quote
 
Eric Bohlman
Guest
Posts: n/a
 
      04-21-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) (Marc Eggenberger) wrote in
news:(E-Mail Removed) om:

> I havent done perl coding for some years now and forgot a lot so bear
> with me ...
>
> I open a text file, read it line by line and do some splitting and
> substr to get an emailadress of the line. Now the textfile has some
> 10k lines and a lot of dublicate mail addresses. I only need each
> emailaddress once .. a bit like a select distinct emailadress would be
> in SQL.


When you start saying words like "duplicate" or "distinct," you should be
immediately thinking "hash."

> My though now was to create a array and test if the address is already
> in the array and if not push it into the array. I dont need to have
> the position of the address in the array ... so I thought of something
> like
>
> if(! exists $address_array[$address]=
> {
> push(@address_array,$address);
> }
>
> this of course does not work ...
>
> how would I achive my goal?


my %addresses;
....
$addresses{$address}=1;
....
foreach my $address (keys %addresses) {
#do something with the address
}
 
Reply With Quote
 
marc.eggenberger@itc.alstom.com
Guest
Posts: n/a
 
      04-21-2005
ok ... I changed it .. but when I run my new script it prints adresses
more than once .. why is that?

Here's my code:

#!/usr/bin/perl
use strict;
use warnings;

my $textfile = 'empfaenger.txt';

open(EMPFAENGER, $textfile) || die("Could not open file $textfile");

my @raw_data = <EMPFAENGER>;
my %ad_hash;

foreach my $line (@raw_data)
{
my @fields = split(/ /, $line);
my @fields2 = split(/=/, $fields[6]);
my $address = $fields2[1];
$address = substr($address,1,length($address) - 3);

if(index($address,"domain.ch") > 0)
{
$ad_hash{$address} = 1;
}

foreach my $key(keys(%ad_hash))
{
print $key . "\n";
}
}
close(EMPFAENGER);

 
Reply With Quote
 
marc.eggenberger@itc.alstom.com
Guest
Posts: n/a
 
      04-21-2005
Argl ....
my last foreach shouldn't be in the foreach loop ...

stupid me

 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      04-21-2005
wrote:

> ok ... I changed it .. but when I run my new script it prints adresses
> more than once .. why is that?


because you print the keys inside the loop

> Here's my code:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
>
> my $textfile = 'empfaenger.txt';
>
> open(EMPFAENGER, $textfile) || die("Could not open file $textfile");


open my $fh, $textfile or die "Can't open '$textfile': $!";

$! = why it didn't work, if you don't print that, you get quite a
meaningless error

> my @raw_data = <EMPFAENGER>;


if you do this, you can close now, not after the loop, or:

my %ad_hash;

while ( my $line = <$fh> ) {

> my %ad_hash;
>
> foreach my $line (@raw_data)
> {
> my @fields = split(/ /, $line);
> my @fields2 = split(/=/, $fields[6]);
> my $address = $fields2[1];
> $address = substr($address,1,length($address) - 3);


this probably could be done in a shorter way

> if(index($address,"domain.ch") > 0)
> {
> $ad_hash{$address} = 1;
> }


$ad_hash{ $address } = 1 if index( $address, "domain.ch") > 0;
}

or:

index( $address, "domain.ch" ) > 0 and $ad_hash{ $address } = 1;
}

or:

index( $address, "domain.ch" ) > 0 or next;
$ad_hash{ $address } = 1;
}

then close:

close $fh or die "Can't close '$textfile': $!";


> foreach my $key(keys(%ad_hash))
> {
> print $key . "\n";
> }


You can write the print as:

print "$key\n";

a shorter way to write the print all:

print "$_\n" for keys %add_hash;

or

print map { "$_\n } keys %add_hash;


--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      04-21-2005
Marc Eggenberger <(E-Mail Removed)> wrote:

> I havent done perl coding for some years now and forgot a lot so bear
> with me ...



You are still expected to check the Perl FAQ *before* posting.


> a lot of dublicate mail addresses. I only need each
> emailaddress once



The answer is easy to find once you spell the search term correctly:

perldoc -q duplicate

How can I remove duplicate elements from a list or array?


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      04-21-2005
(E-Mail Removed) <(E-Mail Removed)> wrote:


> if(index($address,"domain.ch") > 0)



What would index() return if

$address = 'domain.ch';

??


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Sherm Pendley
Guest
Posts: n/a
 
      04-21-2005
Tad McClellan wrote:

> Marc Eggenberger <(E-Mail Removed)> wrote:
>
>>a lot of dublicate mail addresses.


> The answer is easy to find once you spell the search term correctly:
>
> perldoc -q duplicate


There's a "dubya" joke in there somewhere...

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
TEST TEST Test...Blah Blah Blah Generalbatguano@pacbell.net Computer Support 6 09-13-2006 01:53 AM
Does anyone has already seen a non mutable String based on std::string Vincent RICHOMME C++ 12 05-29-2006 07:21 AM
TEST TEST TEST Gazwad Computer Support 2 09-05-2003 07:32 PM
test test test test test test test Computer Support 2 07-02-2003 06:02 PM



Advertisments