Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Help: Duplicate and Unique Lines Problem

Reply
Thread Tools

Help: Duplicate and Unique Lines Problem

 
 
Amy Lee
Guest
Posts: n/a
 
      09-29-2008
Hello,

Dose perl has functions like the UNIX command sort and uniq can output
duplicate lines and unique lines?

There's my codes, what if I run this it will output many lines but I just
want to save the duplicate line just once and unique line.

while (<>)
{
if (/^\>.*/)
{
s/\>//g;
if (/\w+\s\w+\s(.*)\smiR.*\s\w+/g)
{
print "$1\n";
}
}
}

The output is like this:

.......
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Homo sapiens
Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis elegans
Mus musculus
Mus musculus
Mus musculus
Mus musculus
Mus musculus
Mus musculus
Mus musculus
Arabidopsis thaliana
.........

And mu purpose is the output should be like that:
........
Homo sapiens
Caenorhabditis elegans
Mus musculus
Arabidopsis thaliana
........

Thank you very much~

Best Regards,

Amy Lee
 
Reply With Quote
 
 
 
 
Peter Makholm
Guest
Posts: n/a
 
      09-29-2008
Amy Lee <(E-Mail Removed)> writes:

> Dose perl has functions like the UNIX command sort and uniq can output
> duplicate lines and unique lines?


There is a uniq function in the List::MoreUtils module otherwise the
standard way is to use the printed stings as keys in a hash to mark
which lines is allready printed.

//Makholm

 
Reply With Quote
 
 
 
 
Amy Lee
Guest
Posts: n/a
 
      09-29-2008
On Mon, 29 Sep 2008 14:17:16 +0100, bugbear wrote:

> Amy Lee wrote:
>> Hello,
>>
>> Dose perl has functions like the UNIX command sort and uniq can output
>> duplicate lines and unique lines?
>>
>> There's my codes, what if I run this it will output many lines but I just
>> want to save the duplicate line just once and unique line.
>>
>> while (<>)
>> {
>> if (/^\>.*/)
>> {
>> s/\>//g;
>> if (/\w+\s\w+\s(.*)\smiR.*\s\w+/g)
>> {
>> print "$1\n";
>> }
>> }
>> }

>
> If you're running on *NIX, just pipe your script to sort/uniq and you're done.
>
> BugBear

Thank you. But I hope make it more convenient so I could put codes into
another perl script.

Regards,

Amy Lee
 
Reply With Quote
 
Amy Lee
Guest
Posts: n/a
 
      09-29-2008
On Mon, 29 Sep 2008 15:28:51 +0200, Peter Makholm wrote:

> Amy Lee <(E-Mail Removed)> writes:
>
>> Dose perl has functions like the UNIX command sort and uniq can output
>> duplicate lines and unique lines?

>
> There is a uniq function in the List::MoreUtils module otherwise the
> standard way is to use the printed stings as keys in a hash to mark
> which lines is allready printed.
>
> //Makholm

Hello,

I use this module List::MoreUtils to have a process but still failed and
output just the last line, here's my codes.

use List::MoreUtils qw(any all none notall true false firstidx first_index
lastidx last_index insert_after insert_after_string
apply after after_incl before before_incl indexes
firstval first_value lastval last_value each_array
each_arrayref pairwise natatime mesh zip uniq minmax);

$file = $ARGV[0];
open FILE, '<', "$file";
while (<FILE>)
{
@raw_list = split /\n/, $_;
}
@list = uniq @raw_list;
foreach $single (@list)
{
print "$single\n";
}

Thank you very much.

Regards,

Amy
 
Reply With Quote
 
Bart Lateur
Guest
Posts: n/a
 
      09-29-2008
Amy Lee wrote:

>Dose perl has functions like the UNIX command sort and uniq can output
>duplicate lines and unique lines?


Perl has a built in sort, and unique can be implemented with a few lines
of code. They're even in the official FAQ:

perlfaq4: How can I remove duplicate elements from a list or
array?

http://perldoc.perl.org/perlfaq4.htm...st-or-array%3f


--
Bart.
 
Reply With Quote
 
Amy Lee
Guest
Posts: n/a
 
      09-29-2008
On Mon, 29 Sep 2008 16:54:15 +0200, Bart Lateur wrote:

> Amy Lee wrote:
>
>>Dose perl has functions like the UNIX command sort and uniq can output
>>duplicate lines and unique lines?

>
> Perl has a built in sort, and unique can be implemented with a few lines
> of code. They're even in the official FAQ:
>
> perlfaq4: How can I remove duplicate elements from a list or
> array?
>
> http://perldoc.perl.org/perlfaq4.htm...st-or-array%3f

Thanks, but my problem seems a little strange. Because I don't know if
uniq function can process list such as @list. When I use uniq to process
it I can just see the last line of the file.

Amy
 
Reply With Quote
 
Amy Lee
Guest
Posts: n/a
 
      09-29-2008
On Mon, 29 Sep 2008 16:54:15 +0200, Bart Lateur wrote:

> Amy Lee wrote:
>
>>Dose perl has functions like the UNIX command sort and uniq can output
>>duplicate lines and unique lines?

>
> Perl has a built in sort, and unique can be implemented with a few lines
> of code. They're even in the official FAQ:
>
> perlfaq4: How can I remove duplicate elements from a list or
> array?
>
> http://perldoc.perl.org/perlfaq4.htm...st-or-array%3f

Here's the codes:

open FILE, '<', "$file";
while (<FILE>)
{
@raw_list = split /\n/, $_;
@list = uniq (@raw_list);
print "@list\n";
}
It seems that the uniq does nothing! I don't know the reason.

Amy
 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      09-29-2008

Quoth Amy Lee <(E-Mail Removed)>:
> On Mon, 29 Sep 2008 15:28:51 +0200, Peter Makholm wrote:
>
> > Amy Lee <(E-Mail Removed)> writes:
> >
> >> Dose perl has functions like the UNIX command sort and uniq can output
> >> duplicate lines and unique lines?

> >
> > There is a uniq function in the List::MoreUtils module otherwise the
> > standard way is to use the printed stings as keys in a hash to mark
> > which lines is allready printed.

>
> I use this module List::MoreUtils to have a process but still failed and
> output just the last line, here's my codes.
>
> use List::MoreUtils qw(any all none notall true false firstidx first_index
> lastidx last_index insert_after insert_after_string
> apply after after_incl before before_incl indexes
> firstval first_value lastval last_value each_array
> each_arrayref pairwise natatime mesh zip uniq
> minmax);


Don't import more than you need.

use List::MoreUtils qw(uniq);

> $file = $ARGV[0];


Your script should start with

use warnings;
use strict;

which will mean you need 'my' on all your variables

my $file = $ARGV[0];

> open FILE, '<', "$file";


Use lexical filehandles.
Always check the return value of open.
Don't quote things when you don't need to.

open my $FILE, '<', $file
or die "can't read '$file': $!";

> while (<FILE>)
> {
> @raw_list = split /\n/, $_;


while (<FILE>) reads the file one line at a time. You then split that
line on /\n/ (which won't do anything except remove the trailing
newline, since it's just a single line) and replace the contents of
@raw_line with the result. This means @raw_list never has more than one
element (the last line read).

Since you want to keep all the lines, either push them onto the array:

while (<$FILE>) {
chomp; # remove the newline
push @raw_list, $_;
}

or, better, use <> in list context, which returns all the lines:

my @raw_list = <$FILE>;
chomp @raw_list; # remove all the newlines at once

> }
> @list = uniq @raw_list;
> foreach $single (@list)
> {
> print "$single\n";


Ben

--
Outside of a dog, a book is a man's best friend.
Inside of a dog, it's too dark to read.
http://www.velocityreviews.com/forums/(E-Mail Removed) Groucho Marx
 
Reply With Quote
 
Amy Lee
Guest
Posts: n/a
 
      09-29-2008
On Mon, 29 Sep 2008 16:29:26 +0100, Ben Morrow wrote:

>
> Quoth Amy Lee <(E-Mail Removed)>:
>> On Mon, 29 Sep 2008 15:28:51 +0200, Peter Makholm wrote:
>>
>> > Amy Lee <(E-Mail Removed)> writes:
>> >
>> >> Dose perl has functions like the UNIX command sort and uniq can output
>> >> duplicate lines and unique lines?
>> >
>> > There is a uniq function in the List::MoreUtils module otherwise the
>> > standard way is to use the printed stings as keys in a hash to mark
>> > which lines is allready printed.

>>
>> I use this module List::MoreUtils to have a process but still failed and
>> output just the last line, here's my codes.
>>
>> use List::MoreUtils qw(any all none notall true false firstidx first_index
>> lastidx last_index insert_after insert_after_string
>> apply after after_incl before before_incl indexes
>> firstval first_value lastval last_value each_array
>> each_arrayref pairwise natatime mesh zip uniq
>> minmax);

>
> Don't import more than you need.
>
> use List::MoreUtils qw(uniq);
>
>> $file = $ARGV[0];

>
> Your script should start with
>
> use warnings;
> use strict;
>
> which will mean you need 'my' on all your variables
>
> my $file = $ARGV[0];
>
>> open FILE, '<', "$file";

>
> Use lexical filehandles.
> Always check the return value of open.
> Don't quote things when you don't need to.
>
> open my $FILE, '<', $file
> or die "can't read '$file': $!";
>
>> while (<FILE>)
>> {
>> @raw_list = split /\n/, $_;

>
> while (<FILE>) reads the file one line at a time. You then split that
> line on /\n/ (which won't do anything except remove the trailing
> newline, since it's just a single line) and replace the contents of
> @raw_line with the result. This means @raw_list never has more than one
> element (the last line read).
>
> Since you want to keep all the lines, either push them onto the array:
>
> while (<$FILE>) {
> chomp; # remove the newline
> push @raw_list, $_;
> }
>
> or, better, use <> in list context, which returns all the lines:
>
> my @raw_list = <$FILE>;
> chomp @raw_list; # remove all the newlines at once
>
>> }
>> @list = uniq @raw_list;
>> foreach $single (@list)
>> {
>> print "$single\n";

>
> Ben

Thank you very much. I have solved this one by your method.

Best Regards,

Amy
 
Reply With Quote
 
RedGrittyBrick
Guest
Posts: n/a
 
      09-29-2008

Amy Lee wrote:
> Hello,
>
> Dose perl has functions like the UNIX command sort and uniq can output
> duplicate lines and unique lines?
>
> There's my codes, what if I run this it will output many lines but I just
> want to save the duplicate line just once and unique line.
>


#!/usr/bin/perl
use strict;
use warnings;

my %seen;
for(sort <DATA>) {
chomp;
if (/(\w+\s+\w+\s+)/) {
print "$1\n" unless $seen{$1}++;
}
}


__END__
Homo sapiens E
Homo sapiens D
Arabidopsis thaliana S
Homo sapiens G
Mus musculus P
Mus musculus Q
Mus musculus R
Homo sapiens F
Caenorhabditis elegans H
Caenorhabditis elegans I
Homo sapiens A
Homo sapiens B
Homo sapiens C
Caenorhabditis elegans J
Mus musculus L
Mus musculus O
Mus musculus M
Mus musculus N
Caenorhabditis elegans K

--
RGB
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a unique method in python to unique a list? Token Type Python 9 09-09-2012 02:13 PM
how to put unique lines from regexped file beny 18241 Ruby 5 12-20-2009 07:04 PM
list question... unique values in all possible unique spots ToshiBoy Python 6 08-12-2008 05:01 AM
VOIP Provider - BYOD - Multiple Unique Lines - Suggestions Dan Foxley VOIP 0 07-25-2005 05:14 PM
Tweak xsl to eliminate duplicate data and blank lines Luke Airig XML 0 12-23-2003 10:06 PM



Advertisments