Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > sorting text

Reply
Thread Tools

sorting text

 
 
jamasd@hotmail.com
Guest
Posts: n/a
 
      06-16-2004
Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

Thank you very much.
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-16-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Here is a sample of my data (each column is separated by tabs):
>
> 1234123 jaesdf ytkyk 345234
> 1264345 ghgfdf ghjhg 657658
> 3456765 sdasdf ytkyk 456543
> 1231232 assffg werwe 123454
> 5447454 asdqfr ytkyk 254364
>
> I am interested in creating a hash with two of the elements in the
> list ("ytkyk" and "ghjhg"). I would like to create a program to read
> only the third colomn and print the line (row) if it contains one of
> the latter items. Can anyone help me write a program. Here is what I
> have so far and I would like to create a more efficient program (I am
> going to use it for writing a larger program later):
>
> open( File, '<', 'file.txt' ) or die "$!\n";
> while ( <File> ) {
> next unless ( index($_, 'ytkyk') >= 0 );
> next unless ( index($_, 'ghjhg') >= 0 );
> print;
> }
> close( File );


What makes you believe that what you have is not efficient?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      06-16-2004
Gunnar Hjalmarsson wrote:

> (E-Mail Removed) wrote:
>
>> Here is a sample of my data (each column is separated by tabs):
>>
>> 1234123 jaesdf ytkyk 345234
>> 1264345 ghgfdf ghjhg 657658
>> 3456765 sdasdf ytkyk 456543
>> 1231232 assffg werwe 123454
>> 5447454 asdqfr ytkyk 254364
>>
>> I am interested in creating a hash with two of the elements in the
>> list ("ytkyk" and "ghjhg"). I would like to create a program to read
>> only the third colomn and print the line (row) if it contains one of
>> the latter items. Can anyone help me write a program. Here is what I
>> have so far and I would like to create a more efficient program (I am
>> going to use it for writing a larger program later):
>>
>> open( File, '<', 'file.txt' ) or die "$!\n";


my $filename = 'file.txt';
open my $fh, $filename or die "Can't open '$filename' for reading:$!";

>> while ( <File> ) {


while ( <$fh> ) {

>> next unless ( index($_, 'ytkyk') >= 0 );

next unless index($_, 'ytkyk');

The >= 0 test can be replaced, since it's clear it's not the first
position. Even better, (I guess) check the string at the exact position

>> next unless ( index($_, 'ghjhg') >= 0 ); print;
>> }
>> close( File );


close $fh or die "Can't close '$filename' after reading: $!";

> What makes you believe that what you have is not efficient?


Maybe the OP forgot to explain the "sorting" part .

--
John MexIT: http://johnbokma.com/mexit/
personal page: http://johnbokma.com/
Experienced Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
 
Reply With Quote
 
Web Surfer
Guest
Posts: n/a
 
      06-16-2004
[This followup was posted to comp.lang.perl.misc]

In article <(E-Mail Removed) >,
(E-Mail Removed) says...
> Here is a sample of my data (each column is separated by tabs):
>
> 1234123 jaesdf ytkyk 345234
> 1264345 ghgfdf ghjhg 657658
> 3456765 sdasdf ytkyk 456543
> 1231232 assffg werwe 123454
> 5447454 asdqfr ytkyk 254364
>
> I am interested in creating a hash with two of the elements in the
> list ("ytkyk" and "ghjhg"). I would like to create a program to read
> only the third colomn and print the line (row) if it contains one of
> the latter items. Can anyone help me write a program. Here is what I
> have so far and I would like to create a more efficient program (I am
> going to use it for writing a larger program later):
>
> open( File, '<', 'file.txt' ) or die "$!\n";
> while ( <File> ) {
> next unless ( index($_, 'ytkyk') >= 0 );
> next unless ( index($_, 'ghjhg') >= 0 );
> print;
> }
> close( File );
>
> Thank you very much.
>


### Try this untested code ###

#!/usr/bin/perl
use strict;
use warnings;

my ( $buffer , @fields , $filename , %hash1 );

$filename = "file.txt";
open(INPUT,"<$filename") or
die("Can't open file \"$filename\" : $!\n");

%hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

while ( $buffer = <INPUT> ) {
chomp $buffer;
@fields = split(/\t+/,$buffer);
if ( 2 < @fields ) { # Ignore if less than 3 fields
next;
}
unless ( exists $hash1{$fields[2]} ) {
next;
}
print "$buffer\n";
}
close INPUT;
 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      06-16-2004
Web Surfer wrote:

> [This followup was posted to comp.lang.perl.misc]
>
> In article <(E-Mail Removed) >,
> (E-Mail Removed) says...
>
>>Here is a sample of my data (each column is separated by tabs):
>>
>>1234123 jaesdf ytkyk 345234


> while ( $buffer = <INPUT> ) {
> chomp $buffer;


why?, now you have to add back the \n in the print

> @fields = split(/\t+/,$buffer);
> if ( 2 < @fields ) { # Ignore if less than 3 fields
> next;


silly, the OP never specified that could happen. It are 4 fields btw, so
I would test for inequality, not less than..
Don't see any point in putting the constant to the left, btw. Silly C
coding convention IIRC.

--
John MexIT: http://johnbokma.com/mexit/
personal page: http://johnbokma.com/
Experienced Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
 
Reply With Quote
 
Jeff 'japhy' Pinyan
Guest
Posts: n/a
 
      06-16-2004
On Wed, 16 Jun 2004, John Bokma wrote:

>Web Surfer wrote:
>
>> if ( 2 < @fields ) { # Ignore if less than 3 fields

>
>silly, the OP never specified that could happen. It are 4 fields btw, so
>I would test for inequality, not less than..


Because it was the *third* field that contained the string the OP is
searching for. Thus, skip any line that doesn't have enough fields.

>Don't see any point in putting the constant to the left, btw. Silly C
>coding convention IIRC.


There's nothing wrong with it. It's not "silly". There is a point to it.
It stops you from accidentally writing = instead of == if you mean to do a
comparison. Compare:

if ($foo = 2) { ... }

to

if (2 = $foo) { ... }

The coder *meant* to write ==, but only did =. The first one is not an
error, and the if block is reached all the time. The second one IS an
error.

--
Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)


 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-16-2004
John Bokma wrote:
> Gunnar Hjalmarsson wrote:
>> (E-Mail Removed) wrote:
>>> Here is a sample of my data (each column is separated by tabs):
>>>
>>>
>>> 1234123 jaesdf ytkyk 345234
>>> 1264345 ghgfdf ghjhg 657658
>>> 3456765 sdasdf ytkyk 456543
>>> 1231232 assffg werwe 123454
>>> 5447454 asdqfr ytkyk 254364
>>>
>>> I am interested in creating a hash with two of the elements in
>>> the list ("ytkyk" and "ghjhg"). I would like to create a
>>> program to read only the third colomn and print the line (row)
>>> if it contains one of the latter items. Can anyone help me
>>> write a program. Here is what I have so far and I would like to
>>> create a more efficient program (I am going to use it for
>>> writing a larger program later):


<snip>

>>> next unless ( index($_, 'ytkyk') >= 0 );

>
> next unless index($_, 'ytkyk');
>
> The >= 0 test can be replaced, since it's clear it's not the first
> position.


No, it can't. If the string is not found in $_, index() returns -1
which is a true value.

>> What makes you believe that what you have is not efficient?

>
> Maybe the OP forgot to explain the "sorting" part .


Maybe. But it just struck me that the code will not print anything. I
would believe that this is what the OP meant to do:

while ( <File> ) {
print and next if index($_, 'ytkyk') >= 0;
print and next if index($_, 'ghjhg') >= 0;
}

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-16-2004
Web Surfer wrote:
> (E-Mail Removed) says:
>> Here is a sample of my data (each column is separated by tabs):
>>
>> 1234123 jaesdf ytkyk 345234
>> 1264345 ghgfdf ghjhg 657658
>> 3456765 sdasdf ytkyk 456543
>> 1231232 assffg werwe 123454
>> 5447454 asdqfr ytkyk 254364
>>
>> I am interested in creating a hash with two of the elements in
>> the list ("ytkyk" and "ghjhg"). I would like to create a program
>> to read only the third colomn and print the line (row) if it
>> contains one of the latter items. Can anyone help me write a
>> program. Here is what I have so far and I would like to create a
>> more efficient program (I am going to use it for writing a larger
>> program later):
>>
>> open( File, '<', 'file.txt' ) or die "$!\n";
>> while ( <File> ) {
>> next unless ( index($_, 'ytkyk') >= 0 );
>> next unless ( index($_, 'ghjhg') >= 0 );
>> print;
>> }
>> close( File );

>
> ### Try this untested code ###
>
> #!/usr/bin/perl
> use strict;
> use warnings;
>
> my ( $buffer , @fields , $filename , %hash1 );
>
> $filename = "file.txt";
> open(INPUT,"<$filename") or
> die("Can't open file \"$filename\" : $!\n");
>
> %hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );
>
> while ( $buffer = <INPUT> ) {
> chomp $buffer;
> @fields = split(/\t+/,$buffer);
> if ( 2 < @fields ) { # Ignore if less than 3 fields
> next;
> }
> unless ( exists $hash1{$fields[2]} ) {
> next;
> }
> print "$buffer\n";
> }
> close INPUT;


Would a hash creation and involving the regex engine (through split())
be more efficient? What would a benchmark result in?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      06-17-2004


Jeff 'japhy' Pinyan wrote:
> On Wed, 16 Jun 2004, John Bokma wrote:
>
>
>>Web Surfer wrote:
>>
>>
>>> if ( 2 < @fields ) { # Ignore if less than 3 fields

>>
>>silly, the OP never specified that could happen. It are 4 fields btw, so
>>I would test for inequality, not less than..

>
>
> Because it was the *third* field that contained the string the OP is
> searching for. Thus, skip any line that doesn't have enough fields.


Was there ever in the specification that there could be less than 4
fields? No.

>>Don't see any point in putting the constant to the left, btw. Silly C
>>coding convention IIRC.

>
> There's nothing wrong with it. It's not "silly". There is a point to it.
> It stops you from accidentally writing = instead of == if you mean to do a
> comparison. Compare:
>
> if ($foo = 2) { ... }


Found = in conditional, should be ==

> The coder *meant* to write ==, but only did =. The first one is not an
> error, and the if block is reached all the time. The second one IS an
> error.


No, it's and error if your compiler, interpreter, etc doesn't *WARN*
you. And a programmer turning of those warnings is silly.

Most C, C++ compilers do warn, as does Perl (with use strict, use
warnings). It is IMNSHO a stupid coding convention, illogical,
unreadable, weird. Especially with *inequalities* as the prev post used.

--
John MexIT: http://johnbokma.com/mexit/
personal page: http://johnbokma.com/
Experienced Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      06-17-2004
Gunnar Hjalmarsson wrote:

> John Bokma wrote:


> No, it can't. If the string is not found in $_, index() returns -1
> which is a true value.


Arrgh, stupid of me.

--
John MexIT: http://johnbokma.com/mexit/
personal page: http://johnbokma.com/
Experienced Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting list vs sorting vector boltar2003@boltar.world C++ 2 07-06-2010 09:40 AM
fired event Sorting which wasn't handled - sorting and SelectedIndexChanged Jason ASP .Net Web Controls 0 10-04-2006 02:19 PM
Controlling text in a Text Area or Text leo ASP General 1 12-05-2005 01:13 AM
sorting by multiple criterias (sub-sorting) Tom Kirchner Perl Misc 3 10-11-2003 05:16 PM
Any dedicated freeware text sorting utility for Windows? (besides Sorter from Aldra) smic Computer Support 1 08-18-2003 05:20 PM



Advertisments