Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Extracting Text

Reply
Thread Tools

Extracting Text

 
 
Jake Gottlieb
Guest
Posts: n/a
 
      06-10-2004
I am trying to extract lines with:

GO:0009986

out of:


ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at

ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
ENSG00000113494.3 AAD32032.1 GO:0009867 206346_at
ENSG00000113494.3 AAK32703.1 GO:0004567 206346_at
ENSG00000113494.3 AAH59392.1 GO:0000678 206346_at

ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAK32703.1 GO:0005764 211917_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at

ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAD32032.1 GO:0003765 210476_s_at
ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAH59392.1 GO:0005876 210476_s_at

I have been trying to write a program for it, but can't seem to do it.
If someone could help, I would be very appreciative (I am sure it's
really easy, but Perl is new to me).

Thanks
 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      06-10-2004
On Thu, 10 Jun 2004, Jake Gottlieb wrote:

> I am trying to extract lines with:
>
> GO:0009986
>
> out of:
>
>
> ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
> ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
> ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
> ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at
>
> ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
> ENSG00000113494.3 AAD32032.1 GO:0009867 206346_at
> ENSG00000113494.3 AAK32703.1 GO:0004567 206346_at
> ENSG00000113494.3 AAH59392.1 GO:0000678 206346_at
>
> ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
> ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
> ENSG00000113494.3 AAK32703.1 GO:0005764 211917_s_at
> ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at
>
> ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
> ENSG00000113494.3 AAD32032.1 GO:0003765 210476_s_at
> ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
> ENSG00000113494.3 AAH59392.1 GO:0005876 210476_s_at
>
> I have been trying to write a program for it, but can't seem to do it.
> If someone could help, I would be very appreciative (I am sure it's
> really easy, but Perl is new to me).


Show us what you've written so far, so we can help you to see why it
"doesn't work". You've shown us the input and we can deduce the desired
output. Now show us your code, and what output it gave, so we may see how
it doesn't meet your specifications.

Paul Lalli
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-10-2004
Jake Gottlieb wrote:
> I am trying to extract lines with:
>
> GO:0009986


<snip>

> I have been trying to write a program for it, but can't seem to do
> it. If someone could help, I would be very appreciative (I am sure
> it's really easy, but Perl is new to me).


http://learn.perl.org/

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Jake Gottlieb
Guest
Posts: n/a
 
      06-11-2004
Gunnar Hjalmarsson <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Jake Gottlieb wrote:
> > I am trying to extract lines with:
> >
> > GO:0009986

>
> <snip>
>
> > I have been trying to write a program for it, but can't seem to do
> > it. If someone could help, I would be very appreciative (I am sure
> > it's really easy, but Perl is new to me).

>
> http://learn.perl.org/


Here is my code. I am sure its wrong, and would be greatful if someone
could correct and complete it. I would like to extract lines from the
original code, and put them into another text file. I have been trying
for a while:

while (<file.txt>) {
$line = $_;
$yes = (index $line, 'GO:000');
if ($yes > -1) {
print "YES : $line";
}
if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
print "La GO! $line \n";
}
}
 
Reply With Quote
 
Peter Hickman
Guest
Posts: n/a
 
      06-11-2004
Jake Gottlieb wrote:
> Here is my code. I am sure its wrong, and would be greatful if someone
> could correct and complete it. I would like to extract lines from the
> original code, and put them into another text file. I have been trying
> for a while:
>
> while (<file.txt>) {
> $line = $_;
> $yes = (index $line, 'GO:000');
> if ($yes > -1) {
> print "YES : $line";
> }
> if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
> print "La GO! $line \n";
> }
> }


If all you want is to display lines that contain the string GO:0009986 then this
will do the trick.

[peter@wasabi xxx]$ cat prog
#!/usr/bin/perl -w

use strict;
use warnings;

while ( my $line = <> ) {
next unless $line =~ m/\s+GO:0009986\s+/;

print $line;
}
[peter@wasabi xxx]$

Basically it reads data from standard input and skips if the line does not match
the regex otherwise it prints it to standard output.

[peter@wasabi xxx]$ perl prog file.txt
ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at
ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
[peter@wasabi xxx]$

I'm not too sure what all the $yes stuff in your code was for and <file.txt> is
not how you open or handle a file but you got the idea of regex although it
would seem to be over specified for the problem.
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      06-11-2004
Peter Hickman <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Jake Gottlieb wrote:


[...]

> If all you want is to display lines that contain the string GO:0009986
> then this
> will do the trick.
>
> [peter@wasabi xxx]$ cat prog
> #!/usr/bin/perl -w
>
> use strict;
> use warnings;
>
> while ( my $line = <> ) {
> next unless $line =~ m/\s+GO:0009986\s+/;

^ ^
The "+"es make no difference here.

> print $line;
> }


That can be simplified to

/\sGO:0009986\s/ and print while <>;

Anno
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-11-2004
Jake Gottlieb wrote:
> Here is my code. I am sure its wrong,


Please be more specific about the problem. You'd better study the
posting guidelines for this group:

http://mail.augustmail.com/~tadmc/cl...uidelines.html

> and would be greatful if someone could correct and complete it. I
> would like to extract lines from the original code, and put them
> into another text file.


Below please find a couple of comments. If you want to write something
to another file, you should open that file for writing...

> while (<file.txt>) {


That does not open the file for reading. This does:

open my $fh, '< file.txt' or die $!;
while (<$fh>) {

See

perldoc -f open

> $line = $_;
> $yes = (index $line, 'GO:000');


You should have

use strict;
use warnings;

in the beginning of the program, and declare the variables you introduce:

my $line = $_;
my $yes = (index $line, 'GO:000');
----^^

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Tore Aursand
Guest
Posts: n/a
 
      06-11-2004
On Fri, 11 Jun 2004 00:35:59 -0700, Jake Gottlieb wrote:
> while (<file.txt>) {


That doesn't read from "file.txt". This one does (untested);

open( FH, '<', 'file.txt' ) or die "$!\n";
while ( <FH> ) {
# ...
}

> $line = $_;
> $yes = (index $line, 'GO:000');
> if ($yes > -1) {
> print "YES : $line";
> }
> if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
> print "La GO! $line \n";
> }
> }


If you are sure that you can match on 'GO:000', you're on the right track
using 'index'. But you don't need any regular expressions (untested);

open( FH, '<', 'file.txt' ) or die "$!\n";
while ( <FH> ) {
next unless ( index($_, 'GO:000') >= 0 );
print;
}
close( FH );

Also: Be sure to 'use strict' and 'use warnings' in your script(s).


--
Tore Aursand <(E-Mail Removed)>
"Poor management can increase software costs more rapidly than any
other factor." (Barry Boehm)
 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      06-11-2004
Tore Aursand wrote:

> next unless ( index($_, 'GO:000') >= 0 );


index($_, 'GO:000') > -1 or next;

--
John MexIT: http://johnbokma.com/mexit/
personal page: http://johnbokma.com/
Experienced Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      06-11-2004
John Bokma <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Tore Aursand wrote:
>
> > next unless ( index($_, 'GO:000') >= 0 );

>
> index($_, 'GO:000') > -1 or next;


1 + index $_, 'GO:000' or next;

Anno
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting text from a Word document via StreamReader - track chan =?Utf-8?B?S2V2aW4gSw==?= ASP .Net 2 04-05-2006 11:07 PM
extracting text from files using IFilters kunal ASP .Net 0 10-15-2005 11:09 AM
extracting text from files using IFilters kunal ASP .Net 0 10-15-2005 08:18 AM
Extracting CDATA Text without CDATA Tags??? John Davison Java 1 07-06-2004 11:00 PM
extracting unique strings from text file Bubbles ASP .Net 0 03-03-2004 06:55 PM



Advertisments