Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Extarcting And Storing a String

Reply
Thread Tools

Extarcting And Storing a String

 
 
Digger
Guest
Posts: n/a
 
      01-07-2005
I am trying to extract a url from a file and store it, the problem is
I only want the first occurance of that url that meets certain
criteria.

How can I get that single url out of a file and store it to be used
for something else?

Thanks
 
Reply With Quote
 
 
 
 
Sherm Pendley
Guest
Posts: n/a
 
      01-07-2005
Digger wrote:

> How can I get that single url out of a file and store it to be used
> for something else?


You left out a critical bit of information: What format is the file in?
If it's HTML, use HTML:arser.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
 
Reply With Quote
 
 
 
 
Digger
Guest
Posts: n/a
 
      01-07-2005
On Fri, 07 Jan 2005 12:48:20 -0500, Sherm Pendley
<> wrote:

>Digger wrote:
>
>> How can I get that single url out of a file and store it to be used
>> for something else?

>
>You left out a critical bit of information: What format is the file in?
>If it's HTML, use HTML:arser.
>
>sherm--


Sorry, yes......

It's a flat text log file.....

date : error message: url: other garbage



 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      01-07-2005
Sherm Pendley wrote:
> Digger wrote:
>> I am trying to extract a url from a file and store it, the problem is
>> I only want the first occurance of that url that meets certain
>> criteria.
>>
>> How can I get that single url out of a file and store it to be used
>> for something else?

>
> You left out a critical bit of information: What format is the file in?
> If it's HTML, use HTML:arser.


Not necessarily. The OP didn't tell which criteria will be used to
identify the URL, but if those criteria has nothing to do with the
positioning of the URL in relation to various HTML elements,
HTML:arser won't reasonably be useful for the task, even if the file
happens to be an HTML page.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
mjl69
Guest
Posts: n/a
 
      01-07-2005
Digger wrote:

> I am trying to extract a url from a file and store it, the problem is
> I only want the first occurance of that url that meets certain
> criteria.
>
> How can I get that single url out of a file and store it to be used
> for something else?
>
> Thanks


use HTML::LinkExtor;

mjl
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      01-07-2005
Digger wrote:
> Sherm Pendley wrote:
>> Digger wrote:
>>> How can I get that single url out of a file and store it to be
>>> used for something else?

>>
>> You left out a critical bit of information: What format is the file
>> in? If it's HTML, use HTML:arser.

>
> Sorry, yes......
>
> It's a flat text log file.....
>
> date : error message: url: other garbage


What part of the task do you have difficulties with? Show us what you
have tried so far, and somebody may be able to point you in the right
direction.

A hint: check out the split() function.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
mjl69
Guest
Posts: n/a
 
      01-07-2005
Gunnar Hjalmarsson wrote:

> Digger wrote:
> > Sherm Pendley wrote:
> >> Digger wrote:
> >>> How can I get that single url out of a file and store it to be
> >>> used for something else?
> >>
> >> You left out a critical bit of information: What format is the file
> >> in? If it's HTML, use HTML:arser.

> >
> > Sorry, yes......
> >
> > It's a flat text log file.....
> >
> > date : error message: url: other garbage

>
> What part of the task do you have difficulties with? Show us what you
> have tried so far, and somebody may be able to point you in the right
> direction.
>
> A hint: check out the split() function.


#!/usr/bin/perl

use strict;
use warnings;

open my $file, 'log.txt' or die "error: could not open file: $!";
for (<$file>)
{
print if s/.*url:\s+(\S+)\s+.*/$1/;
}

For the flat text log file described, I was thinking of something like
this, but it won't work if the url has spaces in it (like local paths
in Windows) or if there is not at least one space on each side of the
url.


mjl
 
Reply With Quote
 
Digger
Guest
Posts: n/a
 
      01-07-2005

On 7 Jan 2005 18:20:33 GMT, "mjl69" <> wrote:

>Digger wrote:
>
>> I am trying to extract a url from a file and store it, the problem is
>> I only want the first occurance of that url that meets certain
>> criteria.
>>
>> How can I get that single url out of a file and store it to be used
>> for something else?
>>
>> Thanks

>
>use HTML::LinkExtor;
>
>mjl


The criteria to extract the URL with bee either "FAILED" or
"SUCCESS"...

Example...


[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

etc.....
 
Reply With Quote
 
Sherm Pendley
Guest
Posts: n/a
 
      01-07-2005
Digger wrote:

> The criteria to extract the URL with bee either "FAILED" or
> "SUCCESS"...
>
> Example...
>
>
> [2004-12-25 9:20:12] FAILED http://hotmail.com/bla
> [2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
> [2004-12-25 9:26:12] FAILED http://abc.com
> [2004-12-25 9:27:12] FAILED http://123.com


Just loop through the lines in the file. Use a regex to examine each
line and use last to exit from the loop as soon as you find what you're
looking for.

For example:

#!/usr/bin/perl

use strict;
use warnings;

# These are declared outside the while loop so you
# can use them after the loop exits
my $flag;
my $url;

while(<>) {
($flag, $url) = /(FAILED|SUCCESS) (.*)$/;
last if ($flag && $flag eq 'SUCCESS');
}

# Do something with $url ...

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
 
Reply With Quote
 
Joe Smith
Guest
Posts: n/a
 
      01-08-2005
mjl69 wrote:
> /.*url:\s+(\S+)\s+.*/;
> but it won't work if there is not at least one space on each side of the
> url.


Then use \s* instead of the first \s+ and get rid of the second.
You want either /.*?url:/ or /url:/ to ignore potential matches
in the garbage field.
-Joe
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
User Images: Storing in Files VS Storing in Database Jonathan Wood ASP .Net 1 06-02-2008 05:56 PM
storing pointer vs storing object toton C++ 11 10-13-2006 11:08 AM
Storing SQLConnection string in registry MattC ASP .Net 0 06-25-2004 02:28 PM
storing connection string in session Shyam ASP .Net 1 10-28-2003 07:22 PM
Storing connection string in machine.config Ritu ASP .Net 1 07-27-2003 04:07 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57