Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Rubish Way of extracting elements

Reply
Thread Tools

Rubish Way of extracting elements

 
 
Daniel Völkerts
Guest
Posts: n/a
 
      08-16-2004
I started written a little script to analyse my syslogs. The development
went on very fast, but today I'm searching the rubish way to dissect a
string into some parts. For example in my syslog there is a line (valid
as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I
found was about repeating values e.g. | as seperator. Is a suitable
regexp the way or should use another technique e.g. String#index etc.?


Thanks for your time helping me, I'll pay it back if I become a little
more rubisher

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
 
Reply With Quote
 
 
 
 
Daniel Völkerts
Guest
Posts: n/a
 
      08-16-2004
Daniel Völkerts wrote:

> I started written a little script to analyse my syslogs.


I feel sorry, 'I started writting..' is the correct way.

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
 
Reply With Quote
 
 
 
 
David A. Black
Guest
Posts: n/a
 
      08-16-2004
Hi --

On Tue, 17 Aug 2004, Daniel Völkerts wrote:

> I started written a little script to analyse my syslogs. The development
> went on very fast, but today I'm searching the rubish way to dissect a
> string into some parts. For example in my syslog there is a line (valid
> as described in rfc3146)
>
> <165> Aug 16 17:01:35 localhost Just a test
>
> I was trying to reach this form
>
> var = content
>
> pri = 165
> timestamp = Aug 16 17:01:35
> device = localhost
> msg = Just a test
>
> But how do I accomplish this? I read the pickaxe book, but the example I
> found was about repeating values e.g. | as seperator. Is a suitable
> regexp the way or should use another technique e.g. String#index etc.?
>
>
> Thanks for your time helping me, I'll pay it back if I become a little
> more rubisher


You could match it to a regular expression, and grab the results in
()-expressions:

str = "<165> Aug 16 17:01:35 localhost Just a test"

pri, timestamp, device, msg =
/<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

Another way would be to use scanf. This has the advantage that you
get your 165 as an integer (if that's important):

require 'scanf'
pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"


(You might have to adjust either the regex or the format string
depending on how consistent and predictable the lines are.)


David

--
David A. Black
http://www.velocityreviews.com/forums/(E-Mail Removed)



 
Reply With Quote
 
Charles Mills
Guest
Posts: n/a
 
      08-16-2004

On Aug 16, 2004, at 8:06 AM, Daniel Völkerts wrote:

> I started written a little script to analyse my syslogs. The
> development went on very fast, but today I'm searching the rubish way
> to dissect a string into some parts. For example in my syslog there is
> a line (valid as described in rfc3146)
>
> <165> Aug 16 17:01:35 localhost Just a test
>
> I was trying to reach this form
>
> var = content
>
> pri = 165
> timestamp = Aug 16 17:01:35
> device = localhost
> msg = Just a test
>
> But how do I accomplish this? I read the pickaxe book, but the example
> I found was about repeating values e.g. | as seperator. Is a suitable
> regexp the way or should use another technique e.g. String#index etc.?
>

Probably use regular expressions. You could have one big regexp or one
for each field like so:
var =~ /<([0-9]+)>/
pri = $1
$' =~ /some regexp/ # I'm lazy
timestamp = $1
# etc
You could also use \A along with the post match ($') to make sure the
fields come in the order you expect.
-Charlie

>
> Thanks for your time helping me, I'll pay it back if I become a little
> more rubisher
>
> --
> Daniel Völkerts ::
> "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
>





 
Reply With Quote
 
Florian Gross
Guest
Posts: n/a
 
      08-16-2004
Daniel Völkerts wrote:

> <165> Aug 16 17:01:35 localhost Just a test
> I was trying to reach this form
>
> var = content
>
> pri = 165
> timestamp = Aug 16 17:01:35
> device = localhost
> msg = Just a test


This ought to work, but there might be other ways to do this:

if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
pri, timestamp, device, msg = *md.captures
# Do something with the captures
end

Regards,
Florian Gross
 
Reply With Quote
 
Daniel Völkerts
Guest
Posts: n/a
 
      08-16-2004
Daniel Völkerts wrote:

> I feel sorry, 'I started writting..' is the correct way.


What the hell, writting is also wrong, tzzz. Too much caffeine in my head!

After I posted the above thread I have written this line

pri,timestamp,device,msg = aMsg.scan(/<\d{1,5}>|\w{3,} \d\d
\d\d:\d\d:\d\d|\w+/)

Is this the right way? Please feel free to post comments. I'll looking
for it to improve my ruby skills.

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
 
Reply With Quote
 
Daniel Völkerts
Guest
Posts: n/a
 
      08-16-2004
David A. Black wrote:

> You could match it to a regular expression, and grab the results in
> ()-expressions:
>
> str = "<165> Aug 16 17:01:35 localhost Just a test"
>
> pri, timestamp, device, msg =
> /<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures
>
> Another way would be to use scanf. This has the advantage that you
> get your 165 as an integer (if that's important):
>
> require 'scanf'
> pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"
>
>
> (You might have to adjust either the regex or the format string
> depending on how consistent and predictable the lines are.)


Thanks a lot. Thats the way I would expect it. Simple and nice to
understand. I'll try it.

Many regards.

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      08-16-2004

"Florian Gross" <(E-Mail Removed)> schrieb im Newsbeitrag
news:(E-Mail Removed)...
> Daniel Völkerts wrote:
>
> > <165> Aug 16 17:01:35 localhost Just a test
> > I was trying to reach this form
> >
> > var = content
> >
> > pri = 165
> > timestamp = Aug 16 17:01:35
> > device = localhost
> > msg = Just a test

>
> This ought to work, but there might be other ways to do this:
>
> if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
> pri, timestamp, device, msg = *md.captures
> # Do something with the captures
> end


Some more admittedly ugly constructions:

val = "<165> Aug 16 17:01:35 localhost Just a test"
unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
puts "matched"
end

pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
\s+ (\S+) \s+ (.*)$/x.match(val).to_a
if pri
puts "matched"
end

LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
puts "matched"
end

if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
puts "matched"
end

if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
puts "matched"
end



robert

 
Reply With Quote
 
Daniel Völkerts
Guest
Posts: n/a
 
      08-16-2004
Robert Klemme wrote:

> Some more admittedly ugly constructions:
>
> val = "<165> Aug 16 17:01:35 localhost Just a test"
> unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
> \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
> puts "matched"
> end
>
> pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
> \s+ (\S+) \s+ (.*)$/x.match(val).to_a
> if pri
> puts "matched"
> end
>
> LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x
>
> unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
> puts "matched"
> end
>
> if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
> \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
> puts "matched"
> end
>
> if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
> puts "matched"
> end
>
>
>
> robert
>


*boom* That blow my mind away! No no, thanks a lot for that piece of code.

But I prefer the scanf and one-line-regexp.

I'll test which kind performs better for my needs. As I said, I'm a ruby
newbie and personal programming rule is: keep it simple! I've to
understand the things I wrote.

If the point is reached where my little script becomes interesting for
others than me, I'll post an [Ann] thread.


Bye,
--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      08-17-2004

"Daniel Völkerts" <(E-Mail Removed)> schrieb im Newsbeitrag
news:cfr4ml$d13$00$(E-Mail Removed)-online.com...
> Robert Klemme wrote:
>
> > Some more admittedly ugly constructions:
> >
> > val = "<165> Aug 16 17:01:35 localhost Just a test"


(1) This one converts the RX MatchData into an array and tests for emptyness
to determine whether it matched. And along the way values are assigned to
local vars.

> > unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

\s+
> > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
> > puts "matched"
> > end


(2) Similar, but now just one local var is used as match check: if "pri" is
not nil, the RX matched.

> > pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+

\d+:\d+:\d+)
> > \s+ (\S+) \s+ (.*)$/x.match(val).to_a
> > if pri
> > puts "matched"
> > end


(3) Same approach as (1) but the regexp is defined as a constant to make
stuff more readable.

> > LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+

(.*)$/x
> >
> > unless ( ( pri, timestamp, device, msg = *

LOG_RX.match(val).to_a ).empty? )
> > puts "matched"
> > end


(4) Similar approach to (2) but the test is included ("&& line"). Note that
this time no conversion to array is done here so we need the additional
local "line" to receive the complete capture.

> > if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

\s+
> > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
> > puts "matched"
> > end


(5) Same as (4) but with regexp in constant as in (3).

> > if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
> > puts "matched"
> > end
> >
> >
> >
> > robert
> >

>
> *boom* That blow my mind away! No no, thanks a lot for that piece of code.


I *should've* put some comments in... Ok, inserting them above now.

> But I prefer the scanf and one-line-regexp.


Basically I used extended regular expressions (switched by the "/x" flag).
Whitespace is ignored, that's why you see more "\s+" in there. And that's
why the regexp is longer.

> I'll test which kind performs better for my needs. As I said, I'm a ruby
> newbie and personal programming rule is: keep it simple! I've to
> understand the things I wrote.


That's an excellent road to walk down! Handcrafted, simple code is better
than a mindless copy of something found somewhere.

Kind regards

robert

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting elements over multiple lists? JoeM Python 15 11-15-2011 10:54 PM
How to stop Idiots that keep posting Rubish. William Brown NZ Computing 2 10-06-2010 11:08 AM
Seagate SeaTools for Windows. utter Rubish Woger NZ Computing 57 04-18-2009 03:46 AM
Extracting sub elements in array WKC CCC Ruby 4 02-13-2007 07:27 PM
extracting link from elements content smarto59@hotmail.com XML 0 02-12-2005 11:32 AM



Advertisments