Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > splitting with a regex & keeping a ref?

Reply
Thread Tools

splitting with a regex & keeping a ref?

 
 
Kyle Schmitt
Guest
Posts: n/a
 
      05-01-2008
I'm writing some scripts to help handle some ornery samba servers we
have: part of that is unfortunately reading the config scripts that
have built up over the years.

I was hoping to use the standard string method as a quick &
not-so-dirty way of parsing the files, given that samba uses a very
simple format.

#the sample_data variable is defined below
irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
=> ["", "\ncomment = shared directory for the shop\npath =
/dept/shop\nvalid u ....(truncated)
Gives good results, but omits what's between the brackets. I expected
that part.

irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
=> ["", "[shop]", "\ncomment = shared directory for the shop\npath =
/dept/sho ....(truncated)
Neat, gives me the data between the brackets in an element before the
data itself.


I know quite well I can zip through that array again, but I was
wondering, hoping, that there would be a way of accessing that back
reference in a block as part of the split.

Is there any way to do that that I'm just missing?

Thanks,
Kyle

sample_data=%{[shop]
comment = shared directory for the shop
path = /dept/shop
valid users = @shop @admin
public = no
writable = yes
force group = shop
create mask = 0770
[bob]
comment = User files for bob
path = /users/bob
valid users = bob @admin
public = no
writable = yes
create mask = 0770}

 
Reply With Quote
 
 
 
 
David A. Black
Guest
Posts: n/a
 
      05-01-2008
Hi --

On Thu, 1 May 2008, Kyle Schmitt wrote:

> I'm writing some scripts to help handle some ornery samba servers we
> have: part of that is unfortunately reading the config scripts that
> have built up over the years.
>
> I was hoping to use the standard string method as a quick &
> not-so-dirty way of parsing the files, given that samba uses a very
> simple format.
>
> #the sample_data variable is defined below
> irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
> => ["", "\ncomment = shared directory for the shop\npath =
> /dept/shop\nvalid u ....(truncated)
> Gives good results, but omits what's between the brackets. I expected
> that part.
>
> irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
> => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
> /dept/sho ....(truncated)
> Neat, gives me the data between the brackets in an element before the
> data itself.
>
>
> I know quite well I can zip through that array again, but I was
> wondering, hoping, that there would be a way of accessing that back
> reference in a block as part of the split.


I'm afraid I can't quite follow that sentence. What do you mean by a
back reference? Can you show some sample desired output?


David

--
Rails training from David A. Black and Ruby Power and Light:
INTRO TO RAILS June 9-12 Berlin
ADVANCING WITH RAILS June 16-19 Berlin
INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!

 
Reply With Quote
 
 
 
 
Kyle Schmitt
Guest
Posts: n/a
 
      05-01-2008
David, back reference as in a regex back reference.
In a nutshell, it stores what was matched, and allows you to do
something with it. You just place parentheses around the part of the
match you want to save.

They work like this in ruby's gsub (but a little differently in sed,
if that's the regex you grew up with).

example=%{Brian had a dog
James had a cat
Allen has a hampster}
puts example
#If you wanted to change the type of pet with gsub, you could do it like this...
puts example.gsub("/[^ ]+$/","grue")
#but if you wanted to describe the pet, and not change the type, you'd
need a backreference
puts example.gsub(/([^ ]+$)/){|i| "big ugly #{i}"}

On Thu, May 1, 2008 at 9:55 AM, David A. Black <(E-Mail Removed)> wrote:
> Hi --
>
>
> On Thu, 1 May 2008, Kyle Schmitt wrote:
>
>
> > I'm writing some scripts to help handle some ornery samba servers we
> > have: part of that is unfortunately reading the config scripts that
> > have built up over the years.
> >
> > I was hoping to use the standard string method as a quick &
> > not-so-dirty way of parsing the files, given that samba uses a very
> > simple format.
> >
> > #the sample_data variable is defined below
> > irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
> > => ["", "\ncomment = shared directory for the shop\npath =
> > /dept/shop\nvalid u ....(truncated)
> > Gives good results, but omits what's between the brackets. I expected
> > that part.
> >
> > irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
> > => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
> > /dept/sho ....(truncated)
> > Neat, gives me the data between the brackets in an element before the
> > data itself.
> >
> >
> > I know quite well I can zip through that array again, but I was
> > wondering, hoping, that there would be a way of accessing that back
> > reference in a block as part of the split.
> >

>
> I'm afraid I can't quite follow that sentence. What do you mean by a
> back reference? Can you show some sample desired output?
>
>
> David
>
> --
> Rails training from David A. Black and Ruby Power and Light:
> INTRO TO RAILS June 9-12 Berlin
> ADVANCING WITH RAILS June 16-19 Berlin
> INTRO TO RAILS June 24-27 London (Skills Matter)
> See http://www.rubypal.com for details and updates!
>
>


 
Reply With Quote
 
Kyle Schmitt
Guest
Posts: n/a
 
      05-01-2008
Ohh right, desired sample output.

What I'd really like, is to split the string, and either stuff it
straight into a hash at the same time, or, more realistically since
it's splitting, array tuples.
So...

sample.data.split(){magic happens here}
=>{"[shop]"=>"\ncomment = shared directory for the shop\npath..>"}

or
sample.data.split(){magick happens here}
=>[["[shop]","\ncomment = shared directory for the shop\npath..>"]]

 
Reply With Quote
 
Kyle Schmitt
Guest
Posts: n/a
 
      05-01-2008
David,
re-reading your sig, and that page, I've got to apologize,
you already knew that stuff in spades I'm sure!

What part doesn't quite make sense?

On Thu, May 1, 2008 at 10:17 AM, Kyle Schmitt <(E-Mail Removed)> wrote:
> Ohh right, desired sample output.
>
> What I'd really like, is to split the string, and either stuff it
> straight into a hash at the same time, or, more realistically since
> it's splitting, array tuples.
> So...
>
> sample.data.split(){magic happens here}
> =>{"[shop]"=>"\ncomment = shared directory for the shop\npath..>"}
>
> or
> sample.data.split(){magick happens here}
> =>[["[shop]","\ncomment = shared directory for the shop\npath..>"]]
>
>


 
Reply With Quote
 
yermej
Guest
Posts: n/a
 
      05-01-2008
On May 1, 9:46 am, Kyle Schmitt <(E-Mail Removed)> wrote:
> I'm writing some scripts to help handle some ornery samba servers we
> have: part of that is unfortunately reading the config scripts that
> have built up over the years.
>
> I was hoping to use the standard string method as a quick &
> not-so-dirty way of parsing the files, given that samba uses a very
> simple format.
>
> #the sample_data variable is defined below
> irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
> => ["", "\ncomment = shared directory for the shop\npath =
> /dept/shop\nvalid u ....(truncated)
> Gives good results, but omits what's between the brackets. I expected
> that part.
>
> irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
> => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
> /dept/sho ....(truncated)
> Neat, gives me the data between the brackets in an element before the
> data itself.
>
> I know quite well I can zip through that array again, but I was
> wondering, hoping, that there would be a way of accessing that back
> reference in a block as part of the split.
>
> Is there any way to do that that I'm just missing?
>
> Thanks,
> Kyle
>
> sample_data=%{[shop]
> comment = shared directory for the shop
> path = /dept/shop
> valid users = @shop @admin
> public = no
> writable = yes
> force group = shop
> create mask = 0770
> [bob]
> comment = User files for bob
> path = /users/bob
> valid users = bob @admin
> public = no
> writable = yes
> create mask = 0770}


I think you might want scan instead of split.

sample_data.scan( /(\[[a-z0-9]+\])([^\[]*)/i) do |share, opts|
# create your hash or whatever here
end
 
Reply With Quote
 
Kyle Schmitt
Guest
Posts: n/a
 
      05-01-2008
yermej,
scan you say. Heh, I never even thought of that one.
Makes the whole thing rather simple!

Thanks.

On Thu, May 1, 2008 at 11:10 AM, yermej <(E-Mail Removed)> wrote:
>
> On May 1, 9:46 am, Kyle Schmitt <(E-Mail Removed)> wrote:
> > I'm writing some scripts to help handle some ornery samba servers we
> > have: part of that is unfortunately reading the config scripts that
> > have built up over the years.
> >
> > I was hoping to use the standard string method as a quick &
> > not-so-dirty way of parsing the files, given that samba uses a very
> > simple format.
> >
> > #the sample_data variable is defined below
> > irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
> > => ["", "\ncomment = shared directory for the shop\npath =
> > /dept/shop\nvalid u ....(truncated)
> > Gives good results, but omits what's between the brackets. I expected
> > that part.
> >
> > irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
> > => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
> > /dept/sho ....(truncated)
> > Neat, gives me the data between the brackets in an element before the
> > data itself.
> >
> > I know quite well I can zip through that array again, but I was
> > wondering, hoping, that there would be a way of accessing that back
> > reference in a block as part of the split.
> >
> > Is there any way to do that that I'm just missing?
> >
> > Thanks,
> > Kyle
> >
> > sample_data=%{[shop]
> > comment = shared directory for the shop
> > path = /dept/shop
> > valid users = @shop @admin
> > public = no
> > writable = yes
> > force group = shop
> > create mask = 0770
> > [bob]
> > comment = User files for bob
> > path = /users/bob
> > valid users = bob @admin
> > public = no
> > writable = yes
> > create mask = 0770}

>
> I think you might want scan instead of split.
>
> sample_data.scan( /(\[[a-z0-9]+\])([^\[]*)/i) do |share, opts|
> # create your hash or whatever here
> end
>
>


 
Reply With Quote
 
Kyle Schmitt
Guest
Posts: n/a
 
      05-01-2008
Robbert, yermej, David,

Thanks a bunch!
Here's what I finally came up with, in case anyone's bored enough to wonder.

file="/path/to/smb/file/sample.conf"
regex=/(\[[a-z0-9]+\])([^\[]*)/i
samba_config={}
File.open(file){|f| f.read()}.scan(regex) do
|title,options|
samba_config.store(title,{})
options.strip.each() do
|l|
samba_config[title].store(l[/^[^=]*/].strip,l[/[^=]*[^\n]$/].strip)
end
end

 
Reply With Quote
 
David A. Black
Guest
Posts: n/a
 
      05-01-2008
Hi --

On Fri, 2 May 2008, Kyle Schmitt wrote:

> Robbert, yermej, David,
>
> Thanks a bunch!
> Here's what I finally came up with, in case anyone's bored enough to wonder.
>
> file="/path/to/smb/file/sample.conf"
> regex=/(\[[a-z0-9]+\])([^\[]*)/i
> samba_config={}
> File.open(file){|f| f.read()}.scan(regex) do
> |title,options|
> samba_config.store(title,{})
> options.strip.each() do
> |l|
> samba_config[title].store(l[/^[^=]*/].strip,l[/[^=]*[^\n]$/].strip)
> end
> end


I know you're not asking for refactoring advice, but here's some
anyway

If you're just going to read a file's contents into a string, you can
use File.read, rather than the whole open/read thing. Also, I'd
encourage you to drop the empty parentheses after method names. The
message-sending dot tells you that it's a method; the () doesn't add
signal, just noise.

Anyway, here's a tweaked version, in case it's of interest. Nothing
too radical, just a couple of possibly fun alternative techniques

File.read("filename").scan(regex) do |title,options|
samba_config[title] = {}
options.strip.each do |option|
samba_config[title].update(Hash[*option.strip.split(/\s*=\s*/)])
end
end


David

--
Rails training from David A. Black and Ruby Power and Light:
INTRO TO RAILS June 9-12 Berlin
ADVANCING WITH RAILS June 16-19 Berlin
INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!

 
Reply With Quote
 
Kyle Schmitt
Guest
Posts: n/a
 
      05-01-2008
David,
I don't mind it at all!

Out of curiosity, agreeing that File.read().scan() is much cleaner, is
it just syntactic sugar for the same thing, or is it computationally
different?

Thanks for the Hash[*Array] syntax btw, I've used it way way back, but
for the life of me couldn't remember it, thought maybe I was mistaken.

> I know you're not asking for refactoring advice, but here's some
> anyway
>
> If you're just going to read a file's contents into a string, you can
> use File.read, rather than the whole open/read thing. Also, I'd
> encourage you to drop the empty parentheses after method names. The
> message-sending dot tells you that it's a method; the () doesn't add
> signal, just noise.
>
> Anyway, here's a tweaked version, in case it's of interest. Nothing
> too radical, just a couple of possibly fun alternative techniques
>
> File.read("filename").scan(regex) do |title,options|
> samba_config[title] = {}
> options.strip.each do |option|
> samba_config[title].update(Hash[*option.strip.split(/\s*=\s*/)])
> end
> end
>
>
>
>
> David
>
> --
> Rails training from David A. Black and Ruby Power and Light:
> INTRO TO RAILS June 9-12 Berlin
> ADVANCING WITH RAILS June 16-19 Berlin
> INTRO TO RAILS June 24-27 London (Skills Matter)
> See http://www.rubypal.com for details and updates!
>
>


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Splitting a line while keeping quoted items together josh@merchantconcepts.com Python 1 11-20-2012 01:09 AM
Re: Splitting text at whitespace but keeping the whitespace in thereturned list MRAB Python 3 01-26-2010 11:36 PM
Splitting string into array keeping delimiters Gary C40 Ruby 6 12-16-2007 10:46 AM
Splitting and keeping key/value Sandman Perl Misc 17 09-27-2006 11:46 AM
Splitting and keeping the delimiter Sandman Perl Misc 7 09-12-2003 12:40 PM



Advertisments