Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > double-slashes in input causing trouble

Reply
Thread Tools

double-slashes in input causing trouble

 
 
Jeremy Wells
Guest
Posts: n/a
 
      11-23-2006
I'm writing a ruby program that reads a file, reads sections out of that
file, writes a header to each section and then writes the whole file
back to disk. The problem is that if the section contains "\\" which
mine does in places, ruby replaces these with a single "\" without my
asking it to.

Here is the basics of the program:
body = ""
File.open(input, 'r') do |file|
body = file.read
end

if body =~ /^section\sheader(.*)section\sfooter/mi
original_section = $1
new_section = bit_at_top + original_section
new_body = body.sub(original_section, new_section)

File.open(input,'w') do |file|
file.write new_body
end
end

If the original_section contains "\\" then this gets replaced by "\",
can I stop this happening?

Jeremy

 
Reply With Quote
 
 
 
 
Hugh Sasse
Guest
Posts: n/a
 
      11-23-2006
On Fri, 24 Nov 2006, Jeremy Wells wrote:

[...]
> The problem is that if the section contains "\\" which mine does in places,
> ruby replaces these with a single "\" without my asking it to.
>
> Here is the basics of the program:
> body = ""
> File.open(input, 'r') do |file|
> body = file.read
> end
>
> if body =~ /^section\sheader(.*)section\sfooter/mi
> original_section = $1
> new_section = bit_at_top + original_section
> new_body = body.sub(original_section, new_section)

new_body = body.sub(Regexp.new(Regexp.quote(original_section) ),
new_section)
>
> File.open(input,'w') do |file|
> file.write new_body
> end
> end


# Hugh

 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      11-23-2006

Some more remarks:

On 23.11.2006 17:53, Hugh Sasse wrote:
> On Fri, 24 Nov 2006, Jeremy Wells wrote:
>
> [...]
>> The problem is that if the section contains "\\" which mine does in places,
>> ruby replaces these with a single "\" without my asking it to.
>>
>> Here is the basics of the program:
>> body = ""


Initializing body with an empty string is superfluous - nil is more
efficient, but:

>> File.open(input, 'r') do |file|
>> body = file.read
>> end


You could as well replace those lines with

body = File.read input

>> if body =~ /^section\sheader(.*)section\sfooter/mi


Dangerous to use .* which is greedy and will break if there are more
sections in one file!

>> original_section = $1
>> new_section = bit_at_top + original_section
>> new_body = body.sub(original_section, new_section)

> new_body = body.sub(Regexp.new(Regexp.quote(original_section) ),
> new_section)


Now you do a replacement with sub which might replace some completely
different piece of text (i.e. especially if the text of original_section
appears outside a section or otherwise in multiple places.

>> File.open(input,'w') do |file|
>> file.write new_body
>> end
>> end

>
> # Hugh
>


So, combining these you get:

body = File.read input

if body.gsub!( %r{^(section\sheader)(.*?)(?=section\sfooter)}mi,
'\\1your_head\\2')
File.open(input, "w") {|io| io.write body}
end

Kind regards

robert

 
Reply With Quote
 
Edwin Fine
Guest
Posts: n/a
 
      11-23-2006
Hugh Sasse wrote:
> On Fri, 24 Nov 2006, Jeremy Wells wrote:
>
> [...]
>> original_section = $1
>> new_section = bit_at_top + original_section
>> new_body = body.sub(original_section, new_section)

> new_body = body.sub(Regexp.new(Regexp.quote(original_section) ),
> new_section)
>>
>> File.open(input,'w') do |file|
>> file.write new_body
>> end
>> end

>
> # Hugh


Makes no difference. String#sub states that metacharacters in the
pattern will not be interpreted if the pattern is a String and not a
Regexp.

Check it out:

irb(main):061:0> x
=> "section
header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfd kasjhdfkajshdfjh\nsection
footer"
irb(main):062:0> y
=> "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjh dfkajshdfjh\n"
irb(main):063:0> z
=> "xyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhd fkajshdfjh\n"
irb(main):064:0> x.sub(y,z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdf jh\nsection footer"
irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdf jh\nsection footer"

Identical results.

The problem is that the backslashes in the REPLACEMENT string are being
interpreted.

The way to overcome this is to use the block form of sub:

new_body = body.sub(original_section) {|s| s = new_section}

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Hugh Sasse
Guest
Posts: n/a
 
      11-23-2006
On Fri, 24 Nov 2006, Edwin Fine wrote:

> Hugh Sasse wrote:
> > On Fri, 24 Nov 2006, Jeremy Wells wrote:
> >> new_body = body.sub(original_section, new_section)

> > new_body = body.sub(Regexp.new(Regexp.quote(original_section) ),
> > new_section)

[...]
> >> end

> >
> > # Hugh

>
> Makes no difference. String#sub states that metacharacters in the
> pattern will not be interpreted if the pattern is a String and not a
> Regexp.

[...]
> The problem is that the backslashes in the REPLACEMENT string are being
> interpreted.


Oops!
Hugh

 
Reply With Quote
 
Jeremy Wells
Guest
Posts: n/a
 
      11-23-2006
Edwin Fine wrote:
> Makes no difference. String#sub states that metacharacters in the
> pattern will not be interpreted if the pattern is a String and not a
> Regexp.
>
> Check it out:
>
> irb(main):061:0> x
> => "section
> header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfd kasjhdfkajshdfjh\nsection
> footer"
> irb(main):062:0> y
> => "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjh dfkajshdfjh\n"
> irb(main):063:0> z
> => "xyzzy
> \nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhd fkajshdfjh\n"
> irb(main):064:0> x.sub(y,z)
> => "section headerxyzzy
> \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdf jh\nsection footer"
> irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
> => "section headerxyzzy
> \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdf jh\nsection footer"
>
> Identical results.
>
> The problem is that the backslashes in the REPLACEMENT string are being
> interpreted.
>
> The way to overcome this is to use the block form of sub:
>
> new_body = body.sub(original_section) {|s| s = new_section}
>

Thanks, I might try that, it's better looking than my solution, which was:
m = Regexp.new("(" + Regexp.escape(original_section) + ")").match(body)
body[(m.begin(1)..m.end(1)-1)] = new_section



 
Reply With Quote
 
David Vallner
Guest
Posts: n/a
 
      11-23-2006
--------------enigAA7A61195A45AD6EEF15D569
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Edwin Fine wrote:
> new_body =3D body.sub(original_section) {|s| s =3D new_section}
>=20


Using only {new_section} for the block should suffice, I doubt assigning
to a block parameter actually does anything outside the block.

David Vallner


--------------enigAA7A61195A45AD6EEF15D569
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFZhdvy6MhrS8astoRAqBIAJ9eGSPP06LoX6vv3PHXKc Tj398YfACaA3I8
JbmVFm+zGoh79nDkLgLgQp8=
=+geW
-----END PGP SIGNATURE-----

--------------enigAA7A61195A45AD6EEF15D569--

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      11-23-2006
On 23.11.2006 19:50, Jeremy Wells wrote:
> Edwin Fine wrote:
>> Makes no difference. String#sub states that metacharacters in the
>> pattern will not be interpreted if the pattern is a String and not a
>> Regexp.
>>
>> Check it out:
>>
>> irb(main):061:0> x
>> => "section
>> header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfd kasjhdfkajshdfjh\nsection
>> footer"
>> irb(main):062:0> y
>> => "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjh dfkajshdfjh\n"
>> irb(main):063:0> z
>> => "xyzzy \nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhd fkajshdfjh\n"
>> irb(main):064:0> x.sub(y,z)
>> => "section headerxyzzy
>> \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdf jh\nsection footer"
>> irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
>> => "section headerxyzzy
>> \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdf jh\nsection footer"
>>
>> Identical results.
>>
>> The problem is that the backslashes in the REPLACEMENT string are
>> being interpreted.
>>
>> The way to overcome this is to use the block form of sub:
>>
>> new_body = body.sub(original_section) {|s| s = new_section}
>>

> Thanks, I might try that, it's better looking than my solution, which was:
> m = Regexp.new("(" + Regexp.escape(original_section) + ")").match(body)
> body[(m.begin(1)..m.end(1)-1)] = new_section


Frankly, I don't understand why everybody is trying to fix backslashes
in replacement strings when there is gsub and grouping. It's easier and
more robust if you use grouping and use those groups in the replacement.
No problems with slashes in there (see my other posting).

Cheers

robert
 
Reply With Quote
 
Jan Svitok
Guest
Posts: n/a
 
      11-23-2006
or you can use references:

old_section = $1
new_body = body.sub(old_section, bit_at_top + '\&')

\& = the last match.

if there was gsub instead of sub, this would be slower as the
replacement takes place on every occurence. In this case, however,
there's max 1 occurence.

You can do as well:
- body = ""
- File.open(input, 'r') do |file|
- body = file.read
- end
+ body = File.read(input)

and

File.open(input,'w') do |file|
file.write new_body
- end
+ end unless new_body == body

 
Reply With Quote
 
Jeremy Wells
Guest
Posts: n/a
 
      11-24-2006
Jan Svitok wrote:
> or you can use references:
>
> old_section = $1
> new_body = body.sub(old_section, bit_at_top + '\&')
>
> \& = the last match.
>
> if there was gsub instead of sub, this would be slower as the
> replacement takes place on every occurence. In this case, however,
> there's max 1 occurence.
>
> You can do as well:
> - body = ""
> - File.open(input, 'r') do |file|
> - body = file.read
> - end
> + body = File.read(input)
>
> and
>
> File.open(input,'w') do |file|
> file.write new_body
> - end
> + end unless new_body == body
>

thanks, thats useful to know for the future. this was something of a run
once and its done program, and i've um run it now, so its done.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Input errors, but what's causing them? robbertdam@gmail.com Cisco 2 04-18-2007 03:16 PM
i have no trouble to send , ihave trouble reciving mail --any ideas John Penney Computer Support 4 08-29-2006 08:45 PM
HTML input type=image in ASP.NET 1.1 causing current page to reload jon@jongianni.com ASP .Net 2 05-18-2006 06:16 PM
Screensaver causing trouble? cygnian@msn.com Computer Support 10 04-03-2005 12:53 PM
trouble with caching or caching the trouble Hypo ASP .Net 6 08-01-2003 07:11 AM



Advertisments