Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Weird behaviour escaping special characters in a string

Reply
Thread Tools

Weird behaviour escaping special characters in a string

 
 
Greg Hurrell
Guest
Posts: n/a
 
      02-21-2007
This instance method added to the String class returns a copy of the
receiver with occurrences of \ replaced with \\, and occurrences of '
replaced with \':

class String
def to_source_string
gsub(/(\\|')/, '\\\\\1')
end
end

The idea is that it will give you a string that you can write out a
Ruby file that will later print the string. For, example, let's take
the string, foo (3 characters):

"puts '" + "foo".to_source_string + "'" # puts 'foo'

Or a string with special characters in it like 'foo' (5 characters,
including enclosing single quotes):

"puts '" + "'foo'".to_source_string + "'" # puts '\'foo\''

My RSpec specs and experimentation in irb confirm that the method
works but I am at a loss to explain one thing:

Why do I need so many backslashes in my replacement expression?

There are five slashes in the replacement expression:

gsub(/(\\|')/, '\\\\\1')

But I would have thought that three would work:

gsub(/(\\|')/, '\\\1')

I basically want to replace "whatever is found in the pattern" with a
backslash (\\) followed by "whatever was found" (\1); so that's three
slashes. But with only three slashes Ruby gives me \1foo\1 instead of
\'foo\'. Four slashes produces the same result. Five slashes and
suddenly everything works (funnily enough, six slashes also works).
Two slashes and one slash have no effect (no escaping is performed).

I've got working code so it's not a huge problem, but my curiosity is
piqued. What's going on here that I don't understand?

Cheers,
Greg

 
Reply With Quote
 
 
 
 
Austin Ziegler
Guest
Posts: n/a
 
      02-21-2007
On 2/21/07, Greg Hurrell <(E-Mail Removed)> wrote:
> This instance method added to the String class returns a copy of the
> receiver with occurrences of \ replaced with \\, and occurrences of '
> replaced with \':
>
> class String
> def to_source_string
> gsub(/(\\|')/, '\\\\\1')
> end
> end


class String
def to_source_string
gsub(/(\\|')/) { "\\#$1" }
end
end

-austin
--
Austin Ziegler * http://www.velocityreviews.com/forums/(E-Mail Removed) * http://www.halostatue.ca/
* (E-Mail Removed) * http://www.halostatue.ca/feed/
* (E-Mail Removed)

 
Reply With Quote
 
 
 
 
James Edward Gray II
Guest
Posts: n/a
 
      02-21-2007
On Feb 21, 2007, at 12:36 PM, Austin Ziegler wrote:

> On 2/21/07, Greg Hurrell <(E-Mail Removed)> wrote:
>> This instance method added to the String class returns a copy of the
>> receiver with occurrences of \ replaced with \\, and occurrences of '
>> replaced with \':
>>
>> class String
>> def to_source_string
>> gsub(/(\\|')/, '\\\\\1')
>> end
>> end

>
> class String
> def to_source_string
> gsub(/(\\|')/) { "\\#$1" }
> end
> end


It's probably better to use a character class [\\'] instead of
alternation (\\|').

James Edward Gray II

 
Reply With Quote
 
Brian Candler
Guest
Posts: n/a
 
      02-21-2007
On Thu, Feb 22, 2007 at 02:55:09AM +0900, Greg Hurrell wrote:
> Why do I need so many backslashes in my replacement expression?
>
> There are five slashes in the replacement expression:
>
> gsub(/(\\|')/, '\\\\\1')
>
> But I would have thought that three would work:
>
> gsub(/(\\|')/, '\\\1')


Because even in single quotes, blackslashes must be doubled; this in turn is
because \' is the way that you insert a single quote within a single-quoted
string.

irb(main):001:0> a='\\'
=> "\\"
irb(main):002:0> a.size
=> 1
irb(main):003:0> b='\''
=> "'"
irb(main):004:0> b.size
=> 1
irb(main):005:0> c='\x'
=> "\\x"
irb(main):006:0> c.size
=> 2

> I basically want to replace "whatever is found in the pattern" with a
> backslash (\\) followed by "whatever was found" (\1); so that's three
> slashes. But with only three slashes Ruby gives me \1foo\1 instead of
> \'foo\'. Four slashes produces the same result. Five slashes and
> suddenly everything works (funnily enough, six slashes also works).
> Two slashes and one slash have no effect (no escaping is performed).
>
> I've got working code so it's not a huge problem, but my curiosity is
> piqued. What's going on here that I don't understand?


irb(main):009:0> a='\\\\1'
=> "\\\\1"
irb(main):010:0> a.size
=> 3
irb(main):011:0> a='\\\\\1'
=> "\\\\\\1"
irb(main):012:0> a.size
=> 4
irb(main):013:0> a='\\\\\\1'
=> "\\\\\\1"
irb(main):014:0> a.size
=> 4

In a single-quoted string:
\' => '
\\ => \
\x => \x for all other x

So '...\1' and '...\\1' are identical.

HTH,

Brian.

 
Reply With Quote
 
Greg Hurrell
Guest
Posts: n/a
 
      02-22-2007
On 21 feb, 20:50, Brian Candler <(E-Mail Removed)> wrote:

> In a single-quoted string:
> \' => '
> \\ => \
> \x => \x for all other x
>
> So '...\1' and '...\\1' are identical.


Excellent, that explains why I was getting the same results for 3 and
4 slashes, and the same for 5 and 6 slashes.

Cheers,
Greg

 
Reply With Quote
 
Greg Hurrell
Guest
Posts: n/a
 
      02-22-2007
On 21 feb, 19:45, James Edward Gray II <(E-Mail Removed)>
wrote:
> On Feb 21, 2007, at 12:36 PM, Austin Ziegler wrote:
>
> It's probably better to use a character class [\\'] instead of
> alternation (\\|').
>
> James Edward Gray II


I did some quick and dirty benchmarks and using a character class is a
little bit quicker. Interpolation ("\\#$1") is slower but more
readable. I guess I'll stick with the character class and no
interpolation though.

require 'benchmark'
include Benchmark

bm(6) do |x|
x.report('alternation') { 100_000.times { "'foo'".gsub(/(\\|')/, '\\\
\\1') } }
x.report('char class') { 100_000.times { "'foo'".gsub(/[\\']/, '\\\\
\&') } }
x.report('interpolation') { 100_000.times { "'foo'".gsub(/(\\|')/, "\
\#$1") } }
x.report('interpolation with char class') { 100_000.times
{ "'foo'".gsub(/[\\']/, "\\#$&") } }
end
user system total real
alternation 0.450000 0.000000 0.450000 ( 0.452661)
char class 0.390000 0.000000 0.390000 ( 0.396193)
interpolation 0.540000 0.010000 0.550000 ( 0.532106)
interpolation with char class 0.480000 0.000000 0.480000
( 0.485922)

 
Reply With Quote
 
David Vallner
Guest
Posts: n/a
 
      02-22-2007
On Thu, 22 Feb 2007 13:55:06 +0100, Greg Hurrell <(E-Mail Removed)=
> =


wrote:

> On 21 feb, 20:50, Brian Candler <(E-Mail Removed)> wrote:
>
>> In a single-quoted string:
>> \' =3D> '
>> \\ =3D> \
>> \x =3D> \x for all other x
>>
>> So '...\1' and '...\\1' are identical.

>
> Excellent, that explains why I was getting the same results for 3 and
> 4 slashes, and the same for 5 and 6 slashes.
>


%q{...} is your friend.

David Vallner

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      02-22-2007
2007/2/21, Greg Hurrell <(E-Mail Removed)>:
> This instance method added to the String class returns a copy of the
> receiver with occurrences of \ replaced with \\, and occurrences of '
> replaced with \':
>
> class String
> def to_source_string
> gsub(/(\\|')/, '\\\\\1')
> end
> end
>
> The idea is that it will give you a string that you can write out a
> Ruby file that will later print the string. For, example, let's take
> the string, foo (3 characters):
>
> "puts '" + "foo".to_source_string + "'" # puts 'foo'
>
> Or a string with special characters in it like 'foo' (5 characters,
> including enclosing single quotes):
>
> "puts '" + "'foo'".to_source_string + "'" # puts '\'foo\''


Why don't you just use #inspect?

Kind regards

robert

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert between a string w/ backslashes and a string w/special characters? Peng Yu Perl Misc 3 07-13-2010 05:48 AM
String#gsub escaping special characters Gary Yngve Ruby 5 02-24-2009 07:04 PM
SiteMap, SiteMapPath is Escaping Special Characters =?Utf-8?B?bWljaGFlbHJp?= ASP .Net 1 05-09-2007 10:40 PM
escaping special characters in JSON James Black Javascript 4 04-10-2006 12:41 AM
Newbie Question: Escaping special characters in array of strings Gene Kahn Ruby 5 11-22-2004 03:32 PM



Advertisments