Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > gsub and backslashes

Reply
Thread Tools

gsub and backslashes

 
 
Ralph Shnelvar
Guest
Posts: n/a
 
      11-20-2010
[Note: parts of this message were removed to make it a legal post.]

Consider the string
\1\2\3
that is
"\\1\\2\\3"

I feel really stupid ... but this simple substitution pattern does not do what I expect.

"\\1\\2\\3".gsub(/\\/,"\\\\")

What I want is to change single backslashes to double backslashes. The result of the above substitution is "no change"

On the other hand
"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
does do what I want ... but I am clueless as to why.
 
Reply With Quote
 
 
 
 
Ammar Ali
Guest
Posts: n/a
 
      11-20-2010
On Sun, Nov 21, 2010 at 12:13 AM, Ralph Shnelvar <(E-Mail Removed)> wrote:
> Consider the string
> =C2=A0\1\2\3
> that is
> =C2=A0"\\1\\2\\3"
>
> I feel really stupid ... but this simple substitution pattern does not do=

what I expect.
>
> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\")
>
> What I want is to change single backslashes to double backslashes. =C2=A0=

The result of the above substitution is "no change"
>
> On the other hand
> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
> does do what I want ... but I am clueless as to why.


Backslashes are tricky. What's happening here is each escaped
backslash "\\" yields one backslash, which affects (escapes) what
comes after it, in this case another escaped backslash that in turn
yields one back slash. In other words, four backslashes yield two
backslashes, which is an escaped backslash (i.e one backslash).

HTH,
Ammar

 
Reply With Quote
 
 
 
 
Ammar Ali
Guest
Posts: n/a
 
      11-20-2010
On Sun, Nov 21, 2010 at 12:34 AM, Ammar Ali <(E-Mail Removed)> wrote:
> On Sun, Nov 21, 2010 at 12:13 AM, Ralph Shnelvar <(E-Mail Removed)> wrote=

:
>> Consider the string
>> =C2=A0\1\2\3
>> that is
>> =C2=A0"\\1\\2\\3"
>>
>> I feel really stupid ... but this simple substitution pattern does not d=

o what I expect.
>>
>> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\")
>>
>> What I want is to change single backslashes to double backslashes. =C2=

=A0The result of the above substitution is "no change"
>>
>> On the other hand
>> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
>> does do what I want ... but I am clueless as to why.

>
> Backslashes are tricky. What's happening here is each escaped
> backslash "\\" yields one backslash, which affects (escapes) what
> comes after it, in this case another escaped backslash that in turn
> yields one back slash. In other words, four backslashes yield two
> backslashes, which is an escaped backslash (i.e one backslash).
>


I should have added that you can get the same result with 3
backslashes. So 6 of them will give you two.

>> "\\1\\2\\3".gsub(/\\/,"\\\\\\").scan /./

=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

Regards,
Ammar

 
Reply With Quote
 
botp
Guest
Posts: n/a
 
      11-21-2010
[Note: parts of this message were removed to make it a legal post.]

On Sun, Nov 21, 2010 at 6:13 AM, Ralph Shnelvar <(E-Mail Removed)> wrote:
> What I want is to change single backslashes to double backslashes. The

result of the above substitution is "no change"
>
> On the other hand
> "\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
> does do what I want ... but I am clueless as to why.


there are many ways,

#1
"\\1\\2\\3".gsub(/(\\)/,"\\1\\1").scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#2
"\\1\\2\\3".gsub(/(\\)/,'\1\1').scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#3
"\\1\\2\\3".gsub(/\\/){"\\\\"}.scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#4
"\\1\\2\\3".gsub(/(\\)/){$1+$1}.scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]


#1 & #2 samples uses group backreferences, ruby may need second parsing pass
for this feature to work...

#3 & #4 uses code blocks. may not need second pass. backreferences can be
had using $n notation.

best regards -botp

 
Reply With Quote
 
Ammar Ali
Guest
Posts: n/a
 
      11-21-2010
On Sun, Nov 21, 2010 at 11:57 AM, botp <(E-Mail Removed)> wrote:
> On Sun, Nov 21, 2010 at 6:13 AM, Ralph Shnelvar <(E-Mail Removed)> wrote:
>> What I want is to change single backslashes to double backslashes. =C2=

=A0The
> result of the above substitution is "no change"
>>
>> On the other hand
>> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
>> does do what I want ... but I am clueless as to why.

>
> there are many ways,
>
> #1
> "\\1\\2\\3".gsub(/(\\)/,"\\1\\1").scan /./
> #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
>
> #2
> "\\1\\2\\3".gsub(/(\\)/,'\1\1').scan /./
> #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
>
> #3
> "\\1\\2\\3".gsub(/\\/){"\\\\"}.scan /./
> #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
>
> #4
> "\\1\\2\\3".gsub(/(\\)/){$1+$1}.scan /./
> #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
>
>
> #1 & #2 samples uses group backreferences, ruby may need second parsing p=

ass
> for this feature to work...
>
> #3 & #4 uses code blocks. may not need second pass. backreferences can be
> had using $n notation.


botp's excellent suggestions reminded of another one:

>> "\\1\\2\\3".gsub(/\\/, '\&\&')

=3D> "\\\\1\\\\2\\\\3"

Regards,
Ammar

 
Reply With Quote
 
Brian Candler
Guest
Posts: n/a
 
      11-21-2010
Ralph Shnelvar wrote in post #962847:
> Consider the string
> \1\2\3
> that is
> "\\1\\2\\3"
>
> I feel really stupid ... but this simple substitution pattern does not
> do what I expect.
>
> "\\1\\2\\3".gsub(/\\/,"\\\\")


Here you are replacing one backslash with one backslash.

The trouble is, in the *replacement* string, '\1' has a special meaning
(insert the value of the first capture). Because of this, a literal
backslash is backslash-backslash.

So to replace with *two* backslashes you need
backslash-backslash-backslash-backslash. And inside a double or single
quoted string, a single backslash is represented as "\\" or '\\'

irb(main):001:0> "\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
=> "\\\\1\\\\2\\\\3"

The second level of backslashing isn't used with the block form, since
if you want to use captured subexpressions you can use #{$1} instead of
\1. Hence as an alternative:

irb(main):002:0> "\\1\\2\\3".gsub(/\\/) { "\\\\" }
=> "\\\\1\\\\2\\\\3"

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Ammar Ali
Guest
Posts: n/a
 
      11-21-2010
On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <(E-Mail Removed)> wrote=
:
> Ralph Shnelvar wrote in post #962847:
>> =C2=A0 "\\1\\2\\3".gsub(/\\/,"\\\\")

>
> Here you are replacing one backslash with one backslash.
>
> The trouble is, in the *replacement* string, '\1' has a special meaning
> (insert the value of the first capture). Because of this, a literal
> backslash is backslash-backslash.


That's a keen observation, but the fact that they happen to be
back-references doesn't seem to play a part in this situation.

>> "\\a\\b\\c".gsub(/\\/,"\\\\")

=3D> "\\a\\b\\c"
>> "\\a\\b\\c".gsub(/\\/,"\\\\\\")

=3D> "\\\\a\\\\b\\\\c"

Regards,
Ammar

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      11-22-2010
On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <(E-Mail Removed)> wrote:
> On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <(E-Mail Removed)> wro=

te:
>> Ralph Shnelvar wrote in post #962847:
>>> =A0 "\\1\\2\\3".gsub(/\\/,"\\\\")

>>
>> Here you are replacing one backslash with one backslash.
>>
>> The trouble is, in the *replacement* string, '\1' has a special meaning
>> (insert the value of the first capture). Because of this, a literal
>> backslash is backslash-backslash.

>
> That's a keen observation, but the fact that they happen to be
> back-references doesn't seem to play a part in this situation.
>
>>> "\\a\\b\\c".gsub(/\\/,"\\\\")

> =3D> "\\a\\b\\c"
>>> "\\a\\b\\c".gsub(/\\/,"\\\\\\")

> =3D> "\\\\a\\\\b\\\\c"


The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have to backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 =3D 4 backslashes on the screen for one literal
replacement backslash.

Additionally people are often confused by the fact that IRB by default
uses #inspect for showing expression values which will display twice
as much backslashes as are present in the string.

<grumpy>Can we please make a big red sticker and put it on every Ruby
installer and source tar to inform people of this and the local
variable method ambiguity. These two seem to be the issues that pop
up most of the time.</grumpy>

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

 
Reply With Quote
 
Ammar Ali
Guest
Posts: n/a
 
      11-22-2010
On Mon, Nov 22, 2010 at 10:38 AM, Robert Klemme
<(E-Mail Removed)> wrote:
> On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <(E-Mail Removed)> wrote=

:
>> On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <(E-Mail Removed)> wr=

ote:
>>> Ralph Shnelvar wrote in post #962847:
>>>> =C2=A0 "\\1\\2\\3".gsub(/\\/,"\\\\")
>>>
>>> Here you are replacing one backslash with one backslash.
>>>
>>> The trouble is, in the *replacement* string, '\1' has a special meaning
>>> (insert the value of the first capture). Because of this, a literal
>>> backslash is backslash-backslash.

>>
>> That's a keen observation, but the fact that they happen to be
>> back-references doesn't seem to play a part in this situation.
>>
>>>> "\\a\\b\\c".gsub(/\\/,"\\\\")

>> =3D> "\\a\\b\\c"
>>>> "\\a\\b\\c".gsub(/\\/,"\\\\\\")

>> =3D> "\\\\a\\\\b\\\\c"

>
> The key point to understand IMHO is that a backslash is special in
> replacement strings. =C2=A0So, whenever one wants to have a literal
> backslash in a replacement string one needs to escape it and hence
> have to backslashes. =C2=A0Coincidentally a backslash is also special in =

a
> string (even in a single quoted string). =C2=A0So you need two levels of
> escaping, makes 2 * 2 =3D 4 backslashes on the screen for one literal
> replacement backslash.


Actually, 3 backslashes will yield one backslash. The first two result
in one (escaped), and the third one, escaped by the previous escaped
backslash ends up being one. My second example showed this, using 6
backslashes instead of 8. Using 4 backslashes works because the second
pair yields and escaped backslash, but it is not necessary.

Regards,
Ammar

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      11-22-2010
On Mon, Nov 22, 2010 at 1:28 PM, Ammar Ali <(E-Mail Removed)> wrote:
> On Mon, Nov 22, 2010 at 10:38 AM, Robert Klemme
> <(E-Mail Removed)> wrote:
>> On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <(E-Mail Removed)> wrot=

e:
>>> On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <(E-Mail Removed)> w=

rote:
>>>> Ralph Shnelvar wrote in post #962847:
>>>>> =A0 "\\1\\2\\3".gsub(/\\/,"\\\\")
>>>>
>>>> Here you are replacing one backslash with one backslash.
>>>>
>>>> The trouble is, in the *replacement* string, '\1' has a special meanin=

g
>>>> (insert the value of the first capture). Because of this, a literal
>>>> backslash is backslash-backslash.
>>>
>>> That's a keen observation, but the fact that they happen to be
>>> back-references doesn't seem to play a part in this situation.
>>>
>>>>> "\\a\\b\\c".gsub(/\\/,"\\\\")
>>> =3D> "\\a\\b\\c"
>>>>> "\\a\\b\\c".gsub(/\\/,"\\\\\\")
>>> =3D> "\\\\a\\\\b\\\\c"

>>
>> The key point to understand IMHO is that a backslash is special in
>> replacement strings. =A0So, whenever one wants to have a literal
>> backslash in a replacement string one needs to escape it and hence
>> have to backslashes. =A0Coincidentally a backslash is also special in a
>> string (even in a single quoted string). =A0So you need two levels of
>> escaping, makes 2 * 2 =3D 4 backslashes on the screen for one literal
>> replacement backslash.

>
> Actually, 3 backslashes will yield one backslash. The first two result
> in one (escaped), and the third one, escaped by the previous escaped
> backslash ends up being one. My second example showed this, using 6
> backslashes instead of 8. Using 4 backslashes works because the second
> pair yields and escaped backslash, but it is not necessary.


That does not work reliably under all circumstances though:

irb(main):006:0> "abc".gsub /./, "\\\n"
=3D> "\\\n\\\n\\\n"
irb(main):007:0> puts("abc".gsub /./, "\\\n")
\
\
\
=3D> nil
irb(main):008:0> "abc".gsub /./, "\\\\n"
=3D> "\\n\\n\\n"
irb(main):009:0> puts("abc".gsub /./, "\\\\n")
\n\n\n
=3D> nil

It is safer to use 4 backslashes. This is the only robust way to do
this even though sometimes you can simply use a single backslash (e.g.
\1 instead of \\1) because string parsing is a bit tolerant under some
circumstances:

irb(main):014:0> '\1'
=3D> "\\1"
irb(main):015:0> '\\1'
=3D> "\\1"

but

irb(main):019:0> "\n"
=3D> "\n"
irb(main):020:0> "\\n"
=3D> "\\n"
irb(main):021:0> "\1"
=3D> "\x01"
irb(main):022:0> "\\1"
=3D> "\\1"


Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
gsub and backslashes John Wright Ruby 4 01-21-2007 08:02 PM
GridView and Backslashes =?Utf-8?B?TWF0dCBIYW1pbHRvbg==?= ASP .Net 0 05-02-2006 12:21 PM
Replacing single quotes and backslashes in strings Phil Rhoades Ruby 3 12-20-2005 11:26 AM
gsub and gsub! are inconsistent aurelianito Ruby 9 11-09-2005 01:38 PM
SQLite/ActiveRecord and Backslashes David Naseby Ruby 0 10-05-2004 12:33 PM



Advertisments