Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Fast implementation of XML escape

Reply
Thread Tools

Fast implementation of XML escape

 
 
Bob Hutchison
Guest
Posts: n/a
 
      08-03-2007
Hi,

Does anyone know of a fast implementation of the XML escape method
(the one that converts '"<>& to &quot; etc.)?

I did some benchmarking on one of my applications and the
implementation I have, which I thought was okay -- simple minded for
sure, but okay -- turns out to be a bottle neck in certain operations.

I used ruby-prof with a simple test, running over a 400 character
string 50,000 times or so. Running the profiler on version0 (below)
took 1.39 seconds.

def version0(input)
# all kinds of other processing of input simulated by the input.dup
result = input.dup

return result
end

The original simple minded way was, under ruby-prof ran in 8.74 seconds:

def version1(input)
# all kinds of other processing of input simulated by the input.dup
result = input.dup

result.gsub!("&", "&amp;")
result.gsub!("<", "&lt;")
result.gsub!(">", "&gt;")
result.gsub!("'", "&apos;")
result.gsub!("\"", "&quot;")

return result
end

The best I've come up with so far is, under ruby-prof ran in 3.33:

def version2(input)
# all kinds of other processing of input simulated by the input.dup
result = input.dup

result.gsub!(/[&<>'"]/) do | match |
case match
when '&' then return '&amp;'
when '<' then return '&lt;'
when '>' then return '&gt;'
when "'" then return '&apos;'
when '"' then return '&quote;'
end
end

return result
end

After accounting for overhead, 3.8 times faster is good, I'd like it
faster still. BTW, gsub is only marginally slower that gsub! I've
tried using simple iteration, gsub with a hash to avoid the case, and
variations, all slower to a lot slower than version 1, nothing really
near version2 (which really was the first variation I tried).

Any ideas?

Cheers,
Bob


----
Bob Hutchison -- tumblelog at http://
www.recursive.ca/so/
Recursive Design Inc. -- weblog at http://www.recursive.ca/
hutch
http://www.recursive.ca/ -- works on http://www.raconteur.info/
cms-for-static-content/home/




 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      08-03-2007
2007/8/3, Bob Hutchison <(E-Mail Removed)>:
> Hi,
>
> Does anyone know of a fast implementation of the XML escape method
> (the one that converts '"<>& to &quot; etc.)?
>
> I did some benchmarking on one of my applications and the
> implementation I have, which I thought was okay -- simple minded for
> sure, but okay -- turns out to be a bottle neck in certain operations.
>
> I used ruby-prof with a simple test, running over a 400 character
> string 50,000 times or so. Running the profiler on version0 (below)
> took 1.39 seconds.
>
> def version0(input)
> # all kinds of other processing of input simulated by the input.dup
> result = input.dup
>
> return result
> end
>
> The original simple minded way was, under ruby-prof ran in 8.74 seconds:
>
> def version1(input)
> # all kinds of other processing of input simulated by the input.dup
> result = input.dup
>
> result.gsub!("&", "&amp;")
> result.gsub!("<", "&lt;")
> result.gsub!(">", "&gt;")
> result.gsub!("'", "&apos;")
> result.gsub!("\"", "&quot;")
>
> return result
> end
>
> The best I've come up with so far is, under ruby-prof ran in 3.33:
>
> def version2(input)
> # all kinds of other processing of input simulated by the input.dup
> result = input.dup
>
> result.gsub!(/[&<>'"]/) do | match |
> case match
> when '&' then return '&amp;'
> when '<' then return '&lt;'
> when '>' then return '&gt;'
> when "'" then return '&apos;'
> when '"' then return '&quote;'
> end
> end
>
> return result
> end
>
> After accounting for overhead, 3.8 times faster is good, I'd like it
> faster still. BTW, gsub is only marginally slower that gsub! I've
> tried using simple iteration, gsub with a hash to avoid the case, and
> variations, all slower to a lot slower than version 1, nothing really
> near version2 (which really was the first variation I tried).
>
> Any ideas?


You are on the right track. There is just one thing to improve: get
rid of "case":

class Converter
MAP = {
"&" => "&amp;",
# ...
}

def self.convert(s)
s.gsub(/[&<>'"]/) do |m|
MAP[m] || "ERROR"
end
end
end

Also, I believe x.dup.gsub! is less efficient than doing just a single x.gsub.

Kind regards

robert

 
Reply With Quote
 
 
 
 
Bob Hutchison
Guest
Posts: n/a
 
      08-03-2007


Sorry, There are some corrections to this post. I made two STUPID
errors, one a cut and paste error into the message, and another in
recording my test results.

I apologise again.

Here is the real situation:

------

Does anyone know of a fast implementation of the XML escape method
(the one that converts '"<>& to &quot; etc.)?

I did some benchmarking on one of my applications and the
implementation I have, which I thought was okay -- simple minded for
sure, but okay -- turns out to be a bottle neck in certain operations.

I used ruby-prof with a simple test, running over a 400 character
string 50,000 times or so. Running the profiler on version0 (below)
took 1.39 seconds.

def version0(input)
# all kinds of other processing of input simulated by the input.dup
result = input.dup

return result
end

The original simple minded way was, under ruby-prof ran in 8.74 seconds:

def version1(input)
# all kinds of other processing of input simulated by the input.dup
result = input.dup

result.gsub!("&", "&amp;")
result.gsub!("<", "&lt;")
result.gsub!(">", "&gt;")
result.gsub!("'", "&apos;")
result.gsub!("\"", "&quot;")

return result
end

[ the version2 that I originally posted was a cut and paste error
from an old file (too many windows open), there should not have been
those returns in the gsub. Then to compound things, I had made a typo
in the test case and the loop was not run as often, so rather than
taking 3.33 seconds it in fact took 105 seconds ]

The simple-minded approach is the fastest I've come up with.

Any any better ideas?

Cheers,
Bob


----
Bob Hutchison -- tumblelog at http://
www.recursive.ca/so/
Recursive Design Inc. -- weblog at http://www.recursive.ca/
hutch
http://www.recursive.ca/ -- works on http://www.raconteur.info/
cms-for-static-content/home/





----
Bob Hutchison -- tumblelog at http://
www.recursive.ca/so/
Recursive Design Inc. -- weblog at http://www.recursive.ca/
hutch
http://www.recursive.ca/ -- works on http://www.raconteur.info/
cms-for-static-content/home/




 
Reply With Quote
 
Bob Hutchison
Guest
Posts: n/a
 
      08-03-2007
Hi Robert,

Thanks for the response.


On 3-Aug-07, at 8:47 AM, Robert Klemme wrote:
>
> You are on the right track. There is just one thing to improve: get
> rid of "case":
>
> class Converter
> MAP = {
> "&" => "&amp;",
> # ...
> }
>
> def self.convert(s)
> s.gsub(/[&<>'"]/) do |m|
> MAP[m] || "ERROR"
> end
> end
> end



My version3 was:

@@convert = {
'&' => '&amp;',
'<' => '&lt;',
'>' => '&gt;',
"'" => '&apos;',
'"' => '&quot;'
}

def version3(input)
# all kinds of other processing of input simulated by the input.dup
result = input.dup

result.gsub!(/[&<>'"]/) do | match |
@@convert[match]
end

return result
end

which is pretty much what you suggested, but it takes 29.71 seconds
(rather than 8.74 seconds).

>
> Also, I believe x.dup.gsub! is less efficient than doing just a
> single x.gsub.


That dup is just so I could 1) simulate some extra work that I had to
do; and 2) make sure I didn't permanently change the test string with
the gsub!.

Thanks!

Cheers,
Bob

>
> Kind regards
>
> robert
>


----
Bob Hutchison -- tumblelog at http://
www.recursive.ca/so/
Recursive Design Inc. -- weblog at http://www.recursive.ca/
hutch
http://www.recursive.ca/ -- works on http://www.raconteur.info/
cms-for-static-content/home/




 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      08-03-2007
2007/8/3, Bob Hutchison <(E-Mail Removed)>:
> Hi Robert,
>
> Thanks for the response.
>
>
> On 3-Aug-07, at 8:47 AM, Robert Klemme wrote:
> >
> > You are on the right track. There is just one thing to improve: get
> > rid of "case":
> >
> > class Converter
> > MAP = {
> > "&" => "&amp;",
> > # ...
> > }
> >
> > def self.convert(s)
> > s.gsub(/[&<>'"]/) do |m|
> > MAP[m] || "ERROR"
> > end
> > end
> > end

>
>
> My version3 was:
>
> @@convert = {
> '&' => '&amp;',
> '<' => '&lt;',
> '>' => '&gt;',
> "'" => '&apos;',
> '"' => '&quot;'
> }
>
> def version3(input)
> # all kinds of other processing of input simulated by the input.dup
> result = input.dup
>
> result.gsub!(/[&<>'"]/) do | match |
> @@convert[match]
> end
>
> return result
> end
>
> which is pretty much what you suggested, but it takes 29.71 seconds
> (rather than 8.74 seconds).


The overhead of the block form is significant. It probably pays off
when you have longer strings. That depends.

Kind regards

robert

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to read strings cantaining escape character from a file and useit as escape sequences? slomo Python 5 12-02-2007 11:39 AM
More memory: How fast is fast rfdjr1@optonline.net Computer Support 5 05-19-2004 05:45 PM
Canon S30 Fast shutter mode... Why so fast? mark popp Digital Photography 1 02-08-2004 10:07 PM
I NEED HELP FAST!!!!! REAL FAST!!!!! R. Jizzle MCSE 3 09-29-2003 08:51 PM
Super-fast AA Chargers: Anything as fast as the 15 minute Rayovac? David Chien Digital Photography 4 08-30-2003 07:49 PM



Advertisments