Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > question about speed of sequential string replacement vs regex or

Reply
Thread Tools

question about speed of sequential string replacement vs regex or

 
 
Xah Lee
Guest
Posts: n/a
 
      09-28-2011
curious question.

suppose you have 300 different strings and they need all be replaced
to say "aaa".

is it faster to replace each one sequentially (i.e. replace first
string to aaa, then do the 2nd, 3rd,...)
, or is it faster to use a regex with “or” them all and do replace one
shot? (i.e. "1ststr|2ndstr|3rdstr|..." -> aaa)

let's say the sourceString this replacement to be done on is 500k
chars.

Anyone? i suppose the answer will be similar for perl, python, ruby.

btw, the origin of this question is about writing a emacs lisp
function that replace ~250 html named entities to unicode char.

Xah
 
Reply With Quote
 
 
 
 
Chris Angelico
Guest
Posts: n/a
 
      09-28-2011
On Wed, Sep 28, 2011 at 7:28 PM, Xah Lee <(E-Mail Removed)> wrote:
> suppose you have 300 different strings and they need all be replaced
> to say "aaa".
>
> is it faster to replace each one sequentially


Before you ask "is it faster", you need to first be sure it's correct.
I would recommend doing all the replaces in a single function call to
avoid risk of overlaps or other issues causing problems.

ChrisA
 
Reply With Quote
 
 
 
 
Randal L. Schwartz
Guest
Posts: n/a
 
      09-28-2011
>>>>> "Xah" == Xah Lee <(E-Mail Removed)> writes:

Xah> curious question.
Xah> suppose you have 300 different strings and they need all be replaced
Xah> to say "aaa".

And then suppose this isn't the *real* question, but one entirely of
Fiction by Xah Lee.

How helpful do you want to be?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<(E-Mail Removed)> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion
 
Reply With Quote
 
Xah Lee
Guest
Posts: n/a
 
      09-28-2011
On Sep 28, 3:57*am, (E-Mail Removed) (Randal L. Schwartz) wrote:
> >>>>> "Xah" == Xah Lee <(E-Mail Removed)> writes:

>
> Xah> curious question.
> Xah> suppose you have 300 different strings and they need all be replaced
> Xah> to say "aaa".
>
> And then suppose this isn't the *real* question, but one entirely of
> Fiction by Xah Lee.
>
> How helpful do you want to be?


it's a interesting question anyway.

the question originally came from when i was coding elisp of a
function that changes html entities to unicode char literal. The
problem is slightly complicated, involving a few questions about speed
in emacs. e.g. string vs buffer, and much more... i spent several
hours on this but it's probably too boring to detail (but i'll do so
if anyone wishes). But anyway, while digging these questions that's
not clear in my mind, i thought of why not generate a regex or
construct and do it in one shot, and wondered if that'd be faster. But
afterwards, i realized this wouldn't be applicable to my problem
because for my problem each string needs to be changed to a unique
string, not all to the same string.

Xah
 
Reply With Quote
 
Xah Lee
Guest
Posts: n/a
 
      09-28-2011
here's more detail about the origin of this problem. Relevant to emacs
lisp only.

------------------------------

in the definition of “replace-regexp-in-string”, there's this comment:

;; To avoid excessive consing from multiple matches in long strings,
;; don't just call `replace-match' continually. Walk down the
;; string looking for matches of REGEXP and building up a (reversed)
;; list MATCHES. This comprises segments of STRING which weren't
;; matched interspersed with replacements for segments that were.
;; [For a `large' number of replacements it's more efficient to
;; operate in a temporary buffer; we can't tell from the function's
;; args whether to choose the buffer-based implementation, though it
;; might be reasonable to do so for long enough STRING.]

note: «For a `large' number of replacements it's more efficient to
operate in a temporary buffer».

my question is, anyone got some idea, how “large” is large?

currently, i have a function replace-pairs-in-string which is
implemented by repeatedly calling “replace-pairs-in-string” like this:

(while (< ii (length pairs))
(setq mystr (replace-regexp-in-string
(elt tempMapPoints ii)
(elt (elt pairs ii) 1)
mystr t t))
(setq ii (1+ ii))
)

When there are 260 pairs of strings to be replaced on a file that's
26k in size, my function takes about 3 seconds (which i think is too
slow). I'm at pain deciding whether my function should be implemented
like this or whether it should create a temp buffer. The problem with
temp buffer is that, if you repeatedly call it, the overhead of
creating buffer is going to make it much slower.

i was actually surprised that replace-regexp-in-string isn't written
in C, which i thought it was.

is there technical reason the replace-regexp-in-string isn't C? (i
suppose only because nobody every did it and the need for speed didn't
suffice?) and also, shouldn't there also be a replace-in-string
(literal, not regex)? because i thought implementing replacement for
string should be much simpler and faster, because buffers comes with
it a whole structure such as “point”, text properties, buffer names,
buffier modifier, etc.

Xah

On Sep 28, 5:28*am, Xah Lee <(E-Mail Removed)> wrote:
> On Sep 28, 3:57*am, (E-Mail Removed) (Randal L. Schwartz) wrote:
>
> > >>>>> "Xah" == Xah Lee <(E-Mail Removed)> writes:

>
> > Xah> curious question.
> > Xah> suppose you have 300 different strings and they need all be replaced
> > Xah> to say "aaa".

>
> > And then suppose this isn't the *real* question, but one entirely of
> > Fiction by Xah Lee.

>
> > How helpful do you want to be?

>
> it's a interesting question anyway.
>
> the question originally came from when i was coding elisp of a
> function that changes html entities to unicode char literal. The
> problem is slightly complicated, involving a few questions about speed
> in emacs. e.g. string vs buffer, and much more... i spent several
> hours on this but it's probably too boring to detail (but i'll do so
> if anyone wishes). But anyway, while digging these questions that's
> not clear in my mind, i thought of why not generate a regex or
> construct and do it in one shot, and wondered if that'd be faster. But
> afterwards, i realized this wouldn't be applicable to my problem
> because for my problem each string needs to be changed to a unique
> string, not all to the same string.
>
> *Xah


 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      09-28-2011
On Wed, Sep 28, 2011 at 10:28 PM, Xah Lee <(E-Mail Removed)> wrote:
> each string needs to be changed to a unique
> string, not all to the same string.


And you'll find that this is, by and large, the most normal situation.
Folding many strings down to one string is a lot less common. So,
let's have some real-world use cases and then we can talk
recommendations.

ChrisA
 
Reply With Quote
 
Neil Cerutti
Guest
Posts: n/a
 
      09-28-2011
On 2011-09-28, Chris Angelico <(E-Mail Removed)> wrote:
> On Wed, Sep 28, 2011 at 10:28 PM, Xah Lee <(E-Mail Removed)> wrote:
>> each string needs to be changed to a unique string, not all to
>> the same string.

>
> And you'll find that this is, by and large, the most normal
> situation. Folding many strings down to one string is a lot
> less common. So, let's have some real-world use cases and then
> we can talk recommendations.


I'd like to know what "string replacement" is supposed to mean in
the context of Python.

--
Neil Cerutti
"A politician is an arse upon which everyone has sat except a man."
e. e. cummings
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      09-28-2011
On Wed, Sep 28, 2011 at 11:14 PM, Xah Lee <(E-Mail Removed)> wrote:
> (while (< ii (length pairs))
> * * *(setq mystr (replace-regexp-in-string
> * * * * * * * * * (elt tempMapPoints ii)
> * * * * * * * * * (elt (elt pairs ii) 1)
> * * * * * * * * * mystr t t))
> * * *(setq ii (1+ ii))
> * * *)


from __future__ import lisp_syntax

There, that makes it on-topic for the Python list... I guess.

ChrisA
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      09-28-2011
In article <(E-Mail Removed)>,
Neil Cerutti <(E-Mail Removed)> wrote:

> On 2011-09-28, Chris Angelico <(E-Mail Removed)> wrote:
> > On Wed, Sep 28, 2011 at 10:28 PM, Xah Lee <(E-Mail Removed)> wrote:
> >> each string needs to be changed to a unique string, not all to
> >> the same string.

> >
> > And you'll find that this is, by and large, the most normal
> > situation. Folding many strings down to one string is a lot
> > less common. So, let's have some real-world use cases and then
> > we can talk recommendations.

>
> I'd like to know what "string replacement" is supposed to mean in
> the context of Python.


You just need to use "string" in the more generic computer-sciency
sense, not in the python-data-type sense.

s = "I am an immutable string"
l = list(s) # now you can pretend strings are mutable
do_stuff_to_string(l)
s = ''.join(l)
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      09-28-2011
On Wed, Sep 28, 2011 at 11:22 PM, Neil Cerutti <(E-Mail Removed)> wrote:
> I'd like to know what "string replacement" is supposed to mean in
> the context of Python.
>


Substring replacement, such as:
>>> "Hello, world!".replace(", "," -- ")

'Hello -- world!'

The str method doesn't accept a list, though, so it won't do
simultaneous replaces. (At least, I don't think it can. Tried it only
in Python 3.2 on Windows.)

ChrisA
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
question about speed of sequential string replacement vs regex or Xah Lee Perl Misc 6 09-28-2011 08:14 PM
regex =~ string or string =~ regex? Ruby Newbee Ruby 3 01-04-2010 06:04 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
String.replaceAll(String regex, String replacement) question Mladen Adamovic Java 3 12-05-2003 04:20 PM
Re: String.replaceAll(String regex, String replacement) question Mladen Adamovic Java 0 12-04-2003 04:40 PM



Advertisments