Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Fastest way to detect a non-ASCII character in a list of strings.

Reply
Thread Tools

Fastest way to detect a non-ASCII character in a list of strings.

 
 
Dun Peal
Guest
Posts: n/a
 
      10-17-2010
`all_ascii(L)` is a function that accepts a list of strings L, and
returns True if all of those strings contain only ASCII chars, False
otherwise.

What's the fastest way to implement `all_ascii(L)`?

My ideas so far are:

1. Match against a regexp with a character range: `[ -~]`
2. Use s.decode('ascii')
3. `return all(31< ord(c) < 127 for s in L for c in s)`

Any other ideas? Which one do you think will be fastest?

Will reply with final benchmarks and implementations if there's any interest.

Thanks, D
 
Reply With Quote
 
 
 
 
Seebs
Guest
Posts: n/a
 
      10-17-2010
On 2010-10-17, Dun Peal <(E-Mail Removed)> wrote:
> What's the fastest way to implement `all_ascii(L)`?


Start by defining it.

> 1. Match against a regexp with a character range: `[ -~]`


What about tabs and newlines? For that matter, what about DEL and
BEL? Seems to me that the entire 0-127 range are "ASCII characters".
Perhaps you mean "printable"?

> Any other ideas? Which one do you think will be fastest?


I'd guess that a suitable regex (and see whether there's an
existing character class that already has the right semantics) will
be by far the fastest. Just anchor it on both ends and nothing will
have to do any fancy evaluation to test it.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
 
Reply With Quote
 
 
 
 
Carl Banks
Guest
Posts: n/a
 
      10-18-2010
On Oct 17, 12:59*pm, Dun Peal <(E-Mail Removed)> wrote:
> `all_ascii(L)` is a function that accepts a list of strings L, and
> returns True if all of those strings contain only ASCII chars, False
> otherwise.
>
> What's the fastest way to implement `all_ascii(L)`?
>
> My ideas so far are:
>
> 1. Match against a regexp with a character range: `[ -~]`
> 2. Use s.decode('ascii')
> 3. `return all(31< ord(c) < 127 for s in L for c in s)`
>
> Any other ideas? *Which one do you think will be fastest?


If you do numpy the fastest way might be something like:

ns = np.ndarray(len(s),np.uint8,s)
return np.all(np.logical_and(ns>=32,ns<=127))


Carl Banks
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: fastest way to detect a user type Steven D'Aprano Python 3 02-01-2009 04:02 PM
Fastest way to convert sql result into a dict or list ? rewonka@gmail.com Python 4 10-30-2008 08:24 AM
fastest way to remove this strange character Junkone Ruby 2 12-26-2007 11:33 PM
Fastest way to convert a byte of integer into a list Godzilla Python 15 07-15-2007 02:37 AM
Fastest 5 mp Digital Camera ? Fastest 4 mp Digital Camera? photoguysept102004@yahoo.com Digital Photography 6 10-28-2004 11:33 AM



Advertisments