Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Problems With Accented Characters

Reply
Thread Tools

Problems With Accented Characters

 
 
Fuzzyman
Guest
Posts: n/a
 
      02-22-2004
I've written an anagram finder that produces anagrams from a
dictionary of words. The user can load their own dictionary.

( http://www.voidspace.org.uk/atlantibots/nanagram.html )

In order to ensure it is able to find anagrams properly I wanted to
strip characters like punctuation etc from words in the dictionary and
words the user entered. I test(ed) against the 26 English letters (
string.ascii_lowercase ).

I now have someone who wants to use a French dictionary - with words
containing accented characters !! I have two choices - either map the
accented characters to their unaccented equivalent (slightly
innacurate) or treat the accented charcters as a separate letter (very
few anagrams). However - at the moment I can't experiment with either
because my default codec is the 7-bit ascii and crashes (sometimes !!)
when using the accented characters.

Has anyone any advice - or can point me to any resources - for
effectively handling these characters. I guess it's a latin-1 encoding
I want to use... I can't even work out how to cahnge the default
codec........

Thanks,

Fuzzy

http://www.voidspace.org.uk/atlantib...thonutils.html
 
Reply With Quote
 
 
 
 
Fuzzyman
Guest
Posts: n/a
 
      02-23-2004
(Fuzzyman) wrote in message news:<. com>...
> I've written an anagram finder that produces anagrams from a
> dictionary of words. The user can load their own dictionary.
>
> ( http://www.voidspace.org.uk/atlantibots/nanagram.html )
>
> In order to ensure it is able to find anagrams properly I wanted to
> strip characters like punctuation etc from words in the dictionary and
> words the user entered. I test(ed) against the 26 English letters (
> string.ascii_lowercase ).
>
> I now have someone who wants to use a French dictionary - with words
> containing accented characters !! I have two choices - either map the
> accented characters to their unaccented equivalent (slightly
> innacurate) or treat the accented charcters as a separate letter (very
> few anagrams). However - at the moment I can't experiment with either
> because my default codec is the 7-bit ascii and crashes (sometimes !!)
> when using the accented characters.
>



It's particularly difficult for me to understand what is happening -
because python's behaviour *seems* intermittent.

For example - if I run my program from IDLE and give it the word
'degré' (containing e-acute) then I get the error :

Exception in Tkinter callback
Traceback (most recent call last):
[snip..]
File "D:\Python Projects\Nanagram1.3\Nanagram-GUI.pyw", line 123, in
prepare
if letter in self.valid_letters:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
26: ordinal not in range(12
Traceback (most recent call last):

It is testing each character of the users input to remove invalid
characters (like "-" and "'")... It crashes when it comes tot he
e-acute.


*However* - If I run it by double clicking on the file then it appears
to work fine (e.g. if I ask it find anagrams of 'degré hello ma' then
it strips out the e-acute (thinking it's an invalid character) and
finds anagrams of the rest :

gleam holder
hallo merged

What I'd like to do is switch by default to an 8 bit codec (latin-1 I
think ?????) and then offer the user the choice of either mapping the
accented characters to their nearest equivalent (e-acute to e for
example) *or* treating them as seperate characters.............


Anyone able to help ??



Fuzzy



> Has anyone any advice - or can point me to any resources - for
> effectively handling these characters. I guess it's a latin-1 encoding
> I want to use... I can't even work out how to cahnge the default
> codec........
>
> Thanks,
>
> Fuzzy
>
> http://www.voidspace.org.uk/atlantib...thonutils.html

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Dealing with accented characters Mark Drummond Perl 0 05-31-2006 01:38 PM
Text search with accented characters Mickey Segal Java 3 12-16-2005 03:34 AM
literal accented characters in python asp page with Microsoft IIS nicolas_riesch Python 2 08-23-2005 03:19 PM
accented characters Davide Benini XML 4 06-01-2005 03:06 PM
Help with windows clipboard and accented characters Stephen Boulet Python 3 07-16-2004 03:45 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57