Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > converting octal strings to unicode

Reply
Thread Tools

converting octal strings to unicode

 
 
flamingivanova@gmail.com
Guest
Posts: n/a
 
      12-24-2004
I have several ascii files that contain '\ooo' strings which represent
the octal value for a character. I want to convert these files to
unicode, and I came up with the following script. But it seems to me
that there must be a much simpler way to do it. Could someone more
experienced suggest some improvements?

I want to convert a file eg. containing:

hello \326du

with the unicode file containing:

hello Ödu


----------8<---------------------------------------
#!/usr/bin/python

import re, string, sys

if len(sys.argv) > 1:
file = open(sys.argv[1],'r')
lines = file.readlines()
file.close()
else:
print "give a filename"
sys.exit()

def to_unichr(str):
oct = string.atoi(str.group(1),
return unichr(oct)

for line in lines:
line = string.rstrip(unicode(line,'Latin-1'))
if re.compile(r'\\\d\d\d').search(line):
line = re.sub(r'\\(\d\d\d)', to_unichr, line)
line = line.encode('utf-8')
print line

----------8<---------------------------------------

 
Reply With Quote
 
 
 
 
Christos TZOTZIOY Georgiou
Guest
Posts: n/a
 
      12-24-2004
On 23 Dec 2004 18:41:57 -0800, rumours say that
might have written:

>I have several ascii files that contain '\ooo' strings which represent
>the octal value for a character. I want to convert these files to
>unicode, and I came up with the following script. But it seems to me
>that there must be a much simpler way to do it. Could someone more
>experienced suggest some improvements?


decoded_string = "\326du".decode("string_escape")
unicode_text = unicode(decoded_string, "latin-1")
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC195
I really should keep that in mind when talking with people, actually...
 
Reply With Quote
 
 
 
 
Christos TZOTZIOY Georgiou
Guest
Posts: n/a
 
      12-24-2004
On 23 Dec 2004 18:41:57 -0800, rumours say that
might have written:

>I have several ascii files that contain '\ooo' strings which represent
>the octal value for a character. I want to convert these files to
>unicode, and I came up with the following script. But it seems to me
>that there must be a much simpler way to do it. Could someone more
>experienced suggest some improvements?


(hope I cancelled the previous off-by-one-backslash post...)

your_string = "\\326du"
decoded_string = your_string.decode("string_escape")
unicode_text = unicode(decoded_string, "latin-1")
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC195
I really should keep that in mind when talking with people, actually...
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
compare unicode to non-unicode strings Asterix Python 5 08-31-2008 07:31 PM
converting to and from octal escaped UTF--8 Michael Goerz Python 9 12-04-2007 01:40 PM
Converting negative integer to octal/hexadecimal jaks.maths@gmail.com C Programming 15 06-23-2006 12:06 PM
tr octal strings Graham Nicholls Ruby 2 08-23-2004 02:40 PM
converting characters to octal Hostos Java 7 10-15-2003 06:07 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57