Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Problem reading/writing files

Reply
Thread Tools

Problem reading/writing files

 
 
smeenehan@hmc.edu
Guest
Posts: n/a
 
      08-04-2006
This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20. When printed as strings, these two values look the same (null and
space, respectively), but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)? Thanks in advance!

 
Reply With Quote
 
 
 
 
faulkner
Guest
Posts: n/a
 
      08-04-2006
have you been using text mode?

http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> This is a bit of a peculiar problem. First off, this relates to Python
> Challenge #12, so if you are attempting those and have yet to finish
> #12, as there are potential spoilers here.
>
> I have five different image files shuffled up in one big binary file.
> In order to view them I have to "unshuffle" the data, which means
> moving bytes around. Currently my approach is to read the data from the
> original, unshuffle as necessary, and then write to 5 different files
> (2 .jpgs, 2 .pngs and 1 .gif).
>
> The problem is with the read() method. If I read a byte valued as 0x00
> (in hexadecimal), the read method returns a character with the value
> 0x20. When printed as strings, these two values look the same (null and
> space, respectively), but obviously this screws with the data and makes
> the resulting image file unreadable. I can add a simple if statement to
> correct this, which seems to make the .jpgs readable, but the .pngs
> still have errors and the .gif is corrupted, which makes me wonder if
> the read method is not doing this to other bytes as well.
>
> Now, the *really* peculiar thing is that I made a simple little file
> and used my hex editor to manually change the first byte to 0x00. When
> I read that byte with the read() method, it returned the correct value,
> which boggles me.
>
> Anyone have any idea what could be going on? Alternatively, is there a
> better way to shift about bytes in a non-text file without using the
> read() method (since returning the byte as a string seems to be what's
> causing the issue)? Thanks in advance!


 
Reply With Quote
 
 
 
 
Simon Forman
Guest
Posts: n/a
 
      08-04-2006
(E-Mail Removed) wrote:
> This is a bit of a peculiar problem. First off, this relates to Python
> Challenge #12, so if you are attempting those and have yet to finish
> #12, as there are potential spoilers here.
>
> I have five different image files shuffled up in one big binary file.
> In order to view them I have to "unshuffle" the data, which means
> moving bytes around. Currently my approach is to read the data from the
> original, unshuffle as necessary, and then write to 5 different files
> (2 .jpgs, 2 .pngs and 1 .gif).
>
> The problem is with the read() method. If I read a byte valued as 0x00
> (in hexadecimal), the read method returns a character with the value
> 0x20.


No. It doesn't.

Ok, maybe it does, but I doubt this so severely that, without even
checking, I'll bet you a [virtual] beer it doesn't.

Are you opening the file in binary mode?


Ok, I did check, it doesn't.

|>> s = '\0'
|>> len(s)
1
|>> print s
\x00
|>> f = open('noway', 'wb')
|>> f.write(s)
|>> f.close()

Checking that the file is a length 1 null byte:

$ hexdump noway
0000000 0000
0000001
$ ls -l noway
-rw-r--r-- 1 sforman sforman 1 2006-08-03 23:40 noway

Now let's read it and see...

|>> f = open('noway', 'rb')
|>> s = f.read()
|>> f.close()
|>> len(s)
1
|>> print s
\x00

The problem is not with the read() method. Or, if it is, something
very very weird is going on.

If you can do the above and not get the same results I'd be interested
to know what file data you have, what OS you're using.

Peace,
~Simon

(Think about this: More people than you have tried the challenge, if
this happened to them they'd have mentioned it too, and it would have
fixed or at least addressed by now. Maybe.)

(Hmm, or maybe this is *part* of the challenge?)

 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      08-04-2006

(E-Mail Removed) wrote:
> This is a bit of a peculiar problem. First off, this relates to Python
> Challenge #12, so if you are attempting those and have yet to finish
> #12, as there are potential spoilers here.
>
> I have five different image files shuffled up in one big binary file.
> In order to view them I have to "unshuffle" the data, which means
> moving bytes around. Currently my approach is to read the data from the
> original, unshuffle as necessary, and then write to 5 different files
> (2 .jpgs, 2 .pngs and 1 .gif).
>
> The problem is with the read() method. If I read a byte valued as 0x00
> (in hexadecimal), the read method returns a character with the value
> 0x20.


I doubt it. What platform? What version of Python? Have you opened the
file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
parts of your code, plus what caused you to conclude that read()
changed data on the fly in an undocumented fashion.

> When printed as strings, these two values look the same (null and
> space, respectively),


Use the repr() function when you want to see what's *really* in an
object:

#>>> hasnul = 'a\x00b'
#>>> hasspace = 'a\x20b'
#>>> print hasnul, hasspace
a b a b
#>>> print repr(hasnul), repr(hasspace)
'a\x00b' 'a b'
#>>>


> but obviously this screws with the data and makes
> the resulting image file unreadable. I can add a simple if statement to
> correct this, which seems to make the .jpgs readable, but the .pngs
> still have errors and the .gif is corrupted, which makes me wonder if
> the read method is not doing this to other bytes as well.
>
> Now, the *really* peculiar thing is that I made a simple little file
> and used my hex editor to manually change the first byte to 0x00. When
> I read that byte with the read() method, it returned the correct value,
> which boggles me.
>
> Anyone have any idea what could be going on? Alternatively, is there a
> better way to shift about bytes in a non-text file without using the
> read() method (since returning the byte as a string seems to be what's
> causing the issue)?


"seems to be" != "is"

There is no simple better way. We need to establish what you are
actually doing to cause this problem to seem to happen. Kindly answer
the questions above

Cheers,
John

 
Reply With Quote
 
smeenehan@hmc.edu
Guest
Posts: n/a
 
      08-04-2006
> What platform? What version of Python? Have you opened the
> file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
> parts of your code, plus what caused you to conclude that read()
> changed data on the fly in an undocumented fashion.


Yes, I've been reading and writing everything in binary mode. I'm using
version 2.4 on a Windows XP machine.

Here is the code that I have been using to split up the original file:

f = open('evil2.gfx','rb')
i1 = open('img1.jpg','wb')
i2 = open('img2.png','wb')
i3 = open('img3.gif','wb')
i4 = open('img4.png','wb')
i5 = open('img5.jpg','wb')


for i in range(0,67575,5):
i1.write(f.read(1))
i2.write(f.read(1))
i3.write(f.read(1))
i4.write(f.read(1))
i5.write(f.read(1))

f.close()
i1.close()
i2.close()
i3.close()
i4.close()
i5.close()

I first noticed the problem by looking at the original file and
img1.jpg side by side with a hex editor. Since img1 contains every 5th
byte from the original file, I was able to find many places where \x00
should have been copied to img1.jpg, but instead a \x20 was copied.
What caused me to suspect the read method was the following:

>>> f = open('evil2.gfx','rb')
>>> s = f.read()

print repr(s[19:22])
'\xe0 \r'

Now, I have checked many times with a hex editor that the 21st byte of
the file is \x00, yet above you can see that it is reading it as a
space. I've repeated this with several different nulls in the original
file and the result is always the same.

As I said in my original post, when I try simply writing a null to my
own file and reading it (as someone mentioned earlier) everything is
fine. It seems to be only this file which is causing issue.

 
Reply With Quote
 
smeenehan@hmc.edu
Guest
Posts: n/a
 
      08-04-2006
Ok, now I'm very confused, even though I just solved my problem. I
copied the entire contents of the original file (evil2.gfx) from my hex
editor and pasted it into a text file. When I read from *this* file
using my original code, everything worked fine. When I read the 21st
byte, it came up as the correct \x00. Why this didn't work in trying to
read from the original file, I don't know, since the hex values should
be the same, but oh well...

 
Reply With Quote
 
Roel Schroeven
Guest
Posts: n/a
 
      08-04-2006
(E-Mail Removed) schreef:
> f = open('evil2.gfx','rb')
> i1 = open('img1.jpg','wb')
> i2 = open('img2.png','wb')
> i3 = open('img3.gif','wb')
> i4 = open('img4.png','wb')
> i5 = open('img5.jpg','wb')
>
>
> for i in range(0,67575,5):
> i1.write(f.read(1))
> i2.write(f.read(1))
> i3.write(f.read(1))
> i4.write(f.read(1))
> i5.write(f.read(1))
>
> f.close()
> i1.close()
> i2.close()
> i3.close()
> i4.close()
> i5.close()
>
> I first noticed the problem by looking at the original file and
> img1.jpg side by side with a hex editor. Since img1 contains every 5th
> byte from the original file, I was able to find many places where \x00
> should have been copied to img1.jpg, but instead a \x20 was copied.
> What caused me to suspect the read method was the following:
>
>>>> f = open('evil2.gfx','rb')
>>>> s = f.read()

> print repr(s[19:22])
> '\xe0 \r'
>
> Now, I have checked many times with a hex editor that the 21st byte of
> the file is \x00, yet above you can see that it is reading it as a
> space. I've repeated this with several different nulls in the original
> file and the result is always the same.
>
> As I said in my original post, when I try simply writing a null to my
> own file and reading it (as someone mentioned earlier) everything is
> fine. It seems to be only this file which is causing issue.


Very weird. I tried your code on my system (Python 2.4, Windows XP) (but
using a copy of evil2.gfx I still had on my system), with no problems.

Are you sure that you don't have 2 copies of that file around, and that
your program is using the wrong one? Or is it possible that some module
imported with 'from blabla import *' clashes with the builtin open()?

--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven
 
Reply With Quote
 
smeenehan@hmc.edu
Guest
Posts: n/a
 
      08-04-2006
Well, now I tried running the script and it worked fine with the .gfx
file. Originally I was working using the IDLE, which I wouldn't have
thought would make a difference, but when I ran the script on its own
it worked fine and when I ran it in the IDLE it didn't work unless the
data was in a text file. Weird.

 
Reply With Quote
 
Roel Schroeven
Guest
Posts: n/a
 
      08-04-2006
(E-Mail Removed) schreef:
> Well, now I tried running the script and it worked fine with the .gfx
> file. Originally I was working using the IDLE, which I wouldn't have
> thought would make a difference, but when I ran the script on its own
> it worked fine and when I ran it in the IDLE it didn't work unless the
> data was in a text file. Weird.


Weird indeed: I ran the script under IDLE too...


--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to check what symbols are defined in a .o files? .a files? and.so files in linux? yinglcs@gmail.com C++ 3 01-18-2009 05:23 PM
Problem reading small files - stringio problem? Singeo Ruby 3 05-27-2008 04:26 PM
how i can extract text from the PDF files,power point files,Ms word files? crazyprakash Java 4 10-30-2005 10:17 AM
Text files read multiple files into single file, and then recreate the multiple files googlinggoogler@hotmail.com Python 4 02-13-2005 05:44 PM
Help! Files, Files, and more Files ... Everywhere JeffS Digital Photography 22 09-19-2004 01:47 AM



Advertisments