Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Binary data, command output, and Ruby

Reply
Thread Tools

Binary data, command output, and Ruby

 
 
Phrogz
Guest
Posts: n/a
 
      10-01-2007
I have a script that pulls pages from our wiki server. It was working
using Net:HTTP and open-uri with basic_authentication, but our
sysadmin disabled basic authentication and left NTLM as the only
authentication method.

Instead of trying to figure out how to use the Ruby NTLM library, I
decide to just use curl. It was working nicely for the HTML pages
using this form:
def fetch_http_ntlm( url )
`curl #{url} --ntlm -# -u #{USER}:#{PASS}`
end

However, the above fails for binary files. (Pulling down images
embedded in pages.) So I had to switch it to this:
def fetch_http_ntlm( url )
file_name = "C:\\tmp_#{Time.new.to_i}"
`curl #{url} --ntlm -# -u #{USER}:#{PASS} -o #{file_name}`
raw = File.open( file_name, 'rb' ){ |f| f.read }
File.delete( file_name )
raw
end

In other words, I have curl write the output to a file, and then read
in the file using binary mode, and delete the file.

Should I have to do this? Is it a general problem that commands can't
cleanly return binary data to the 'console', and hence can't be
captured using the above format? Or is curl on Windows at fault, and
should be doing something different? Or is Ruby Windows at fault? Or
is Windows itself at fault?


Also - I didn't try using the Tempfile library for the above, since
the documentation for Tempfile.new says:
'Creates a temporary file of mode 0600 in the temporary directory
whose name is basename.pid.n and opens with mode "w+".' If this
documentation is correct, does this mean that the Tempfile library
doesn't work for binary files on Windows?

 
Reply With Quote
 
 
 
 
Phrogz
Guest
Posts: n/a
 
      10-02-2007
On Oct 1, 10:15 am, Phrogz <phr...@mac.com> wrote:
> I have a script that pulls pages from our wiki server. It was working
> using Net:HTTP and open-uri with basic_authentication, but our
> sysadmin disabled basic authentication and left NTLM as the only
> authentication method.
>
> Instead of trying to figure out how to use the Ruby NTLM library, I
> decide to just use curl. It was working nicely for the HTML pages
> using this form:
> def fetch_http_ntlm( url )
> `curl #{url} --ntlm -# -u #{USER}:#{PASS}`
> end
>
> However, the above fails for binary files. (Pulling down images
> embedded in pages.) So I had to switch it to this:
> def fetch_http_ntlm( url )
> file_name = "C:\\tmp_#{Time.new.to_i}"
> `curl #{url} --ntlm -# -u #{USER}:#{PASS} -o #{file_name}`
> raw = File.open( file_name, 'rb' ){ |f| f.read }
> File.delete( file_name )
> raw
> end
>
> In other words, I have curl write the output to a file, and then read
> in the file using binary mode, and delete the file.
>
> Should I have to do this? Is it a general problem that commands can't
> cleanly return binary data to the 'console', and hence can't be
> captured using the above format? Or is curl on Windows at fault, and
> should be doing something different? Or is Ruby Windows at fault? Or
> is Windows itself at fault?


Followup - this does not seem to be a core problem of terminal
commands returning binary data, or a core failing of Ruby. From my OS
X box at home:

Slim2:~/Desktop phrogz$ cat send_bytes.rb
print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join

Slim2:~/Desktop phrogz$ cat get_bytes.rb
result = `ruby send_bytes.rb`
p result.length, result

Slim2:~/Desktop phrogz$ ruby get_bytes.rb
8
"\r\a\201\372\000Foo"

This is also not a problem with curl (at least on *nix):

Slim2:~/Desktop phrogz$ curl -s -O http://phrogz.net/tmp/gkhead.jpg
Slim2:~/Desktop phrogz$ irb
irb(main):001:0> good = IO.read( 'gkhead.jpg' ); good.length
=> 21443
irb(main):002:0> url = 'http://phrogz.net/tmp/gkhead.jpg'
=> "http://phrogz.net/tmp/gkhead.jpg"
irb(main):003:0> test = `curl -s #{url}`; test.length
=> 21443
irb(main):004:0> test == good
=> true

Tomorrow I'll see which of the above fails back on my Windows box.
Glad this isn't a fundamental Ruby or shell workflow problem, anyhow.

 
Reply With Quote
 
 
 
 
Phrogz
Guest
Posts: n/a
 
      10-02-2007
On Oct 1, 9:38 pm, Phrogz <phr...@mac.com> wrote:
> Followup - this does not seem to be a core problem of terminal
> commands returning binary data, or a core failing of Ruby. From my OS
> X box at home:
>
> Slim2:~/Desktop phrogz$ cat send_bytes.rb
> print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join
>
> Slim2:~/Desktop phrogz$ cat get_bytes.rb
> result = `ruby send_bytes.rb`
> p result.length, result
>
> Slim2:~/Desktop phrogz$ ruby get_bytes.rb
> 8
> "\r\a\201\372\000Foo"
>
> This is also not a problem with curl (at least on *nix):
>
> Slim2:~/Desktop phrogz$ curl -s -Ohttp://phrogz.net/tmp/gkhead.jpg
> Slim2:~/Desktop phrogz$ irb
> irb(main):001:0> good = IO.read( 'gkhead.jpg' ); good.length
> => 21443
> irb(main):002:0> url = 'http://phrogz.net/tmp/gkhead.jpg'
> => "http://phrogz.net/tmp/gkhead.jpg"
> irb(main):003:0> test = `curl -s #{url}`; test.length
> => 21443
> irb(main):004:0> test == good
> => true
>
> Tomorrow I'll see which of the above fails back on my Windows box.


Here are the results from Windows. Binary per se doesn't fail, but
using it with curl makes it break eventually.

Any suggestions on how to further pare this down to see if this is a
Ruby-Windows problem, a Windows shell problem, or a Curl-Windows
problem?


c:\>type send_bytes.rb
print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join

c:\>type get_bytes.rb
result = `ruby send_bytes.rb`
p result.length, result

c:\>ruby get_bytes.rb
8
"\r\a\201\372\000Foo"


c:\>curl -s -O http://phrogz.net/tmp/gkhead.jpg

c:\>irb
irb(main):001:0> good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read };
good.length
=> 21443

irb(main):002:0> url = 'http://phrogz.net/tmp/gkhead.jpg'
=> "http://phrogz.net/tmp/gkhead.jpg"

irb(main):003:0> test = `curl -s #{url}`; test.length
=> 2010

irb(main):008:0> 0.step( test.length, 100 ){ |i|
irb(main):009:1* range = i...(i+100)
irb(main):010:1> if good[ range ] != test[ range ]
irb(main):011:2> p good[ range ], test[ range ], range
irb(main):012:2> break
irb(main):013:2> end
irb(main):014:1> }
"\000\000\000\004\000\000\000\0008BIM\004\032\006S lices
\000\000\000\000m
\000\000\000\006\000\000\000\000\000\000\000\000\0 00\000\001\276\000\000\001\231\000\000\000\006\000 g
\000k\000h\000e\000a\000d
\000\000\000\001\000\000\000\000\000\000\000\000\0 00\000\000\000\000\000\000\000\000\000\000\001\000 \000\000\000\000\000\000\000\000\000\001\231\000\0 00"
"\000\000\000\004\000\000\000\0008BIM\004$\023\222 \vDW$\026\020EG
\377\320\346\177\335q9}K\236:{5C\357L\026\372\330\ 251\207\261W>
\372\301v\346O\222b\373\027/\276p\310\372\351\370\246\036\314\327~
\366\260\\\t\037\002\236\253\356X\373\267\237\346) \352{\221\221\367I
\352\177\322\2223z`\227\335W"
700...800


 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      10-03-2007
OK, so this seems like a Ruby Windows problem:

C:\>curl -s -O http://phrogz.net/tmp/gkhead.jpg
C:\>curl -s http://phrogz.net/tmp/gkhead.jpg > test.jpg
C:\>irb
irb(main):001:0> good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read };
good.length
=> 21443
irb(main):002:0> test = File.open( 'test.jpg', 'rb' ){ |f| f.read };
test.length
=> 21443
irb(main):003:0> suck = `curl -s http://phrogz.net/tmp/gkhead.jpg`;
suck.length
=> 2010


good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
test = `curl -s http://phrogz.net/tmp/gkhead.jpg`

0.upto( test.length-1 ){ |i|
if test[ i ] != good[ i ]
s1 = good[ (i-5)..(i+2) ]
s2 = test[ (i-5)..(i+2) ]
p s1, s2
puts
[ s1, s2 ].each{ |str|
puts str.unpack( 'B8'*str.length ).join('|')
}
break
end
}

#=> "8BIM\004\032\006S"
#=> "8BIM\004$\023\222"
#=>
#=> 00111000|01000010|01001001|01001101|00000100|00011 010|00000110|
01010011
#=> 00111000|01000010|01001001|01001101|00000100|00100 100|00010011|
10010010


Windows console can properly redirect binary command output to a file,
but (after a certain point or certain binary sequence?) Ruby gets
munged binary data back instead.

I'll take this to ruby-core unless someone can point out why this flaw
isn't Ruby's.

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      10-04-2007
For my last post on this topic, a simpler test case showing Ruby on OS
X behaving as expected, and Ruby on Windows...not.

====

Darwin Slim2.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23
16:33:00 PDT 2007; rootnu-792.22.5~1/RELEASE_I386 i386 i386
ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.9.1]

Slim2:~/Desktop phrogz$ cat put_bytes.rb
File.open( 'gkhead.jpg', 'rb' ){ |f| print f.read }

Slim2:~/Desktop phrogz$ cat get_bytes.rb
raw_bytes = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
rcv_bytes = `ruby put_bytes.rb`
p raw_bytes.length, rcv_bytes.length

Slim2:~/Desktop phrogz$ ruby get_bytes.rb
21443
21443

====

Windows XP SP 2 (Microsoft Windows XP [Version 5.1.2600])
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-mswin32] (latest one-click
installer)

C:\Documents and Settings\gavin.kistner\Desktop>type put_bytes.rb
File.open( 'gkhead.jpg', 'rb' ){ |f| print f.read }

C:\Documents and Settings\gavin.kistner\Desktop>type get_bytes.rb
raw_bytes = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
rcv_bytes = `ruby put_bytes.rb`
p raw_bytes.length, rcv_bytes.length

C:\Documents and Settings\gavin.kistner\Desktop>ruby get_bytes.rb
21443
5159

 
Reply With Quote
 
Daniel Sheppard
Guest
Posts: n/a
 
      10-04-2007
> I have a script that pulls pages from our wiki server. It was working
> using Net:HTTP and open-uri with basic_authentication, but our
> sysadmin disabled basic authentication and left NTLM as the only
> authentication method.


Install http://ntlmaps.sourceforge.net/ and direct Net::HTTP through
that
as a proxy.


 
Reply With Quote
 
Daniel Sheppard
Guest
Posts: n/a
 
      10-04-2007
> good =3D File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
> test =3D `curl -s http://phrogz.net/tmp/gkhead.jpg`


I would hazard a guess that if you took that 'b' off of the File.open,
you'd get the same bytes `` is returning?

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      10-04-2007
On Oct 3, 10:06 pm, "Daniel Sheppard" <dani...@pronto.com.au> wrote:
> > good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
> > test = `curl -shttp://phrogz.net/tmp/gkhead.jpg`

>
> I would hazard a guess that if you took that 'b' off of the File.open,
> you'd get the same bytes `` is returning?


I doubt it, but will try when I get into work. My understanding was
that (on Windows) opening a file without 'b' "helpfully" converts \n
bytes to \r\n pairs; the 'b' is needed to say "Hey, don't be munging
my data!".

But like I said, I'll give it a shot.

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      10-04-2007
On Oct 4, 8:03 am, Phrogz <phr...@mac.com> wrote:
> > > good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
> > > test = `curl -shttp://phrogz.net/tmp/gkhead.jpg`

>
> > I would hazard a guess that if you took that 'b' off of the File.open,
> > you'd get the same bytes `` is returning?

>
> I doubt it, but will try when I get into work. My understanding was
> that (on Windows) opening a file without 'b' "helpfully" converts \n
> bytes to \r\n pairs; the 'b' is needed to say "Hey, don't be munging
> my data!".
>
> But like I said, I'll give it a shot.


OK, so this has nothing to do with reading files from disk. The crazy
thing is that it isn't even deterministic! See the following:

C:\>type put_bytes.rb
print (0..12000).map{ |i| ((i % 255) + 1).chr }.join
$stdout.flush
sleep 1
$stdout.flush

C:\>type get_bytes.rb
p `ruby put_bytes.rb`.length

C:\>type multiget.bat
@echo off
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb

C:\>multiget.bat
944
696
944
1192
944
919
1192
1192
944
944
1192
1192
944
1167
1192
1192
944
1192
1192
1192

Note that it also does the above with or without the sleep, and with
or without the $stdout.flush calls.

What is going on here?!

 
Reply With Quote
 
Peņa, Botp
Guest
Posts: n/a
 
      10-05-2007
From: Phrogz [mailto]=20
# OK, so this has nothing to do with reading files=20
# from disk. The crazy thing is that it isn't even=20
# deterministic! See the following:
# <snip>
#...
# What is going on here?!

can't help you there, but mine has a different yet consistent output...

C:\family\ruby>type put_bytes.rb
print (0..12000).map{ |i| ((i % 255) + 1).chr }.join
$stdout.flush
sleep 1
$stdout.flush

C:\family\ruby>type get_bytes.rb
p `ruby put_bytes.rb`.length

C:\family\ruby>type multi_get.bat
@echo off
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb
ruby get_bytes.rb

C:\family\ruby> multi_get.bat
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348
348

C:\family\ruby>ver

Microsoft Windows XP [Version 5.1.2600]

C:\family\ruby>ruby -v
ruby 1.8.6 (2007-09-23 patchlevel 110) [i386-mswin32]

maybe we differ on the patchlevel?

kind regards -botp

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
x86 binary runs; x86_64 binary throws segfault Don C Programming 60 03-19-2010 05:58 AM
(8-bit binary to two digit bcd) or (8-bit binary to two digit seven segment) Fangs VHDL 3 10-26-2008 06:41 AM
writing binary file (ios::binary) Ron Eggler C++ 9 04-28-2008 08:20 AM
A 64-bit binary returning a value to a 32-bit binary? spammenotplui31@yahoo.ca C Programming 12 04-08-2007 07:02 AM
Re: ostreams, ios::binary, endian, mixed binary-ascii Marc Schellens C++ 8 07-15-2003 12:27 PM



Advertisments