Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Test if file is binary ?

Reply
Thread Tools

Test if file is binary ?

 
 
Rebhan, Gilbert
Guest
Posts: n/a
 
      08-21-2007

Hi ,

how to test if a file is binary or not ?

There ain't something like File.binary =3D
NoMethodError: undefined method `binary?' for File:Class

Any ideas or libraries available ?

Regards, Gilbert

 
Reply With Quote
 
 
 
 
dima
Guest
Posts: n/a
 
      08-21-2007
On Aug 21, 8:04 am, "Rebhan, Gilbert" <(E-Mail Removed)>
wrote:
> Hi ,
>
> how to test if a file is binary or not ?
>
> There ain't something like File.binary =
> NoMethodError: undefined method `binary?' for File:Class
>
> Any ideas or libraries available ?
>
> Regards, Gilbert


What to you need to achieve with this is_binary? method?
All files are just collection of bytes, so in a perspective they all
are binary. We interpret them as suites our needs.

 
Reply With Quote
 
 
 
 
Rebhan, Gilbert
Guest
Posts: n/a
 
      08-21-2007
=20
Hi,

-----Original Message-----
From: dima [(E-Mail Removed)]=20
Sent: Tuesday, August 21, 2007 8:50 AM
To: ruby-talk ML
Subject: Re: Test if file is binary ?

On Aug 21, 8:04 am, "Rebhan, Gilbert" <(E-Mail Removed)>
wrote:
> Hi ,
>>
>> how to test if a file is binary or not ?
>>
>> There ain't something like File.binary =3D
>> NoMethodError: undefined method `binary?' for File:Class
>>
>> Any ideas or libraries available ?


>What to you need to achieve with this is_binary? method?
>All files are just collection of bytes, so in a perspective they all
>are binary. We interpret them as suites our needs.


For example this information is needed to decide whether
cvs should handle that file / that fileextension as binary or ascii

Regards, Gilbert


 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      08-21-2007
2007/8/21, Rebhan, Gilbert <(E-Mail Removed)>:
>
> Hi ,
>
> how to test if a file is binary or not ?
>
> There ain't something like File.binary =
> NoMethodError: undefined method `binary?' for File:Class
>
> Any ideas or libraries available ?


If I'd really need it I'd probably do a heuristic based on
distribution of byte values across an initial portion of the file.
Something like this:

class File
def self.binary?(name)
ascii = control = binary = 0

File.open(name, "rb") {|io| io.read(1024)}.each_byte do |bt|
case bt
when 0...32
control += 1
when 32...128
ascii += 1
else
binary += 1
end
end

control.to_f / ascii > 0.1 || binary.to_f / ascii > 0.05
end
end

Kind regards

robert

 
Reply With Quote
 
Rebhan, Gilbert
Guest
Posts: n/a
 
      08-21-2007
=20
Hi,

-----Original Message-----
From: Robert Klemme [(E-Mail Removed)]=20
Sent: Tuesday, August 21, 2007 9:05 AM
To: ruby-talk ML
Subject: Re: Test if file is binary ?

2007/8/21, Rebhan, Gilbert <(E-Mail Removed)>:
>
> Hi ,
>
> how to test if a file is binary or not ?
>
> There ain't something like File.binary =3D
> NoMethodError: undefined method `binary?' for File:Class
>
> Any ideas or libraries available ?


/*

If I'd really need it I'd probably do a heuristic based on
distribution of byte values across an initial portion of the file.
Something like this:

class File
def self.binary?(name)
ascii =3D control =3D binary =3D 0

File.open(name, "rb") {|io| io.read(1024)}.each_byte do |bt|
case bt
when 0...32
control +=3D 1
when 32...128
ascii +=3D 1
else
binary +=3D 1
end
end

control.to_f / ascii > 0.1 || binary.to_f / ascii > 0.05
end
end

*/


Nice Thanks !!

Regards, Gilbert

 
Reply With Quote
 
Alex Gutteridge
Guest
Posts: n/a
 
      08-21-2007
On 21 Aug 2007, at 15:57, Rebhan, Gilbert wrote:

>
> Hi,
>
> -----Original Message-----
> From: dima [(E-Mail Removed)]
> Sent: Tuesday, August 21, 2007 8:50 AM
> To: ruby-talk ML
> Subject: Re: Test if file is binary ?
>
> On Aug 21, 8:04 am, "Rebhan, Gilbert" <(E-Mail Removed)>
> wrote:
>> Hi ,
>>>
>>> how to test if a file is binary or not ?
>>>
>>> There ain't something like File.binary =
>>> NoMethodError: undefined method `binary?' for File:Class
>>>
>>> Any ideas or libraries available ?

>
>> What to you need to achieve with this is_binary? method?
>> All files are just collection of bytes, so in a perspective they all
>> are binary. We interpret them as suites our needs.

>
> For example this information is needed to decide whether
> cvs should handle that file / that fileextension as binary or ascii
>
> Regards, Gilbert


One simple approach is this:

class File
def is_binary?
ascii = 0
total = 0
self.read(1024).each_byte{|c| total += 1; ascii +=1 if c >= 128
or c == 0}
ascii.to_f / total.to_f > 0.33 ? true : false
end
end

You can tweak the 0.33 value if you like. Probably better (i.e. more
robust) ways out there though.

Alex Gutteridge

Bioinformatics Center
Kyoto University



 
Reply With Quote
 
Alex Gutteridge
Guest
Posts: n/a
 
      08-21-2007
Sorry for the duplicate! Robert is too fast for me.

Alex Gutteridge

Bioinformatics Center
Kyoto University



 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      08-21-2007
2007/8/21, Alex Gutteridge <(E-Mail Removed)-u.ac.jp>:
> Sorry for the duplicate! Robert is too fast for me.


It's always good to see more solutions. I like the conciseness of
your solution. But I think this should rather be a class method
because you would not do the test on an open stream. Dunno which of
the solutions is more realistic. Might be fun to let both approaches
test a large number of files and compare their results (probably also
with output from "file").

Btw, you should get rid of the ternary operator - it's totally
superfluous because there is no point in converting a boolean value
into a boolean value.

Kind regards

robert

 
Reply With Quote
 
Rebhan, Gilbert
Guest
Posts: n/a
 
      08-21-2007
=20

-----Original Message-----
From: Robert Klemme [(E-Mail Removed)]=20
Sent: Tuesday, August 21, 2007 9:41 AM
To: ruby-talk ML
Subject: Re: Test if file is binary ?

2007/8/21, Alex Gutteridge <(E-Mail Removed)-u.ac.jp>:
> Sorry for the duplicate! Robert is too fast for me.


/*
It's always good to see more solutions. I like the conciseness of
your solution. But I think this should rather be a class method
because you would not do the test on an open stream. Dunno which of
the solutions is more realistic.
*/

you mean it should be something like ? =3D

class File
def self.is_binary?(name)
ascii =3D total =3D 0
File.open(name, "rb") { |io| io.read(1024) }.each_byte do |c|
total +=3D 1;=20
ascii +=3D1 if c >=3D 128 or c =3D=3D 0
end
ascii.to_f / total.to_f > 0.33
end
end


/*
Might be fun to let both approaches
test a large number of files and compare their results (probably also
with output from "file").
*/

Is there an exisiting standard what is considered as a binary file,
means a
rule like check the first block from a file and =3D

- if control characters (ASCII 0-32) and "high ASCII" (> 12 are found
>30 %

it's considered as binary file otherwise textfile

- if control characters (ASCII 0-32 and > 12 are found =3D=3D 0 it's
always
considered as textfile

??


Regards, Gilbert




 
Reply With Quote
 
Xavier Noria
Guest
Posts: n/a
 
      08-21-2007
On Aug 21, 2007, at 10:21 AM, Rebhan, Gilbert wrote:

> Is there an exisiting standard what is considered as a binary file,
> means a
> rule like check the first block from a file and =
>
> - if control characters (ASCII 0-32) and "high ASCII" (> 12 are
> found
>> 30 %

> it's considered as binary file otherwise textfile
>
> - if control characters (ASCII 0-32 and > 12 are found == 0 it's
> always
> considered as textfile
>
> ??


What's the heuristic in Subversion?

-- fxn


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie: working with binary files/extract png from a binary file Jim Ruby 6 12-24-2013 08:09 AM
writing binary file (ios::binary) Ron Eggler C++ 9 04-28-2008 08:20 AM
TEST TEST Test...Blah Blah Blah Generalbatguano@pacbell.net Computer Support 6 09-13-2006 01:53 AM
TEST TEST TEST Gazwad Computer Support 2 09-05-2003 07:32 PM
test test test test test test test Computer Support 2 07-02-2003 06:02 PM



Advertisments