Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Document identification

Reply
Thread Tools

Document identification

 
 
M. Eteum
Guest
Posts: n/a
 
      06-01-2005
Dear Ruby Guru:
Is there a way to identify any documents from its header? I have a
bunch of document collected over the year from multi platform system,
Mac, Windows, and various unix/linux variant where some of the document
does not have file extension. Are there a list that tells us what header
should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
word, excel, visio, etc ...

Thanks
 
Reply With Quote
 
 
 
 
Robin Stocker
Guest
Posts: n/a
 
      06-01-2005
M. Eteum wrote:
> Is there a way to identify any documents from its header? I have a
> bunch of document collected over the year from multi platform system,
> Mac, Windows, and various unix/linux variant where some of the document
> does not have file extension. Are there a list that tells us what header
> should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> word, excel, visio, etc ...


Hi,

On a Unix system you could use the "file" command, it is able to detect
file types even when there's no extension.
I don't know if a Ruby module exists for this purpose though.

Regards,
Robin


 
Reply With Quote
 
 
 
 
Austin Ziegler
Guest
Posts: n/a
 
      06-01-2005
On 6/1/05, Robin Stocker <(E-Mail Removed)> wrote:
> M. Eteum wrote:
> > Is there a way to identify any documents from its header? I have a
> > bunch of document collected over the year from multi platform system,
> > Mac, Windows, and various unix/linux variant where some of the document
> > does not have file extension. Are there a list that tells us what heade=

r
> > should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> > word, excel, visio, etc ...

> On a Unix system you could use the "file" command, it is able to detect
> file types even when there's no extension.
> I don't know if a Ruby module exists for this purpose though.


Not yet. I do plan on adding it to MIME::Types in the future.

-austin
--=20
Austin Ziegler * http://www.velocityreviews.com/forums/(E-Mail Removed)
* Alternate: (E-Mail Removed)


 
Reply With Quote
 
M. Eteum
Guest
Posts: n/a
 
      06-01-2005
Robin Stocker wrote:
> M. Eteum wrote:
>
>> Is there a way to identify any documents from its header? I have
>> a bunch of document collected over the year from multi platform
>> system, Mac, Windows, and various unix/linux variant where some of the
>> document does not have file extension. Are there a list that tells us
>> what header should we expect for certain documents e.g. txt, rtf, pdf,
>> jpg, mpg, word, excel, visio, etc ...

>
>
> Hi,
>
> On a Unix system you could use the "file" command, it is able to detect
> file types even when there's no extension.
> I don't know if a Ruby module exists for this purpose though.
>
> Regards,
> Robin
>
>

Thanks for the reply.

I'm running on Windows as well as MAC. We exchange files between both
OS. Ruby modules that can handle this function would have been nice but
I'll take anything for now.

Thanks again
 
Reply With Quote
 
M. Eteum
Guest
Posts: n/a
 
      06-01-2005
Austin Ziegler wrote:
> On 6/1/05, Robin Stocker <(E-Mail Removed)> wrote:
>
>>M. Eteum wrote:
>>
>>> Is there a way to identify any documents from its header? I have a
>>>bunch of document collected over the year from multi platform system,
>>>Mac, Windows, and various unix/linux variant where some of the document
>>>does not have file extension. Are there a list that tells us what header
>>>should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
>>>word, excel, visio, etc ...

>>
>>On a Unix system you could use the "file" command, it is able to detect
>>file types even when there's no extension.
>>I don't know if a Ruby module exists for this purpose though.

>
>
> Not yet. I do plan on adding it to MIME::Types in the future.
>
> -austin


Super! Oh by the way, do you know if Perl or Python has it? I'm quite
desperate to find the solution, therefore I'll take any solution while
waiting for the Ruby modules.

Thanks
 
Reply With Quote
 
Ilmari Heikkinen
Guest
Posts: n/a
 
      06-01-2005
ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
> Dear Ruby Guru:
> Is there a way to identify any documents from its header? I have a
> bunch of document collected over the year from multi platform system,
> Mac, Windows, and various unix/linux variant where some of the document
> does not have file extension. Are there a list that tells us what header
> should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> word, excel, visio, etc ...
>
> Thanks


Hello,

If you have shared-mime-info database installed
( http://freedesktop.org/wiki/Software..._2dmime_2dinfo )
you can use this: http://www.code-monkey.de/projects/mimeInfoRb.html
Or my extended version: http://dark.fhtr.org/mime_info_rb.tar.gz

>From the README:


MimeInfo class provides an interface to query freedesktop.org's
shared-mime-info database. It can be used to guess a filename's
Mimetype and to get the description for the Mimetype.

require 'mime_info'

info = MimeInfo.get('foo.xml') #=> Mimetype['text/xml']
info.description
#=> "eXtensible Markup Language document"
info.description("de") #=> "XML-Dokument"

info2 = MimeInfo.get('foo.rb') #=> Mimetype['application/x-ruby']
info2.description #=> "Ruby script"
info2.is_a? Mimetype['text/plain'] #=> true

t = Mimetype['audio/x-mp3'] #=> Mimetype['audio/x-mp3']
t.description #=> "MP3 audio"
t.description('cy') #=> "Sain MP3"
t.descriptions['fr'] #=> "audio MP3"
t == Mimetype['audio']['x-mp3'] #=> true
t.is_a? Mimetype['audio'] #=> true
t.ancestors #=> [Mimetype['audio/x-mp3'], Mimetype['audio'],
# Mimetype['application/octet-stream'], Mimetype,
# Module, Object, Kernel]


HTH,

Ilmari



 
Reply With Quote
 
Austin Ziegler
Guest
Posts: n/a
 
      06-01-2005
On 6/1/05, Ilmari Heikkinen <(E-Mail Removed)> wrote:
> ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
> > Dear Ruby Guru:
> > Is there a way to identify any documents from its header? I have =

a
> > bunch of document collected over the year from multi platform system,
> > Mac, Windows, and various unix/linux variant where some of the document
> > does not have file extension. Are there a list that tells us what heade=

r
> > should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> > word, excel, visio, etc ...
> >
> > Thanks


Most of this is covered by MIME::Types on RubyForge. However, the OP
indicated that the problem was related to NOT having proper filename
extensions. The OP wants to look for magic numbers and strings.

-austin
--=20
Austin Ziegler * (E-Mail Removed)
* Alternate: (E-Mail Removed)


 
Reply With Quote
 
Ilmari Heikkinen
Guest
Posts: n/a
 
      06-01-2005
ke, 2005-06-01 kello 23:33, Austin Ziegler kirjoitti:
> On 6/1/05, Ilmari Heikkinen <(E-Mail Removed)> wrote:
> > ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
> > > Dear Ruby Guru:
> > > Is there a way to identify any documents from its header? I have a
> > > bunch of document collected over the year from multi platform system,
> > > Mac, Windows, and various unix/linux variant where some of the document
> > > does not have file extension. Are there a list that tells us what header
> > > should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> > > word, excel, visio, etc ...
> > >
> > > Thanks

>
> Most of this is covered by MIME::Types on RubyForge. However, the OP
> indicated that the problem was related to NOT having proper filename
> extensions. The OP wants to look for magic numbers and strings.
>


Shared-mime-info does this aswell. Though it may fare worse than file in
some cases.

kig@bauhaus:~$ mv fire.avi fire
kig@bauhaus:~$ irb
irb(main):001:0> require 'mime_info'
=> true
irb(main):002:0> MimeInfo.get('fire')
=> Mimetype['video/x-msvideo']




 
Reply With Quote
 
Martin DeMello
Guest
Posts: n/a
 
      06-02-2005
M. Eteum <(E-Mail Removed)> wrote:
>
> Super! Oh by the way, do you know if Perl or Python has it? I'm quite
> desperate to find the solution, therefore I'll take any solution while
> waiting for the Ruby modules.


Your best bet would be to find a windows port of unix's 'file' (Mac OSX
is definitely bound to have it). Sadly, it's a very hard thing to google
for

martin
 
Reply With Quote
 
Martin DeMello
Guest
Posts: n/a
 
      06-02-2005
Martin DeMello <(E-Mail Removed)> wrote:
> M. Eteum <(E-Mail Removed)> wrote:
> >
> > Super! Oh by the way, do you know if Perl or Python has it? I'm quite
> > desperate to find the solution, therefore I'll take any solution while
> > waiting for the Ruby modules.

>
> Your best bet would be to find a windows port of unix's 'file' (Mac OSX
> is definitely bound to have it). Sadly, it's a very hard thing to google
> for


You're in luck - gnuwin32 includes a port of file.

http://gnuwin32.sourceforge.net/summary.html

All you need to do is a = `file.exe #{filename}`

martin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
UnManaged Resource Identification thomson ASP .Net 7 02-03-2006 04:02 AM
identification of NET objects guy ASP .Net 2 09-06-2005 09:47 AM
How to get the Web Site Identification information from inside the application Daniel Jorge ASP .Net 5 07-11-2005 10:24 PM
Re: Client identification Sorin Sandu ASP .Net 1 04-09-2004 08:25 AM
Client identification Sorin Sandu ASP .Net 1 04-09-2004 07:24 AM



Advertisments