Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Best way to store a large number of files?

Reply
Thread Tools

Best way to store a large number of files?

 
 
heather.fraser@gmail.com
Guest
Posts: n/a
 
      10-08-2005
Hello everybody,

I am creating an Image library application with Java which will
store several million files on the file system.
Meta data desribing the images will be stored in a database but
I think it's probably faster if the actual image files are stored
on the file system with a reference stored in the database.

As I understand it, storing all of the files in one single
directory would become slow in look-ups. And so I am thinking
of giving each image a 10-digit number and placing the image
in a directory structure such as this ~

/1/2/3/4/5/6/7/8/9/x.png

For example, if an image is 2749282749.jpg then the image 9.png
will be placed in the subdirectory /2/7/4/9/2/8/2/7/4

Is it really that simple? Are there any caveats that I should
be aware of?

thank you very much,

Heather

 
Reply With Quote
 
 
 
 
Mark Thornton
Guest
Posts: n/a
 
      10-08-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Hello everybody,
>
> I am creating an Image library application with Java which will
> store several million files on the file system.
> Meta data desribing the images will be stored in a database but
> I think it's probably faster if the actual image files are stored
> on the file system with a reference stored in the database.
>
> As I understand it, storing all of the files in one single
> directory would become slow in look-ups.


It depends on the file system in use. This problem does occur for FAT
but not for NTFS for example.

>And so I am thinking
> of giving each image a 10-digit number and placing the image
> in a directory structure such as this ~


A better approach might be to MessageDigest to compute a hash of the
file and use that to derive the file path and name. This would result in
identical files being located in the same place. You should also
experiment with the number of 'digits' to use at each level; one is
probably too few, two or three is likely to be more efficient. Otherwise
the approach is reasonable and is used by a number of applications.

Mark Thornton
 
Reply With Quote
 
 
 
 
Kenneth P. Turvey
Guest
Posts: n/a
 
      10-08-2005
On Sat, 08 Oct 2005 06:50:21 -0700, heather.fraser wrote:

[Snip]
> /1/2/3/4/5/6/7/8/9/x.png
>
> For example, if an image is 2749282749.jpg then the image 9.png
> will be placed in the subdirectory /2/7/4/9/2/8/2/7/4
>
> Is it really that simple? Are there any caveats that I should
> be aware of?


This is pretty much exactly how many news servers store articles in the
filesystem. You don't need that many levels of directories though. Your
design will work find under Unix, I can't say for other platforms, but
I would expect it to be fine.

--
Kenneth P. Turvey <(E-Mail Removed)>
http://kt.squeakydolphin.com (not much there yet)
Jabber IM: (E-Mail Removed)
Phone: (314) 255-2199

 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      10-09-2005
On 8 Oct 2005 06:50:21 -0700, (E-Mail Removed) wrote or quoted
:

>Is it really that simple? Are there any caveats that I should
>be aware of?


Create your directory structure first. You can't create a file without
the directory structure in place.

It is primarily Windows 98 and its FAT file system that has troubles
with long linear searches of directories. Your scheme has 10
directories per level and 10 leaf files per directory.

You might try your code with 100, 256 or 1000 per node to find the
optimal efficiency, perhaps even making the arity a platform
configurable option.

You are probably best to put the entire index name in the leaf file
name to avoid confusion, especially if files are copied about.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.
 
Reply With Quote
 
Jon Martin Solaas
Guest
Posts: n/a
 
      10-09-2005
Mark Thornton wrote:
> (E-Mail Removed) wrote:
>
>> Hello everybody,
>>
>> I am creating an Image library application with Java which will
>> store several million files on the file system.
>> Meta data desribing the images will be stored in a database but
>> I think it's probably faster if the actual image files are stored
>> on the file system with a reference stored in the database.
>>
>> As I understand it, storing all of the files in one single
>> directory would become slow in look-ups.

>
>
> It depends on the file system in use. This problem does occur for FAT
> but not for NTFS for example.


Are you joking? NTFS is better at handling large number of files, but
sure it becomes a problem when the number is large enough.

--
jon martin solaas
 
Reply With Quote
 
Mark Thornton
Guest
Posts: n/a
 
      10-09-2005
Jon Martin Solaas wrote:
> Mark Thornton wrote:
>
>> (E-Mail Removed) wrote:
>>
>>> Hello everybody,
>>>
>>> I am creating an Image library application with Java which will
>>> store several million files on the file system.
>>> Meta data desribing the images will be stored in a database but
>>> I think it's probably faster if the actual image files are stored
>>> on the file system with a reference stored in the database.
>>>
>>> As I understand it, storing all of the files in one single
>>> directory would become slow in look-ups.

>>
>>
>>
>> It depends on the file system in use. This problem does occur for FAT
>> but not for NTFS for example.

>
>
> Are you joking? NTFS is better at handling large number of files, but
> sure it becomes a problem when the number is large enough.
>


Given that NTFS uses a tree structure for directories, it won't have any
more problem than using the hierarchy of directories proposed by the OP.
It certainly is happy with many thousands of entries in a directory. For
Linux fans I think ReiserFS has similar properties.

Mark Thornton
 
Reply With Quote
 
Drazen Gemic
Guest
Posts: n/a
 
      10-09-2005
> A better approach might be to MessageDigest to compute a hash of the
> file and use that to derive the file path and name. This would result in


Similar approach is used by Squid, a cacheing proxy. It creates hash
code out of URLs. Be sure that it can store and access files quickly and
deals with milions of files without any effort.

DG
 
Reply With Quote
 
Andrey Kuznetsov
Guest
Posts: n/a
 
      10-09-2005
>> As I understand it, storing all of the files in one single
>> directory would become slow in look-ups.

>
> It depends on the file system in use. This problem does occur for FAT but
> not for NTFS for example.


but with java you will get HUGE problems in this case - think about
File#list().

--
Andrey Kuznetsov
http://uio.imagero.com Unified I/O for Java
http://reader.imagero.com Java image reader
http://jgui.imagero.com Java GUI components and utilities


 
Reply With Quote
 
Andrey Kuznetsov
Guest
Posts: n/a
 
      10-09-2005
> Create your directory structure first. You can't create a file without
> the directory structure in place.


you can create missing directories with mkdirs()

--
Andrey Kuznetsov
http://uio.imagero.com Unified I/O for Java
http://reader.imagero.com Java image reader
http://jgui.imagero.com Java GUI components and utilities


 
Reply With Quote
 
Mark Thornton
Guest
Posts: n/a
 
      10-09-2005
Andrey Kuznetsov wrote:
>>>As I understand it, storing all of the files in one single
>>>directory would become slow in look-ups.

>>
>>It depends on the file system in use. This problem does occur for FAT but
>>not for NTFS for example.

>
>
> but with java you will get HUGE problems in this case - think about
> File#list().
>


The OP's task may not need to use the 'list' method. We also hope that
JSR-203 will eventually provide a way around this problem.

Mark Thornton
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Efficient way to store a limited number of booleans mathieu C++ 11 12-12-2007 02:24 PM
Best way to store a time? Tarun Mistry ASP .Net 1 02-22-2006 12:33 PM
Best way to manage. catalogue and store photos?? PeterH Digital Photography 8 01-05-2005 03:52 AM
Best way to load/store web site settings in database Max ASP .Net 2 12-08-2003 10:39 PM
best way to store dig.photo just_a_girl41 Digital Photography 1 08-22-2003 12:25 PM



Advertisments