Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > How Would you Store this Data?

Reply
Thread Tools

How Would you Store this Data?

 
 
Jonathan Wood
Guest
Posts: n/a
 
      02-28-2010
I'm thinking about building a site that would include user-contributed
articles, file attachments, images, and maybe videos.

I'm trying to determine the best way to store this data. Specifically:

A) What are the pros and cons of storing the file attachments, images, and
videos in a database vs. storing them as separate files?

B) I've decided user-contributed articles should not be submitted as HTML.
I'd have some code to convert newlines to <br /> and otherwise convert the
articles to HTML. I'm thinking about separate database columns for both the
original and the converted articles. The advantages would be that the site
would be fast because no conversion would be needed when serving the
articles, and it would be possible for a contributor to obtain the original
if they wanted to edit it and resubmit. I'd be curious if others thought
this was the best approach.

I'm interested in hearing all ideas.

Thanks for any suggestions

--
Jonathan Wood
SoftCircuits Programming
http://www.softcircuits.com


 
Reply With Quote
 
 
 
 
Mr. Arnold
Guest
Posts: n/a
 
      02-28-2010
Jonathan Wood wrote:
> I'm thinking about building a site that would include user-contributed
> articles, file attachments, images, and maybe videos.
>
> I'm trying to determine the best way to store this data. Specifically:
>
> A) What are the pros and cons of storing the file attachments, images,
> and videos in a database vs. storing them as separate files?
>
> B) I've decided user-contributed articles should not be submitted as
> HTML. I'd have some code to convert newlines to <br /> and otherwise
> convert the articles to HTML. I'm thinking about separate database
> columns for both the original and the converted articles. The advantages
> would be that the site would be fast because no conversion would be
> needed when serving the articles, and it would be possible for a
> contributor to obtain the original if they wanted to edit it and
> resubmit. I'd be curious if others thought this was the best approach.
>
> I'm interested in hearing all ideas.


Those are files and you need a control that does file uploads and
downloads. The files should be kept on a file share or file share array.

The database table would only have a record that has the name of the
file, and the path to the file on the file share a pointer record.

You need a viewer control so when the file is selected for viewing, it
takes the file extension and makes a determination as to what
application would be used to allow the viewing of the file. If is's a
*.doc, then the viewer is going to select MS Word to view the file, etc,
etc.

The various application used to view a file will reside on the Web
server like a file association that the Windows O/S does is most likely
going to be invoked.

That's how you do a document repository that's controlled by a Web
application using a browser.



 
Reply With Quote
 
 
 
 
Alexey Smirnov
Guest
Posts: n/a
 
      02-28-2010
On Feb 28, 4:18*am, "Jonathan Wood" <(E-Mail Removed)> wrote:
> I'm thinking about building a site that would include user-contributed
> articles, file attachments, images, and maybe videos.
>
> I'm trying to determine the best way to store this data. Specifically:
>
> A) What are the pros and cons of storing the file attachments, images, and
> videos in a database vs. storing them as separate files?
>
> B) I've decided user-contributed articles should not be submitted as HTML..
> I'd have some code to convert newlines to <br /> and otherwise convert the
> articles to HTML. I'm thinking about separate database columns for both the
> original and the converted articles. The advantages would be that the site
> would be fast because no conversion would be needed when serving the
> articles, and it would be possible for a contributor to obtain the original
> if they wanted to edit it and resubmit. I'd be curious if others thought
> this was the best approach.
>
> I'm interested in hearing all ideas.
>
> Thanks for any suggestions
>
> --
> Jonathan Wood
> SoftCircuits Programminghttp://www.softcircuits.com


A) It depends on size of attachments and many other things. You should
estimate how many users, files and traffic you will get. Imagine that
your database would be 10 TB, are you sure that you can manage it? In
most cases you will get an answer that going for a file share is the
easiest and scalable way.

B) Replacing <br> on-the-fly should not be a performance problem
 
Reply With Quote
 
Andy B.
Guest
Posts: n/a
 
      02-28-2010

"Jonathan Wood" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> I'm thinking about building a site that would include user-contributed
> articles, file attachments, images, and maybe videos.
>
> I'm trying to determine the best way to store this data. Specifically:
>
> A) What are the pros and cons of storing the file attachments, images, and
> videos in a database vs. storing them as separate files?
>

There are lots of differences in storing in databases VS file system files.
The main thing for me is if the files them selves are going to be somewhat
on the huge size, consider file system storage. The good side of database
storage is that your files/videos are secured by default (you have to do
almost no extra work to keep them away from prying fingers) except the
streaming factor from the database to the browser. In file system storage,
you need to implement file security (and that can be quite a difficult area
to deal with). Either way you decide, make sure not to put your files that
will be accessed through the browser in the app_data folder. I tried this
and coldn't figure out why nothing worked. Come to find out, it has special
permissions on it preventing browser access of any kind.

> B) I've decided user-contributed articles should not be submitted as HTML.
> I'd have some code to convert newlines to <br /> and otherwise convert the
> articles to HTML. I'm thinking about separate database columns for both
> the original and the converted articles. The advantages would be that the
> site would be fast because no conversion would be needed when serving the
> articles, and it would be possible for a contributor to obtain the
> original if they wanted to edit it and resubmit. I'd be curious if others
> thought this was the best approach.
>

Converting from html to plain text on the fly shouldn't be a problem. Here
is a good article that shows you how to do it with Regex.
http://www.4guysfromrolla.com/webtech/042501-1.shtml

If you don't want people sending articles and stuff in html format, there
isn't any point in keeping the html version. If you don't be careful, it
could leave room for hackers into your database. Just be careful to close
all your holes in database access code if you do decide to keep the original
html version. You also might want to consider scanning the incoming article
text for any kind of scripting languages as well as any sql code. That way
you disarm someone if the intent is to mess with your website. Wish I had
some links for the scripting/sql scanning/removal code, but I use DotNetNuke
5.2 which has the security features I talked about already built in. Any
html/scripting/sql code will be rendered as plain text. The link above just
removes the html/xml code alltogether.


 
Reply With Quote
 
Jonathan Wood
Guest
Posts: n/a
 
      02-28-2010
Alexey Smirnov wrote:

> A) It depends on size of attachments and many other things. You should
> estimate how many users, files and traffic you will get. Imagine that
> your database would be 10 TB, are you sure that you can manage it? In
> most cases you will get an answer that going for a file share is the
> easiest and scalable way.


What do you mean by "are you sure that you can manage it?" What part of
managing do you question if I can do? (Sure not disk space as that would be
needed if I used files as well.)

> B) Replacing <br> on-the-fly should not be a performance problem


There would be other changes too, which could possibly include color-coding
source code.

Thanks.

--
Jonathan Wood
SoftCircuits Programming
http://www.softcircuits.com


 
Reply With Quote
 
Jonathan Wood
Guest
Posts: n/a
 
      02-28-2010
Andy B. wrote:

> There are lots of differences in storing in databases VS file system
> files. The main thing for me is if the files them selves are going to be
> somewhat on the huge size, consider file system storage. The good side of
> database storage is that your files/videos are secured by default (you
> have to do almost no extra work to keep them away from prying fingers)
> except the streaming factor from the database to the browser. In file
> system storage, you need to implement file security (and that can be quite
> a difficult area to deal with). Either way you decide, make sure not to
> put your files that will be accessed through the browser in the app_data
> folder. I tried this and coldn't figure out why nothing worked. Come to
> find out, it has special permissions on it preventing browser access of
> any kind.


Thanks for the last tip.

I can't think of any case where access to the files would be restricted to
anybody--they would be available to all anonymous users. If that's what you
meant by "secured", then that won't be an issue.

>> B) I've decided user-contributed articles should not be submitted as
>> HTML. I'd have some code to convert newlines to <br /> and otherwise
>> convert the articles to HTML. I'm thinking about separate database
>> columns for both the original and the converted articles. The advantages
>> would be that the site would be fast because no conversion would be
>> needed when serving the articles, and it would be possible for a
>> contributor to obtain the original if they wanted to edit it and
>> resubmit. I'd be curious if others thought this was the best approach.
>>

> Converting from html to plain text on the fly shouldn't be a problem. Here
> is a good article that shows you how to do it with Regex.
> http://www.4guysfromrolla.com/webtech/042501-1.shtml


Thanks for the link, but the articles won't initially have HTML. They might
be closer to Wikipedia articles that have their own coding, which must then
be converted to HTML. Also, as I mentioned elsewhere, my conversion might be
fairly involved and include color coding source code and other processing.

> html version. You also might want to consider scanning the incoming
> article text for any kind of scripting languages as well as any sql code.
> That way you disarm someone if the intent is to mess with your website.
> Wish I had some links for the scripting/sql scanning/removal code, but I
> use DotNetNuke 5.2 which has the security features I talked about already
> built in. Any html/scripting/sql code will be rendered as plain text.


Right, that's the main reason I don't want to accept articles that already
have HTML. (Any existing HTML not removed on review will simply be
HTML-encoded.)

Thanks.

Jon


 
Reply With Quote
 
Jonathan Wood
Guest
Posts: n/a
 
      02-28-2010
Well, that doesn't add much to the discuss about the pros and cons of
storing as files vs. storing in the database, but thanks for your comments.

Jon

"Mr. Arnold" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Jonathan Wood wrote:
>> I'm thinking about building a site that would include user-contributed
>> articles, file attachments, images, and maybe videos.
>>
>> I'm trying to determine the best way to store this data. Specifically:
>>
>> A) What are the pros and cons of storing the file attachments, images,
>> and videos in a database vs. storing them as separate files?
>>
>> B) I've decided user-contributed articles should not be submitted as
>> HTML. I'd have some code to convert newlines to <br /> and otherwise
>> convert the articles to HTML. I'm thinking about separate database
>> columns for both the original and the converted articles. The advantages
>> would be that the site would be fast because no conversion would be
>> needed when serving the articles, and it would be possible for a
>> contributor to obtain the original if they wanted to edit it and
>> resubmit. I'd be curious if others thought this was the best approach.
>>
>> I'm interested in hearing all ideas.

>
> Those are files and you need a control that does file uploads and
> downloads. The files should be kept on a file share or file share array.
>
> The database table would only have a record that has the name of the file,
> and the path to the file on the file share a pointer record.
>
> You need a viewer control so when the file is selected for viewing, it
> takes the file extension and makes a determination as to what application
> would be used to allow the viewing of the file. If is's a *.doc, then the
> viewer is going to select MS Word to view the file, etc, etc.
>
> The various application used to view a file will reside on the Web server
> like a file association that the Windows O/S does is most likely going to
> be invoked.
>
> That's how you do a document repository that's controlled by a Web
> application using a browser.
>
>
>

 
Reply With Quote
 
Andy B.
Guest
Posts: n/a
 
      03-01-2010

"Jonathan Wood" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Andy B. wrote:
>
>> There are lots of differences in storing in databases VS file system
>> files. The main thing for me is if the files them selves are going to be
>> somewhat on the huge size, consider file system storage. The good side of
>> database storage is that your files/videos are secured by default (you
>> have to do almost no extra work to keep them away from prying fingers)
>> except the streaming factor from the database to the browser. In file
>> system storage, you need to implement file security (and that can be
>> quite a difficult area to deal with). Either way you decide, make sure
>> not to put your files that will be accessed through the browser in the
>> app_data folder. I tried this and coldn't figure out why nothing worked.
>> Come to find out, it has special permissions on it preventing browser
>> access of any kind.

>
> Thanks for the last tip.
>
> I can't think of any case where access to the files would be restricted to
> anybody--they would be available to all anonymous users. If that's what
> you meant by "secured", then that won't be an issue.
>


That's what I originally meant. The thing you would want to decide is this:
1. Do you want the files to be accessed directly from a URL? Something like
www.yourDomain.com/Filename.mv4. This would allow anyone to bookmark/put it
in their favorites. Depending, this [might] cause some possible security
issues but I wouldn't know at this point what ones they would be. or
2. Have the website be a proxy for the files themselves. Possibly have a
generic handler dealing with the files directly. This would give you better
control on what happens to the files as they go to the browser stream.
Meaning you can tell the server what to do with the files before they are
downloaded. A good example would be when you see an mv4 file that is 300MB
in size on a website, you download it but find out it was actually zipped
and is now 200MB instead of the original uncompressed video. I want to use
this idea for my rss feeds sometime down the road. or
3. Database option. The link would be something like
www.yourDomain.com/fileID=12948&compress=1. This would get the file from the
database and set a query string flag to let you compress the file (if you
wanted to provide that feature). Database option for files has some
drawbacks:
A. It takes more load on the server especially for larger files because the
webserver has to make 2 trips. 1 to the database to get the file stream and
1 to the browser to give the user their content. All of a sudden, a 300MB
file just used 600MB of bandwidth. If you have to worry about bandwidth with
huge files, this might not be for you.
B. You have to write the code to get the files (binary) out of the database,
figure out how to temporarly store it (if you don't just stream it directly)
and then how you are going to get it to the user.
- Are you going to directly stream it with Response.Write? or
- Write it to the filesystem and generate a page with a link on it for
download.



 
Reply With Quote
 
Alexey Smirnov
Guest
Posts: n/a
 
      03-01-2010
On Feb 28, 4:30*pm, "Jonathan Wood" <(E-Mail Removed)> wrote:
> Alexey Smirnov wrote:
> > A) It depends on size of attachments and many other things. You should
> > estimate how many users, files and traffic you will get. Imagine that
> > your database would be 10 TB, are you sure that you can manage it? In
> > most cases you will get an answer that going for a file share is the
> > easiest and scalable way.

>
> What do you mean by "are you sure that you can manage it?" What part of
> managing do you question if I can do? (Sure not disk space as that would be
> needed if I used files as well.)
>
> > B) Replacing <br> on-the-fly should not be a performance problem

>
> There would be other changes too, which could possibly include color-coding
> source code.
>
> Thanks.
>
> --
> Jonathan Wood
> SoftCircuits Programminghttp://www.softcircuits.com


Managing large database could be a challenge. It's not only about disk
space but also hardware and software requirements, and technical
personnel. The database server for a large database would need more
support than a file share.

I have a project with a small 1GB SQL Server database where I have
about 20,000 small pictures (~50-100 Kb). It works well. This way was
chosen because of data replication over multiple web and SQL Servers,
easy backup and security. In order to decrease the load on the
database and to improve the system's performance, I use pre-cached
files for most requested pictures.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
How would you design an XML file to store key-value pairs? Ramon F Herrera XML 3 07-27-2011 08:00 AM
sqldict - You have a dict with unlimited capacity, what do you do?(it can store arbitrary objects too) Krister Hedfors Python 0 04-15-2010 12:21 PM
an oddball scary kind of thing you would think would never happen richard Computer Support 4 01-31-2010 06:34 PM
to store or not to store an image =?Utf-8?B?UnVkeQ==?= ASP .Net 6 03-30-2005 05:51 AM



Advertisments