Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Design of a pipelined architecture/framework for handling large data sets

Reply
Thread Tools

Design of a pipelined architecture/framework for handling large data sets

 
 
nish
Guest
Posts: n/a
 
      11-30-2006
I am facing an inconvenience which I believe should have been faced
before by other Java developers but I am finding it difficult to
articulate it in keywords so that google will give me the right
answers..so here goes

1. I am using eclipse ide with mulitple java projects, each one sourced
from a CVS repository on an external server in the local LAN.
2. Almost all of these projects basically handle big data sets (read
100mbs - 500mbs of xml and text files) which is basically data crawled
from the web, act and transform it in some way and then pass it along
for other projects to act on it. Some of hte data is in single big
files and some of it is in 100's of small files inside a single
directory.

Basically what I am looking for is a better way to handle this data.
Currently if I put the data in CVS then it is not that efficient , plus
there needs to be some central lookup for all the data.I guess this is
partly a java design question and partly ignorance on my part to use
the right tools to do this job.

Thanks for any help.

 
Reply With Quote
 
 
 
 
nish
Guest
Posts: n/a
 
      11-30-2006
Other issues I could think about:

3. I should be able to specify how a data set is being archived. So for
example for some large data I dont want it to be revisioned in CVS
because it is not going to change, for others i might want it to be
checked into cvs so that it gets revisioned


nish wrote:
> I am facing an inconvenience which I believe should have been faced
> before by other Java developers but I am finding it difficult to
> articulate it in keywords so that google will give me the right
> answers..so here goes
>
> 1. I am using eclipse ide with mulitple java projects, each one sourced
> from a CVS repository on an external server in the local LAN.
> 2. Almost all of these projects basically handle big data sets (read
> 100mbs - 500mbs of xml and text files) which is basically data crawled
> from the web, act and transform it in some way and then pass it along
> for other projects to act on it. Some of hte data is in single big
> files and some of it is in 100's of small files inside a single
> directory.
>
> Basically what I am looking for is a better way to handle this data.
> Currently if I put the data in CVS then it is not that efficient , plus
> there needs to be some central lookup for all the data.I guess this is
> partly a java design question and partly ignorance on my part to use
> the right tools to do this job.
>
> Thanks for any help.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
building an adder tree for a pipelined fixed point dot product wallge VHDL 0 02-20-2006 10:34 PM
Re: generic pipelined comparator and package MB VHDL 1 11-20-2005 04:14 AM
Need suggestion abt FFs without RST for pipelined datapath. john.deepu@gmail.com VHDL 5 03-03-2005 02:45 PM
Pipelined binary encoder Fred Bartoli VHDL 1 11-10-2004 12:33 AM
handling large data sets Martin Pirker Ruby 10 12-09-2003 07:02 PM



Advertisments