![]() |
|
|
|
#1 |
|
anal_aviator wrote:
> Hi, > > I have about 40,000 blocks sequentially in a file. each block is 1024 bytes, > > My application can either read or write blocks, but what i need to do is > keep the original file intact , and build up log , so that when a block is > written , any further reads , read from 'that' block and not the original > file. Sounds like a cache. You could hold the updated blocks in a hash table or tree or some such, keyed on block number, search the cache first and refer to the original file only when the cache doesn't hold the block of interest. But at forty megabytes it's probably simpler just to make a working copy of the original file and muck with the copy as freely as you like. Maybe just read it all into memory and hammer on it if persistence isn't needed. > So basically we start off with a virgin file, which is the root, then log and > continually repoint as blocks are written to some place else. > The sticky problem is that we need to keep a history of the blocks that were > re-written along with the data , that they held. So you create another sequential file in which you log each change as you make it. You'll surely want the pre-change data, possibly the post-change data as well, maybe with time stamps and other decorations of your choosing. > Ideally each block in the master file should have some sort of linked list/ > history attached to it , that not only shows the history of writes /reads to > a given block , but maintains them in a contest of r/w to the overall file. Hunh? Didn't you say you needed to keep the original file intact? How does that square with making changes to the file? I don't get it. You can record the reads as well as the writes in your log, if you feel like it. (What is this: Some kind of security app where you want to be able to prove that So-And-So peeked at Such-And-Such's tax records outside office hours?) You'll need to make some decisions about how to archive your ever-expanding log, though. > it is very much like a transactional file system, fortunately the number of > writes will be minimal, so it is easily manageable form a storage point of > view. If you want the semantics of a transactional file system, you might consider using -- forgive me if this idea is simply too weird -- a transactional file system ... > It must also be very fast , generally it should not get slower if a > particular block starts to build up a chain of writes, the idea being that a > read goes straight to the head of the last write to a particular block. (so > a linked list is out), also maintenance should be minimal. Sorry; I'm unable to decipher this last set of requirements. What, exactly, must be "fast?" How "fast" must it be to qualify as "very fast?" Are you talking about throughput or about latency? And what do you mean by "maintenance" (programmer time, reorg time, backup time, ...)? And how far are you prepared to compromise on the other requirements to keep it "minimal?" -- Eric Sosman lid Eric Sosman |
|
|
|
|
#2 |
|
Posts: n/a
|
On Sat, 21 Feb 2009 08:57:03 -0500, Eric Sosman wrote:
> anal_aviator wrote: >> it is very much like a transactional file system, fortunately the >> number of writes will be minimal, so it is easily manageable form a >> storage point of view. > > If you want the semantics of a transactional file system, > you might consider using -- forgive me if this idea is simply too weird > -- a transactional file system ... > So why not make it one and use a database table? Each row would hold the original block together with a two field prime key. The first field would be the sequence number of the original block in its file and the second would start from zero and be incremented for each edited copy of the block. Add any more information you might need, e.g. the edit timestamp and user name, and you're done. Access is fast since the prime key index is only 40,000+ entries. changes are easily found by comparing the block keyed with n,k-1 with n,k and change logs are easily extracted. >> It must also be very fast , generally it should not get slower if a >> particular block starts to build up a chain of writes, > The speed impact should be small: select block_content from block_table where seqno=? order by editno descending limit 1; The rows are fairly small (a payload of only 1K bytes is nothing) so the extract and sort needed to retrieve the latest edit should fast and only the required data would be returned to the JDBC client. -- martin@ | Martin Gregorie gregorie. | Essex, UK org | Martin Gregorie |
|
|
|
#3 |
|
Posts: n/a
|
On Sat, 21 Feb 2009 21:57:03 +0800, Eric Sosman wrote
(in article <gnp17k$kpm$>): > anal_aviator wrote: >> Hi, >> >> I have about 40,000 blocks sequentially in a file. each block is 1024 bytes, >> >> My application can either read or write blocks, but what i need to do is >> keep the original file intact , and build up log , so that when a block >> is >> written , any further reads , read from 'that' block and not the original >> file. > > Sounds like a cache. You could hold the updated blocks in > a hash table or tree or some such, keyed on block number, search > the cache first and refer to the original file only when the > cache doesn't hold the block of interest. > > But at forty megabytes it's probably simpler just to make > a working copy of the original file and muck with the copy as > freely as you like. Maybe just read it all into memory and > hammer on it if persistence isn't needed. > Yep We could do that , but it does not give us a history, only a 'snapshot' . >> So basically we start off with a virgin file, which is the root, then log >> and >> continually repoint as blocks are written to some place else. >> The sticky problem is that we need to keep a history of the blocks that >> were >> re-written along with the data , that they held. > > So you create another sequential file in which you log each > change as you make it. You'll surely want the pre-change data, > possibly the post-change data as well, maybe with time stamps > and other decorations of your choosing. > >> Ideally each block in the master file should have some sort of linked list/ >> history attached to it , that not only shows the history of writes /reads >> to >> a given block , but maintains them in a contest of r/w to the overall file. > > Hunh? Didn't you say you needed to keep the original file > intact? How does that square with making changes to the file? > I don't get it. Basically the master file is the 'root' , and writes would branch off in some sort of data structure. > > You can record the reads as well as the writes in your log, > if you feel like it. (What is this: Some kind of security app > where you want to be able to prove that So-And-So peeked at > Such-And-Such's tax records outside office hours?) You'll need > to make some decisions about how to archive your ever-expanding > log, though. > >> it is very much like a transactional file system, fortunately the number >> of >> writes will be minimal, so it is easily manageable form a storage point of >> view. > > If you want the semantics of a transactional file system, > you might consider using -- forgive me if this idea is simply > too weird -- a transactional file system ... > That was just an example to try and convey the idea, the only issue with a TFS , it that the history is trashed once the transactions are written to disk. >> It must also be very fast , generally it should not get slower if a >> particular block starts to build up a chain of writes, the idea being that >> a >> read goes straight to the head of the last write to a particular block. >> (so >> a linked list is out), also maintenance should be minimal. > > Sorry; I'm unable to decipher this last set of requirements. > What, exactly, must be "fast?" How "fast" must it be to qualify > as "very fast?" Are you talking about throughput or about > latency? And what do you mean by "maintenance" (programmer time, > reorg time, backup time, ...)? And how far are you prepared to > compromise on the other requirements to keep it "minimal?" Fast as it can be ,compared to other available solutions, linked lists are potentially slow when they get longer, because you have to chain down the length of the transactions. > > steve |
|
|
|
#4 |
|
Posts: n/a
|
Please do not top-post.
anal_aviator wrote: > Yes It could be a solution, but it means cracking open a database, extra > support infrastructure and something else to go wrong. Don't be afraid of database programming. Once you get used to it it isn't all that hard. The built-in Java DB (a.k.a., "Derby") is fairly easy to set up, comes free with Sun's JDK, and is well worth learning. > I had looked at oracle [sic], but things just seemed to get more complicated and > bigger. Go with Java DB, then. > I'm looking for a single machine solution, plus i [sic] have to keep network > traffic to a minimum , since my 'monitored' data traffic is coming in over > tcp Databases need not add to network traffic, nor be excessivly complicated. I'm not saying that database is the answer for you necessarily, only that complexity and network traffic concerns need not stop you from using one. It is true that to master database usage is a learning curve, but simple uses with simple table structures don't take all that long to put together, nor do they require massive administration. -- Lew Lew |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| logging to CISCO lan switch 3560 through key based SSH authentication. | veena bhaskar | Hardware | 1 | 10-16-2008 10:59 AM |
| xslt for tree structure. please help me | arumahi | Software | 0 | 09-03-2007 04:29 PM |
| ASP.Net Project Structure Question | koraykazgan | Software | 0 | 08-10-2007 08:23 AM |
| multiuser XP reboots when logging off | opie | General Help Related Topics | 3 | 04-10-2007 11:10 AM |
| Logging Link UP/DOWN Status (4506) To Console | prad | General Help Related Topics | 0 | 08-15-2006 05:35 PM |