Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > moving data in a file without using system memory

Reply
Thread Tools

moving data in a file without using system memory

 
 
ulyses
Guest
Posts: n/a
 
      01-09-2006
Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?

Thanks for help,
John

 
Reply With Quote
 
 
 
 
Nelu
Guest
Posts: n/a
 
      01-09-2006
On 2006-01-09, ulyses <(E-Mail Removed)> wrote:
> Let's assume I have following file:
>
> 2938929384902491233.....
> 923949919199191919112....
>
> File contains INTs only. What is more they are huge. For example first
> row in file may contain integer which size is 50MB and the second 30MB.
> Now we come to my problem. Is there possibility to swap this rows
> without using system memory (preferably in Unix/Linux)? Is there any
> function in C to do this?
>

I assume you want to make the swap without using char *, because you
need to use some memory.
One way you could do this is to use two files: f1 and f2.
You can read the first line, character by character and put it in the
f1 file using fgetc and fputc.
Read the second line in the same way and put it in f2.
Create f3 or rewrite the initial file, write the content of f2 (also
with fgetc and fputc) to f3, add the EOL and then add the content
of f1 to f3. Your lines should be swapped. Now, you have to pay
attention at what EOL means both when reading and writing. Also,
given the fact that you have integers, you could use isdigit() (pay
attention to negative numbers).

--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)
 
Reply With Quote
 
 
 
 
Alex Colvin
Guest
Posts: n/a
 
      01-09-2006

>2938929384902491233.....
>923949919199191919112....


>File contains INTs only. What is more they are huge. For example first
>row in file may contain integer which size is 50MB and the second 30MB.
>Now we come to my problem. Is there possibility to swap this rows
>without using system memory (preferably in Unix/Linux)? Is there any
>function in C to do this?


First you have to scan the first 50MB to locate the end-of-line, unless
you happen to know it. That doesn't take a whole lot of memory.

Swapping the lines using a second file is pretty trivial. Just start
copying the second line to the destination. When you're done, rewind the
source and copy the first.

If you can't actually afford a second file, the problem gets more
interesting. You want to slide the second 30MB forward in the file, and
the first 50MB back. Although it's tricky, this too can be done
incrementally, using very little memory. Look up peristaltic in-place
out-of-core permutation.

--
mac the naf
 
Reply With Quote
 
Pascal Bourguignon
Guest
Posts: n/a
 
      01-09-2006
Nelu <(E-Mail Removed)> writes:

> On 2006-01-09, ulyses <(E-Mail Removed)> wrote:
>> Let's assume I have following file:
>>
>> 2938929384902491233.....
>> 923949919199191919112....
>>
>> File contains INTs only. What is more they are huge. For example first
>> row in file may contain integer which size is 50MB and the second 30MB.
>> Now we come to my problem. Is there possibility to swap this rows
>> without using system memory (preferably in Unix/Linux)? Is there any
>> function in C to do this?
>>

> I assume you want to make the swap without using char *, because you
> need to use some memory.
> One way you could do this is to use two files: f1 and f2.
> You can read the first line, character by character and put it in the
> f1 file using fgetc and fputc.
> Read the second line in the same way and put it in f2.
> Create f3 or rewrite the initial file, write the content of f2 (also
> with fgetc and fputc) to f3, add the EOL and then add the content
> of f1 to f3. Your lines should be swapped. Now, you have to pay
> attention at what EOL means both when reading and writing. Also,
> given the fact that you have integers, you could use isdigit() (pay
> attention to negative numbers).


This wouldn't prevent the use of system memory, since any I/O will be
cached by default.

To avoid using system memory you'd have to use the raw device drive on
linux. Perhaps there is something similar on other OSes. Anyways, it's
out of topic on comp.lang.c...

Perhaps the OP could explain what he wants exactly, since it's rather
silly to buy GigaBytes of RAM not to use it later...


--
__Pascal Bourguignon__ http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
 
Reply With Quote
 
Walter Roberson
Guest
Posts: n/a
 
      01-09-2006
In article <(E-Mail Removed)>,
Pascal Bourguignon <(E-Mail Removed)> wrote:
>Nelu <(E-Mail Removed)> writes:
>
>> On 2006-01-09, ulyses <(E-Mail Removed)> wrote:


>>> File contains INTs only. What is more they are huge. For example first
>>> row in file may contain integer which size is 50MB and the second 30MB.
>>> Now we come to my problem. Is there possibility to swap this rows
>>> without using system memory (preferably in Unix/Linux)?


>> I assume you want to make the swap without using char *, because you
>> need to use some memory.
>> One way you could do this is to use two files: f1 and f2.


>This wouldn't prevent the use of system memory, since any I/O will be
>cached by default.


>Perhaps the OP could explain what he wants exactly, since it's rather
>silly to buy GigaBytes of RAM not to use it later...


My speculation would be that the OP does not want to read the entire
file into memory, as that might be a strain on the system resources.

The OP gave, by the way, no indication that gigabytes of RAM are available
nor that the file contents would be able to fit within the address space
available.


If the OP has numerous rows to exchange, then a "create a new file
each time" algorithm could get rather slow. What might be practical,
though, is to create a couple of I/O wrapper routines that kept
track of the files and which "logically current" portions of the files
are physically somewhere else (probably easiest in this circumstance
if the I/O wrappers operated at the "record" level.) Then, at the
end, create as many temporary files as there were files logically
written to, and for each logically-written file do a linear run
pulling the data out of the original data files in the appropriate order;
when the temporary files are fully generated, then either rename
them into existance where the original data files were, or else
[e.g., for permissions or NFS reasons] then copy the data out of the
temporary files into the original files and trunc() the original files
and throw away the temporary files.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      01-09-2006
"ulyses" <(E-Mail Removed)> writes:
> Let's assume I have following file:
>
> 2938929384902491233.....
> 923949919199191919112....
>
> File contains INTs only. What is more they are huge. For example first
> row in file may contain integer which size is 50MB and the second 30MB.
> Now we come to my problem. Is there possibility to swap this rows
> without using system memory (preferably in Unix/Linux)? Is there any
> function in C to do this?


Do you mean that your file is a text file where each line consists of
a sequence decimal digits representing a large integer?

If so, for purposes of you problem, you can just treat them as
sequences of characters (which happen to be decimal digits); the fact
that they represent large integers is irrelevant.

Can the file contain arbitrarily many lines?

Here's one possible approach. Read the input file, keeping track of
where each new line starts. Use ftell() or fgetpos() and build a list
or array of indices.

Then traverse your index in reverse order. For each index, use
fseek() or fsetpos() to jump to that location in the file; read up to
the end of the current line and write to your output file.

(Note that some systems may provide a command to do this -- and 50MB
isn't all that much memory these days. Since you mentioned
Unix/Linux, <OT>"man tac" or "info tac"</OT>.)

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a
 
      01-11-2006

"ulyses" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...

> Let's assume I have following file:
>
> 2938929384902491233.....
> 923949919199191919112....
>
> File contains INTs only. What is more they are huge. For example first
> row in file may contain integer which size is 50MB and the second 30MB.
> Now we come to my problem. Is there possibility to swap this rows
> without using system memory (preferably in Unix/Linux)? Is there any
> function in C to do this?


It's not clear what you mean by "without using system memory". Why
wouldn't you want to use memory if this results in faster operation?

For example, you could move 16Kb at a time, and rig it to use very
little system memory. But then the head of the disk would have to jump back
and forth 5,000 times. It would almost certainly make more sense to use more
memory.

Are you concerned about consumption of system cache?

It's a bit tricky, but there are exchange algorithms. You can tweak them
to minimize user memory consumption and then let the system figure out how
much cache is appropriate. You can hint to the cache to help with that.

DS



 
Reply With Quote
 
ulyses
Guest
Posts: n/a
 
      01-12-2006
David Schwartz napisal(a):
> "ulyses" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed) ups.com...
>
> > Let's assume I have following file:
> >
> > 2938929384902491233.....
> > 923949919199191919112....
> >
> > File contains INTs only. What is more they are huge. For example first
> > row in file may contain integer which size is 50MB and the second 30MB.
> > Now we come to my problem. Is there possibility to swap this rows
> > without using system memory (preferably in Unix/Linux)? Is there any
> > function in C to do this?

>
> It's not clear what you mean by "without using system memory". Why
> wouldn't you want to use memory if this results in faster operation?
>


I thouhg about something that would be some kind of low level function
that would modify pointers not real data, the same way as we do with
pointer to data in memory, e.g.:
int a*, b*, temp*;
....
//the swap
temp = a;
a = b;
b = temp;

And the data was swapped without moving it in memory. I asked if there
is such functionality that would enable me to do something like that
but with rows in file. Swap them without moving them on disk.

Thanks again for help,
John

 
Reply With Quote
 
Barry Margolin
Guest
Posts: n/a
 
      01-13-2006
In article <(E-Mail Removed) .com>,
"ulyses" <(E-Mail Removed)> wrote:

> David Schwartz napisal(a):
> > "ulyses" <(E-Mail Removed)> wrote in message
> > news:(E-Mail Removed) ups.com...
> >
> > > Let's assume I have following file:
> > >
> > > 2938929384902491233.....
> > > 923949919199191919112....
> > >
> > > File contains INTs only. What is more they are huge. For example first
> > > row in file may contain integer which size is 50MB and the second 30MB.
> > > Now we come to my problem. Is there possibility to swap this rows
> > > without using system memory (preferably in Unix/Linux)? Is there any
> > > function in C to do this?

> >
> > It's not clear what you mean by "without using system memory". Why
> > wouldn't you want to use memory if this results in faster operation?
> >

>
> I thouhg about something that would be some kind of low level function
> that would modify pointers not real data, the same way as we do with
> pointer to data in memory, e.g.:
> int a*, b*, temp*;
> ...
> //the swap
> temp = a;
> a = b;
> b = temp;
>
> And the data was swapped without moving it in memory. I asked if there
> is such functionality that would enable me to do something like that
> but with rows in file. Swap them without moving them on disk.


No, because the information in files is not organized by lines, it's
just sequences of bytes organized by disk blocks. If you wanted to
rearrange the blocks it would theoretically be possible to update the
block pointers in the inode. However, there's no API for this, so you
would have to do it by writing a new ioctl or device driver, or by
accessing the disk device directly (and you'd need to unmount the
filesystem first, to avoid conflicts with the kernel's in-memory copies
of inodes).

But to rearrange lines you have to read the file into memory, search for
the newline characters, then write the lines back out to the file.

--
Barry Margolin, (E-Mail Removed)
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      01-13-2006
Barry Margolin <(E-Mail Removed)> writes:
> In article <(E-Mail Removed) .com>,
> "ulyses" <(E-Mail Removed)> wrote:
>> David Schwartz napisal(a):
>> > "ulyses" <(E-Mail Removed)> wrote in message
>> > news:(E-Mail Removed) ups.com...
>> >
>> > > Let's assume I have following file:
>> > >
>> > > 2938929384902491233.....
>> > > 923949919199191919112....
>> > >
>> > > File contains INTs only. What is more they are huge. For example first
>> > > row in file may contain integer which size is 50MB and the second 30MB.
>> > > Now we come to my problem. Is there possibility to swap this rows
>> > > without using system memory (preferably in Unix/Linux)? Is there any
>> > > function in C to do this?
>> >
>> > It's not clear what you mean by "without using system memory". Why
>> > wouldn't you want to use memory if this results in faster operation?
>> >

>>
>> I thouhg about something that would be some kind of low level function
>> that would modify pointers not real data, the same way as we do with
>> pointer to data in memory, e.g.:
>> int a*, b*, temp*;
>> ...
>> //the swap
>> temp = a;
>> a = b;
>> b = temp;
>>
>> And the data was swapped without moving it in memory. I asked if there
>> is such functionality that would enable me to do something like that
>> but with rows in file. Swap them without moving them on disk.

>
> No, because the information in files is not organized by lines, it's
> just sequences of bytes organized by disk blocks. If you wanted to
> rearrange the blocks it would theoretically be possible to update the
> block pointers in the inode. However, there's no API for this, so you
> would have to do it by writing a new ioctl or device driver, or by
> accessing the disk device directly (and you'd need to unmount the
> filesystem first, to avoid conflicts with the kernel's in-memory copies
> of inodes).
>
> But to rearrange lines you have to read the file into memory, search for
> the newline characters, then write the lines back out to the file.


If you want to rearrange the lines so you can read the rearranged file
with ordinary stdio calls, you'll need to physically re-write the file
(unless you can manage to do some nasty low-level file system stuff).
But if you want to be able to access the lines in some specified order
other than their physical order in the file, you can just create a
separate index. Do one pass over the file, creating an index of the
position of the start of each line (using ftell() or fgetpos()). Once
you have the index, you can use fseek() or fsetpos() to jump directly
to the beginning of any line you want.

Reading the file in reverse order is likely to be less efficient than
if you had physically reversed the file; the performance tradeoff
depends on how often you read it.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Moving Guide Moving Companies Movers Storage linkswanted ASP .Net 0 01-23-2008 01:08 AM
Free Moving Estimate, Local Movers, Long Distance Moving, PackingSupplies, Storage Rental, Home Moving, Apartment Moving, Office Moving,Commercial Moving linkswanted ASP .Net 0 01-06-2008 04:45 AM
Moving to California Moving to L.A linkswanted Digital Photography 3 01-05-2008 04:22 PM
movers los angeles moving storage services movers in L.A companymoving companies moving in calfornia los angeles linkswanted HTML 0 12-21-2007 10:55 PM



Advertisments