Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Splitting a multirecord per file format to a single record per file format: Right approach?

Reply
Thread Tools

Splitting a multirecord per file format to a single record per file format: Right approach?

 
 
Randy Kramer
Guest
Posts: n/a
 
      01-12-2007
I'm trying to write essentially what I guess you'd call a filter (or maybe not
quite exactly). It needs to:

* read multi-line records from a file (one record at a time)
* then, with that one record:
* prepend some additional lines
* make substitutions for some of the lines already in the record
* grab some other portions of the record (less than a line, but usually
multiple words), find the "non-null" pieces, and incorporate those in
another header line
* create a unique filename
* write that (single) record to that file

I got started (maybe) by finding a likely looking piece of code in the Ruby
Cookbook, and tried to modify it to fit my situation:

open('/rhk/work/ask_notes/politics.twk') { |f| f.each('\x80\x81\x82\x83') { |
record| p record } }

At this point, I'm stuck, and need some clues to move forward. (In addition,
I have a few not completely essential to understand questions, below.)

I think the next step is, within the code block / continuation (is that (or
one of those) the right name?), to slurp the entire record into a string,
prepend the additional lines, do the substitutions, ..., and finally write a
single record to the new filename.

Main Question:

Am I on the right track, or must I take some different approach to be able to
process the content of a single record at a time? (I mean, I did a little
experiment (possibly a bad experiment like this:

rec_num = 0

open('/rhk/work/ask_notes/politics.twk') { |f| f.each('\x80\x81\x82\x83') { |
record| rec_num = rec_num + 1 } }

p rec_num

It only counts to one--instead of 70 to reflect the 70 records I know are in
that particular file (and which are all printed out with the earlier version
which has the line "{ |record| p record }").

Other questions: (I could start a thread for each, but I'll start this way and
split them up if I either get too much or not enough response

1. What is the right name for that construction: is that a continuation, a
(code?) block, or something else. (Is it possibly that Ruby calls this a
code block and some other languages call it a continuation, or it is an
example of one kind of continuation available in Ruby?)

2. What's the story on white space in that kind of structure. I experimented
with trying to format it to make it (possibly) easier to read, something like
this:

open('/rhk/work/ask_notes/politics.twk') {
|f| f.each('\x80\x81\x82\x83') {
|record| p record

<anticipated location of code to process a single record>

}
}

But any whitespace (i.e., newlines) that I added just caused syntax errors.
Is there a way to "prettyformat" that structure?

3. The content of the files I have to convert is actually more like this:

<bof>
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
(trailing blank line)
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
(trailing blank line)
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
<eof>

The Ruby code that I copied from the Ruby Cookbook is more aimed at separating
records that end with a record separator (instead of starting with a record
header). I can work this way--I mean, worst case I modify every input file
to do something like remove the first record header from the file and add a
record header at the end of the file, but that's probably not really
necessary.

But, it seems like I'm using not quite the right tool. Is there a better
approach that more exactly fits the format of my files?

Thanks!
Randy Kramer



 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      01-12-2007
On 12.01.2007 15:02, Randy Kramer wrote:
> I'm trying to write essentially what I guess you'd call a filter (or maybe not
> quite exactly). It needs to:
>
> * read multi-line records from a file (one record at a time)
> * then, with that one record:
> * prepend some additional lines
> * make substitutions for some of the lines already in the record
> * grab some other portions of the record (less than a line, but usually
> multiple words), find the "non-null" pieces, and incorporate those in
> another header line
> * create a unique filename
> * write that (single) record to that file

[...]
> 3. The content of the files I have to convert is actually more like this:
>
> <bof>
> Record header ('\x80\x81\x82\x83')
>
> Record (with blank lines)
> (trailing blank line)
> Record header ('\x80\x81\x82\x83')
>
> Record (with blank lines)
> (trailing blank line)
> Record header ('\x80\x81\x82\x83')
>
> Record (with blank lines)
> <eof>


You could do:

# create a class for your records or use OpenStruct
YourRecord = Struct.new :name, :length, :foo, :bar
def dump()
File.open(file_name, "w") do |io|
# whatever
end
end
end


current = nil

File.foreach('your file') do |line|
line.chomp!

case line
when /^<bof>$/
current = YourRecord.new
when /^<eof>$/
current.dump
current = nil
when /Record header/
...
else
# ignore or whatever
end
end

Kind regards

robert
 
Reply With Quote
 
 
 
 
Randy Kramer
Guest
Posts: n/a
 
      01-12-2007
On Friday 12 January 2007 09:15 am, Robert Klemme wrote:
> On 12.01.2007 15:02, Randy Kramer wrote:
> > I'm trying to write essentially what I guess you'd call a filter (or
> > maybe not quite exactly). It needs to:


> You could do:


Thanks--that will get me started!

Randy Kramer

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom Taglib problems - instead of a single instance per page, I have a single instance per application. chris brat Java 1 05-10-2006 11:16 AM
multirecord grid cells - how?? drdave@canoemail.com ASP .Net 4 09-17-2005 03:05 PM
Single row of data displayed horizontally next to label names with paging per record... possible? Jason Caid ASP .Net Datagrid Control 0 09-07-2005 12:03 AM
Re: Splitting up the definitions of a class into different files (splitting public from private)? Mark C++ 0 07-19-2003 04:24 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Ericson C++ 0 07-19-2003 04:03 PM



Advertisments