Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Parsing challenge...

Reply
Thread Tools

Parsing challenge...

 
 
Artco News
Guest
Posts: n/a
 
      10-07-2003
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Thanks


 
Reply With Quote
 
 
 
 
Justin Koivisto
Guest
Posts: n/a
 
      10-07-2003
Artco News wrote:
> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first
> column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access?
>
> Perhaps there are an advanced scripting language can do this easily.


Regex is your friend...

<?php
$fp=fopen('data.txt','r');
$content=fread($fp,filesize('data.txt'));
fclose($fp);
$tmp=time();
$content= preg_replace('/(\r\n|\r|\n)/',$tmp,$content);
$pattern='/NN-111\^(.*)\^/U';
preg_match($pattern,$content,$matches);
$data=explode($tmp,$matches[1]);
unset($matches);
unset($content);
unset($time);
echo '<pre>';
print_r($data);
echo'</pre>';
?>

This will get you an array with each line of data as a separate element.
You should be able to see how to extract the notes and such from the
example. I may be wrong, but it looks like the caret (^) is used as a
field delimiter as well as the newline.

--
Justin Koivisto - http://www.velocityreviews.com/forums/(E-Mail Removed)
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.

 
Reply With Quote
 
 
 
 
Ed Morton
Guest
Posts: n/a
 
      10-07-2003


Artco News wrote:
> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first
> column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access?
>
> Perhaps there are an advanced scripting language can do this easily.
>
> Thanks
>


This will parse them to make the records/fields obvious:

gawk 'BEGIN{pat="NN-"; RS="\n" pat; FS="^"}
{
printf("Record %d = {\n",NR)
$1 = pat $1
for (i = 1; i <= NF; i++ ) {
printf("\tField %d = { %s }\n",i,$i)
}
printf("}\n")
}' inputfile

It'd be trivial to modify the output to whatever format your database
expects. I used NN- on the start of a line as the record separator,
hence the unique handling of the first field to replace that NN-. When
run on your sample input file, this produces:

Record 1 = {
Field 1 = { NN-CODE#1 }
Field 2 = { DESCRIPTION }
Field 3 = { CODE#2 }
Field 4 = { NOTES }
}
Record 2 = {
Field 1 = { NN-110 }
Field 2 = { an info of NN-001 }
Field 3 = { BRY234 }
Field 4 = { some notes }
}
Record 3 = {
Field 1 = { NN-111 }
Field 2 = { 1st line data
2nd line data
3rd line data }
Field 3 = { BRT345 }
Field 4 = { another notes }
}
Record 4 = {
Field 1 = { NN-112 }
Field 2 = { description of NN-112 }
Field 3 = { BBC23 }
Field 4 = { multiline
notes blah
blah
blah }
}
Record 5 = {
Field 1 = { NN-113 }
Field 2 = { info info }
Field 3 = { MNO12 }
Field 4 = { some notes here
}
}

Regards,

Ed.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What libraries should I use for MIME parsing, XML parsing, and MySQL ? John Levine Ruby 0 02-02-2012 11:15 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 08:58 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments