Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Regex question(how easy/hard to do it in ruby)

Reply
Thread Tools

Regex question(how easy/hard to do it in ruby)

 
 
Sarah Tanembaum
Guest
Posts: n/a
 
      05-04-2004
Pointers, please...

I have this text in a comma delimited file with the following
characteristic:

ccc-123456, <multiline data>,

Field number:

1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-

1b - after the dash, it follows by numbers starting from
1 to 99999

2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc ...

3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an "@", 1-7chars, and followed by
1-4 numbers

My question is :

1a. how to parse the first field(field 1a) so I can manipulate/rename it to
a new label dending on what label they have currently

1b. in field 1b, instead of just 1 number, I'd like to pad
them with leading zero so, 1 -> 000001,
1494 -> 001494, 560987->560987(no change).

2. capture 2nd field and escape the special characters with ascii number

3. capture 3rd field and parse them as well just as field 1.

THanks


 
Reply With Quote
 
 
 
 
Ara.T.Howard
Guest
Posts: n/a
 
      05-04-2004
On Mon, 3 May 2004, Sarah Tanembaum wrote:

> Pointers, please...
>
> I have this text in a comma delimited file with the following
> characteristic:
>
> ccc-123456, <multiline data>,
>
> Field number:
>
> 1a - its always begin with 1 to 3 characters followed by
> a dash, e.g JKL-, A-, NM-, PQ-
>
> 1b - after the dash, it follows by numbers starting from
> 1 to 99999
>
> 2 - a multiline data with either or both newline chars(\n)
> and/or cariage-return char(\r), or both(\r\n). This field
> might include special characters such as a
> single(') or double(") quote, a space, characters
> with ascii number > 127 - accented character,
> umlaud, etc ...
>
> 3 - this field contain at least 2 line to at most 5 line of
> data where each line might be
> Begin with 2-3 chars, e.g GH@OPRJGPF1234
> followed by an "@", 1-7chars, and followed by
> 1-4 numbers
>
> My question is :
>
> 1a. how to parse the first field(field 1a) so I can manipulate/rename it to
> a new label dending on what label they have currently


what exactly do you mean by this? if you want to parse the fields themselves
out use the 'csv' module included with ruby...

> 1b. in field 1b, instead of just 1 number, I'd like to pad
> them with leading zero so, 1 -> 000001,
> 1494 -> 001494, 560987->560987(no change).


~ > ruby -e 'p(sprintf("%06.6d", 42))'
"000042"

~ > man 3 printf

> 2. capture 2nd field and escape the special characters with ascii number


esc = '\\'[0]
munged = ''
field_2.each_byte{|c| munged << esc if c > 127; munged << c}
field_2 = munged

you could also use a regex to do this...

special = %r/([#{ 127.chr }-#{ 255.chr })]/o
field_2.gsub!(special){|match| "\\#{ match }"}

>
> 3. capture 3rd field and parse them as well just as field 1.
>
> THanks



can you post some sample data? we could probably say more then...


-a
--
================================================== =============================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL :: http://www.ngdc.noaa.gov/stp/
| TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done
================================================== =============================

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM
Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine? =?Utf-8?B?SmViQnVzaGVsbA==?= ASP .Net 2 10-22-2005 02:43 PM
Java regex imposture re: Perl regex compatibility a_c_Attlee@yahoo.com Java 2 05-06-2005 12:16 AM
perl regex to java regex Rick Venter Java 5 11-06-2003 10:55 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57