Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > removing paragraphs from text files

Reply
Thread Tools

removing paragraphs from text files

 
 
alfonsobaldaserra
Guest
Posts: n/a
 
      07-13-2009
hello,

i have a specific paragraph in a bunch of configuration files that i
want to remove. the lines are as follows

define service{
use linux-service
host_name ninjasrv
service_description PING
check_command check_ping!100.0,20%!500.0,60%
action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=
$SERVICEDESC$
}

the 'use' and 'host_name' directives are different in each file. the
unique string is 'PING'.

i was just wondering if it is possible to do such thing in Perl?

thanks.
 
Reply With Quote
 
 
 
 
alfonsobaldaserra
Guest
Posts: n/a
 
      07-13-2009
> * * perl -p0777 -i -e 's/define service\{[^}]*PING[^}]*\}\s+//g' *.cf

that was so amazing, all done in a single shot. could you please also
help on what exactly is -p0777 and how did this substitution work 's/
define service\{[^}]*PING[^}]*\}\s+//g'. i have never seen/read such
regex.

thanks again.
 
Reply With Quote
 
 
 
 
alfonsobaldaserra
Guest
Posts: n/a
 
      07-13-2009
> help on what exactly is -p0777 and how did this substitution work 's/
> define service\{[^}]*PING[^}]*\}\s+//g'. *i have never seen/read such
> regex.


i just found
-0777
the separator between records is 777 in octal; this is not a real
ASCII char so the whole file is slurped in as a single record;

now my confusion is the regex match.
it goes like, search for
define service followed by a { then any characters but not } then PING
then any characters but not } then atleast one space and replace with
nothing. i am just wondering what exactly is this [^}]* doing. i
tried it with .* like

define service\{.*PING.*\}\s+//g
but it would not replace.

my understanding is that it should work because [^}]* (any character
but not }) is same as .* in this case since I know there is no }
before PING string.

what am i missing?
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      07-13-2009
On 2009-07-13 08:52, alfonsobaldaserra <(E-Mail Removed)> wrote:
>> how did this substitution work 's/ define
>> service\{[^}]*PING[^}]*\}\s+//g'. *i have never seen/read such regex.

[...]
> now my confusion is the regex match.
> it goes like, search for
> define service followed by a { then any characters but not } then PING
> then any characters but not } then atleast one space and replace with
> nothing. i am just wondering what exactly is this [^}]* doing. i
> tried it with .* like
>
> define service\{.*PING.*\}\s+//g
> but it would not replace.
>
> my understanding is that it should work because [^}]* (any character
> but not }) is same as .* in this case since I know there is no }
> before PING string.


/./ is not "any character" but "any character except newline" unless you
use the /s modifier. So your substitution would only work if the whole
section was on a single line.

s/define service\{.*PING.*\}\s+//sg

OTOH would match anything from the first "define service{" to the last
"}" in the file (provided there's a PING somewhere between them) so it
would probably remove a lot more than you want. The /[^}]*/ in Tad's
regex is there to keep the match within a single brace-delimited block
(and it's a bit simple-minded: It won't work if you have a } inside a
comment, for example, but you probably don't, so that doesn't matter).

hp
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      07-13-2009
On Mon, 13 Jul 2009 01:52:14 -0700 (PDT), alfonsobaldaserra <(E-Mail Removed)> wrote:

>> help on what exactly is -p0777 and how did this substitution work 's/
>> define service\{[^}]*PING[^}]*\}\s+//g'. *i have never seen/read such
>> regex.

>
>i just found
>-0777
> the separator between records is 777 in octal; this is not a real
>ASCII char so the whole file is slurped in as a single record;
>
>now my confusion is the regex match.
>it goes like, search for
>define service followed by a { then any characters but not } then PING
>then any characters but not } then atleast one space and replace with
>nothing. i am just wondering what exactly is this [^}]* doing. i
>tried it with .* like
>
>define service\{.*PING.*\}\s+//g
>but it would not replace.
>
>my understanding is that it should work because [^}]* (any character
>but not }) is same as .* in this case since I know there is no }
>before PING string.
>
>what am i missing?


If you have never read such a regex, you don't know regex. This is very simple.
You should visit this group/site more often.

Assuming a slurped in file and your test: s/define service\{.*PING.*\}\s+//g,
as Holzer said .* will greedily grab all the chars up until the last anchor 'PING.*\}\s+',
that is all except '\n' newline because you don't have /s modifier, and won't match anything.
Try 's/define service\{.*PING.*\}\s+//sg'.

Also, using greedy quantifiers with '.' is a tricky prospect. They have thier place
though. Most beginners just throw '.*' in the middle of thier regex, when in reality,
they should only be put in when the regex can already be described without them,
if at all.

The reason is that there is no guarantee of the shape of text when it is written to
a file, none! For this reason, regexs' should be molded with at least a certain level
of built in error checking (qualification). And while not %100, 90-95 will do as a
minimal QA check.

Thus, Tad used the '[^}]*' character class to describe all characters, but one.
Specifically NOT '}' which would signify the end of a block. Which leads to the next
problem:

How do you know the syntax of what the known parser uses to extract information
from that file? Even if the form of the writer is simple, even custom, there may be
anomolies introduced from the file system, even if the writer changes form, then what?
Surely you would want a little robustness of QA built into the regex.

Tad gave you what you wanted from your simple problem statement. Indeed it was stated
in simple terms, that would not be acceptable in a production environment.

A lot of times (most of them) here on this group/site, that is the case.
It just amazes me sometimes that people come back with, 'but it doesen't work if I
have this condition', that was never stated.

Tads regex could have been written (untested) like this:

/define\s+service\s*\{[^}]*service_description\s+PING[^}]*\}\s*//g

and still work, that maybe give some variability the way normal parsers work.
But you didn't state information on where it came from or how it is parsed.
Whether 'use' or 'service_description' any other other var type is there,
what order, required, etc...

No, you stated PING, the only constant, is in this form:
'define service{PING}'

Not alot to go on, but don't expect this to be a real parser unless you understand
the RULES.

Good luck.

-sln

 
Reply With Quote
 
Eric Pozharski
Guest
Posts: n/a
 
      07-14-2009
On 2009-07-13, Peter J. Holzer <(E-Mail Removed)> wrote:
*SKIP*
*skipping alfonsobaldaserra since he skipped Tad anyway*

> s/define service\{.*PING.*\}\s+//sg
>
> OTOH would match anything from the first "define service{" to the last
> "}" in the file (provided there's a PING somewhere between them) so it
> would probably remove a lot more than you want. The /[^}]*/ in Tad's
> regex is there to keep the match within a single brace-delimited block
> (and it's a bit simple-minded: It won't work if you have a } inside a
> comment, for example, but you probably don't, so that doesn't matter).


Then stricter

qr/\}\n+/

and stricter

qr/\}(?:\h*\n)+/ # needs 5.10

and stricter

qr/\}\h*\n(?:\h*\n)*/

What leads as to

perdoc -q nesting

and applieing regexes at HTML.


--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
 
Reply With Quote
 
alfonsobaldaserra
Guest
Posts: n/a
 
      07-15-2009
> Not alot to go on, but don't expect this to be a real parser unless you understand
> the RULES.


that was an excellent explanation. thank you very much guys, i have
understood it now.

 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      07-15-2009
On Tue, 14 Jul 2009 16:04:44 +0300, Eric Pozharski <(E-Mail Removed)> wrote:

>On 2009-07-13, Peter J. Holzer <(E-Mail Removed)> wrote:
>*SKIP*
>*skipping alfonsobaldaserra since he skipped Tad anyway*
>
>> s/define service\{.*PING.*\}\s+//sg
>>
>> OTOH would match anything from the first "define service{" to the last
>> "}" in the file (provided there's a PING somewhere between them) so it
>> would probably remove a lot more than you want. The /[^}]*/ in Tad's
>> regex is there to keep the match within a single brace-delimited block
>> (and it's a bit simple-minded: It won't work if you have a } inside a
>> comment, for example, but you probably don't, so that doesn't matter).

>
>Then stricter
>
> qr/\}\n+/
>
>and stricter
>
> qr/\}(?:\h*\n)+/ # needs 5.10

^^^^
510 is great, a lot of new stuff in the engine.
New nesting, etc.

When you can write a regex without the need for the
# needs 5.10
maybe it might be usefull.

Btw, I don't think anybody skipped Tad, who never skips
anybody.

-sln
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
newbie: css & "removing blank line between paragraphs" Jeff HTML 3 04-13-2010 09:27 AM
Default leading for paragraphs City Dweller HTML 13 04-08-2006 10:37 AM
HTML Paragraphs from Text poopdeville@gmail.com Perl Misc 5 12-24-2005 04:48 PM
Collapsable paragraphs... Rlrcstr ASP .Net 7 05-17-2005 12:34 AM
"left floated" paragraphs and images using CSS? jersie0 HTML 0 11-23-2003 02:43 AM



Advertisments