wrote:
> > Regular expressions are for matching patterns, and you do no have a pattern
> > to match. You might use a regular expression to break up the sentences on
> > punctuation, but you're never going to write a regular expression to
> > determine what is and what isn't a "proper" sentence.
> >
> > Matt
>
> Thanks all for the inputs.
>
> Surely, though, there must be a regular expression saying $whatever
> starts with A-Z, has whatever in the middle and ends with .
> (punctuation) ?
>
> M
A starting point (in Ruby):
# Will match multiple contiguous sentences.
re = /(?: ^ | \s )
(
(?:
["('`] *
[A-Z]
[- a-z \s ,;: () '`"]+
[.?!]
[")'`] *
(?: \s+ | $ )
) +
)
/xm
s = DATA.read
s.scan( re ){ |x| x = x.first.strip
if x.split.size > 4
puts x
end
}
__END__
pjkoqwe () asdkj() asdasd...... dasdkasjk ** This is a proper sentence,
right here. Hejrkjlekk werkwe wer werjlkj! Wedkljew erewrkjkj?
Wwlkjfdskjsdflk sdlkfjsdsd sdflkjsd sdfkjsdf, sdfklj sdflkjsdf lksdfj.
1223 sd dskj() sdkjas | asd| |sdasda sadkjasd
"I suppose this is a sentence," he said. THisdsa
askhwerjjk.vfklanf.,,dsf,, .
(A "sentence" at the very end.)
Output:
This is a proper sentence,
right here. Hejrkjlekk werkwe wer werjlkj! Wedkljew erewrkjkj?
Wwlkjfdskjsdflk sdlkfjsdsd sdflkjsd sdfkjsdf, sdfklj sdflkjsdf lksdfj.
"I suppose this is a sentence," he said.
(A "sentence" at the very end.)