On Aug 1, 12:53*pm, dusans <dusan.smit...@gmail.com> wrote:
> On Jul 31, 10:07*pm, chrispoliq...@gmail.com wrote:
>
>
>
>
>
> > I am using regular expressions to search a string (always full
> > sentences, maybe more than one sentence) for common abbreviations and
> > remove the periods. *I need to break the string into different
> > sentences but split('.') doesn't solve the whole problem because of
> > possible periods in the middle of a sentence.
>
> > So I have...
>
> > ----------------
>
> > import re
>
> > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
>
> > # this will find abbreviations like e.g. or i.e. in the middle of a
> > sentence.
> > # then I want to remove the periods.
>
> > ----------------
>
> > I want to keep the ie or eg but just take out the periods. *Any
> > ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
> > string will take out the entire abbreviation with the alphanumeric
> > characters included.
>
> Its impossible with regex. U could try it with a statistical analysis;
> and even this would give u a good split.
"and even this wont* give u a good split."