Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to remove empty lines with re?

Reply
Thread Tools

How to remove empty lines with re?

 
 
Tim Haynes
Guest
Posts: n/a
 
      10-10-2003
"ted" <(E-Mail Removed)> writes:

> f = open("old_site/index.html")
> for line in f:
> line = re.sub(r'^\s+$|\n', '', line) # }
> print line # }



If you will set a variable to an empty string and then print it, you will
get an empty line printed

~Tim
--
Product Development Consultant
OpenLink Software
Tel: +44 (0) 20 8681 7701
Web: <http://www.openlinksw.com>
Universal Data Access & Data Integration Technology Providers
 
Reply With Quote
 
 
 
 
ted
Guest
Posts: n/a
 
      10-10-2003
I'm having trouble using the re module to remove empty lines in a file.

Here's what I thought would work, but it doesn't:

import re
f = open("old_site/index.html")
for line in f:
line = re.sub(r'^\s+$|\n', '', line)
print line

Also, when I try to remove some HTML tags, I get even more empty lines:

import re
f = open("old_site/index.html")
for line in f:
line = re.sub('<.*?>', '', line)
line = re.sub(r'^\s+$|\n', '', line)
print line

I don't know what I'm doing. Any help appreciated.

TIA,
Ted








 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      10-10-2003
ted wrote:

> I'm having trouble using the re module to remove empty lines in a file.
>
> Here's what I thought would work, but it doesn't:
>
> import re
> f = open("old_site/index.html")
> for line in f:
> line = re.sub(r'^\s+$|\n', '', line)
> print line


Try:

import sys
for line in f:
if line.strip():
sys.stdout.write(line)

Background: lines read from the file keep their trailing "\n", a second
newline is inserted by the print statement.
The strip() method creates a copy of the string with all leading/trailing
whitespace chars removed. All but the empty string evaluate to True in the
if statement.

Peter
 
Reply With Quote
 
Bror Johansson
Guest
Posts: n/a
 
      10-10-2003

"ted" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> I'm having trouble using the re module to remove empty lines in a file.
>
> Here's what I thought would work, but it doesn't:
>
> import re
> f = open("old_site/index.html")
> for line in f:
> line = re.sub(r'^\s+$|\n', '', line)
> print line
>


nonempty = [x for x in f if not x.strip()]

/BJ


 
Reply With Quote
 
Anand Pillai
Guest
Posts: n/a
 
      10-10-2003
To do this, you need to modify your re to just
this

empty=re.compile('^$')

This of course looks for a pattern where there is beginning just
after end, ie the line is empty

Here is the complete code.

import re

empty=re.compile('^$')
for line in open('test.txt').readlines():
if empty.match(line):
continue
else:
print line,

The comma at the end of the print is to avoid printing another newline,
since the 'readlines()' method gives you the line with a '\n' at the end.

Also dont forget to compile your regexps for efficiency sake.

HTH

-Anand Pillai


"ted" <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> I'm having trouble using the re module to remove empty lines in a file.
>
> Here's what I thought would work, but it doesn't:
>
> import re
> f = open("old_site/index.html")
> for line in f:
> line = re.sub(r'^\s+$|\n', '', line)
> print line
>
> Also, when I try to remove some HTML tags, I get even more empty lines:
>
> import re
> f = open("old_site/index.html")
> for line in f:
> line = re.sub('<.*?>', '', line)
> line = re.sub(r'^\s+$|\n', '', line)
> print line
>
> I don't know what I'm doing. Any help appreciated.
>
> TIA,
> Ted

 
Reply With Quote
 
Anand Pillai
Guest
Posts: n/a
 
      10-10-2003
Errata:

I meant "there is end just after the beginning" of course.

-Anand

"ted" <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> I'm having trouble using the re module to remove empty lines in a file.
>
> Here's what I thought would work, but it doesn't:
>
> import re
> f = open("old_site/index.html")
> for line in f:
> line = re.sub(r'^\s+$|\n', '', line)
> print line
>
> Also, when I try to remove some HTML tags, I get even more empty lines:
>
> import re
> f = open("old_site/index.html")
> for line in f:
> line = re.sub('<.*?>', '', line)
> line = re.sub(r'^\s+$|\n', '', line)
> print line
>
> I don't know what I'm doing. Any help appreciated.
>
> TIA,
> Ted

 
Reply With Quote
 
Klaus Alexander Seistrup
Guest
Posts: n/a
 
      10-10-2003
Anand Pillai wrote:

> Here is the complete code.
>
> import re
>
> empty=re.compile('^$')
> for line in open('test.txt').readlines():
> if empty.match(line):
> continue
> else:
> print line,


The .readlines() method retains any line terminators, and using the
builtin print will suffix an extra line terminator to every line,
thus effectively producing an empty line for every non-empty line.
You'd want to use e.g. sys.stdout.write() instead of print.


// Klaus

--
><> unselfish actions pay back better

 
Reply With Quote
 
ted
Guest
Posts: n/a
 
      10-11-2003
Thanks Anand, works great.


"Anand Pillai" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) om...
> To do this, you need to modify your re to just
> this
>
> empty=re.compile('^$')
>
> This of course looks for a pattern where there is beginning just
> after end, ie the line is empty
>
> Here is the complete code.
>
> import re
>
> empty=re.compile('^$')
> for line in open('test.txt').readlines():
> if empty.match(line):
> continue
> else:
> print line,
>
> The comma at the end of the print is to avoid printing another newline,
> since the 'readlines()' method gives you the line with a '\n' at the end.
>
> Also dont forget to compile your regexps for efficiency sake.
>
> HTH
>
> -Anand Pillai
>
>
> "ted" <(E-Mail Removed)> wrote in message

news:<(E-Mail Removed)>...
> > I'm having trouble using the re module to remove empty lines in a file.
> >
> > Here's what I thought would work, but it doesn't:
> >
> > import re
> > f = open("old_site/index.html")
> > for line in f:
> > line = re.sub(r'^\s+$|\n', '', line)
> > print line
> >
> > Also, when I try to remove some HTML tags, I get even more empty lines:
> >
> > import re
> > f = open("old_site/index.html")
> > for line in f:
> > line = re.sub('<.*?>', '', line)
> > line = re.sub(r'^\s+$|\n', '', line)
> > print line
> >
> > I don't know what I'm doing. Any help appreciated.
> >
> > TIA,
> > Ted



 
Reply With Quote
 
Anand Pillai
Guest
Posts: n/a
 
      10-12-2003
You probably did not read my posting completely.

I have added a comma after the print statement and mentioned
a comment specifically on this.

The 'print line,' statement with a comma after it does not print
a newline which you also call as line terminator whereas
the 'print' without a comma at the end does just that.

No wonder python sometimes feels like high-level psuedocode
It has that ultra intuitive feel for most of its tricks.

In this case, the comma is usually put when you have more than
one item to print, and python puts a newline after all items.
So it very intuitively follows that just putting a comma will not
print a newline! It is better than telling the programmer to use
another print function to avoid newlines, which you find in many
other 'un-pythonic' languages.

-Anand

Klaus Alexander Seistrup <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Anand Pillai wrote:
>
> > Here is the complete code.
> >
> > import re
> >
> > empty=re.compile('^$')
> > for line in open('test.txt').readlines():
> > if empty.match(line):
> > continue
> > else:
> > print line,

>
> The .readlines() method retains any line terminators, and using the
> builtin print will suffix an extra line terminator to every line,
> thus effectively producing an empty line for every non-empty line.
> You'd want to use e.g. sys.stdout.write() instead of print.
>
>
> // Klaus

 
Reply With Quote
 
Klaus Alexander Seistrup
Guest
Posts: n/a
 
      10-12-2003
Anand Pillai wrote:

> You probably did not read my posting completely.
>
> I have added a comma after the print statement and mentioned
> a comment specifically on this.


You are completely right, I missed an important part of your posting.
I didn't know about the comma feature, so thanks for teaching me!

Cheers,

// Klaus

--
><> unselfish actions pay back better

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Asp.Net Calender, how to display 5 lines if there are only 5 lines in one month? Jack ASP .Net 9 10-12-2005 03:44 AM
Modems, Analog Lines and ... Electrical Lines? Sens Fan Happy In Ohio Computer Support 5 09-02-2004 04:15 AM
HOWTO remove empty lines with Regex tor Perl Misc 6 12-10-2003 03:24 PM
Re: how to read 10 lines from a 200 lines file and write to a new file?? Joe Wright C Programming 0 07-27-2003 08:50 PM



Advertisments