Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > TSV to HTML

Reply
Thread Tools

TSV to HTML

 
 
Brian
Guest
Posts: n/a
 
      05-31-2006
I was wondering if anyone here on the group could point me in a
direction that would expllaing how to use python to convert a tsv file
to html. I have been searching for a resource but have only seen
information on dealing with converting csv to tsv. Specifically I want
to take the values and insert them into an html table.

I have been trying to figure it out myself, and in essence, this is
what I have come up with. Am I on the right track? I really have the
feeling that I am re-inventing the wheel here.

1) in the code define a css
2) use a regex to extract the info between tabs
3) wrap the values in the appropriate tags and insert into table.
4) write the .html file

Thanks again for your patience,
Brian

 
Reply With Quote
 
 
 
 
Tim Chase
Guest
Posts: n/a
 
      05-31-2006
> I was wondering if anyone here on the group could point me
> in a direction that would expllaing how to use python to
> convert a tsv file to html. I have been searching for a
> resource but have only seen information on dealing with
> converting csv to tsv. Specifically I want to take the
> values and insert them into an html table.
>
> I have been trying to figure it out myself, and in
> essence, this is what I have come up with. Am I on the
> right track? I really have the feeling that I am
> re-inventing the wheel here.
>
> 1) in the code define a css
> 2) use a regex to extract the info between tabs
> 3) wrap the values in the appropriate tags and insert into
> table.
> 4) write the .html file


Sounds like you just want to do something like

print "<table>"
for line in file("in.tsv"):
print "<tr>"
items = line.split("\t")
for item in items:
print "<td>%s</td>" % item
print "</tr>"
print "</table>"

It gets a little more complex if you need to clean each item
for HTML entities/scripts/etc...but that's usually just a
function that you'd wrap around the item:

print "<td>%s</td>" % escapeEntity(item)

using whatever "escapeEntity" function you have on hand.
E.g.

from xml.sax.saxutils import escape
:
:
print "<td>%s</td>" % escape(item)

It doesn't gracefully attempt to define headers using
<thead>, <tbody>, and <th> sorts of rows, but a little
toying should solve that.

-tim





 
Reply With Quote
 
 
 
 
Dan M
Guest
Posts: n/a
 
      05-31-2006
> 1) in the code define a css
> 2) use a regex to extract the info between tabs


In place of this, you might want to look at
http://effbot.org/librarybook/csv.htm
Around the middle of that page you'll see how to use a delimiter other
than a comma

> 3) wrap the values in the appropriate tags and insert into table. 4)
> write the .html file
>
> Thanks again for your patience,
> Brian


 
Reply With Quote
 
Leif K-Brooks
Guest
Posts: n/a
 
      05-31-2006
Brian wrote:
> I was wondering if anyone here on the group could point me in a
> direction that would expllaing how to use python to convert a tsv file
> to html. I have been searching for a resource but have only seen
> information on dealing with converting csv to tsv. Specifically I want
> to take the values and insert them into an html table.


import csv
from xml.sax.saxutils import escape

def tsv_to_html(input_file, output_file):
output_file.write('<table><tbody>\n')
for row in csv.reader(input_file, 'excel-tab'):
output_file.write('<tr>')
for col in row:
output_file.write('<td>%s</td>' % escape(col))
output_file.write('</tr>\n')
output_file.write('</tbody></table>')

Usage example:

>>> from cStringIO import StringIO
>>> input_file = StringIO('"foo"\t"bar"\t"baz"\n'

.... '"qux"\t"quux"\t"quux"\n')
>>> output_file = StringIO()
>>> tsv_to_html(input_file, output_file)
>>> print output_file.getvalue()

<table><tbody>
<tr><td>foo</td><td>bar</td><td>baz</td></tr>
<tr><td>qux</td><td>quux</td><td>quux</td></tr>
</tbody></table>
 
Reply With Quote
 
Brian
Guest
Posts: n/a
 
      06-01-2006

First let me say that I appreciate the responses that everyone has
given.

A friend of mine is a ruby programmer but knows nothing about python.
He gave me the script below and it does exactly what I want, only it is
in Ruby. Not knowing ruby this is greek to me, and I would like to
re-write it in python.

I ask then, is this essentially what others here have shown me to do,
or is it in a different vein all together?

Code:

class TsvToHTML
@@styleBlock = <<-ENDMARK
<style type='text/css'>
td {
border-left:1px solid #000000;
padding-right:4px;
padding-left:4px;
white-space: nowrap;
}
.cellTitle {
border-bottom:1px solid #000000;
background:#ffffe0;
font-weight: bold;
text-align: center;
}
.cell0 { background:#eff1f1; }
.cell1 { background:#f8f8f8; }
</style>
ENDMARK

def TsvToHTML::wrapTag(data,tag,modifier = "")
return "<#{tag} #{modifier}>" + data + "</#{tag}>\n"
end # wrapTag

def TsvToHTML::makePage(source)
page = ""
rowNum = 0
source.readlines.each { |record|
row = ""
record.chomp.split("\t").each { |field|
# replace blank fields with &nbsp;
field.sub!(/^$/,"&nbsp;")
# wrap in TD tag, specify style
row += wrapTag(field,"td","class=\"" +
((rowNum == 0)?"cellTitle":"cell#{rowNum % 2}") +
"\"")
}
rowNum += 1
# wrap in TR tag, add row to page
page += wrapTag(row,"tr") + "\n"
}
# finish page formatting
[ [ "table","cellpadding=0 cellspacing=0 border=0" ], "body","html"
].each { |tag|
page = wrapTag(@@styleBlock,"head") + page if tag == "html"
page = wrapTag(page,*tag)
}
return page
end # makePage
end # class

# stdin -> convert -> stdout
print TsvToHTML.makePage(STDIN)

 
Reply With Quote
 
Paddy
Guest
Posts: n/a
 
      06-01-2006
Brian wrote:
> First let me say that I appreciate the responses that everyone has
> given.
>
> A friend of mine is a ruby programmer but knows nothing about python.
> He gave me the script below and it does exactly what I want, only it is
> in Ruby. Not knowing ruby this is greek to me, and I would like to
> re-write it in python.
>
> I ask then, is this essentially what others here have shown me to do,
> or is it in a different vein all together?
>

Leif's Python example uses the csv module which understands a lot more
about the peculiarities of the CSV/TSV formats.
The Ruby example prepends a <style>...</style> block.

The Ruby example splits each line to form a table row and each row on
tabs, to form the cells.

The thing about TSV/CSV formats is that their is no one format. you
need to check how your TSV creator generates the TSV file:
Does it put quotes around text fields?
What kind of quotes?
How does it represent null fields?
Might you get fields that include newlines?

- P.S. I'm not a Ruby programmer, just read the source

 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      06-01-2006
On 31 May 2006 18:48:30 -0700, "Brian" <(E-Mail Removed)> declaimed
the following in comp.lang.python:


> Code:
>
> class TsvToHTML
> @@styleBlock = <<-ENDMARK


<snip>

> print TsvToHTML.makePage(STDIN)


Given that no "instances" are created, there's no real need to use a
class (in Python, at least -- I don't know if Ruby is like Java, where
everything is embedded in a class). A simple module (file) is
sufficient.

I took a few liberties -- like splitting out the table generation
from the rest of the page, and adding argument parsing for input files
(so this version will create multiple tables if multiple files were
supplied). Be careful, one or two lines were wrapped by the news client.

-=-=-=-=-=-=-=-
# tsv2html.py
# function module

import sys

# define CSS style definition
STYLEBLOCK = """
<style type="text/css">
td {
border-left:1px solid #000000;
padding-right:4px;
padding-left:4px;
white-space: nowrap; }
..cellTitle {
border-bottom:1px solid #000000;
background:#ffffe0;
font-weight: bold;
text-align: center; }
..cell0 { background:#3ff1f1; }
..cell1 { background:#f8f8f8; }
</style>
"""

# utility function to wrap "data" within
# <tag modifier> data </tag>
def wrapTag(data, tag, modifier = ""):
if type(tag) != type(""): #check for complex (tag, modifier) tuple
tag, modifier = tag
return "<%s %s>%s</%s>\n" % (tag, modifier, data, tag)

# utility function to produce an HTML table
# from tab-separated data read from
# iterable source material
def makeTable(source):
tableParts = []
rowNum = 0
# get each line of source
for record in source:
rowParts = []
# get each field of source; splitting on tabs
for field in record.strip().split("\t"):
# convert empty fields to a non-breaking space
if not field: field = "&nbsp;"
if rowNum:
# past the first row, alternate cell style
tagged = wrapTag(field, "td",
'class="cell%s"' % (rowNum % 2))
else:
# first row, use "title" style
tagged = wrapTag(field, "td", #I'd use "th"
'class="cellTitle"')
# collect the tagged field as a list of row parts
rowParts.append(tagged)
rowNum += 1
# join the row parts, and wrap as a row, collecting rows in
list
tableParts.append(wrapTag("".join(rowParts), "tr"))
# join the rows with a new-line separator
return wrapTag("\n".join(tableParts),
("table",
'align="center" cellpadding="0" cellspacing="0"
border="0"'))

def makePage(data):
# wrap the tables in rest of HTML tags: table, body, html
for tag in ["body", "html"]:
# if current tag is the <html>, insert a <head> block with
# the CSS style definition
if tag == "html":
data = wrapTag(STYLEBLOCK, "head") + data
data = wrapTag(data, tag)
return data

if __name__ == "__main__":
# if command line arguments supplied, treat as file names
if len(sys.argv) > 1:
fout = open("TSV2HTML.html", "w")
tables = []
# for each file supplied
for fid in sys.argv[1:]:
# open for read, and open a <filename>.html for output
fin = open(fid, "r")
# generate page from file data, write new file
tables.append(makeTable(fin))
fin.close()
fout.write(makePage("\n".join(tables)))
fout.close()
else:
# no arguments, read stdin, write stdout
sys.stdout.write(makePage(makeTable(sys.stdin))) #could use
print

NOTE: no HTML escaping is done, and my test data sometimes caused
problems.
--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) (E-Mail Removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (E-Mail Removed))
HTTP://www.bestiaria.com/
 
Reply With Quote
 
Brian
Guest
Posts: n/a
 
      06-01-2006

Dennis,

Thank you for that response. Your code was very helpful to me. I
think that actually seeing how it should be done in Python was a lot
more educational than spending hours with trial and error.

One question (and this is a topic that I still have trouble getting my
arms around). Why is the text in STYLEBLOCK tripple quoted?

Thanks again,
Brian

 
Reply With Quote
 
Scott David Daniels
Guest
Posts: n/a
 
      06-01-2006
Brian wrote:
> One question (and this is a topic that I still have trouble getting my
> arms around). Why is the text in STYLEBLOCK tripple quoted?


Because triple-quoted strings can span lines and include single quotes
and double quotes.

--
--Scott David Daniels
(E-Mail Removed)
 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      06-01-2006
On 1 Jun 2006 03:29:35 -0700, "Brian" <(E-Mail Removed)> declaimed the
following in comp.lang.python:

> Thank you for that response. Your code was very helpful to me. I
> think that actually seeing how it should be done in Python was a lot
> more educational than spending hours with trial and error.
>

It's not the best code around -- I hacked it together pretty much
line-for-line from an assumption of what the Ruby was doing (I don't do
Ruby -- too much PERL idiom in it)

> One question (and this is a topic that I still have trouble getting my
> arms around). Why is the text in STYLEBLOCK tripple quoted?
>

Triple quotes allow: 1) use of single quotes within the block
without needing to escape them; 2) allows the string to span multiple
lines. Plain string quoting must be one logical line to the parser.

I've practically never seen anyone use a line continuation character
in Python. And triple quoting looks cleaner than parser concatenation.

The alternatives would have been:

Line Continuation:
STYLEBLOCK = '\n\
<style type="text/css">\n\
td {\n\
border-left:1px solid #000000;\n\
padding-right:4px;\n\
padding-left:4px;\n\
white-space: nowrap; }\n\
..cellTitle {\n\
border-bottom:1px solid #000000;\n\
background:#ffffe0;\n\
font-weight: bold;\n\
text-align: center; }\n\
..cell0 { background:#3ff1f1; }\n\
..cell1 { background:#f8f8f8; }\n\
</style>\n\
'
Note the \n\ as the end of each line; the \n is to keep the
formatting on the generated HTML (otherwise everything would be one long
line) and the final \ (which must be the physical end of line)
signifying "this line is continued". Also note that I used ' rather than
" to avoid escaping the " on text/css.

Parser Concatenation:
STYLEBLOCK = (
'<style type="text/css">\n'
"td {\n"
" border-left:1px solid #000000;\n"
" padding-right:4px;\n"
" padding-left:4px;\n"
" white-space: nowrap; }\n"
".cellTitle {\n"
" border-bottom:1px solid #000000;\n"
" background:#ffffe0;\n"
" font-weight: bold;\n"
" text-align: center; }\n"
".cell0 { background:#3ff1f1; }\n"
".cell1 { background:#f8f8f8; }\n"
"</style>\n"
)

Note the use of ( ) where the original had """ """. Also note that
each line has quotes at start/end (the first has ' to avoid escaping
text/css). There are no commas separating each line (and the \n is still
for formatting). Using the ( ) creates an expression, and Python is nice
enough to let one split expressions inside () or[lists], {dicts}, over
multiple lines (I used that feature in a few spots to put call arguments
on multiple lines). Two strings that are next to each other

"string1" "string2"

are parsed as one string

"string1string2"

Using """ (or ''') is the cleanest of those choices, especially if
you want to do preformatted layout of the text. It works similar to the
Ruby/PERL construct that basically said: Copy all text up to the next
occurrence of MARKER_STRING.




> Thanks again,
> Brian

--
Wulfraed Dennis Lee Bieber KD6MOG
(E-Mail Removed) (E-Mail Removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (E-Mail Removed))
HTTP://www.bestiaria.com/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Date in CSV/TSV question Dr Eberhard Lisse Perl Misc 19 02-14-2013 06:38 AM
firefox html, my downloaded html and firebug html different? Adam Akhtar Ruby 9 08-16-2008 07:55 PM
How to read tsv file? BCC C++ 10 01-30-2004 06:07 PM
how to redirect to a frames-based html page and load the right html when coming from an ASP.NET page Mark Kamoski ASP .Net 1 08-13-2003 05:51 AM
How to use HTML::Parser to remove HTML tags and print result Mitchua Perl 1 07-15-2003 02:02 PM



Advertisments