Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Fetch info from website and write to txt file.

Reply
Thread Tools

Fetch info from website and write to txt file.

 
 
Pitmairen
Guest
Posts: n/a
 
      03-06-2006
I want to make a program that get info from a website and prints it out
in a txt file.

I made this:

import urllib
f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
s = f.read()
k = open("test.txt","w")
k.write(s)
k.close()
f.close()

That saves all the html code into the test.txt file. But if i for
example only want the genre, plot outline and Cast overview to be
written to the txt file. How can i do that?


And another problem i have:

If the txt file i want the information to be saved in already have some
text saved in it. How can i save the info from the website between the
text that was there before?

for example:

blablablablablablablabla
blablablablablablablabla
blablablablablablablabla
(inset info from website here)
blablablablablablablabla
blablablablablablablabla
blablablablablablablabla


Pitmairen

 
Reply With Quote
 
 
 
 
gene tani
Guest
Posts: n/a
 
      03-06-2006

Pitmairen wrote:
> I want to make a program that get info from a website and prints it out
> in a txt file.
>
> I made this:
>
> import urllib
> f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
> s = f.read()
> k = open("test.txt","w")
> k.write(s)
> k.close()
> f.close()
>
> That saves all the html code into the test.txt file. But if i for
> example only want the genre, plot outline and Cast overview to be
> written to the txt file. How can i do that?
>
>
> And another problem i have:
>
> If the txt file i want the information to be saved in already have some
> text saved in it. How can i save the info from the website between the
> text that was there before?
>
> for example:
>
> blablablablablablablabla
> blablablablablablablabla
> blablablablablablablabla
> (inset info from website here)
> blablablablablablablabla
> blablablablablablablabla
> blablablablablablablabla
>


to get a text file that looks like your web page, stripped of markup,
look at "lynx -dump" or "w3m -dump" ( i think links2 does the same).
else:

http://groups.google.com/group/comp....k+to+Search&&d
http://groups.google.com/group/comp....k+to+Search&&d

 
Reply With Quote
 
 
 
 
gene tani
Guest
Posts: n/a
 
      03-06-2006

Pitmairen wrote:
> I want to make a program that get info from a website and prints it out
> in a txt file.
>
> I made this:
>
> import urllib
> f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")


path of even less resistance
http://imdbpy.sourceforge.net/

 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      03-06-2006
On 6 Mar 2006 10:08:44 -0800, "Pitmairen" <>
declaimed the following in comp.lang.python:

> That saves all the html code into the test.txt file. But if i for
> example only want the genre, plot outline and Cast overview to be
> written to the txt file. How can i do that?
>

Well, how would you do it by hand? Write down the steps you go
through to extract that information from your HTML file by hand... Clean
that up into a generalized algorithm... Write code the performs that
algorithm...

IOW: You'll going to have write code to parse the HTML (there may be
libraries available to help, but you still need to do the recognizer for
the parts you want).

>
> And another problem i have:
>
> If the txt file i want the information to be saved in already have some
> text saved in it. How can i save the info from the website between the
> text that was there before?
>


{I'm making enemies today}

Same answer... How would you do this by hand? Translate that
procedure to code.

Though I suspect, in this case, "by hand" would be to open the
entire file into memory (using notepad or some editor). Open the other
text into another memory-based editor. Select, copy, paste... But that
puts all the work of the insertion on the editor program (IE, someone
else had to code the same thing you are asking to make the editor work).

Question: how do you identify /where/ to do the insert... By number
of lines, by some keyword, etc.?

http://cis.stvincent.edu/swd/extsort/extsort.html

Modify as needed (it assumes each "line" is a record to be
sorted/merged, while you want to merge on some arbitrary boundary)
--
> ================================================== ============ <
> | Wulfraed Dennis Lee Bieber KD6MOG <
> | Bestiaria Support Staff <
> ================================================== ============ <
> Home Page: <http://www.dm.net/~wulfraed/> <
> Overflow Page: <http://wlfraed.home.netcom.com/> <

 
Reply With Quote
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      03-06-2006
Pitmairen a écrit :
> I want to make a program that get info from a website and prints it out
> in a txt file.
>
> I made this:
>
> import urllib
> f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
> s = f.read()
> k = open("test.txt","w")
> k.write(s)
> k.close()
> f.close()
>
> That saves all the html code into the test.txt file. But if i for
> example only want the genre, plot outline and Cast overview to be
> written to the txt file. How can i do that?
>


Seems like you want BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/


> And another problem i have:
>
> If the txt file i want the information to be saved in already have some
> text saved in it. How can i save the info from the website between the
> text that was there before?
>
> for example:
>
> blablablablablablablabla
> blablablablablablablabla
> blablablablablablablabla
> (inset info from website here)
> blablablablablablablabla
> blablablablablablablabla
> blablablablablablablabla
>


You need to be able to identify the place where you want to insert your
data. Then it's a matter of reading the original file, creating a temp
file, writing lines before insertion point, writing data to insert,
writing remaing lines, closing all files, replacing original file by the
temp file.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to fetch data from excel and send it to website. s.arun316@gmail.com Python 1 03-19-2013 04:12 PM
counting how often the same word appears in a txt file...But my codeonly prints the last line entry in the txt file dgcosgrave@gmail.com Python 8 12-19-2012 06:29 PM
Diff. between FileWriter("f.txt") and OutputStreamWriter(new FileOutputStream("f.txt")) ? Jochen Brenzlinger Java 7 09-15-2011 01:23 AM
Fetch read database info instead of fake content teser3@hotmail.com Java 1 08-28-2007 04:29 AM
Opening a txt file to view ( i.e. readme.txt) Sameen C++ 2 08-29-2005 03:14 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57