![]() |
not sure who to ask... sorting data from a webpage...
Hi there, I'm wondering if anyone might now how I can sort through
data from a web site. Here's what I mean: I go to a page like this, http://biz.yahoo.com/research/earncal/20050727.html and make lists in a text file that look like this, """""""" July 27/05 am: zbra ycc xel wec wlp wlm vcg vitx uco umc tup trps twti tmo mos faf ba tin tds tem sup su seo fon see std res rcl rol rok resp quot pub px prai plug plc pas pfsb ptc pnp pfcb oxgn ocas nus nsc nfx mpp mnst mx mtlk mdp mwv mso mpx mmp lz liz tvl lii kyo komg iris ips ipt intt iff ifcj ilog ibas.ob holx hit hw hhs gifi gbbk gemp grmn fcl forr fsrv fmsb fmbi fnf eqr eog dyax dtc dbd do cfr cgx cop cgen cbbo cnh cksw ctec cbi gib cra csar caj calp cach bc biom brg bhl bms beav bol rate ava attu arw ant apu ahc amrn agn ati apd amg actu acpw time not: wgbc wri wlt vitr upl ttmi toc eml skx rai rjet rgen o rndc pnw ptnr oste opy omx nwpx nu nem njr nls mnc mips mesa mth lpx kmg kmt hmc hlt hca gsic sab flyi flml fe xide exac eeft eqix eni ele csx covd.ob cnxt cpts chrz cl chir cra belfb augt aspm amkr alda agu aby pm: zmh xl wxs wits wsh wpi wsii vas vrtx vrlk vtr vvc vari var uhs xprsa tyl trid twp twi thrx ttek tk te talx smbi sxl stnr stts sfn sspi ssi sp sfcc sero sanyy rop rsg rrc rhd qdel quik str phm pgi plxs pdg pxlw pmtr osip open ntri ntct cetl mtsc mrvc motv mrh mcel mcrl wfr mdsi mck mxo mant mtw lmnx lsi linn.ob ltbg lpnt psco kex jll ipas issx imgc ingr ifsia idti imdc htrn hlex hrs hgr gmk gva job fbn fbr foxh fbc chrx fcgi fic eyet esrx esst eres epix ets edap dre hill driv dtpi ddr dnb cytk cybe cts clb cnqr ctg clrk cogt clf cenx cv ctlm cldn cald cdn cbt vnt bne bpfh bcgi blkb bjfi bjct belm bgf acls atsi ahl arrs amcc appb apac anik adpi atgn alex acl arg atac actu (1) akr atx """"""""" I do this by hand. As you can see there are 3 main categories, 1)before market open, 2) time not supplied and, 3) after market close and some specific times of earnings release. Can any one tell me how to create these lists without typing them all out by hand? thanks for any help Eric |
Re: not sure who to ask... sorting data from a webpage...
Gazing into my crystal ball I observed Eric <1@1.com> writing in
news:7i3ee1122hc8sjfjdss8epmddprau9j1a0@4ax.com: > Hi there, I'm wondering if anyone might now how I can sort through > data from a web site. > > Here's what I mean: I go to a page like this, > http://biz.yahoo.com/research/earncal/20050727.html > > and make lists in a text file that look like this, > """""""" <snip list> > I do this by hand. As you can see there are 3 main categories, > 1)before market open, 2) time not supplied and, 3) after market close > and some specific times of earnings release. > > Can any one tell me how to create these lists without typing them all > out by hand? > > thanks for any help It's a cheat, but it works. Open the page you want in IE, and open Excel. Copy the information from IE, paste into Excel. Then you can use Excel to manipulate it and save it as a text file, or save it as a dbf file, whichever is better for you. I do not think this will work with any other browser except IE. Of course, I could be wrong. -- Adrienne Boswell (Opera lover) http://www.cavalcade-of-coding.info Please respond to the group so others can share |
Re: not sure who to ask... sorting data from a webpage...
Eric wrote:
> Hi there, I'm wondering if anyone might now how I can sort through > data from a web site. > > Here's what I mean: I go to a page like this, > http://biz.yahoo.com/research/earncal/20050727.html > > and make lists in a text file that look like this, > """""""" > July 27/05 > am: > zbra ycc xel wec wlp wlm vcg vitx uco umc tup trps twti tmo mos faf ba > tin tds tem sup su seo fon see std res rcl rol rok resp quot pub px > I do this by hand. As you can see there are 3 main categories, > 1)before market open, 2) time not supplied and, 3) after market close > and some specific times of earnings release. > > Can any one tell me how to create these lists without typing them all > out by hand? > > thanks for any help > Eric It could be completely automated all the way from the web page to a formatted file on your local machine. You could use Perl's LWP::Simple module to get the webpage and put it into a variable. Next you could use Perl's HTML::Parser module to extract the plain text you want from the HTML. You would likely also have to use the split function and regular expressions as suppliments to this. Perl has sophisticated sorting facilities once you get the information you want sucked into an array. The array could then be written in whatever format you want to a file. There is lots of Perl documentation online, and you can get ActivePerl for Windows at activestate.com. If you havn't programmed Perl before there will be a learning period, but it will automate your task completely. Similar facilities exist for Python, the language the Google search engine was written in. -- mbstevens http://www.mbstevens.com/ |
Re: not sure who to ask... sorting data from a webpage...
mbstevens wrote:
> Eric wrote: > <snip> > > There is lots of Perl documentation online, and you can get ActivePerl > for Windows at activestate.com. If you havn't programmed Perl before > there will be a learning period, but it will automate your task > completely. Similar facilities exist for Python, the language the > Google search engine was written in. Among the lot of documentation, I find the following most useful and succinct: http://www.comp.leeds.ac.uk/Perl/start.html I tried installing ActiveState Perl but I didnt like it. It takes way too long ot install and doesn't runs properly on Win-XP with SP2. Instead I use perl inside Cygwin. Soon I will get back to Linux like good old days. Best A |
Re: not sure who to ask... sorting data from a webpage...
Animesh Kumar wrote:
> mbstevens wrote: > >> Eric wrote: >> > <snip> > >> >> There is lots of Perl documentation online, and you can get ActivePerl >> for Windows at activestate.com. If you havn't programmed Perl before >> there will be a learning period, but it will automate your task >> completely. Similar facilities exist for Python, the language the >> Google search engine was written in. > > > Among the lot of documentation, I find the following most useful and > succinct: > > http://www.comp.leeds.ac.uk/Perl/start.html > > I tried installing ActiveState Perl but I didnt like it. It takes way > too long ot install and doesn't runs properly on Win-XP with SP2. > Instead I use perl inside Cygwin. Hmm. Havn't tried it on Win since SP2 -- I would be interested in knowing if anyone else is having trouble running Active State Perl on Win with SP2. > Soon I will get back to Linux like > good old days. An op system that comes with Perl, Python, and Common Lisp is much more comfortable than one that comes with proprietary languages, all right. You can buy a big hard disk fro $50 US these days, leave your XP on the machine, and install 4 or 5 linux systems on the same machine. Just study Grub and LILO. |
Re: not sure who to ask... sorting data from a webpage...
>
> You could use Perl's LWP::Simple module to get the webpage and put it > into a variable. > > Next you could use Perl's HTML::Parser module to extract the plain text > you want from the HTML. You would likely also have to use the split > function and regular expressions as suppliments to this. > Actually in this case, I would suggest Template::Extract rather than Html::Parser as an simpler way of extracting data. But then with Perl there's usually more than one way of doing it. data64 |
| All times are GMT. The time now is 07:57 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.