![]() |
Date in CSV/TSV question
I have a Tab Separated File of roughly 1000 likes with the first fields like
"07 Jan 2011" "TFR" "05 Jan 2011" "DR" I need change the first field to look like 2011-01-07 "TFR" 2011-01-05 "DR" for all lines, of course :-)-O Can someone point me to where I can read this up? Or send me a code fragment? Thanks, el -- if you want to reply, replace nospam with my initials |
Re: Date in CSV/TSV question
On Tue, 1 Jan 2013 23:56:14 UTC, Dr Eberhard Lisse <nospam@lisse.NA>
wrote: > I have a Tab Separated File of roughly 1000 likes with the first fields like > > "07 Jan 2011" "TFR" > "05 Jan 2011" "DR" > > I need change the first field to look like > > 2011-01-07 "TFR" > 2011-01-05 "DR" > > for all lines, of course :-)-O > > Can someone point me to where I can read this up? Or send me a code > fragment? Not clear if the file has the quotes or you are using them to show the fields. Assuming you have extracted the first field then split on space to day month year. Set up an array of month names. Find the index of the given month. Regenerate the field with sprintf. $new = sprintf($year-%2.2d-$day, $index); For simplicity put a dummy month on the front of the list, perl arrays index from 0, so @months = qw(crap Jan Feb .......... HTH -- Regards Dave Saville |
Re: Date in CSV/TSV question
Thanks.
el On 2013-01-02 15:01 , Henry Law wrote: > On 01/01/13 23:56, Dr Eberhard Lisse wrote: >> I have a Tab Separated File of roughly 1000 likes with the first >> fields like >> >> "07 Jan 2011" "TFR" >> "05 Jan 2011" "DR" >> >> I need change the first field to look like >> >> 2011-01-07 "TFR" >> 2011-01-05 "DR" > > OK, couldn't resist having a bash at this. Didn't spend a lot of time > on it but this does what you want. > > #!/usr/bin/perl > use strict; > use warnings; > use 5.010; > > use Date::Calc qw( Decode_Date_EU ); > use Text::CSV; > > my $csv = Text::CSV->new( { sep_char=>"\t", quote_char=>'"' } ) > or die "Failed to create CSV object: $!\n"; > while ( 1 ) { > my $row = $csv->getline( \*DATA ); > last unless $row->[0]; # getline returns zero-length arrayref; > irritating > my ( $year, $month, $day ) = Decode_Date_EU( $row->[0] ); > die "Bad date" unless $year; > printf "%04d-%02d-%02d\t%s\n", $year, $month, $day, $row->[1]; > } > > __DATA__ > "07 Jan 2011" "TFR" > "05 Jan 2011" "DR" > >> henry@eris:~/Perl/tryout$ ./tryout >> 2011-01-07 TFR >> 2011-01-05 DR > > It could be improved, and made more Perlish (I write code in isolation, > rather, which isn't a good idea). In particular I was maddened by the > need to check the EOF condition explicitly. "while my $row = > getline..." returns a one-element array containing a null value when it > hits EOF; you'd think it would return undef. (And yes I did try > "defined" as suggested in perldoc IO::Handle but the arrayref is > actually defined, despite not containing anything useful). > -- If you want to email me, replace nospam with el |
Re: Date in CSV/TSV question
Dr Eberhard Lisse <nospam@lisse.NA> writes:
> I have a Tab Separated File of roughly 1000 likes with the first fields like > > "07 Jan 2011" "TFR" > "05 Jan 2011" "DR" > > I need change the first field to look like > > 2011-01-07 "TFR" > 2011-01-05 "DR" > > for all lines, of course :-)-O > > Can someone point me to where I can read this up? Or send me a code > fragment? ----------- %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); while (<>) { s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/; print; } ----------- |
Re: Date in CSV/TSV question
Henry Law <news@lawshouse.org> writes:
[...] > You could use Date::Calc, particularly the Decode_Date_EU function; it's > overkill if what you've described is really all there is, but it saves > programming. A truly lazy^H^H^H^Hcreative programmer would look for > something to decode the tab-separated file too; maybe Text::CSV would do > that? I've only ever used it for comma separated data, (which, er, is > what it's for). Yes, quoting "perldoc Text::CSV": The module accepts either strings or files as input and can utilize any user-specified characters as delimiters, separators, and escapes so it is perhaps better called ASV (anything separated values) rather than just CSV. -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> Working, but not speaking, for JetHead Development, Inc. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" |
Re: Date in CSV/TSV question
Henry Law <news@lawshouse.org> writes:
> On 02/01/13 10:22, Dave Saville wrote: >> On Tue, 1 Jan 2013 23:56:14 UTC, Dr Eberhard Lisse <nospam@lisse.NA> >> wrote: >> >>> I have a Tab Separated File of roughly 1000 likes with the first fields like >>> >>> "07 Jan 2011" "TFR" >>> "05 Jan 2011" "DR" >> >> Not clear if the file has the quotes or you are using them to show the >> fields. Assuming you have extracted the first field then split on >> space to day month year. Set up an array of month names. Find the >> index of the given month. Regenerate the field with sprintf. $new = >> sprintf($year-%2.2d-$day, $index); For simplicity put a dummy month on >> the front of the list, perl arrays index from 0, so @months = qw(crap >> Jan Feb .......... > > You could use Date::Calc, particularly the Decode_Date_EU function; > it's overkill if what you've described is really all there is, but it > saves programming. A truly lazy^H^H^H^Hcreative programmer would look > for something to decode the tab-separated file too; maybe Text::CSV > would do that? Nice example how it 'saves programming': ,---- | #!/usr/bin/perl | use strict; | use warnings; | use 5.010; | | use Date::Calc qw( Decode_Date_EU ); | use Text::CSV; | | my $csv = Text::CSV->new( { sep_char=>"\t", quote_char=>'"' } ) | or die "Failed to create CSV object: $!\n"; | while ( 1 ) { | my $row = $csv->getline( \*DATA ); | last unless $row->[0]; # getline returns zero-length arrayref; | irritating | my ( $year, $month, $day ) = Decode_Date_EU( $row->[0] ); | die "Bad date" unless $year; | printf "%04d-%02d-%02d\t%s\n", $year, $month, $day, $row->[1]; | } `---- That's 14 lines of code. Alternate version without Date::Calc and Text::CSV ,---- | %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); | | while (<>) { | s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/; | print; | } `---- That's good enough for the problem which was described and it's four lines of code. "Truly creative", -10 lines of code were saved here and a comment explaining an 'ugly' workaround for deficiency in the downloaded code had to be added as well[*], while (1) { |
Re: Date in CSV/TSV question
On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
> Dr Eberhard Lisse <nospam@lisse.NA> writes: > > > I have a Tab Separated File of roughly 1000 likes with the first fields like > > > > > > "07 Jan 2011" "TFR" > > > "05 Jan 2011" "DR" > > > > > > I need change the first field to look like > > > > > > 2011-01-07 "TFR" > > > 2011-01-05 "DR" > > > > > > for all lines, of course :-)-O > > > > > > Can someone point me to where I can read this up? Or send me a code > > > fragment? > > > > ----------- > > %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); > > > > while (<>) { > > s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/; > > print; > > } > > ----------- Maybe even shrink it to a long one-liner: perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"} {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile -- Charles DeRykus |
Re: Date in CSV/TSV question
"C.DeRykus" <derykus@gmail.com> writes:
> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote: >> Dr Eberhard Lisse <nospam@lisse.NA> writes: >> > I have a Tab Separated File of roughly 1000 likes with the first >> fields like >> >> > "07 Jan 2011" "TFR" >> > "05 Jan 2011" "DR" >> >>> 2011-01-07 "TFR" >>> 2011-01-05 "DR" [...] >> ----------- >> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); >> >> while (<>) { >> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/; >> print; >> } >> ----------- > > Maybe even shrink it to a long one-liner: > > perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"} > {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile Considering the situation of the OP, he has a 'zero line' solution because all code was written by someone else. I don't know how his is for other people, however, I can type qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) much faster than I can download anything from the net, especially considering that I'd have to read to documentation for this anything, too, making this a very bad tradeoff. And if I had to rely one someone else's code for totally trivial stuff such as splitting a text file with n 'somehow separated' data columns into an array, I would have a very hard time solving the much more complicated problems I usually need to deal with. Actually, I regularly search CPAN whenever I have a reasonably complex and self-contained subtask of something that 'using a module' if one existed would be a good idea. The most common result of this searches, however, is 'nada', the second most common is some totally bizarre implementation of 25% of the features I actually need and the third 'implementation is total crap' aka 'IO::Poll' (and the original author abandoned the code in question in 1975 in order to become a missionary in Gabun or something like that). CPAN is mostly a load of tripe resulting from fifteen years of bored 'hobbyists' (here supposed to mean people whose actual job isn't programming) trying whatever weirdo-approach for solving fifty different but vaguely related _trivial_ problems with the help of a steam-engine powered motor umbrella constructed out of yellow, magenta and purple lego bricks happened to come to their mind. And downloading all these 'incredible machines' is - except in case of 500 SLOC throw-away 'oneliners' - not the end of the story: I have to maintain the code because the people who use the software I'm responsible for come to me with any problems resulting from that. The rule of thumb I usually follow is that 'using a library' (or - something I very much prefer - an already written program somebody actually used to solve a real problem) is only worth the effort if it saves a significant amount of work, at least something like 500 lines of code and preferably, a few thousands. And even then, I end up 'maintaining' seriously byzantine workarounds for all the problems in the 'free' code until I grow tired of that and replace it with something which actually works (in the sense that it reliably does what is needed to solve the problem I have to solve and nothing else) more often than not. |
Re: Date in CSV/TSV question
The OP is an elderly Obstetrician & Gynecologist, who occasionally needs
to Practically Extract and Report stuff. el On 2013-01-05 21:56 , Rainer Weikusat wrote: > "C.DeRykus" <derykus@gmail.com> writes: >> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote: >>> Dr Eberhard Lisse <nospam@lisse.NA> writes: >>>> I have a Tab Separated File of roughly 1000 likes with the first >>> fields like >>> >>>> "07 Jan 2011" "TFR" >>>> "05 Jan 2011" "DR" >>> >>>> 2011-01-07 "TFR" >>>> 2011-01-05 "DR" > > [...] > >>> ----------- >>> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); >>> >>> while (<>) { >>> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/; >>> print; >>> } >>> ----------- >> >> Maybe even shrink it to a long one-liner: >> >> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"} >> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile > > Considering the situation of the OP, he has a 'zero line' solution > because all code was written by someone else. I don't know how his is > for other people, however, I can type > > qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) > > much faster than I can download anything from the net, especially > considering that I'd have to read to documentation for this anything, > too, making this a very bad tradeoff. And if I had to rely one someone > else's code for totally trivial stuff such as splitting a text file > with n 'somehow separated' data columns into an array, I would have a > very hard time solving the much more complicated problems I usually > need to deal with. Actually, I regularly search CPAN whenever I have a > reasonably complex and self-contained subtask of something that 'using > a module' if one existed would be a good idea. The most common result > of this searches, however, is 'nada', the second most common is some > totally bizarre implementation of 25% of the features I actually need > and the third 'implementation is total crap' aka 'IO::Poll' (and the > original author abandoned the code in question in 1975 in order to > become a missionary in Gabun or something like that). > > CPAN is mostly a load of tripe resulting from fifteen years of bored > 'hobbyists' (here supposed to mean people whose actual job isn't > programming) trying whatever weirdo-approach for solving fifty > different but vaguely related _trivial_ problems with the help of a > steam-engine powered motor umbrella constructed out of yellow, > magenta and purple lego bricks happened to come to their mind. And > downloading all these 'incredible machines' is - except in case of > 500 SLOC throw-away 'oneliners' - not the end of the story: I have to > maintain the code because the people who use the software I'm > responsible for come to me with any problems resulting from that. > > The rule of thumb I usually follow is that 'using a library' (or - > something I very much prefer - an already written program somebody > actually used to solve a real problem) is only worth the effort if it > saves a significant amount of work, at least something like 500 lines > of code and preferably, a few thousands. And even then, I end up > 'maintaining' seriously byzantine workarounds for all the problems in > the 'free' code until I grow tired of that and replace it with > something which actually works (in the sense that it reliably does > what is needed to solve the problem I have to solve and nothing else) > more often than not. > -- if you want to reply, replace nospam with my initials |
Re: Date in CSV/TSV question
On Saturday, January 5, 2013 11:56:18 AM UTC-8, Rainer Weikusat wrote:
> "C.DeRykus" <derykus@gmail.com> writes: > > > On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote: > > >> Dr Eberhard Lisse <nospam@lisse.NA> writes: > > >> > I have a Tab Separated File of roughly 1000 likes with the first > > >> fields like > > >> > > >> > "07 Jan 2011" "TFR" > > >> > "05 Jan 2011" "DR" > > >> > > >>> 2011-01-07 "TFR" > > >>> 2011-01-05 "DR" > > > > [...] > > > > >> ----------- > > >> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); > > >> > > >> while (<>) { > > >> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/; > > >> print; > > >> } > > >> ----------- > > > > > > Maybe even shrink it to a long one-liner: > > > > > > perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"} > > > {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile > > > > Considering the situation of the OP, he has a > 'zero line' solution because all code was written > by someone else. Hm, it sounded like he just a separate tab-delimited file he needed in a different format (ideal for a 1- liner.) The -i switch is especially useful for just this if the scenario allows it. > I don't know how his > for other people, however, I can type > > qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) > > much faster than I can download anything from the net, especially > > considering that I'd have to read to documentation for this anything, > > too, making this a very bad tradeoff. And if I had to rely one someone > > else's code for totally trivial stuff such as splitting a text file > > with n 'somehow separated' data columns into an array, I would have a > > very hard time solving the much more complicated problems I usually > > need to deal with. Actually, I regularly search CPAN whenever I have a > > reasonably complex and self-contained subtask of something that 'using > > a module' if one existed would be a good idea. The most common result > > of this searches, however, is 'nada', the second most common is some > > totally bizarre implementation of 25% of the features I actually need > > and the third 'implementation is total crap' aka 'IO::Poll' (and the > > original author abandoned the code in question in 1975 in order to > > become a missionary in Gabun or something like that). > > > > CPAN is mostly a load of tripe resulting from fifteen years of bored > > 'hobbyists' (here supposed to mean people whose actual job isn't > > programming) trying whatever weirdo-approach for solving fifty > > different but vaguely related _trivial_ problems with the help of a > > steam-engine powered motor umbrella constructed out of yellow, > > magenta and purple lego bricks happened to come to their mind. And > > downloading all these 'incredible machines' is - except in case of > > 500 SLOC throw-away 'oneliners' - not the end of the story: I have to > > maintain the code because the people who use the software I'm > > responsible for come to me with any problems resulting from that. > > > > The rule of thumb I usually follow is that 'using a library' (or - > > something I very much prefer - an already written program somebody > > actually used to solve a real problem) is only worth the effort if it > > saves a significant amount of work, at least something like 500 lines > > of code and preferably, a few thousands. And even then, I end up > > 'maintaining' seriously byzantine workarounds for all the problems in > > the 'free' code until I grow tired of that and replace it with > > something which actually works (in the sense that it reliably does > > what is needed to solve the problem I have to solve and nothing else) > > more often than not. I can appreciate your viewpoint. Date::Manip though is well-maintained and extraordinarily useful. There are several other very good Date modules as well. Leveraging a small bit of module code for a tedious, surprisingly frequent little chore appeals to the very lazy. So, it's worth it IMO :) -- Charles DeRykus |
| All times are GMT. The time now is 12:21 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.