Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Date in CSV/TSV question (http://www.velocityreviews.com/forums/t956049-date-in-csv-tsv-question.html)

Dr Eberhard Lisse 01-01-2013 11:56 PM

Date in CSV/TSV question
 
I have a Tab Separated File of roughly 1000 likes with the first fields like

"07 Jan 2011" "TFR"
"05 Jan 2011" "DR"

I need change the first field to look like

2011-01-07 "TFR"
2011-01-05 "DR"

for all lines, of course :-)-O

Can someone point me to where I can read this up? Or send me a code
fragment?

Thanks, el
--
if you want to reply, replace nospam with my initials

Dave Saville 01-02-2013 10:22 AM

Re: Date in CSV/TSV question
 
On Tue, 1 Jan 2013 23:56:14 UTC, Dr Eberhard Lisse <nospam@lisse.NA>
wrote:

> I have a Tab Separated File of roughly 1000 likes with the first fields like
>
> "07 Jan 2011" "TFR"
> "05 Jan 2011" "DR"
>
> I need change the first field to look like
>
> 2011-01-07 "TFR"
> 2011-01-05 "DR"
>
> for all lines, of course :-)-O
>
> Can someone point me to where I can read this up? Or send me a code
> fragment?


Not clear if the file has the quotes or you are using them to show the
fields. Assuming you have extracted the first field then split on
space to day month year. Set up an array of month names. Find the
index of the given month. Regenerate the field with sprintf. $new =
sprintf($year-%2.2d-$day, $index); For simplicity put a dummy month on
the front of the list, perl arrays index from 0, so @months = qw(crap
Jan Feb ..........

HTH
--
Regards
Dave Saville

Dr Eberhard W Lisse 01-02-2013 01:47 PM

Re: Date in CSV/TSV question
 
Thanks.

el

On 2013-01-02 15:01 , Henry Law wrote:
> On 01/01/13 23:56, Dr Eberhard Lisse wrote:
>> I have a Tab Separated File of roughly 1000 likes with the first
>> fields like
>>
>> "07 Jan 2011" "TFR"
>> "05 Jan 2011" "DR"
>>
>> I need change the first field to look like
>>
>> 2011-01-07 "TFR"
>> 2011-01-05 "DR"

>
> OK, couldn't resist having a bash at this. Didn't spend a lot of time
> on it but this does what you want.
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use 5.010;
>
> use Date::Calc qw( Decode_Date_EU );
> use Text::CSV;
>
> my $csv = Text::CSV->new( { sep_char=>"\t", quote_char=>'"' } )
> or die "Failed to create CSV object: $!\n";
> while ( 1 ) {
> my $row = $csv->getline( \*DATA );
> last unless $row->[0]; # getline returns zero-length arrayref;
> irritating
> my ( $year, $month, $day ) = Decode_Date_EU( $row->[0] );
> die "Bad date" unless $year;
> printf "%04d-%02d-%02d\t%s\n", $year, $month, $day, $row->[1];
> }
>
> __DATA__
> "07 Jan 2011" "TFR"
> "05 Jan 2011" "DR"
>
>> henry@eris:~/Perl/tryout$ ./tryout
>> 2011-01-07 TFR
>> 2011-01-05 DR

>
> It could be improved, and made more Perlish (I write code in isolation,
> rather, which isn't a good idea). In particular I was maddened by the
> need to check the EOF condition explicitly. "while my $row =
> getline..." returns a one-element array containing a null value when it
> hits EOF; you'd think it would return undef. (And yes I did try
> "defined" as suggested in perldoc IO::Handle but the arrayref is
> actually defined, despite not containing anything useful).
>



--
If you want to email me, replace nospam with el

Rainer Weikusat 01-02-2013 03:37 PM

Re: Date in CSV/TSV question
 
Dr Eberhard Lisse <nospam@lisse.NA> writes:
> I have a Tab Separated File of roughly 1000 likes with the first fields like
>
> "07 Jan 2011" "TFR"
> "05 Jan 2011" "DR"
>
> I need change the first field to look like
>
> 2011-01-07 "TFR"
> 2011-01-05 "DR"
>
> for all lines, of course :-)-O
>
> Can someone point me to where I can read this up? Or send me a code
> fragment?


-----------
%months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

while (<>) {
s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
print;
}
-----------

Keith Thompson 01-04-2013 09:35 PM

Re: Date in CSV/TSV question
 
Henry Law <news@lawshouse.org> writes:
[...]
> You could use Date::Calc, particularly the Decode_Date_EU function; it's
> overkill if what you've described is really all there is, but it saves
> programming. A truly lazy^H^H^H^Hcreative programmer would look for
> something to decode the tab-separated file too; maybe Text::CSV would do
> that? I've only ever used it for comma separated data, (which, er, is
> what it's for).


Yes, quoting "perldoc Text::CSV":

The module accepts either strings or files as input and
can utilize any user-specified characters as delimiters,
separators, and escapes so it is perhaps better called ASV
(anything separated values) rather than just CSV.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Rainer Weikusat 01-04-2013 09:55 PM

Re: Date in CSV/TSV question
 
Henry Law <news@lawshouse.org> writes:

> On 02/01/13 10:22, Dave Saville wrote:
>> On Tue, 1 Jan 2013 23:56:14 UTC, Dr Eberhard Lisse <nospam@lisse.NA>
>> wrote:
>>
>>> I have a Tab Separated File of roughly 1000 likes with the first fields like
>>>
>>> "07 Jan 2011" "TFR"
>>> "05 Jan 2011" "DR"

>>
>> Not clear if the file has the quotes or you are using them to show the
>> fields. Assuming you have extracted the first field then split on
>> space to day month year. Set up an array of month names. Find the
>> index of the given month. Regenerate the field with sprintf. $new =
>> sprintf($year-%2.2d-$day, $index); For simplicity put a dummy month on
>> the front of the list, perl arrays index from 0, so @months = qw(crap
>> Jan Feb ..........

>
> You could use Date::Calc, particularly the Decode_Date_EU function;
> it's overkill if what you've described is really all there is, but it
> saves programming. A truly lazy^H^H^H^Hcreative programmer would look
> for something to decode the tab-separated file too; maybe Text::CSV
> would do that?


Nice example how it 'saves programming':

,----
| #!/usr/bin/perl
| use strict;
| use warnings;
| use 5.010;
|
| use Date::Calc qw( Decode_Date_EU );
| use Text::CSV;
|
| my $csv = Text::CSV->new( { sep_char=>"\t", quote_char=>'"' } )
| or die "Failed to create CSV object: $!\n";
| while ( 1 ) {
| my $row = $csv->getline( \*DATA );
| last unless $row->[0]; # getline returns zero-length arrayref;
| irritating
| my ( $year, $month, $day ) = Decode_Date_EU( $row->[0] );
| die "Bad date" unless $year;
| printf "%04d-%02d-%02d\t%s\n", $year, $month, $day, $row->[1];
| }
`----

That's 14 lines of code. Alternate version without Date::Calc and
Text::CSV

,----
| %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
|
| while (<>) {
| s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
| print;
| }
`----

That's good enough for the problem which was described and it's four
lines of code. "Truly creative", -10 lines of code were saved here
and a comment explaining an 'ugly' workaround for deficiency in the
downloaded code had to be added as well[*],

while (1) {

C.DeRykus 01-05-2013 09:47 AM

Re: Date in CSV/TSV question
 
On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
> Dr Eberhard Lisse <nospam@lisse.NA> writes:
>
> > I have a Tab Separated File of roughly 1000 likes with the first fields like

>
> >

>
> > "07 Jan 2011" "TFR"

>
> > "05 Jan 2011" "DR"

>
> >

>
> > I need change the first field to look like

>
> >

>
> > 2011-01-07 "TFR"

>
> > 2011-01-05 "DR"

>
> >

>
> > for all lines, of course :-)-O

>
> >

>
> > Can someone point me to where I can read this up? Or send me a code

>
> > fragment?

>
>
>
> -----------
>
> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>
>
>
> while (<>) {
>
> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
>
> print;
>
> }
>
> -----------


Maybe even shrink it to a long one-liner:

perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
{"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

--
Charles DeRykus



Rainer Weikusat 01-05-2013 07:56 PM

Re: Date in CSV/TSV question
 
"C.DeRykus" <derykus@gmail.com> writes:
> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
>> Dr Eberhard Lisse <nospam@lisse.NA> writes:
>> > I have a Tab Separated File of roughly 1000 likes with the first

>> fields like
>>
>> > "07 Jan 2011" "TFR"
>> > "05 Jan 2011" "DR"

>>
>>> 2011-01-07 "TFR"
>>> 2011-01-05 "DR"


[...]

>> -----------
>> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>>
>> while (<>) {
>> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
>> print;
>> }
>> -----------

>
> Maybe even shrink it to a long one-liner:
>
> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile


Considering the situation of the OP, he has a 'zero line' solution
because all code was written by someone else. I don't know how his is
for other people, however, I can type

qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)

much faster than I can download anything from the net, especially
considering that I'd have to read to documentation for this anything,
too, making this a very bad tradeoff. And if I had to rely one someone
else's code for totally trivial stuff such as splitting a text file
with n 'somehow separated' data columns into an array, I would have a
very hard time solving the much more complicated problems I usually
need to deal with. Actually, I regularly search CPAN whenever I have a
reasonably complex and self-contained subtask of something that 'using
a module' if one existed would be a good idea. The most common result
of this searches, however, is 'nada', the second most common is some
totally bizarre implementation of 25% of the features I actually need
and the third 'implementation is total crap' aka 'IO::Poll' (and the
original author abandoned the code in question in 1975 in order to
become a missionary in Gabun or something like that).

CPAN is mostly a load of tripe resulting from fifteen years of bored
'hobbyists' (here supposed to mean people whose actual job isn't
programming) trying whatever weirdo-approach for solving fifty
different but vaguely related _trivial_ problems with the help of a
steam-engine powered motor umbrella constructed out of yellow,
magenta and purple lego bricks happened to come to their mind. And
downloading all these 'incredible machines' is - except in case of
500 SLOC throw-away 'oneliners' - not the end of the story: I have to
maintain the code because the people who use the software I'm
responsible for come to me with any problems resulting from that.

The rule of thumb I usually follow is that 'using a library' (or -
something I very much prefer - an already written program somebody
actually used to solve a real problem) is only worth the effort if it
saves a significant amount of work, at least something like 500 lines
of code and preferably, a few thousands. And even then, I end up
'maintaining' seriously byzantine workarounds for all the problems in
the 'free' code until I grow tired of that and replace it with
something which actually works (in the sense that it reliably does
what is needed to solve the problem I have to solve and nothing else)
more often than not.

Dr Eberhard Lisse 01-05-2013 09:33 PM

Re: Date in CSV/TSV question
 
The OP is an elderly Obstetrician & Gynecologist, who occasionally needs
to Practically Extract and Report stuff.

el

On 2013-01-05 21:56 , Rainer Weikusat wrote:
> "C.DeRykus" <derykus@gmail.com> writes:
>> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
>>> Dr Eberhard Lisse <nospam@lisse.NA> writes:
>>>> I have a Tab Separated File of roughly 1000 likes with the first
>>> fields like
>>>
>>>> "07 Jan 2011" "TFR"
>>>> "05 Jan 2011" "DR"
>>>
>>>> 2011-01-07 "TFR"
>>>> 2011-01-05 "DR"

>
> [...]
>
>>> -----------
>>> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>>>
>>> while (<>) {
>>> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
>>> print;
>>> }
>>> -----------

>>
>> Maybe even shrink it to a long one-liner:
>>
>> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
>> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

>
> Considering the situation of the OP, he has a 'zero line' solution
> because all code was written by someone else. I don't know how his is
> for other people, however, I can type
>
> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
>
> much faster than I can download anything from the net, especially
> considering that I'd have to read to documentation for this anything,
> too, making this a very bad tradeoff. And if I had to rely one someone
> else's code for totally trivial stuff such as splitting a text file
> with n 'somehow separated' data columns into an array, I would have a
> very hard time solving the much more complicated problems I usually
> need to deal with. Actually, I regularly search CPAN whenever I have a
> reasonably complex and self-contained subtask of something that 'using
> a module' if one existed would be a good idea. The most common result
> of this searches, however, is 'nada', the second most common is some
> totally bizarre implementation of 25% of the features I actually need
> and the third 'implementation is total crap' aka 'IO::Poll' (and the
> original author abandoned the code in question in 1975 in order to
> become a missionary in Gabun or something like that).
>
> CPAN is mostly a load of tripe resulting from fifteen years of bored
> 'hobbyists' (here supposed to mean people whose actual job isn't
> programming) trying whatever weirdo-approach for solving fifty
> different but vaguely related _trivial_ problems with the help of a
> steam-engine powered motor umbrella constructed out of yellow,
> magenta and purple lego bricks happened to come to their mind. And
> downloading all these 'incredible machines' is - except in case of
> 500 SLOC throw-away 'oneliners' - not the end of the story: I have to
> maintain the code because the people who use the software I'm
> responsible for come to me with any problems resulting from that.
>
> The rule of thumb I usually follow is that 'using a library' (or -
> something I very much prefer - an already written program somebody
> actually used to solve a real problem) is only worth the effort if it
> saves a significant amount of work, at least something like 500 lines
> of code and preferably, a few thousands. And even then, I end up
> 'maintaining' seriously byzantine workarounds for all the problems in
> the 'free' code until I grow tired of that and replace it with
> something which actually works (in the sense that it reliably does
> what is needed to solve the problem I have to solve and nothing else)
> more often than not.
>



--
if you want to reply, replace nospam with my initials

C.DeRykus 01-05-2013 09:51 PM

Re: Date in CSV/TSV question
 
On Saturday, January 5, 2013 11:56:18 AM UTC-8, Rainer Weikusat wrote:
> "C.DeRykus" <derykus@gmail.com> writes:
>
> > On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:

>
> >> Dr Eberhard Lisse <nospam@lisse.NA> writes:

>
> >> > I have a Tab Separated File of roughly 1000 likes with the first

>
> >> fields like

>
> >>

>
> >> > "07 Jan 2011" "TFR"

>
> >> > "05 Jan 2011" "DR"

>
> >>

>
> >>> 2011-01-07 "TFR"

>
> >>> 2011-01-05 "DR"

>
>
>
> [...]
>
>
>
> >> -----------

>
> >> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

>
> >>

>
> >> while (<>) {

>
> >> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;

>
> >> print;

>
> >> }

>
> >> -----------

>
> >

>
> > Maybe even shrink it to a long one-liner:

>
> >

>
> > perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}

>
> > {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

>
>
>
> Considering the situation of the OP, he has a
> 'zero line' solution because all code was written
> by someone else.


Hm, it sounded like he just a separate tab-delimited
file he needed in a different format (ideal for a 1-
liner.) The -i switch is especially useful for just
this if the scenario allows it.

> I don't know how his
> for other people, however, I can type
>
> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
>


> much faster than I can download anything from the net, especially
>
> considering that I'd have to read to documentation for this anything,
>
> too, making this a very bad tradeoff. And if I had to rely one someone
>
> else's code for totally trivial stuff such as splitting a text file
>
> with n 'somehow separated' data columns into an array, I would have a
>
> very hard time solving the much more complicated problems I usually
>
> need to deal with. Actually, I regularly search CPAN whenever I have a
>
> reasonably complex and self-contained subtask of something that 'using
>
> a module' if one existed would be a good idea. The most common result
>
> of this searches, however, is 'nada', the second most common is some
>
> totally bizarre implementation of 25% of the features I actually need
>
> and the third 'implementation is total crap' aka 'IO::Poll' (and the
>
> original author abandoned the code in question in 1975 in order to
>
> become a missionary in Gabun or something like that).
>
>
>
> CPAN is mostly a load of tripe resulting from fifteen years of bored
>
> 'hobbyists' (here supposed to mean people whose actual job isn't
>
> programming) trying whatever weirdo-approach for solving fifty
>
> different but vaguely related _trivial_ problems with the help of a
>
> steam-engine powered motor umbrella constructed out of yellow,
>
> magenta and purple lego bricks happened to come to their mind. And
>
> downloading all these 'incredible machines' is - except in case of
>
> 500 SLOC throw-away 'oneliners' - not the end of the story: I have to
>
> maintain the code because the people who use the software I'm
>
> responsible for come to me with any problems resulting from that.
>
>
>
> The rule of thumb I usually follow is that 'using a library' (or -
>
> something I very much prefer - an already written program somebody
>
> actually used to solve a real problem) is only worth the effort if it


>
> saves a significant amount of work, at least something like 500 lines
>
> of code and preferably, a few thousands. And even then, I end up
>
> 'maintaining' seriously byzantine workarounds for all the problems in
>
> the 'free' code until I grow tired of that and replace it with
>
> something which actually works (in the sense that it reliably does
>
> what is needed to solve the problem I have to solve and nothing else)
>
> more often than not.


I can appreciate your viewpoint. Date::Manip though
is well-maintained and extraordinarily useful. There
are several other very good Date modules as well.

Leveraging a small bit of module code for a tedious,
surprisingly frequent little chore appeals to the
very lazy. So, it's worth it IMO :)

--
Charles DeRykus




All times are GMT. The time now is 12:21 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57