Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Date in CSV/TSV question

Reply
Thread Tools

Date in CSV/TSV question

 
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-06-2013
"C.DeRykus" <(E-Mail Removed)> writes:
> On Saturday, January 5, 2013 11:56:18 AM UTC-8, Rainer Weikusat wrote:
>> "C.DeRykus" <(E-Mail Removed)> writes:
>>> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
>>>> Dr Eberhard Lisse <(E-Mail Removed)> writes:
>>>>> I have a Tab Separated File of roughly 1000 likes with the first
>>>> fields like
>>>>
>>>> "05 Jan 2011" "DR"
>>>


[and need to translate that to]

>>
>> >>> 2011-01-07 "TFR"
>> >>> 2011-01-05 "DR"


[...]

>>> Maybe even shrink it to a long one-liner:
>>>
>>> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
>>> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

>> Considering the situation of the OP, he has a
>> 'zero line' solution because all code was written
>> by someone else.

>
> Hm, it sounded like he just a separate tab-delimited
> file he needed in a different format (ideal for a 1-
> liner.) The -i switch is especially useful for just
> this if the scenario allows it.


If you weren't using -i, it wasn't necessary to worry about creating a
backup file since the modified content would end up in a new file.

>
>> I don't know how his
>> for other people, however, I can type
>>
>> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
>>

>
>> much faster than I can download anything from the net,


[...]

> Date::Manip though is well-maintained and extraordinarily
> useful. There are several other very good Date modules as well.
>
> Leveraging a small bit of module code for a tedious,
> surprisingly frequent little chore appeals to the
> very lazy. So, it's worth it IMO


It would call this a case of 'false laziness': You happen to be
familiar with a certain 'date munging' module. The OP wanted to modify
some 'structured text field' which happened to be a data. Ergo:
Clearly, a case for using the date manipulation code. But nothing in
the described problem is related to dates. A sequence of text of the
form

"number0 string number1"

is supposed to be changed such that it becomes

number1-number2-number0

that is, the quotes are supposed to be deleted (I didn't realize
that), the first and the last subfield should be transposed and the
middle string replaced by a two-digit number using a simple,
"well-known" static mapping from twelve three character strings to
numbers. This is exactly the kind of stuff which can be done very
easily with perl, ie

-------------
%months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

s/^"(\d+)\s+(\S+)\s+(\d+)"/$3-$months{$2}-$1/, print while (<>);
-------------

and telling the OP that he should instead download a couple of
thousands (probably, I've only counted the DM6 file which figures at
691 LOC) of lines of code consisting of 972(!) different files, most
of which are documented(!) as broken and are totally useless for the
problem at hand is not something I'd call a sound piece of technical
advice. It is probably possible to use a combine harvester instead of
a lawnmower but nobody in his right mind would ever do that or suggest
that others do it.
 
Reply With Quote
 
 
 
 
C.DeRykus
Guest
Posts: n/a
 
      01-06-2013
On Sunday, January 6, 2013 9:12:35 AM UTC-8, Rainer Weikusat wrote:
> "C.DeRykus" <(E-Mail Removed)> writes:
>
> > On Saturday, January 5, 2013 11:56:18 AM UTC-8, Rainer Weikusat wrote:

>
> >> "C.DeRykus" <(E-Mail Removed)> writes:

>
> >>> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:

>
> >>>> Dr Eberhard Lisse <(E-Mail Removed)> writes:

>
> >>>>> I have a Tab Separated File of roughly 1000 likes with the first

>
> >>>> fields like

>
> >>>>

>
> >>>> "05 Jan 2011" "DR"

>
> >>>

>
>
>
> [and need to translate that to]
>
>
>
> >>

>
> >> >>> 2011-01-07 "TFR"

>
> >> >>> 2011-01-05 "DR"

>
>
>
> [...]
>
>
>
> >>> Maybe even shrink it to a long one-liner:

>
> >>>

>
> >>> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}

>
> >>> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

>
> >> Considering the situation of the OP, he has a

>
> >> 'zero line' solution because all code was written

>
> >> by someone else.

>
> >

>
> > Hm, it sounded like he just a separate tab-delimited

>
> > file he needed in a different format (ideal for a 1-

>
> > liner.) The -i switch is especially useful for just

>
> > this if the scenario allows it.

>
>
>
> If you weren't using -i, it wasn't necessary to worry about creating a
>
> backup file since the modified content would end up in a new file.
>
>


-i is useful in case you're one of those whose
code never works the first time time though...
And you can always remove -i later.

> >

>
> >> I don't know how his

>
> >> for other people, however, I can type

>
> >>

>
> >> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)

>
> >>

>
> >

>
> >> much faster than I can download anything from the net,

>
>
>
> [...]
>
>
>
> > Date::Manip though is well-maintained and extraordinarily

>
> > useful. There are several other very good Date modules as well.

>
> >

>
> > Leveraging a small bit of module code for a tedious,

>
> > surprisingly frequent little chore appeals to the

>
> > very lazy. So, it's worth it IMO

>
>
>
> It would call this a case of 'false laziness': You happen to be
>
> familiar with a certain 'date munging' module. The OP wanted to modify
>
> some 'structured text field' which happened to be a data. Ergo:
>
> Clearly, a case for using the date manipulation code. But nothing in
>
> the described problem is related to dates. A sequence of text of the
>
> form
>
>
>
> "number0 string number1"
>
>
>
> is supposed to be changed such that it becomes
>
>
>
> number1-number2-number0
>
>
>
> that is, the quotes are supposed to be deleted (I didn't realize
>
> that), the first and the last subfield should be transposed and the
>
> middle string replaced by a two-digit number using a simple,
>
> "well-known" static mapping from twelve three character strings to
>
> numbers. This is exactly the kind of stuff which can be done very
>
> easily with perl, ie
>
>
>
> -------------
>
> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>
>
>
> s/^"(\d+)\s+(\S+)\s+(\d+)"/$3-$months{$2}-$1/, print while (<>);
> ...



Sure, if you don't deal with this kind of
transform often, yet another incantation is
no big deal. And a simple regex can remain
blissfully ignorant of the fact that it's
dealing with dates. But then, if tweaks are
needed, it's "deja vu all over again". Can't
remember where to cut'n paste your old tweak..
No problem. Just wade in and watch out for typo's.


>
> and telling the OP that he should instead download a couple of
>
> thousands (probably, I've only counted the DM6 file which figures at
>
> 691 LOC) of lines of code consisting of 972(!) different files, most
>
> of which are documented(!) as broken and are totally useless for the
>
> problem at hand is not something I'd call a sound piece of technical
>
> advice.



I'd agree there are probably better solutions
that pulling in the bloat of Date::Manip. But
there are several good Date modules and it's
all about leveraging code already written and
working. Concern with "pulling in a big module"
is almost always FUD - especially speed concerns. Additionally, if the input format changes, and
those are dates after all, a good Date module
probably has a method to cinch the code tweaks.
One that's already written...


> It is probably possible to use a combine harvester > instead of a lawnmower but nobody in his right mind would ever do that or suggest
> that others do it.


Then why do we use a simple module function to
escape HTML for instance.. rather than rolling
our own? Sometimes a Swiss army knife - rather
than scrounging around for a small pen knife -
is worth the extra weight in a knapsack.

--
Charles DeRykus
 
Reply With Quote
 
 
 
 
Dr Eberhard W Lisse
Guest
Posts: n/a
 
      01-07-2013
Ah, the Plonkers.

el


On 2013-01-05 23:49 , Henry Law wrote:
> On 05/01/13 21:33, Dr Eberhard Lisse wrote:
>> The OP is an elderly Obstetrician & Gynecologist, who occasionally needs
>> to Practically Extract and Report stuff.

>
> By the way, Meinheer Doctor, you might be interested to know that quite
> a lot of people who frequent this group won't have seen the article
> which you followed up here, having decided some time ago to block posts
> from its author at source.
>
> I leave it to you to determine the significance of this.
>
> PS I bet you're no more elderly than I am
>



--
If you want to email me, replace nospam with el
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-07-2013
"C.DeRykus" <(E-Mail Removed)> writes:

Leading remark: I'm going to cut this somewhat short. I don't agree
with your opinion on this, however, essentially repeating myself
doesn't seem very useful to me, so I'm just going to address a few
isolated points.

>> On Sunday, January 6, 2013 9:12:35 AM UTC-8, Rainer Weikusat wrote:
>> "C.DeRykus" <(E-Mail Removed)> writes:


[...]


>> If you weren't using -i, it wasn't necessary to worry about creating a
>> backup file since the modified content would end up in a new file.

>
> -i is useful in case you're one of those whose
> code never works the first time time though...
> And you can always remove -i later.


What I was trying to get at was that it wouldn't be necessary to use
the 'automatic backup' feature of -i if 'overwriting' (aka
'destroying') the input file hadn't been requested to begin with: In this
case, the processed data would go to stdout, immediately available for
interactive inspection, and could be redirected to some other file if
so desired at the user's discretion.

[...]

>> -------------
>>
>> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>>
>>
>>
>> s/^"(\d+)\s+(\S+)\s+(\d+)"/$3-$months{$2}-$1/, print while (<>);
>> ...

>
> Sure, if you don't deal with this kind of
> transform often, yet another incantation is
> no big deal.


'Incantation' is IMO a very unfortunate choice for describing this. It
is a sequence of instructions with exactly defined meaning which
causes a machine to perform a specific function. That's a completely
mundane thing with absolutely no 'magic' of any kind involved (except
insofar 'any sufficiently advanced technology is indistinguishable
fomr magic' [as seen by someone who doesn't understand any of it],).

[...]

>> It is probably possible to use a combine harvester instead of a
>> lawnmower but nobody in his right mind would ever do that or
>> suggest that others do it.

>
> Then why do we use a simple module function to
> escape HTML for instance.. rather than rolling
> our own?


Hmm ... why would I?

$text =~ s/([<>"'&])/'&#'.ord($1).';'/ge;
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      01-08-2013
On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
> "07 Jan 2011" "TFR"
> "05 Jan 2011" "DR">
>
> I need change the first field to look like>
>
> 2011-01-07 "TFR"
> 2011-01-05 "DR"


For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
1. my ($day, $month, $year) = split(/ /, $date);
2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);

Line 1 splits your date string into the three components: day, month, year.
Line 2 reassembles those three components and assigns the result back to $date.
The hash table %mo2num looks like this:
my %mo2num = (
JAN => 1,
FEB => 2,
mar => 3,
etc.
);

CC.
 
Reply With Quote
 
Dr Eberhard Lisse
Guest
Posts: n/a
 
      01-09-2013
Thanks,

el

on 2013-01-08 18:35 ccc31807 said the following:
> On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
>> "07 Jan 2011" "TFR"
>> "05 Jan 2011" "DR">
>>
>> I need change the first field to look like>
>>
>> 2011-01-07 "TFR"
>> 2011-01-05 "DR"

>
> For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
> 1. my ($day, $month, $year) = split(/ /, $date);
> 2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);
>
> Line 1 splits your date string into the three components: day, month, year.
> Line 2 reassembles those three components and assigns the result back to $date.
> The hash table %mo2num looks like this:
> my %mo2num = (
> JAN => 1,
> FEB => 2,
> mar => 3,
> etc.
> );
>
> CC.
>


 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-09-2013
Dr Eberhard Lisse <(E-Mail Removed)> writes:

> Thanks,
>
> el
>
> on 2013-01-08 18:35 ccc31807 said the following:
>> On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
>>> "07 Jan 2011" "TFR"
>>> "05 Jan 2011" "DR">
>>>
>>> I need change the first field to look like>
>>>
>>> 2011-01-07 "TFR"
>>> 2011-01-05 "DR"

>>
>> For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
>> 1. my ($day, $month, $year) = split(/ /, $date);
>> 2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);
>>
>> Line 1 splits your date string into the three components: day, month, year.
>> Line 2 reassembles those three components and assigns the result back to $date.
>> The hash table %mo2num looks like this:
>> my %mo2num = (
>> JAN => 1,
>> FEB => 2,
>> mar => 3,
>> etc.
>> );


And assuming the hash exists (I posted a command generating it two
times), the format can be transformed with a subsitution expression (I
also posted two times), namely

s/"(\d+)\s+(\S+)\s+(\d+)"/$3-$mo2num{$2}-$1/
 
Reply With Quote
 
Ben Goldberg
Guest
Posts: n/a
 
      02-12-2013
On Wednesday, January 2, 2013 10:37:02 AM UTC-5, Rainer Weikusat wrote:
> Dr Eberhard Lisse <(E-Mail Removed)> writes:
>
> > I have a Tab Separated File of roughly 1000 likes with the first
> > fields like
> >
> > "07 Jan 2011" "TFR"
> > "05 Jan 2011" "DR"
> >
> > I need change the first field to look like
> >
> > 2011-01-07 "TFR"
> > 2011-01-05 "DR"
> >
> > for all lines, of course -O
> >
> > Can someone point me to where I can read this up? Or send me a code
> > fragment?

>
> -----------
>
> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>
>
>
> while (<>) {
>
> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
>
> print;
>
> }
>
> -----------


Don't forget that you can use perl's "command line" switches even when you put your program in a file.
#!/usr/bin/perl -pi.bak
BEGIN {
%months = map {;$_, sprintf('%02d', ++$n)}
qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
}
s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
__END__
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      02-12-2013
Ben Goldberg <(E-Mail Removed)> writes:
> On Wednesday, January 2, 2013 10:37:02 AM UTC-5, Rainer Weikusat wrote:


[...]

>> -----------
>>
>> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>>
>> while (<>) {
>> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
>> print;
>> }
>>
>> -----------

>
> Don't forget that you can use perl's "command line" switches even when you put your program in a file.
> #!/usr/bin/perl -pi.bak
> BEGIN {
> %months = map {;$_, sprintf('%02d', ++$n)}
> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
> }
> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
> __END__


The 'BEGIN' serves no useful purpose here: %months needs to be
initialized before the while-loop uses it. Since statements in a file
are executed consecutively (anything else would probably be 'a little
confusing' , this will be the case with either variant.

As I wrote in another posting: If perl hadn't been told to destroy the
input file, also telling it to make a backup of that before doing so
wasn't necessary. While this probably doesn't matter much for a
trivial example like this, 'not using -i' also means that the code can
be debugged and fixed without constantly renaming files or losing the
original input file altogether in case the 'backup request' was
accidentally forgotten. This also enables use of the script(let) as
'another filter' in a more complicated pipeline.


 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      02-14-2013
>>>>> "BM" == Ben Morrow <(E-Mail Removed)> writes:

BM> There's no need to muck about with the #! line and BEGIN blocks, both of
BM> which would make it impossible to turn this into a subroutine later:

BM> my %months = ...;

BM> local $^I = ".bak";
BM> while (<>) { ... }

BM> The edit-in-place handling, including renaming the old file and opening
BM> and selecting ARGVOUT, is done by the no-filehandle <> operator (or an
BM> explicit <ARGV> or readline(ARGV)) whenever $^I is set. If you want to
BM> in-place edit a custom list of files, you can also localise @ARGV.

and File::Slurp has edit_file and edit_file_lines which are even easier
to use.

i do need to add a backup file option to those.

uri

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Date, date date date.... Peter Grison Java 10 05-30-2004 01:20 PM
Given a date, how to find the beginning date and ending date of that week Matt ASP General 11 11-08-2003 11:24 PM
Given a date, how to find the beginning date and ending date of that week Matt ASP .Net 1 11-08-2003 09:14 PM
Given a date, how to find the beginning date and ending date of that week Matt C Programming 3 11-08-2003 09:07 PM
Given a date, how to find the beginning date and ending date of that week Matt C++ 2 11-08-2003 08:30 PM



Advertisments