Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > tips requested for a log-processing script

Reply
Thread Tools

tips requested for a log-processing script

 
 
Jaap
Guest
Posts: n/a
 
      11-05-2006
Python ers,
As a relatively new user of Python I would like to ask your advice on
the following script I want to create.

I have a logfile which contains records. All records have the same
layout, and are stored in a CSV-format. Each record is (non-uniquely)
identified by a date and a itemID. Each itemID can occur 0 or more times
per month. The item contains a figure/amount which I need to sum per
month and per itemID. I have already managed to separate the individual
parts of each logfile-record by using the csv-module from Python 2.5.
very simple indeed.

Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose?
The logfile is not very big (about 200k max, average 200k) so I assume I
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant
items from logfile? Would you propose I use a filter(func, list) for
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems
straitforward from the documentation I have found, although I must
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!


Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table
which does about the same ;-] but that is not what I want; the
processing must be automated in the end with a periodic script which
e-mails the summary of the keyfigure every month.
 
Reply With Quote
 
 
 
 
martdi
Guest
Posts: n/a
 
      11-05-2006
if you are running in windows you can use the win32com module to
automate the process of generating a pivot table in excel and then code
to send it via e-mail



Jaap wrote:
> Python ers,
> As a relatively new user of Python I would like to ask your advice on
> the following script I want to create.
>
> I have a logfile which contains records. All records have the same
> layout, and are stored in a CSV-format. Each record is (non-uniquely)
> identified by a date and a itemID. Each itemID can occur 0 or more times
> per month. The item contains a figure/amount which I need to sum per
> month and per itemID. I have already managed to separate the individual
> parts of each logfile-record by using the csv-module from Python 2.5.
> very simple indeed.
>
> Apart from this I have a configuration file, which contains the list of
> itemID's i need to focus on per month. Not all itemID's are relevant for
> each month, but for example only every second or third month. All
> records in the logfile with other itemID's can be ignored. I have yet to
> define the format of this configuration file, but am thinking about a 0
> or 1 for each month, and then the itemID, like:
> "1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
> consideration at first month of each quarter.
>
> My question to this forum is: which data structure would you propose?
> The logfile is not very big (about 200k max, average 200k) so I assume I
> can store in internal memory/list?
>
> How would you propose I tackle the filtering of relevant/non-relevant
> items from logfile? Would you propose I use a filter(func, list) for
> this task or is another thing better?
>
> In the end I want to mail the outcome of my process, but this seems
> straitforward from the documentation I have found, although I must
> connect to an external SMTP-server.
>
> Any tips, views, advice is highly appreciated!
>
>
> Jaap
>
> PS: when I load the logfile in a spreadsheet I can create a pivot table
> which does about the same ;-] but that is not what I want; the
> processing must be automated in the end with a periodic script which
> e-mails the summary of the keyfigure every month.


 
Reply With Quote
 
 
 
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      11-05-2006
On Sun, 05 Nov 2006 12:00:07 +0100, Jaap <> declaimed
the following in comp.lang.python:

>
> Apart from this I have a configuration file, which contains the list of
> itemID's i need to focus on per month. Not all itemID's are relevant for
> each month, but for example only every second or third month. All
> records in the logfile with other itemID's can be ignored. I have yet to
> define the format of this configuration file, but am thinking about a 0
> or 1 for each month, and then the itemID, like:
> "1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
> consideration at first month of each quarter.
>

Personally -- I'd put the ID first... Maybe even as a INI file
style:

123456: 1, 0, 0, 1, ...

and use the config parser to read those... Question though:
"consideration at first month of each quarter" => does that mean only
process that month, or process the entire quarter just in that month?

> My question to this forum is: which data structure would you propose?
> The logfile is not very big (about 200k max, average 200k) so I assume I
> can store in internal memory/list?
>

SQLite in-memory database... Should be relatively easy to then
select/group the data with SQL statements. Even the summation might be
coded directly into the SQL. If the processing configuration (above)
doesn't change often, go to an SQLite physical file database, and store
the ID/Month pairs that need processing -- then a subselect to get the
months per ID could be used to select the ID records for summation.

--
Wulfraed Dennis Lee Bieber KD6MOG

HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-)
HTTP://www.bestiaria.com/
 
Reply With Quote
 
George Sakkis
Guest
Posts: n/a
 
      11-05-2006
Jaap wrote:

> Apart from this I have a configuration file, which contains the list of
> itemID's i need to focus on per month. Not all itemID's are relevant for
> each month, but for example only every second or third month. All
> records in the logfile with other itemID's can be ignored. I have yet to
> define the format of this configuration file, but am thinking about a 0
> or 1 for each month, and then the itemID, like:
> "1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
> consideration at first month of each quarter.


It's probably not necessary if your records are in the order of 100K,
but if you're dealing with millions and above, you can write your
config file in binary using the struct module and condense it down to 6
bytes per record (32 bits for the ID and 12 bits for the months
occurences). Filtering will also be faster, as for each record you just
have to do a bitwise AND with the 0..010...0 mask corresponding to a
given month.

George

 
Reply With Quote
 
Hendrik van Rooyen
Guest
Posts: n/a
 
      11-06-2006
"Jaap" <> wrote:


> Python ers,
> As a relatively new user of Python I would like to ask your advice on
> the following script I want to create.
>
> I have a logfile which contains records. All records have the same
> layout, and are stored in a CSV-format. Each record is (non-uniquely)
> identified by a date and a itemID. Each itemID can occur 0 or more times
> per month. The item contains a figure/amount which I need to sum per
> month and per itemID. I have already managed to separate the individual
> parts of each logfile-record by using the csv-module from Python 2.5.
> very simple indeed.
>
> Apart from this I have a configuration file, which contains the list of
> itemID's i need to focus on per month. Not all itemID's are relevant for
> each month, but for example only every second or third month. All
> records in the logfile with other itemID's can be ignored. I have yet to
> define the format of this configuration file, but am thinking about a 0
> or 1 for each month, and then the itemID, like:
> "1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
> consideration at first month of each quarter.
>
> My question to this forum is: which data structure would you propose?
> The logfile is not very big (about 200k max, average 200k) so I assume I
> can store in internal memory/list?
>
> How would you propose I tackle the filtering of relevant/non-relevant
> items from logfile? Would you propose I use a filter(func, list) for
> this task or is another thing better?
>
> In the end I want to mail the outcome of my process, but this seems
> straitforward from the documentation I have found, although I must
> connect to an external SMTP-server.
>
> Any tips, views, advice is highly appreciated!
>
>
> Jaap
>
> PS: when I load the logfile in a spreadsheet I can create a pivot table
> which does about the same ;-] but that is not what I want; the
> processing must be automated in the end with a periodic script which
> e-mails the summary of the keyfigure every month.



I would do something like this: (obviously untested)

for line in readlines(open(logfile,r,1)):
(code to get hold of item, date, amount)
if item not in item_dict:
item_dict[item] = [(date,amount)]
else:
item_dict[item].append(date,amount)

this will give you, for each unique item, a direct ref to wherever its been
used.

I would then work through the config file, and extract the items of interest for
the run date...

HTH - Hendrik



 
Reply With Quote
 
Jaap
Guest
Posts: n/a
 
      11-06-2006
Hendrik van Rooyen schreef:
> "Jaap" <> wrote:
>
>
>> Python ers,

Thanks!
all your replies have been both to the point and helpfull for me.

You have proven both Python and it's community are open and welcoming to
new users.

Jaap
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
.Net Tips, C# Tips : Get IP Address from host name with C# Examplesand VB.Net Examples jayeshsorathia@gmail.com ASP .Net 0 07-31-2012 07:25 AM
.Net Tips, C# Tips : Create a well formed URI using UriBuilder classwith C# Examples and VB.Net Examples jayeshsorathia@gmail.com ASP .Net 1 07-31-2012 01:03 AM
.Net Tips, C# Tips : Get list of all files of directory or folderusing LINQ using .Net Framework 4 with C# Examples and VB.Net Examples jayeshsorathia@gmail.com ASP .Net 0 07-27-2012 07:13 AM
balloon tips + aston or how to connect a bluetooth headset without ballon tips diesel Computer Support 0 05-31-2006 01:00 PM
Mozilla Tips reaches 100 Tips and 90,000 Visits Cornelius Fichtner Firefox 0 12-18-2003 11:50 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57