Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > search for messages in large files

Reply
Thread Tools

search for messages in large files

 
 
Jman
Guest
Posts: n/a
 
      06-25-2003
I am working with files that grow to a size of 1-2 mb each day
of the month. The file is closed at the end of each month.
The format of the messages is:

aaaaaaaa YY-MN-DY HR:MN:SC MSG1 BBBB
qqqq wwww eeee rrrr tttt
yyyyyyy uuuuuuuuu
iiii

and

aaaaaaaa yy-mn-dy hr:mn:sc MSG2 BBBB
zzzz cccc
kkkkkkkk

lllllllll mmmmm nnnn

I want to do a search of the files each day for some previous days messages.
The important data in the message to me is the date (YY-MN-DY),
and the MSG1 (actually MSG[1-50]). Some of the messages have data
in every line (MSG1), and some messages have lines that are blank followed
by lines with data. Is there a good, or simple way to gather into a new
file
all of the previous days MSGs that I want? Hope my question makes sense.
Thanks



 
Reply With Quote
 
 
 
 
Jim McTiernan
Guest
Posts: n/a
 
      06-25-2003

"Martien Verbruggen" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed). ..
> On Tue, 24 Jun 2003 19:27:20 -0700,
> Jman <(E-Mail Removed)> wrote:
> > I am working with files that grow to a size of 1-2 mb each day
> > of the month. The file is closed at the end of each month.
> > The format of the messages is:
> >

snip
>
> It would be better to include _real_ data from your log file, and even
> better to show more than one record, so we can see whether there is
> anything between records/messages that can be used.
>
> > I want to do a search of the files each day for some previous days

messages.
> > The important data in the message to me is the date (YY-MN-DY),
> > and the MSG1 (actually MSG[1-50]). Some of the messages have data
> > in every line (MSG1), and some messages have lines that are blank

followed
> > by lines with data. Is there a good, or simple way to gather into a new
> > file
> > all of the previous days MSGs that I want? Hope my question makes

sense.
>
> Maybe something like (untested):
>
> my $yesterday = "03-06-25"; # assuming that that is the format
> open F, "mylogfile" or die $!;
> while (<F>)
> {
> if (/$yesterday.*MSG(\d\d?)/)
> {
> # We now have the message number in $1
> # Since you're only interested in yesterday, you already know
> # the date. No need to capture it.
> print;
> }
> }
> close F;
>
> I am assuming that none of the other lines have that pattern. I'm also
> assuming that the BBBB bits above don't contain anything matching
> 'MSG\d\d?', or if it foes that it's actually the correct number as
> well.
>
> Hard to tell whether this is sufficient. You give us very little
> information about what exactly you're having trouble with. next time,
> apart from showing real data, also show us what you have tried (real
> code), and which bit exactly you're having trouble with.
>
> Martien
> --
> |
> Martien Verbruggen | True seekers can always find something to
> Trading Post Australia | believe in.
> |


Below is an example of the data that I was trying to reflect.
There is a CTRL M at end of each line after the line that starts "-----New".
and there is a CTRL Y on the line prior to the line that starts "-----New".
I am new at this obviously, my original approach was to delete the data that
I don't need to try to group the messages into paragraphs:
#!/usr/bin/perl -w
while (<>) {
s/^M|^Y|^-.*//;
print:
}

Then I pipe that to another program:
#!/usr/bin/perl -w
$/ = "";
while (<>) {
print if / 03-06-01 /;
}

Here is some file data:

-----New Message Received on 06-01-2003 at 00:00:03 -----

S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
REPT TIME 03-06-01 00:00:03

-----New Message Received on 06-01-2003 at 00:00:06 -----

S570-58785830 03-06-01 00:00:06 611262 SLC SANF
* REPT RT SID=2050 DNUSRT=2-0-60 MINOR FAR END EVENT=12489

-----New Message Received on 06-01-2003 at 02:47:03 -----

S570-58785830 03-06-01 02:47:03 612603 MDIIMON SANF
A REPT MDII CVN SIGTYPE ISUP TKGMN 303-4 SZ 168 OOS 0 ID
SUPRVSN TIME 02:47:03 NEN=2-0-0-1-1-4-3-4 TRIAL 1 CARRFLAG NC
OGT NORMAL CALL CALLED-NO 1288 CALLING-NO 9033
DISCARD 0
OPC 123083056 DPC 456041003 CIC 3004

-----New Message Received on 06-01-2003 at 02:53:01 -----

S32C-942407807 03-06-01 02:53:01 612617 MAINT SANF
M REPT AUDSTAT COMPLETED

ROUTINE AUDIT SCHEDULING IS ALLOWED

-----New Message Received on 06-01-2003 at 02:54:01 -----

S570-58785830 03-06-01 02:54:01 612619 TRCE SANF
A TRC IPCT EVENT 2621

DN=9759 TERM=3-H'329f DIALED


DN=5551212
TIME 02:54:01



 
Reply With Quote
 
 
 
 
Jman
Guest
Posts: n/a
 
      06-26-2003
Actually, as I completely understand the contents of the file, and you do
not,
I am trying to explain what the contents of the file looks like, and have
not
changed my mind on anything. I was attempting to show how I have
tried to handle my task, like you requested. I thought that it would be
better to remove the control characters first, maybe this isn't necessary.
The "MSG" data that I mentioned in my original posting are the second to
last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...
Let's say I want to retrieve all of the MAINT messages from 03-06-13,
what is the best way to do it. Using my style I end up creating large
files,
against which I run another script against, creating another large file,
and running another script against it, until I finally get the data I want.
I would like to be able to run one script, looking for any day of the month
with a particular MSG.
If you can offer anything, thanks, if not thanks anyway
I am doing my best to explain


"Martien Verbruggen" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed). ..
> On Wed, 25 Jun 2003 11:53:37 -0700,
> Jim McTiernan <(E-Mail Removed)> wrote:
> >
> > "Martien Verbruggen" <(E-Mail Removed)> wrote in message
> > news:(E-Mail Removed). ..
> >> On Tue, 24 Jun 2003 19:27:20 -0700,
> >> Jman <(E-Mail Removed)> wrote:
> >> > I am working with files that grow to a size of 1-2 mb each day
> >> > of the month. The file is closed at the end of each month.
> >> > The format of the messages is:
> >> >

> > snip
> >>
> >> It would be better to include _real_ data from your log file, and even
> >> better to show more than one record, so we can see whether there is
> >> anything between records/messages that can be used.
> >>
> >> > I want to do a search of the files each day for some previous days

> > messages.
> >> > The important data in the message to me is the date (YY-MN-DY),
> >> > and the MSG1 (actually MSG[1-50]). Some of the messages have data
> >> > in every line (MSG1), and some messages have lines that are blank

> > followed
> >> > by lines with data. Is there a good, or simple way to gather into a

new
> >> > file
> >> > all of the previous days MSGs that I want? Hope my question makes

> > sense.
> >>
> >> Maybe something like (untested):
> >>
> >> my $yesterday = "03-06-25"; # assuming that that is the format
> >> open F, "mylogfile" or die $!;
> >> while (<F>)
> >> {
> >> if (/$yesterday.*MSG(\d\d?)/)
> >> {
> >> # We now have the message number in $1
> >> # Since you're only interested in yesterday, you already know
> >> # the date. No need to capture it.
> >> print;
> >> }
> >> }
> >> close F;
> >>
> >> I am assuming that none of the other lines have that pattern. I'm also
> >> assuming that the BBBB bits above don't contain anything matching
> >> 'MSG\d\d?', or if it foes that it's actually the correct number as
> >> well.
> >>
> >> Hard to tell whether this is sufficient. You give us very little
> >> information about what exactly you're having trouble with. next time,
> >> apart from showing real data, also show us what you have tried (real
> >> code), and which bit exactly you're having trouble with.

> >
> > Below is an example of the data that I was trying to reflect.
> > There is a CTRL M at end of each line after the line that starts

"-----New".
> > and there is a CTRL Y on the line prior to the line that starts

"-----New".
> > I am new at this obviously, my original approach was to delete the data

that
> > I don't need to try to group the messages into paragraphs:
> > #!/usr/bin/perl -w
> > while (<>) {
> > s/^M|^Y|^-.*//;
> > print:
> > }

>
> So... You're removing any initial M or Y, or anything in a line that
> initially starts with -?
>
> > Then I pipe that to another program:
> > #!/usr/bin/perl -w
> > $/ = "";
> > while (<>) {
> > print if / 03-06-01 /;
> > }

>
> And now you print "paragraphs" that contain that date.
>
> > Here is some file data:
> >
> > -----New Message Received on 06-01-2003 at 00:00:03 -----
> >
> > S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
> > REPT TIME 03-06-01 00:00:03
> >

>
> Well.. That data doesn't look at all like what you described in your
> original post. In your OP, you were talking about being interested in
> some message number, and the date only. I don't see any message
> number.
>
> Given that ctrl-Y seems to be the record separator, or terminator, I'd
> probably set $/ to ctrl-Y, and then process the file message by
> message, selecting on whichever criteria you want, and I'm more
> confused now about what you do and don't want. I'll just make up
> something, and leave it up to you to change it. You're not clear on
> whether all of the dates in those messages can be used, or whether it
> has to be one in the capitalised bits. I'll simply select on that
> first line, because it's easier.
>
>
> #!/usr/local/bin/perl
> use strict;
> use warnings;
>
> # Set record separator to ctrl-Y followed by a newline
> $/ = "\cY\n";
> my $target_date = "06-01-2003";
>
> while (<DATA>)
> {
> chomp;
>
> # We're only interested in records that contain our target date
> next unless /Received on $target_date at/;
>
> # Remove any M or Y following a newline (Just following your code,
> # I think)
> s/\n(M|Y)/\n/g;
>
> # Remove that first line. We are not interested in it.
> s/\A.*--\n//;
>
> # Print what's left
> print;
> }
>
> __DATA__
> -----New Message Received on 06-01-2003 at 00:00:03 -----
>
> S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
> REPT TIME 03-06-01 00:00:03
> 
> -----New Message Received on 06-01-2003 at 00:00:06 -----
>
> S570-58785830 03-06-01 00:00:06 611262 SLC SANF
> * REPT RT SID=2050 DNUSRT=2-0-60 MINOR FAR END EVENT=12489
> 
> -----New Message Received on 06-01-2003 at 02:47:03 -----
>
> S570-58785830 03-06-01 02:47:03 612603 MDIIMON SANF
> A REPT MDII CVN SIGTYPE ISUP TKGMN 303-4 SZ 168 OOS 0 ID
> SUPRVSN TIME 02:47:03 NEN=2-0-0-1-1-4-3-4 TRIAL 1 CARRFLAG NC
> OGT NORMAL CALL CALLED-NO 1288 CALLING-NO 9033
> DISCARD 0
> OPC 123083056 DPC 456041003 CIC 3004
> 
> -----New Message Received on 06-01-2003 at 02:53:01 -----
>
> S32C-942407807 03-06-01 02:53:01 612617 MAINT SANF
> M REPT AUDSTAT COMPLETED
>
> ROUTINE AUDIT SCHEDULING IS ALLOWED
> 
> -----New Message Received on 06-01-2003 at 02:54:01 -----
>
> S570-58785830 03-06-01 02:54:01 612619 TRCE SANF
> A TRC IPCT EVENT 2621
>
> DN=9759 TERM=3-H'329f DIALED
>
>
> DN=5551212
> TIME 02:54:01
> 
>
> Martien
> --
> |
> Martien Verbruggen | Useful Statistic: 75% of the people make up
> Trading Post Australia | 3/4 of the population.
> |



 
Reply With Quote
 
Sam Holden
Guest
Posts: n/a
 
      06-26-2003
On Wed, 25 Jun 2003 20:59:07 -0700, Jman <(E-Mail Removed)> wrote:
> Actually, as I completely understand the contents of the file, and you do
> not,
> I am trying to explain what the contents of the file looks like, and have
> not
> changed my mind on anything. I was attempting to show how I have
> tried to handle my task, like you requested. I thought that it would be
> better to remove the control characters first, maybe this isn't necessary.
> The "MSG" data that I mentioned in my original posting are the second to
> last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...


Of course, all the readers are psychic and knew that when you said "actually
MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
TIME, SLC, MDIIMON, etc...

How foolish of those of us who can't read minds.

--
Sam Holden

 
Reply With Quote
 
Martien Verbruggen
Guest
Posts: n/a
 
      06-26-2003
[Don't top post]


On Wed, 25 Jun 2003 20:59:07 -0700,
Jman <(E-Mail Removed)> wrote:
> Actually, as I completely understand the contents of the file, and you do
> not,
> I am trying to explain what the contents of the file looks like, and have
> not
> changed my mind on anything. I was attempting to show how I have
> tried to handle my task, like you requested. I thought that it would be
> better to remove the control characters first, maybe this isn't necessary.
> The "MSG" data that I mentioned in my original posting are the second to
> last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...
> Let's say I want to retrieve all of the MAINT messages from 03-06-13,
> what is the best way to do it. Using my style I end up creating large
> files,


How are we supposed to know that? You initially said something totally
different from what is in the actual data that you finally posted.
Your data does NOT contain any MSG followed by a number between 1 and
50 at all, but that is what you originally stated. I provided some
code to find that.

Then you post actual data that looks completely different, and I again
do my best to interpret what it is you mean from your half-arsed
specification (including modifying the data according to your
instructions), and again provide some code for you to start with.

All you do is whinge that you're not getting a complete solution to
your underspacified problem, instead of trying to clarify the
confusion that you, yourself, created in the first place.

> against which I run another script against, creating another large file,
> and running another script against it, until I finally get the data I want.
> I would like to be able to run one script, looking for any day of the month
> with a particular MSG.
> If you can offer anything, thanks, if not thanks anyway
> I am doing my best to explain


What was wrong with the suggestions I posted already? if you answer,
please realise that i will not be reading it anymore.

*plonk*

[SNIP of TOFU]

Martien
--
|
Martien Verbruggen | Never hire a poor lawyer. Never buy from a
Trading Post Australia | rich salesperson.
|
 
Reply With Quote
 
Jim McTiernan
Guest
Posts: n/a
 
      06-26-2003

"Sam Holden" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed). ..
> On Wed, 25 Jun 2003 20:59:07 -0700, Jman <(E-Mail Removed)> wrote:
> > Actually, as I completely understand the contents of the file, and you

do
> > not,
> > I am trying to explain what the contents of the file looks like, and

have
> > not
> > changed my mind on anything. I was attempting to show how I have
> > tried to handle my task, like you requested. I thought that it would be
> > better to remove the control characters first, maybe this isn't

necessary.
> > The "MSG" data that I mentioned in my original posting are the second to
> > last words on the first line of each message, e.g. TIME, SLC MDIIMON,

etc...
>
> Of course, all the readers are psychic and knew that when you said

"actually
> MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
> TIME, SLC, MDIIMON, etc...
>
> How foolish of those of us who can't read minds.

I didn't think that it was that hard to understand.
I attempted to recreate the format manually in my first posting.
Sorry this bothered you.
I am thru with this thread.
>
> --
> Sam Holden
>



 
Reply With Quote
 
Jim McTiernan
Guest
Posts: n/a
 
      06-26-2003

"Martien Verbruggen" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed). ..
> [Don't top post]
>
>
> On Wed, 25 Jun 2003 20:59:07 -0700,
> Jman <(E-Mail Removed)> wrote:
> > Actually, as I completely understand the contents of the file, and you

do
> > not,
> > I am trying to explain what the contents of the file looks like, and

have
> > not
> > changed my mind on anything. I was attempting to show how I have
> > tried to handle my task, like you requested. I thought that it would be
> > better to remove the control characters first, maybe this isn't

necessary.
> > The "MSG" data that I mentioned in my original posting are the second to
> > last words on the first line of each message, e.g. TIME, SLC MDIIMON,

etc...
> > Let's say I want to retrieve all of the MAINT messages from 03-06-13,
> > what is the best way to do it. Using my style I end up creating large
> > files,

>
> How are we supposed to know that? You initially said something totally
> different from what is in the actual data that you finally posted.
> Your data does NOT contain any MSG followed by a number between 1 and
> 50 at all, but that is what you originally stated. I provided some
> code to find that.
>
> Then you post actual data that looks completely different, and I again
> do my best to interpret what it is you mean from your half-arsed
> specification (including modifying the data according to your
> instructions), and again provide some code for you to start with.

You seem to be a little thick, you can't even see that I was using
substitution in the original post for the actual data. In retrospect
I would not do that again, it leads to a whole lot of complaining.
>
> All you do is whinge that you're not getting a complete solution to
> your underspacified problem, instead of trying to clarify the
> confusion that you, yourself, created in the first place.

Where did I whinge that I am not getting a complete solution?
I attempted to adjust my explanation to your crankiness.
>
> > against which I run another script against, creating another large file,
> > and running another script against it, until I finally get the data I

want.
> > I would like to be able to run one script, looking for any day of the

month
> > with a particular MSG.
> > If you can offer anything, thanks, if not thanks anyway
> > I am doing my best to explain

>
> What was wrong with the suggestions I posted already? if you answer,
> please realise that i will not be reading it anymore.

The only thing wrong is your annoying attitude, goodbye.
>
> *plonk*
>
> [SNIP of TOFU]
>
> Martien
> --
> |
> Martien Verbruggen | Never hire a poor lawyer. Never buy from a
> Trading Post Australia | rich salesperson.
> |



 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      06-26-2003
Jim McTiernan <(E-Mail Removed)> wrote:
> "Sam Holden" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed). ..
>> On Wed, 25 Jun 2003 20:59:07 -0700, Jman <(E-Mail Removed)> wrote:


>> > Actually, as I completely understand the contents of the file, and you

> do
>> > not,



Right. So it is *your* responsibility to convey what you know to us
if we are to be able to help you.


>> Of course, all the readers are psychic and knew that when you said

> "actually
>> MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
>> TIME, SLC, MDIIMON, etc...
>>
>> How foolish of those of us who can't read minds.


> I didn't think that it was that hard to understand.



That is irrelevant, since you were not explaining it to yourself.

When writing, what matters is the _reader's_ perception, not
the author's perception.


> Sorry this bothered you.
> I am thru with this thread.



I am through with this poster.

*plonk*


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Sam Holden
Guest
Posts: n/a
 
      06-26-2003
On Thu, 26 Jun 2003 08:14:23 -0700,
Jim McTiernan <(E-Mail Removed)> wrote:
>
> "Martien Verbruggen" <(E-Mail Removed)> wrote in message
>>
>> What was wrong with the suggestions I posted already? if you answer,
>> please realise that i will not be reading it anymore.

> The only thing wrong is your annoying attitude, goodbye.


Let's hope you don't have any future perl problems/questions/issues since
the 'experts' of the group (of which I am not one, obviously) aren't going
to be reading them here...

--
Sam Holden

 
Reply With Quote
 
Jim McTiernan
Guest
Posts: n/a
 
      06-26-2003

"Sam Holden" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed). ..
> On Thu, 26 Jun 2003 08:14:23 -0700,
> Jim McTiernan <(E-Mail Removed)> wrote:
> >
> > "Martien Verbruggen" <(E-Mail Removed)> wrote in message
> >>
> >> What was wrong with the suggestions I posted already? if you answer,
> >> please realise that i will not be reading it anymore.

> > The only thing wrong is your annoying attitude, goodbye.

>
> Let's hope you don't have any future perl problems/questions/issues since
> the 'experts' of the group (of which I am not one, obviously) aren't going
> to be reading them here...

That's fine. I'll just won't be able to learn anything else about perl,
or get to be part of these lively conversations.
>
> --
> Sam Holden
>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Search a Large files backwards mud_saisem Perl Misc 7 03-02-2010 10:05 PM
Search regular expression with search for hex values in files? Peter Hanke Perl Misc 1 01-06-2008 08:54 PM
search within a search within a search - looking for better way...my script times out Abby Lee ASP General 5 08-02-2004 04:01 PM
Backing Up Large Files..Or A Large Amount Of Files Scott D. Weber For Unuathorized Thoughts Inc. Computer Support 1 09-19-2003 07:28 PM



Advertisments