Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > How to get file count under a directory?

Reply
Thread Tools

How to get file count under a directory?

 
 
rockdale
Guest
Posts: n/a
 
      09-28-2009
Hi,

I have an application which writes log files out. If then log file
size is great than let's say 1M, the application will create a new log
file with sequence number. the log file format likes
mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
limit.

Now the problem is if my application get restarted, I need to know
what is the largest sequence number of my log file. I am thinking in
a loop from 1 to like 100000, check if the file exist, if it does
not , then I get the max sequence number I need. But this method looks
very awkward. Is there another way to do this(get the max number for a
series of similar files)?

My applicaiton is running on windows platform but did not using MFC
function very much.

Thanks in advance
-Rockdale
 
Reply With Quote
 
 
 
 
Victor Bazarov
Guest
Posts: n/a
 
      09-28-2009
rockdale wrote:
> I have an application which writes log files out. If then log file
> size is great than let's say 1M, the application will create a new log
> file with sequence number. the log file format likes
> mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
> limit.
>
> Now the problem is if my application get restarted, I need to know
> what is the largest sequence number of my log file. I am thinking in
> a loop from 1 to like 100000, check if the file exist, if it does
> not , then I get the max sequence number I need. But this method looks
> very awkward. Is there another way to do this(get the max number for a
> series of similar files)?


Yes, and it's platform-specific. You can probably obtain a list of (or
enumerate) the files whose name fits a certain pattern, like "log_*.*",
and then find your largest number (behind the '*')...

> My applicaiton is running on windows platform but did not using MFC
> function very much.


Try posting to a relevant newsgroup from 'microsoft.public.*' hierarchy
where Windows platform-specific stuff is discussed.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
 
Reply With Quote
 
 
 
 
Sjouke Burry
Guest
Posts: n/a
 
      09-28-2009
rockdale wrote:
> Hi,
>
> I have an application which writes log files out. If then log file
> size is great than let's say 1M, the application will create a new log
> file with sequence number. the log file format likes
> mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
> limit.
>
> Now the problem is if my application get restarted, I need to know
> what is the largest sequence number of my log file. I am thinking in
> a loop from 1 to like 100000, check if the file exist, if it does
> not , then I get the max sequence number I need. But this method looks
> very awkward. Is there another way to do this(get the max number for a
> series of similar files)?
>
> My applicaiton is running on windows platform but did not using MFC
> function very much.
>
> Thanks in advance
> -Rockdale

Step 100 at a time to go past the last one,
then step 1 at a time trough the last partial block.
 
Reply With Quote
 
mzdude
Guest
Posts: n/a
 
      09-28-2009
On Sep 28, 12:50*pm, rockdale <(E-Mail Removed)> wrote:
> Hi,
>
> I have an application which writes log files out. If then log file
> size is great than let's say 1M, the application will create a new log
> file with sequence number. the log file format likes
> mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
> limit.
>
> Now the problem is if my application get restarted, I need to know
> what is the largest sequence number *of my log file. I am thinking in
> a loop from 1 to like 100000, check if the file exist, if it does
> not , then I get the max sequence number I need. But this method looks
> very awkward. Is there another way to do this(get the max number for a
> series of similar files)?
>
> My applicaiton is running on windows platform but did not using MFC
> function very much.
>


Well for starters you can create simple text file to contain the
next numeric number in your log sequence. Every time you increment
your log file number, update the text file.

Then it's simply a matter of opening and reading the number. The
which Operating System (windows, linux, ..) or library (mfc,
boost, ...)
you are using is irrelevant.


NextNumber.txt
1234
 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      09-28-2009
rockdale wrote:
> Is there another way to do this(get the max number for a
> series of similar files)?


http://www.boost.org/doc/libs/1_40_0.../doc/index.htm
 
Reply With Quote
 
Marcel Müller
Guest
Posts: n/a
 
      09-28-2009
Hi,

rockdale wrote:
> I have an application which writes log files out. If then log file
> size is great than let's say 1M, the application will create a new log
> file with sequence number. the log file format likes
> mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
> limit.


don't do that.

Use a time stamp and use a naming convention that follows a canonical
sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt. The guys that must
service your application will appreciate greatly. Furthermore you should
prefer UTC time stamps for logging to avoid confusion with daylight saving.

> Now the problem is if my application get restarted, I need to know
> what is the largest sequence number of my log file.


Either create always a new log if the application gets restarted or
forbear from the size limit and use a time limit instead. I would
recommend the latter. If your application is under heavy load the files
grow larger. What's bad with that?

From the service point of view it is a big advantage to have a
deterministic relation between the file name (in fact something like a
primary key) and the content. And it is even better if the canonical
file name ordering corresponds to their logical order.


> I am thinking in
> a loop from 1 to like 100000, check if the file exist, if it does
> not , then I get the max sequence number I need.


From that you see how bad the idea is. Everyone who searches for a
certain entry has to do the same loop, regardless if program or human.
In fact you have absolutely no advantage over putting all logs of a day
into a single file in this case.

> But this method looks
> very awkward. Is there another way to do this(get the max number for a
> series of similar files)?


No. And since most file systems do not maintain a defined sort ordering,
there is no cheaper solution in general. You could scan the entire
directory content, but this is in the same order.

> My applicaiton is running on windows platform but did not using MFC
> function very much.


That makes no difference here.

Using rotating logs with a fixed time slice is straight forward to
implement, although in case of application restarts. You could use a
simple and fast hash function on the time stamp, that controls log file
switches. Every time the hash changes a virtual method that switches the
log could be invoked. Only his method implements the full rendering of
the file name scheme.
This makes it very easy and with good performance to implement different
cycle times, e.g once per week, once per day and once per hour.

And if you are even smarter you could add a functionality that cleans up
old log automatically once they exceed a configured age. This prevents
from the common issue of full volumes.
Again a fixed relation between the file name and the content is helpful.
All you have to do is to calculate the file name that corresponds to now
minus a configured period and delete all files in the folder which names
compare less to this name and which match the pattern of your logfiles,
e.g. mylogfile_*.txt. Neither you have to touch their content nor you
have to parse the names.
Unfortunately this will always be O(n), so it should not be invoked too
often (e.g. once a day).


Marcel
 
Reply With Quote
 
Suraj
Guest
Posts: n/a
 
      09-28-2009
On Sep 28, 11:27*pm, mzdude <(E-Mail Removed)> wrote:
> On Sep 28, 12:50*pm, rockdale <(E-Mail Removed)> wrote:
>
>
>
> > Hi,

>
> > I have an application which writes log files out. If then log file
> > size is great than let's say 1M, the application will create a new log
> > file with sequence number. the log file format likes
> > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
> > limit.

>
> > Now the problem is if my application get restarted, I need to know
> > what is the largest sequence number *of my log file. I am thinking in
> > a loop from 1 to like 100000, check if the file exist, if it does
> > not , then I get the max sequence number I need. But this method looks
> > very awkward. Is there another way to do this(get the max number for a
> > series of similar files)?

>
> > My applicaiton is running on windows platform but did not using MFC
> > function very much.

>
> Well for starters you can create simple text file to contain the
> next numeric number in your log sequence. Every time you increment
> your log file number, update the text file.
>
> Then it's simply a matter of opening and reading the number. The
> which Operating System (windows, linux, ..) or library (mfc,
> boost, ...)
> you are using is irrelevant.
>
> NextNumber.txt
> * 1234


It may be for starters but since years, we are using a similar
technique to achieve this in the product I work on. Maintaining a file
which contains the current sequence number is what we do.

The log files have names as LogFile_SeqNo.txt (LogFile_1.txt and so
on), maintain a file called CurrentSeqNo.txt which contains the
current sequence number.
Log is written to the file with current sequence number.

If the application restarts or even Windows for that matter, the
application tries to write the file with the current sequence number.
If the file exceeds a particular size, a new file is created with a
new sequence number and the new sequence number is updated in the
CurrentSeqNo.txt.

Best Regards,
Suraj
 
Reply With Quote
 
robertwessel2@yahoo.com
Guest
Posts: n/a
 
      09-28-2009
On Sep 28, 3:18*pm, Marcel Müller <(E-Mail Removed)>
wrote:
> Hi,
>
> rockdale wrote:
> > I have an application which writes log files out. If then log file
> > size is great than let's say 1M, the application will create a new log
> > file with sequence number. the log file format likes
> > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
> > limit.

>
> don't do that.
>
> Use a time stamp and use a naming convention that follows a canonical
> sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt. The guys that must
> service your application will appreciate greatly. Furthermore you should
> prefer UTC time stamps for logging to avoid confusion with daylight saving.



Depending on what the log file is logging, a useful alternative is to
generate log file names with the application's startup time, *plus* a
unique identifier (lie a sequence number). Especially if your
applications handles something along the lines of sessions, which may
show up logged in other places, then a name like "yyyymmdd-hhmmss-
TypeOfLog-nnn.log" may make associating the various bits back together
easier.
 
Reply With Quote
 
Marcel Müller
Guest
Posts: n/a
 
      09-28-2009
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Depending on what the log file is logging, a useful alternative is to
> generate log file names with the application's startup time, *plus* a
> unique identifier (lie a sequence number). Especially if your
> applications handles something along the lines of sessions, which may
> show up logged in other places, then a name like "yyyymmdd-hhmmss-
> TypeOfLog-nnn.log" may make associating the various bits back together
> easier.


Usually I use dedicated columns in the log for session identification.
This keeps the strict event sequence to track potential concurrency
issues even if the time stamps are not accurate enough. A viewer could
filter that, at the easiest grep. Merging different logs is more
complicated.
However, a set of different files can be useful too. E.g. samba uses
this kind of session specific log files.


Marcel
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      09-29-2009
On Sep 28, 9:18 pm, Marcel Müller <(E-Mail Removed)>
wrote:
> rockdale wrote:
> > I have an application which writes log files out. If then
> > log file size is great than let's say 1M, the application
> > will create a new log file with sequence number. the log
> > file format likes mylogfile_mmddyy_1.txt,
> > mylogfile_mmddyy_2.txt. ....without upper limit.


> don't do that.


> Use a time stamp and use a naming convention that follows a
> canonical sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt.
> The guys that must service your application will appreciate
> greatly. Furthermore you should prefer UTC time stamps for
> logging to avoid confusion with daylight saving.


That sounds like a good idea. I'm used to putting the date in
the logfile name, and using a sequential number (with a fixed
number of digits, so a straight sort will put them in order),
but using the time does sound better.

> > Now the problem is if my application get restarted, I need
> > to know what is the largest sequence number of my log file.


> Either create always a new log if the application gets
> restarted or forbear from the size limit and use a time limit
> instead. I would recommend the latter. If your application is
> under heavy load the files grow larger. What's bad with that?


Files that are too large are hard to read and to manipulate.
Depending on the application, a time limit might either result
in an occasional file which is awkwardly large, or a lot of very
small files.

That doesn't mean that you should forego using time completely.
If there are particular moments when the application is largely
quiescent, those are good times to rotate the log; it reduces
the probability of a sequence which interests someone spanning
two different files. (Ideally, of course, the files should be
small enough so that the reader can easily concatenate two of
them, in cases where what interests him spans a rotation.)

> From the service point of view it is a big advantage to have a
> deterministic relation between the file name (in fact
> something like a primary key) and the content. And it is even
> better if the canonical file name ordering corresponds to
> their logical order.


> > I am thinking in a loop from 1 to like 100000, check if the
> > file exist, if it does not , then I get the max sequence
> > number I need.


> From that you see how bad the idea is. Everyone who searches
> for a certain entry has to do the same loop, regardless if
> program or human. In fact you have absolutely no advantage
> over putting all logs of a day into a single file in this
> case.


The readers can do a binary search. For that matter, so could
the program. (But again depending on the application, there may
be so few files that it isn't worth it.)

> > But this method looks very awkward. Is there another way to
> > do this(get the max number for a series of similar files)?


> No. And since most file systems do not maintain a defined sort
> ordering, there is no cheaper solution in general. You could
> scan the entire directory content, but this is in the same
> order.


> > My applicaiton is running on windows platform but did not
> > using MFC function very much.


> That makes no difference here.


> Using rotating logs with a fixed time slice is straight
> forward to implement, although in case of application
> restarts. You could use a simple and fast hash function on the
> time stamp, that controls log file switches.


You don't even need that. On program start-up, it's easy to
calculate the last rotation time from current time; just open
that file for append. There is some argument, however, for
always opening a new log file on program start-up.

> Every time the hash changes a virtual method that switches the
> log could be invoked. Only his method implements the full
> rendering of the file name scheme.
> This makes it very easy and with good performance to implement
> different cycle times, e.g once per week, once per day and
> once per hour.


> And if you are even smarter you could add a functionality that
> cleans up old log automatically once they exceed a configured
> age. This prevents from the common issue of full volumes.


This is usually done by means of a cronjob (or whatever it is
called under Windows---it surely exists), using a fairly simple
script. Typically, the log files will go through a stage where
they are compressed, before being completely deleted. (E.g.
compress anything older than a day, and delete anything older
than a week.)

--
James Kanze
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Count = Count + 1 Using only std_logic_1164 Doubt efelnavarro09 VHDL 2 01-26-2011 03:49 AM
help : my jar file is not running under linux terminal , but it runs under JbuilderX ide bronby Java 1 07-15-2005 07:23 AM
Count(*) in a Subquery with multiple tables: How does SQL determine which table to generate the Count() from? Kaimuri MCSD 3 12-29-2004 06:38 PM
I am adding a new row to the datagrid dynamically but if i use the Count property of Item it is not showing the count of the new rows being added Praveen Balanagendra via .NET 247 ASP .Net 2 06-06-2004 07:16 AM



Advertisments