Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > [ANN] KirbyBase 2.2

Reply
Thread Tools

[ANN] KirbyBase 2.2

 
 
Jamey Cribbs
Guest
Posts: n/a
 
      05-03-2005
I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby
database management system that stores it's data in plain-text files.

You can download the new version here:

Windows: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.zip
Linux/Unix: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.tar.gz

You can find out more about Kirbybase at:

http://www.netpromi.com/kirbybase_ruby.html

I would like to thank Hugh Sasse for his bug fixes and code enhancements
and I would like to thank Emiel van de Larr for his bug fixes.


List of changes:

* By far the biggest change in this version is that I have completely
redesigned the internal structure of the database code. Because the
KirbyBase and KBTable classes were too tightly coupled, I have created
a KBEngine class and moved all low-level I/O logic and locking logic
to this class. This allowed me to restructure the KirbyBase class to
remove all of the methods that should have been private, but couldn't be
because of the coupling to KBTable. In addition, it has allowed me to
take all of the low-level code that should not have been in the KBTable
class and put it where it belongs, as part of the underlying engine. I
feel that the design of KirbyBase is much cleaner now. No changes were
made to the class interfaces, so you should not have to change any of
your code.

* Changed str_to_date and str_to_datetime to use Date#parse method.

* Changed #pack method so that it no longer reads the whole file into
memory while packing it.

* Changed code so that special character sequences like &linefeed; can be
part of input data and KirbyBase will not interpret it as special
characters.

Enjoy!

Jamey Cribbs
http://www.velocityreviews.com/forums/(E-Mail Removed)



 
Reply With Quote
 
 
 
 
Oliver Cromm
Guest
Posts: n/a
 
      05-17-2005
* Jamey Cribbs wrote:

> I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby
> database management system that stores it's data in plain-text files.


The idea of plain text files appealed to me a lot (I had been pondering
something similar myself, but couldn't have implemented in such a
general fashion), so I decided to try it in my Usenet news statistics
script, on which I'm learning lots of Ruby techniques.

So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.

Unfortunately, it turned out to be none faster. I wonder if I'm doing
anything wrong. What I save in time waiting for the server, KirbyBase
seems to eat away in processing time (disk access hardly mentionable
with my 6000 rows, 10KB of data). Is it true that you need a lot of
processing power to use it, and my PIII-500 (Win-2K/Cygwin) is just not
up to the task?

You said:
| Right now, it performs pretty well on small databases

and even

| It is fairly fast, comparing favorably to SQLite

Well, one reason to try it was that I had installation problems with
SQLite, so I can't compare directly, but now I wonder how it could ever
compete. One select for string equality on my 6000 rows takes half a
second or so, so I gave up on that completely.
--
Oliver C.
45n31, 73w34
Temperatur: 6.9C (13 May 2005 10:00 AM EDT)
 
Reply With Quote
 
 
 
 
Jamey Cribbs
Guest
Posts: n/a
 
      05-18-2005
Oliver Cromm wrote:

>So for a start, I plugged KirbyBase in just as a cache - where before, I
>was reading header data from a news server each time, in the new version
>I save the raw data to a KirbyBase, add only recent messages, then read
>the part of the data I want (by date) from the KirbyBase.
>
>
>

This might be the source of the slowness. Is this field that you are
reading by date defined as a Date field in the KirbyBase table? If it
is, this is probably the problem. As I note in the manual, Ruby's
Date/DateTime librarys are S-L-O-W! They really need to be rewritten as
C libraries. Every time KirbyBase does a select on a Date field, it has
to read in each record from the table's physical file and do a Date.new
on the data. Like I said, this is slow!

Here is an alternative to try: define this field in the table as a
String field instead of a Date field. Select's will still work pretty
much the same way because, for example:

2005-05-25 > 2005-05-24

and

Date.new(2005,05,25) > Date.new(2005,05,24)

are both true. In other words, Strings formatted similarly to the way
Date's look compare the same way.

Give this a try and see if you see a speed improvement. I have tried it
and have seen dramatic improvements.

Let me know how it goes.

Jamey


 
Reply With Quote
 
gabriele renzi
Guest
Posts: n/a
 
      05-18-2005
Jamey Cribbs ha scritto:

> Here is an alternative to try: define this field in the table as a
> String field instead of a Date field. Select's will still work pretty
> much the same way because, for example:
>
> 2005-05-25 > 2005-05-24
>
> and
>
> Date.new(2005,05,25) > Date.new(2005,05,24)
>
> are both true. In other words, Strings formatted similarly to the way
> Date's look compare the same way.
>
> Give this a try and see if you see a speed improvement. I have tried it
> and have seen dramatic improvements.
>
> Let me know how it goes.


why don't use a Time object?
 
Reply With Quote
 
Oliver Cromm
Guest
Posts: n/a
 
      05-18-2005
Jamey Cribbs wrote:

> Oliver Cromm wrote:
>
>>So for a start, I plugged KirbyBase in just as a cache - where before, I
>>was reading header data from a news server each time, in the new version
>>I save the raw data to a KirbyBase, add only recent messages, then read
>>the part of the data I want (by date) from the KirbyBase.
>>

> This might be the source of the slowness. Is this field that you are
> reading by date defined as a Date field in the KirbyBase table? [...]
>
> Here is an alternative to try: define this field in the table as a
> String field instead of a Date field. Select's will still work pretty
> much the same way because, for example:
>
> 2005-05-25 > 2005-05-24


I left the Date field as a string in the format I originally receive
them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
use ParseDate. This is overhead for sure, but the point is that it is
the same thing I do for the non-caching version (receive a specified
number of Dates and decide which are within my limits).

But I'll go ahead and try a version where I parse at read-in time and
store the result, which would be a number (or two numbers, as I'd want
to keep the time zone separate).
--
WinErr 008: Erroneous error. Nothing is wrong.
 
Reply With Quote
 
Jamey Cribbs
Guest
Posts: n/a
 
      05-18-2005
gabriele renzi wrote:

> why don't use a Time object?
>

I chose to have Date/DateTime be field types in KirbyBase, rather than
Time, because Time can only store dates back to 1970.

Jamey


 
Reply With Quote
 
Jamey Cribbs
Guest
Posts: n/a
 
      05-24-2005
Christian Neukirchen wrote:

>Jamey Cribbs <(E-Mail Removed)> writes:
>
>
>
>>gabriele renzi wrote:
>>
>>
>>
>>>why don't use a Time object?
>>>
>>>
>>>

>>I chose to have Date/DateTime be field types in KirbyBase, rather than
>>Time, because Time can only store dates back to 1970.
>>
>>

>
>ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.0]
>
>irb(main):006:0> Time.at -1600000000
>=> Sun Apr 20 12:33:20 CET 1919
>
>


When I tried this on my WindowsXP machine I got the following error:

irb(main):001:0> Time.at -1600000000
ArgumentError: time must be positive
from (irb):1:in `at'
from (irb):1
irb(main):002:0>


So, it does not let you use negative Times on XP. That's why I had to
use Date/DateTime.

Jamey


Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.


 
Reply With Quote
 
Oliver Cromm
Guest
Posts: n/a
 
      05-25-2005
* Oliver Cromm wrote:

> Jamey Cribbs wrote:
>
>> Oliver Cromm wrote:
>>
>>>So for a start, I plugged KirbyBase in just as a cache - where before, I
>>>was reading header data from a news server each time, in the new version
>>>I save the raw data to a KirbyBase, add only recent messages, then read
>>>the part of the data I want (by date) from the KirbyBase.
>>>

>> This might be the source of the slowness. Is this field that you are
>> reading by date defined as a Date field in the KirbyBase table? [...]
>>
>> Here is an alternative to try: define this field in the table as a
>> String field instead of a Date field. Select's will still work pretty
>> much the same way because, for example:
>>
>> 2005-05-25 > 2005-05-24

>
> I left the Date field as a string in the format I originally receive
> them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
> use ParseDate. This is overhead for sure, but the point is that it is
> the same thing I do for the non-caching version (receive a specified
> number of Dates and decide which are within my limits).
>
> But I'll go ahead and try a version where I parse at read-in time and
> store the result, which would be a number (or two numbers, as I'd want
> to keep the time zone separate).


I found some time now for further experiments, and stored time as an
integer. And yes, it is significantly faster this way, even slightly
faster than my first attempt to do the same with SQLite.

Times from some test with similar, not exactly equal tasks, so read with
spoons of salt:
- reading data fresh from News server: 50s
- reading from KirbyBase with original format (rfc2822) Date field: 45s
- reading from KirbyBase with Date as Integer: 12s
- reading from SQLite with Date as Integer: 16s

I have to do quite a number of calculations on that field; for every
record selected (and in my simple experiments, that is nearly all of
them), I need to extract at least the day of the week and the day
number. But apparently, that doesn't take nearly as much time as a
KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
what is going on with the select, but I know how to circumvent the
problem.
--
Oliver C.
45n31, 73w34
Temperatur: 14.9C (25 May 2005 11:00 AM EDT)
 
Reply With Quote
 
Jamey Cribbs
Guest
Posts: n/a
 
      05-26-2005
Oliver Cromm wrote:

>I found some time now for further experiments, and stored time as an
>integer. And yes, it is significantly faster this way, even slightly
>faster than my first attempt to do the same with SQLite.
>
>Times from some test with similar, not exactly equal tasks, so read with
>spoons of salt:
>- reading data fresh from News server: 50s
>- reading from KirbyBase with original format (rfc2822) Date field: 45s
>- reading from KirbyBase with Date as Integer: 12s
>- reading from SQLite with Date as Integer: 16s
>
>I have to do quite a number of calculations on that field; for every
>record selected (and in my simple experiments, that is nearly all of
>them), I need to extract at least the day of the week and the day
>number. But apparently, that doesn't take nearly as much time as a
>KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
>what is going on with the select, but I know how to circumvent the
>problem.
>
>

If I remember my experiments correctly when I first ported KirbyBase
from Python to Ruby and noticed the significant speed difference when
using Date/Datetime, my guess was that there isn't anything going on in
#select that is causing the slowness. It is just that, in Ruby,
creating a new Date/DateTime object is relatively slow, compared to
Python. My further guess as to why this was is that, in Python, the
datetime library is written in C, while in Ruby, the Date/DateTime
library is written in Ruby. How's that for exhaustive scientific
analysis?

I could be totally wrong about this, but I am guessing that if the
Date/DateTime library was re-written in C, it would be significantly
faster and you would likewise notice a marked speed improvement while
using Date/DateTime fields in KirbyBase. Unfortunately, since I am not
a C programmer, I can't actually do this to test my theory. Hence, my
workaround is to usually define any date fields I need as String
fields. It speeds things up and, for comparison purposes, things pretty
much work the same way.

Jamey

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ANNOUNCE: KirbyBase 1.7.1 (Bugfix Release) Jamey Cribbs Python 0 01-31-2005 11:51 PM
ANNOUNCE: KirbyBase 1.7 Jamey Cribbs Python 3 01-31-2005 06:40 PM
ANNOUNCE: KirbyBase 1.5 Jamey Cribbs Python 0 09-04-2003 05:12 PM
ANNOUNCE: KirbyBase 1.4 Jamey Cribbs Python 0 08-27-2003 06:53 PM
ANNOUNCE: KirbyBase 1.3 Jamey Cribbs Python 0 08-16-2003 01:56 AM



Advertisments