Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Efficiently Parsing Data

Reply
Thread Tools

Efficiently Parsing Data

 
 
Jasper
Guest
Posts: n/a
 
      12-14-2007
Hi,

I have multiple data files which need parsing in realtime so high
performance is *crucial*.

I dont have a format definition, but from what I can see there is a
hierarchy of data.
Each data field is named thus <"name":> (the <> are mine).
The data can be quoted text or unquoted text or a composite hierarcy field.
Each name/data pair is terminated by a comma unless it is the last in the
group.

A comma can also appear within a quoted text data field.

The hierarchical tokens are open and close braces <{}> and open and
close square brackets <[]>.

Thats all there is to it

The data describes, say, a school class, so we have a rigid set of data
groups.
eg we have data describing the teacher, data describing the class taken, and
a repeating group describing each kid and grades.

So it would be nice to be able to parse this data out into appropriate
structures.

Below is a snipped of dummy data (in reality there is much more). I have
added the spacing and carriage returns for clarity. The real data has no
white spaces. There may be a variable number of parameters (I think) so it
would be useful to be able to ID and potentially store the variable name
with its data value.

Anyone got any ideas/code snips/references of the best, most speedy (at run
time), way to go about it? A tight, pure c++ solution (with or without the
stl) would be needed.

Thanks in advance for any help


{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"},


"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","sex":"Female}],
"Brown":
[{"First Name":"John","sex":"Male}],
"Jackson":
[{"First Name":"Jackie","sex":"Female}]
}
],


"Grades":
[
{
"Test":
[{"grade":A,"points":68},{"grade":B,"points":25},{" grade":C,"points":15}],
"Test":
[{"grade":C,"points":2},{"grade":B,"points":29},{"g rade":A,"points":55}],
"Test":
[{"grade":C,"points":2},{"grade":A,"points":72},{"g rade":A,"points":65}]
}
]

}











 
Reply With Quote
 
 
 
 
Victor Bazarov
Guest
Posts: n/a
 
      12-14-2007
Jasper wrote:
> I have multiple data files which need parsing in realtime so high
> performance is *crucial*.
>
> I dont have a format definition, but from what I can see there is a
> hierarchy of data.


You better come up with a definition, otherwise you're programming
without a spec. Even if you are reverse-engineering, you need to
begin by writing a specification. A good spec gets you half way
to the solution.

Once you have the definition, you can write a flex/yacc grammar for
it, and then you generate the code that handles that file. Simple
as that.

> [..]
> Anyone got any ideas/code snips/references of the best, most speedy
> (at run time), way to go about it? A tight, pure c++ solution (with
> or without the stl) would be needed.


I can only say, good luck with your homework!

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask


 
Reply With Quote
 
 
 
 
Jasper
Guest
Posts: n/a
 
      12-14-2007

> You better come up with a definition, otherwise you're programming
> without a spec. Even if you are reverse-engineering, you need to
> begin by writing a specification. A good spec gets you half way
> to the solution.


The only thing that defines the data as far as I can see is it's
hierarchical structure as given by the bracket and square bracket.
The token names and data values are irrelevant to this.

> Once you have the definition, you can write a flex/yacc grammar for
> it, and then you generate the code that handles that file. Simple
> as that.


I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar with
it, but it seems a bit of overkill for what I want.
I wondered if I could adapt a lightweight XML parser, but all I need is a
sort of DOM based on the "{}" and "[]"


>> [..]
>> Anyone got any ideas/code snips/references of the best, most speedy
>> (at run time), way to go about it? A tight, pure c++ solution (with
>> or without the stl) would be needed.

>
> I can only say, good luck with your homework!
>


Homework? What's that supposed to mean?



 
Reply With Quote
 
anon
Guest
Posts: n/a
 
      12-14-2007
Jasper wrote:
>> Once you have the definition, you can write a flex/yacc grammar for
>> it, and then you generate the code that handles that file. Simple
>> as that.

>
> I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar with
> it, but it seems a bit of overkill for what I want.
> I wondered if I could adapt a lightweight XML parser, but all I need is a
> sort of DOM based on the "{}" and "[]"


Maybe this can help you:
http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html

>
>
>>> [..]
>>> Anyone got any ideas/code snips/references of the best, most speedy
>>> (at run time), way to go about it? A tight, pure c++ solution (with
>>> or without the stl) would be needed.

>> I can only say, good luck with your homework!
>>

>
> Homework? What's that supposed to mean?


Here is definition:
http://en.wikipedia.org/wiki/Homework
 
Reply With Quote
 
Jasper
Guest
Posts: n/a
 
      12-14-2007

"anon" <(E-Mail Removed)> wrote in message
news:fjtgb9$ndn$(E-Mail Removed)...
> Jasper wrote:
>>> Once you have the definition, you can write a flex/yacc grammar for
>>> it, and then you generate the code that handles that file. Simple
>>> as that.

>>
>> I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar
>> with it, but it seems a bit of overkill for what I want.
>> I wondered if I could adapt a lightweight XML parser, but all I need is
>> a sort of DOM based on the "{}" and "[]"

>
> Maybe this can help you:
> http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html
>


Thanks, I'll take a look. If you kow about the tool and XML (I assume) - I
dont.
Will I have to rewrite the parser to handle the brackets or are they part of
the XML spec (in some way)?
(I'm just asking for a "quickstart").


>>
>> Homework? What's that supposed to mean?

>
> Here is definition:
> http://en.wikipedia.org/wiki/Homework



Oh thats what it is. Thanls


 
Reply With Quote
 
anon
Guest
Posts: n/a
 
      12-14-2007
Jasper wrote:
> "anon" <(E-Mail Removed)> wrote in message
> news:fjtgb9$ndn$(E-Mail Removed)...
>> Jasper wrote:
>>>> Once you have the definition, you can write a flex/yacc grammar for
>>>> it, and then you generate the code that handles that file. Simple
>>>> as that.
>>> I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar
>>> with it, but it seems a bit of overkill for what I want.
>>> I wondered if I could adapt a lightweight XML parser, but all I need is
>>> a sort of DOM based on the "{}" and "[]"

>> Maybe this can help you:
>> http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html
>>

>
> Thanks, I'll take a look. If you kow about the tool and XML (I assume) - I
> dont.


There is a tutorial, explaining xml format and the library.

> Will I have to rewrite the parser to handle the brackets or are they part of
> the XML spec (in some way)?
> (I'm just asking for a "quickstart").


I don't know about brackets. Maybe
 
Reply With Quote
 
Frank Bergemann
Guest
Posts: n/a
 
      12-14-2007
Hi Jasper,

you might want to have a look at boost::serialization:

http://boost.org/libs/serialization/doc/index.html

It is a bit tricky to get in touch with 1st.
But once understood, it is straight forward to use.

rgds!

Frank
 
Reply With Quote
 
Matthias Buelow
Guest
Posts: n/a
 
      12-14-2007
Jasper wrote:

> I wondered if I could adapt a lightweight XML parser, but all I need is a


Wrong turn. I know, today it's XML for any conceivable i/o situation all
hyped up so that people hardly know that software can actually exist
that doesn't use XML, but still. You said you want something fast (and
probably simple).

> sort of DOM based on the "{}" and "[]"


You can easily write a simple recursive-descent parser for that. A
recursive descent parser is just another name for the intuitive solution.
 
Reply With Quote
 
Jasper
Guest
Posts: n/a
 
      12-14-2007

"Matthias Buelow" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Jasper wrote:
>
>> I wondered if I could adapt a lightweight XML parser, but all I need is
>> a

>
> Wrong turn. I know, today it's XML for any conceivable i/o situation all
> hyped up so that people hardly know that software can actually exist
> that doesn't use XML, but still. You said you want something fast (and
> probably simple).


Actually, I have just discovered that the format of the data is JSON.


 
Reply With Quote
 
AnonMail2005@gmail.com
Guest
Posts: n/a
 
      12-15-2007
On Dec 14, 6:34 am, "Jasper" <(E-Mail Removed)> wrote:
> "Matthias Buelow" <(E-Mail Removed)> wrote in message
>
> news:(E-Mail Removed)...
>
> > Jasper wrote:

>
> >> I wondered if I could adapt a lightweight XML parser, but all I need is
> >> a

>
> > Wrong turn. I know, today it's XML for any conceivable i/o situation all
> > hyped up so that people hardly know that software can actually exist
> > that doesn't use XML, but still. You said you want something fast (and
> > probably simple).

>
> Actually, I have just discovered that the format of the data is JSON.


As someone already mentioned, figuring out the format gets you well on
your
way to a solution. Google JSON C++ and you will see that there are
existing
solutions written to parse that format. So now the next step is to
determine
whether that work is sufficient for your task.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
parsing tab separated data efficiently into numpy/pylab arrays per Python 2 03-24-2009 04:15 PM
Parsing content from textboxes efficiently taa ASP .Net 0 07-24-2008 10:42 AM
Parsing large web server logfiles efficiently ashutosh.gaur@gmail.com Perl Misc 14 01-19-2006 05:08 PM
Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently Jane Austine Python 14 10-09-2004 05:54 PM
- Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently Jane Austine Python 2 10-05-2004 01:54 PM



Advertisments