Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > canonical way for handling raw data

Reply
Thread Tools

canonical way for handling raw data

 
 
Matthias Czapla
Guest
Posts: n/a
 
      08-24-2003
Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.

Regards
lal
 
Reply With Quote
 
 
 
 
Gianni Mariani
Guest
Posts: n/a
 
      08-24-2003
Matthias Czapla wrote:
> Hi!
>
> Whats the canonical way for handling raw data. I want to read a file without
> making any assumption about its structure and store portions of it in memory
> and compare ranges with constant byte sequences. _I_ would read it
> into arrays of unsigned char and use C's memcmp(), but as you see Im a
> novice C++ programmer and think that theres some better, typically used,
> way.
>


I've seen all kinds of messes when handling raw data !

Before you go down writing memcmp everywhere, ask yourself, what do
these "chunks of raw data" do ?

Do you:
- concatenate them
- do you write to them
- do you convert them
- do you break them up into smaller chunks

..... write a list of operations you do with them.

Sometimes you'll benefit from using a regular vector<char> and sometimes
you need somthing a little fancier.

I tend to write code that avoids copying data and so I usually have a
"Buffer" class where I can create create chunks of raw data and
reference chunks within those chunks .... etc The idea is that data is
not copied.




 
Reply With Quote
 
 
 
 
Matthias Czapla
Guest
Posts: n/a
 
      08-24-2003
Gianni Mariani wrote:
> Matthias Czapla wrote:
> > Hi!
> >
> > Whats the canonical way for handling raw data. I want to read a file without
> > making any assumption about its structure and store portions of it in memory
> > and compare ranges with constant byte sequences. _I_ would read it
> > into arrays of unsigned char and use C's memcmp(), but as you see Im a
> > novice C++ programmer and think that theres some better, typically used,
> > way.
> >

>
> I've seen all kinds of messes when handling raw data !
>
> Before you go down writing memcmp everywhere, ask yourself, what do
> these "chunks of raw data" do ?
>
> Do you:
> - concatenate them
> - do you write to them
> - do you convert them
> - do you break them up into smaller chunks
>
> .... write a list of operations you do with them.


Ok, I have an image file of some smartcard used in a digital camera which was
accidentally deleted/formatted. I want to search in this file for occurences
of one of several byte sequences which indicate the start of a JPEG picture.
So Im interested in the position of these sequences in the file.

I already wrote a pure C program which works seemingly well but Im currently
in the process of gronking C++ and want to reimplement the program the C++ way.

Regards
lal
 
Reply With Quote
 
Thomas Matthews
Guest
Posts: n/a
 
      08-25-2003
Matthias Czapla wrote:
> Hi!
>
> Whats the canonical way for handling raw data. I want to read a file without
> making any assumption about its structure and store portions of it in memory
> and compare ranges with constant byte sequences. _I_ would read it
> into arrays of unsigned char and use C's memcmp(), but as you see Im a
> novice C++ programmer and think that theres some better, typically used,
> way.
>
> Regards
> lal


The method for handling raw unstructured data is to read it into a
buffer, then parse the buffer.

One process that I use is to have classes for each datum type and have
the classes provide a "load from buffer" and "store to buffer"
methods. I then pass a pointer to the buffer and call the load
methods of the class. The load method would advance the buffer
pointer:
class MyClass
{
public:
void load_from_buffer(unsigned char * & buffer_pointer);
};

void
MyClass ::
load_from_buffer(unsigned char * & buffer_pointer)
{
my_item = *((/* type of my_item */ *) buffer_pointer);
buffer_pointer += sizeof /* type of my item */;
// ...
return;
}

also:
template <class AnyType>
AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
{
return *((AnyType *) buffer_pointer);
}



--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

 
Reply With Quote
 
Matthias Czapla
Guest
Posts: n/a
 
      08-25-2003
Thomas Matthews wrote:
> The method for handling raw unstructured data is to read it into a
> buffer, then parse the buffer.
>
> One process that I use is to have classes for each datum type and have
> the classes provide a "load from buffer" and "store to buffer"
> methods. I then pass a pointer to the buffer and call the load
> methods of the class. The load method would advance the buffer
> pointer:
> class MyClass
> {
> public:
> void load_from_buffer(unsigned char * & buffer_pointer);
> };
>
> void
> MyClass ::
> load_from_buffer(unsigned char * & buffer_pointer)
> {
> my_item = *((/* type of my_item */ *) buffer_pointer);
> buffer_pointer += sizeof /* type of my item */;
> // ...
> return;
> }
>
> also:
> template <class AnyType>
> AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
> {
> return *((AnyType *) buffer_pointer);
> }


Tanks for your reply. I thought about using a separate class for I/O too.
The most important point for me in your explanation is the use of unsigned
char to hold the data. Mind you asking me whats the advantage of using
unsigned over signed char? Do you agree to using std::ifstream::read() for
reading the data?
 
Reply With Quote
 
Thomas Matthews
Guest
Posts: n/a
 
      08-26-2003
Matthias Czapla wrote:
> Thomas Matthews wrote:
>
>
> Tanks for your reply. I thought about using a separate class for I/O too.
> The most important point for me in your explanation is the use of unsigned
> char to hold the data. Mind you asking me whats the advantage of using
> unsigned over signed char? Do you agree to using std::ifstream::read() for
> reading the data?


Unsigned char allows usage of all the bits, without any worries about
overflow and signing. I just want a simple 'byte' or smallest
accessible unit. The 'signed' quantities have issues when it comes
to bitmanipulation (such as shifting).

I guess it's just my style. You can find good discussions about
signed and unsigned integral types in this newsgroup and
our neighbor news:comp.lang.c++.

You can use ifstream::read() as long as the file is opened in
binary mode. The binary mode tells the compiler/platform to
_NOT_ perform any translations on the data.

There are also claims that fread() is simpler and faster.
However, since developer time and quality is more important
than speed, go with ifstream::read().

In my Binary_Stream class, I have a pure virtual function:
unsigned long size_on_stream() const = 0;
All classes that use the Binary_Stream interface must provide
the size that they occupy on the stream. This allows one to
query an object about the size of data it requires in order
to allocate a buffer for reading:
unsigned long buffer_size = my_msg.size_on_stream();
unsigned char * buffer = new unsigned char[buffer_size];
my_data_file.read(buffer, buffer_size);
unsigned char * buf_ptr(buffer);
my_msg.load_from_buffer(buf_ptr);
delete [] buffer;
One nice benefit is that objects can be written to and read
from a stream without knowing any details about the object!

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

 
Reply With Quote
 
Thomas Matthews
Guest
Posts: n/a
 
      08-26-2003
Thomas Matthews wrote:

> Matthias Czapla wrote:
>
>> Thomas Matthews wrote:

> I guess it's just my style. You can find good discussions about
> signed and unsigned integral types in this newsgroup and
> our neighbor news:comp.lang.c++.


That should be news:comp.lang.c.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

 
Reply With Quote
 
Matthias Czapla
Guest
Posts: n/a
 
      08-27-2003
Thomas Matthews wrote:
> > char to hold the data. Mind you asking me whats the advantage of using
> > unsigned over signed char? Do you agree to using std::ifstream::read() for
> > reading the data?

>
> Unsigned char allows usage of all the bits, without any worries about
> overflow and signing. I just want a simple 'byte' or smallest
> accessible unit. The 'signed' quantities have issues when it comes
> to bitmanipulation (such as shifting).


I see.

> I guess it's just my style. You can find good discussions about
> signed and unsigned integral types in this newsgroup and
> our neighbor news:comp.lang.c++.
>
> You can use ifstream::read() as long as the file is opened in
> binary mode. The binary mode tells the compiler/platform to
> _NOT_ perform any translations on the data.


Ill remember that.

> There are also claims that fread() is simpler and faster.
> However, since developer time and quality is more important
> than speed, go with ifstream::read().


And as I stated elsewhere I want to do it the "C++ way".

> In my Binary_Stream class, I have a pure virtual function:
> unsigned long size_on_stream() const = 0;
> All classes that use the Binary_Stream interface must provide
> the size that they occupy on the stream. This allows one to
> query an object about the size of data it requires in order
> to allocate a buffer for reading:
> unsigned long buffer_size = my_msg.size_on_stream();
> unsigned char * buffer = new unsigned char[buffer_size];
> my_data_file.read(buffer, buffer_size);
> unsigned char * buf_ptr(buffer);
> my_msg.load_from_buffer(buf_ptr);
> delete [] buffer;
> One nice benefit is that objects can be written to and read
> from a stream without knowing any details about the object!


Very nice. That has given me an idea about the topic. As it seems raw data
handling isnt too different from Cs and when I think about it this is
logical since this is very low level. Thank you for your help.

Regards
lal
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
"Canonical" way of deleting elements from lists Robert Latest Python 12 01-09-2008 08:07 PM
Canonical way to copy an array Frederick Gotham C++ 15 08-20-2006 09:09 AM
Canonical way of dealing with null-separated lines? Douglas Alan Python 17 03-02-2005 01:25 AM
Canonical way to find if a date is valid or not ? foo Java 4 01-26-2005 08:13 PM
Properties, canonical way of reading/writing configuration files. Alex Polite Java 17 06-08-2004 04:04 PM



Advertisments