Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Tree in C++

Reply
Thread Tools

Tree in C++

 
 
Ian
Guest
Posts: n/a
 
      08-13-2004
Casper wrote:
> In scanning a drive and comparing file content, I need to remember
> filepaths to every single file I encounter so I can take actions on
> certain files later on.
>
> Rather than having a huge list enumerating the complete filepath to
> every file it seems the smarter way (faster, more memmory efficient) is
> to model the filesystem treestructure in a abstract tree - and having
> only the filenames & node pointer in an Array.
>
> struct tree {
> struct tree *parent;
> struct tree *child;
> CString filename;
> };
>

What's a CString?

> struct file {
> CString name;
> struct tree *path;
> };
>
> Does this sound like a good idea? Specifically, can I utilize an
> existing data structure? What about stack/heap issues and tradeoffs?
>
> Not being awfully experienced with trees and pointers, why won't this run:
>
> struct tree *root;
> root = ( struct tree * ) malloc ( sizeof( struct tree ) );
> root->filename = "Test";
>

Why use malloc?

tree* root = new tree;

would be better.

Ian
 
Reply With Quote
 
 
 
 
David Hilsee
Guest
Posts: n/a
 
      08-13-2004
"Casper" <(E-Mail Removed)> wrote in message
news:eAVSc.14460$(E-Mail Removed).. .
> In scanning a drive and comparing file content, I need to remember
> filepaths to every single file I encounter so I can take actions on
> certain files later on.
>
> Rather than having a huge list enumerating the complete filepath to
> every file it seems the smarter way (faster, more memmory efficient) is
> to model the filesystem treestructure in a abstract tree - and having
> only the filenames & node pointer in an Array.
>
> struct tree {
> struct tree *parent;
> struct tree *child;
> CString filename;
> };
>
> struct file {
> CString name;
> struct tree *path;
> };


This doesn't make much sense to me. Perhaps you meant something like this:

struct dir_or_file {
// Note: "struct" not necessary for C++
dir_or_file *parent;
int numChildren;
// pointer to first element of array of size numChildren
dir_or_file *child;
CString name;

// maybe some flag here indicating whether or not it is a file or a
directory
};

> Does this sound like a good idea? Specifically, can I utilize an
> existing data structure? What about stack/heap issues and tradeoffs?
>
> Not being awfully experienced with trees and pointers, why won't this run:
>
> struct tree *root;
> root = ( struct tree * ) malloc ( sizeof( struct tree ) );
> root->filename = "Test";


Your mistake is that you used malloc. The object to which "root" points did
not have its constructor executed before you attempted to assign the value
"Test" to one of its members. It it important to execute constructors
before attempting to use the object, so you should have instead used "new":

root = new tree();

This will cause the (compiler-generated) constructor for the object to be
executed. Now, the object will be properly initialized and the assignment
should not cause any problems.

--
David Hilsee


 
Reply With Quote
 
 
 
 
Casper
Guest
Posts: n/a
 
      08-13-2004
In scanning a drive and comparing file content, I need to remember
filepaths to every single file I encounter so I can take actions on
certain files later on.

Rather than having a huge list enumerating the complete filepath to
every file it seems the smarter way (faster, more memmory efficient) is
to model the filesystem treestructure in a abstract tree - and having
only the filenames & node pointer in an Array.

struct tree {
struct tree *parent;
struct tree *child;
CString filename;
};

struct file {
CString name;
struct tree *path;
};

Does this sound like a good idea? Specifically, can I utilize an
existing data structure? What about stack/heap issues and tradeoffs?

Not being awfully experienced with trees and pointers, why won't this run:

struct tree *root;
root = ( struct tree * ) malloc ( sizeof( struct tree ) );
root->filename = "Test";

---
Unhandled exception at 0x0041b07b in DubWiz.exe: 0xC0000005: Access
violation reading location 0xcdcdcdc1.
In "atlsimpstr.h", GetLength() method (VC++ 7.0 compiler using MFC/ATL)
---

Thanks in advance,
Casper
 
Reply With Quote
 
Casper
Guest
Posts: n/a
 
      08-13-2004
David Hilsee wrote:
> This doesn't make much sense to me. Perhaps you meant something like this:
>
> struct dir_or_file {
> // Note: "struct" not necessary for C++
> dir_or_file *parent;
> int numChildren;
> // pointer to first element of array of size numChildren
> dir_or_file *child;
> CString name;
>
> // maybe some flag here indicating whether or not it is a file or a
> directory
> };


Actually that would mix path and filename which is not optimum for my
problem - but I can understand why my proposed structures seem weird to
the uninvited.
Basically what I do is to collect ALL filenames and its size in a list.
The list uses insertion sort so that after scanning, I will be able to
determine which files are *exactly* the same size. When I have similar
sized files I do a binary compare (or CRC not sure yet) to verify they
are identical. When dublets are found I will delete all but one copy and
create hard links for the remaining copies. But to be able to delete
(and know where to point the hard link at) I need the path part of this
list item - and that's where the tree comes in - the list item holds a
pointer to a node in my directory tree.

Its all part pf a space saver experiment of mine.

> Your mistake is that you used malloc. The object to which "root" points did
> not have its constructor executed before you attempted to assign the value
> "Test" to one of its members. It it important to execute constructors
> before attempting to use the object, so you should have instead used "new":
>
> root = new tree();
>
> This will cause the (compiler-generated) constructor for the object to be
> executed. Now, the object will be properly initialized and the assignment
> should not cause any problems.


Indeed that did the trick. I will continue my experiment then, thanks
for the feedback!

Casper

 
Reply With Quote
 
Bill Thompson
Guest
Posts: n/a
 
      08-13-2004
"Casper" <(E-Mail Removed)> wrote in message
news0XSc.18846$(E-Mail Removed).. .
> David Hilsee wrote:
> > This doesn't make much sense to me. Perhaps you meant something like

this:
> >
> > struct dir_or_file {
> > // Note: "struct" not necessary for C++
> > dir_or_file *parent;
> > int numChildren;
> > // pointer to first element of array of size numChildren
> > dir_or_file *child;
> > CString name;
> >
> > // maybe some flag here indicating whether or not it is a file or a
> > directory
> > };

>
> Actually that would mix path and filename which is not optimum for my
> problem - but I can understand why my proposed structures seem weird to
> the uninvited.
> Basically what I do is to collect ALL filenames and its size in a list.
> The list uses insertion sort so that after scanning, I will be able to
> determine which files are *exactly* the same size. When I have similar
> sized files I do a binary compare (or CRC not sure yet) to verify they
> are identical. When dublets are found I will delete all but one copy and
> create hard links for the remaining copies. But to be able to delete
> (and know where to point the hard link at) I need the path part of this
> list item - and that's where the tree comes in - the list item holds a
> pointer to a node in my directory tree.
>
> Its all part pf a space saver experiment of mine.
>
> > Your mistake is that you used malloc. The object to which "root" points

did
> > not have its constructor executed before you attempted to assign the

value
> > "Test" to one of its members. It it important to execute constructors
> > before attempting to use the object, so you should have instead used

"new":
> >
> > root = new tree();
> >
> > This will cause the (compiler-generated) constructor for the object to

be
> > executed. Now, the object will be properly initialized and the

assignment
> > should not cause any problems.

>
> Indeed that did the trick. I will continue my experiment then, thanks
> for the feedback!
>
> Casper
>


Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or an
array (insertion sort); the tree should be reasonably well balanced.


 
Reply With Quote
 
David Fisher
Guest
Posts: n/a
 
      08-13-2004
"Bill Thompson" wrote:

>> Basically what I do is to collect ALL filenames and its size in a list.
>> The list uses insertion sort so that after scanning, I will be able to
>> determine which files are *exactly* the same size. When I have similar
>> sized files I do a binary compare (or CRC not sure yet) to verify they
>> are identical. When dublets are found I will delete all but one copy and
>> create hard links for the remaining copies. But to be able to delete
>> (and know where to point the hard link at) I need the path part of this
>> list item - and that's where the tree comes in - the list item holds a
>> pointer to a node in my directory tree.

>
> Why not store the file size information in the tree as well, and sort by
> that? The performance should be much better than either a linked list or

an
> array (insertion sort); the tree should be reasonably well balanced.


Sounds like a job for a priority queue keyed on the file size ...

No need for post-sorting afterwards (and the hard link could even be created
during the creation of the priority queue, whenever a duplicate key is
found). Needs to handle multiple entries with the same file size, though -
maybe a priority queue of linked lists ? Pointing into another (tree)
structure which stores the filenames in a memory effecient way ?

http://www.dinkumware.com/manuals/re...priority_queue

David Fisher
HSA Systems


 
Reply With Quote
 
Casper
Guest
Posts: n/a
 
      08-13-2004

> Why not store the file size information in the tree as well, and sort by
> that? The performance should be much better than either a linked list or an
> array (insertion sort); the tree should be reasonably well balanced.
>

Sounds very very interesting! I am not 100% sure I understand how a
fixed/immutable nested tree structure can be sorted by a node data
member?! Would it work as a secondard access path to the structure? I
would love to see a prototype example of the struct you have in mind!

Sincerely,
Casper

 
Reply With Quote
 
Bill Thompson
Guest
Posts: n/a
 
      08-13-2004
"Casper" <(E-Mail Removed)> wrote in message
news:9a_Sc.25209$(E-Mail Removed).. .
>
> > Why not store the file size information in the tree as well, and sort by
> > that? The performance should be much better than either a linked list

or an
> > array (insertion sort); the tree should be reasonably well balanced.
> >

> Sounds very very interesting! I am not 100% sure I understand how a
> fixed/immutable nested tree structure can be sorted by a node data
> member?! Would it work as a secondard access path to the structure? I
> would love to see a prototype example of the struct you have in mind!
>
> Sincerely,
> Casper
>


My apologies for the poorly worded statement: when building the tree, use
the filesize as the sort field.

You can find C++ code for simple binary trees very easily on the internet.
Without looking too hard I found this link:
http://www.cs.bu.edu/teaching/cs112/...0/binary-tree/

The code would have to be modified to deal with duplicate file sizes.

It appears that you aren't particularly interested in sorting by the
filename. After building the tree, you can traverse it in order by file
size; as you traverse you can pull out sets of files with the same size and
deal with them as you see fit.

Another possibility is finding and using a database api. Or, easier still,
writing a program to translate the file system into a file that can be
imported into a database, processing the data, export a file back out, and
write another program to deal with the exported data.

As you've indicated, you could also build two trees, one sorted by name, the
other by size; all refering to the same bank of file data.



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
B+ Tree versus Ternary Search Tree Ramkumar Menon Java 2 08-16-2005 08:13 PM
B+ Tree versus Ternary Search Tree Ramkumar Menon Java 1 08-16-2005 09:46 AM
B+ Tree versus Ternary Search Tree Ramkumar Menon Java 0 08-16-2005 09:01 AM
B tree, B+ tree and B* tree Stub C Programming 3 11-12-2003 01:51 PM
Spanning Tree And Per Vlan Spanning Tree Amy L. Cisco 0 07-24-2003 10:01 PM



Advertisments