Velocity Reviews > C++ > Re: Sorting vectors (multi-dimensional array)

# Re: Sorting vectors (multi-dimensional array)

jason.cipriani@gmail.com
Guest
Posts: n/a

 01-04-2009
On Jan 4, 4:11*am, "(2b|!2b)==?" <(E-Mail Removed)> wrote:
> I have a multi-dimentional array (for the purposes of this post - lets
> assume the dimensionality = 2 (to keep things simple). The array
> (implemented by vectors), consists of "columns", where each column is of
> a particular data type (similar to a traditional database table).
>
> I want to be able to sort the array by its columns (like in Excel or in
> a databse). I need some pseudocode or code sippet that will show how I
> can sort the table (i.e. vector of columns) by specified columns.
>
> Ideally, the algorithm will be generic enough to apply to an
> N-dimensional table.

The easiest thing to do is use std::sort:

http://www.sgi.com/tech/stl/sort.html

And supply it with a predicate function that compares the columns of
interest.

A general predicate for multiple fields is usually something like (say
you want to sort by field1, then field2, then ..., then fieldN):

bool isbefore (const Thing &a, const Thing &b) {

if (a.field1 < b.field1)
return true;
else if (b.field1 < a.field1)
return false;
else if (a.field2 < b.field2)
return true;
else if (b.field2 < a.field2)
return true;
// ... and so on ...
else
return a.fieldN < b.fieldN;

}

Jason

jason.cipriani@gmail.com
Guest
Posts: n/a

 01-04-2009
On Jan 4, 4:40*am, "(2b|!2b)==?" <(E-Mail Removed)> wrote:
> (E-Mail Removed) wrote:
> > On Jan 4, 4:11 am, "(2b|!2b)==?" <(E-Mail Removed)> wrote:
> >> I have a multi-dimentional array (for the purposes of this post - lets
> >> assume the dimensionality = 2 (to keep things simple). The array
> >> (implemented by vectors), consists of "columns", where each column is of
> >> a particular data type (similar to a traditional database table).

>
> >> I want to be able to sort the array by its columns (like in Excel or in
> >> a databse). I need some pseudocode or code sippet that will show how I
> >> can sort the table (i.e. vector of columns) by specified columns.

>
> >> Ideally, the algorithm will be generic enough to apply to an
> >> N-dimensional table.

>
> > The easiest thing to do is use std::sort:

>
> > *http://www.sgi.com/tech/stl/sort.html

>
> > And supply it with a predicate function that compares the columns of
> > interest.

>
> > A general predicate for multiple fields is usually something like (say
> > you want to sort by field1, then field2, then ..., then fieldN):

>
> > bool isbefore (const Thing &a, const Thing &b) {

>
> > * if (a.field1 < b.field1)
> > * * return true;
> > * else if (b.field1 < a.field1)
> > * * return false;
> > * else if (a.field2 < b.field2)
> > * * return true;
> > * else if (b.field2 < a.field2)
> > * * return true;
> > * // ... and so on ...
> > * else
> > * * return a.fieldN < b.fieldN;

>
> > }

>
> > Jason

>
> Hi Jason, could you please elaborate some more?. I mean, suppose I have
> this:
>
> class DataColumn
> {
> public:
> * * std::string name;
> * * std::vector<Variant> m_rows ;
>
> };
>
> class MemoryTable
> {
> public:
> * * *void SortByColumns(const std::vector<std::string>& colnames);
>
> private:
> * * std::vector<DataColumn> m_cols ;
>
> };
>
> How would I go about implementing SortByColumns(), using std::sort and a
> predicate function?

You're going to have a lot of problems if you store data by column.
Store it by rows instead, and the predicate function will be easy to
implement; note that you will have to store the column names
separately:

class DataRow {
public:
std::vector<Variant> m_cols;
};

class MemoryTable {
public:
void SortByColumns(const std::vector<std::string>& colnames);
private:
// Note: let N be number of columns:
// m_colnames has exactly N elements.
// All m_rows[].m_cols have exactly N elements.
std::vector<std::string> m_colnames;
std::vector<DataRow> m_rows;
};

You can use a functor for the predicate, so you might do something
like this (error checking and most code let out for clarity):

class ColSortPredicate {
public:
ColSortPredicate (const std::vector<int> &col_indexes);
bool operator () (const DataRow &a, const DataRow &b);
private:
std::vector<int> m_col_indexes;
};

void MemoryTable::SortByColumns (const vector<string> &names) {

vector<int> indexes = MapNamesToIndexes(names);
std::sort(m_rows.begin(), m_rows.end(), ColSortPredicate(indexes));

}

bool ColSortPredicate:perator () (const DataRow &a, const DataRow
&b) {

// compare columns of 'a' and 'b' in order specified
// by m_col_indexes.
return result_of_that_comparison;

}

The reason for using column indexes was mostly an optimization,
there's no sense search for the columns by name every time you do a
comparison while sorting -- mapping column names to indexes into
DataRow::m_cols can be done once before sorting assuming it remains
constant during the sort.

Jason

Marcel Müller
Guest
Posts: n/a

 01-04-2009
Hi,

(2b|!2b)==? wrote:
> class DataColumn
> {
> public:
> std::string name;
> std::vector<Variant> m_rows ;
> };

then you have a data model that makes things as complicated as possible.
You collected objects of multiple rows in one container, althoungt they
are obviously unrelated. But you want to sort the /rows/ with /related/
objects, which are distributed over several containers.
From the logical point of view you want to transpose the data first,
then sort on a certain criterion and then transpose back.

I strongly recommend to reconsider the data model and put objects that
are related together in rows, not the other way around. Every abstract
data implementation operates this way.

However, if you couldn't resist, create an integer array with 1..n and
sort it by the desired column criterion. You can use this array as index
table, when enumerating the rows.

Marcel