What is the best way to store a table in C++

Posted by Topo on Programmers See other posts from Programmers or by Topo
Published on 2013-02-23T06:31:46Z Indexed on 2013/10/28 10:01 UTC
Read the original article Hit count: 174

Filed under:
|

I'm programming a decision tree in C++ using a slightly modified version of the C4.5 algorithm. Each node represents an attribute or a column of your data set and it has a children per possible value of the attribute.

My problem is how to store the training data set having in mind that I have to use a subset for each node so I need a quick way to only select a subset of rows and columns.

The main goal is to do it in the most memory and time efficient possible (in that order of priority).

The best way I have thought of is to have an array of arrays (or std::vector), or something like that, and for each node have a list (array, vector, etc) or something with the column,line(probably a tuple) pairs that are valid for that node.

I now there should be a better way to do this, any suggestions?

UPDATE: What I need is something like this:

In the beginning I have this data:

Paris    4    5.0    True
New York 7    1.3    True
Tokio    2    9.1    False
Paris    9    6.8    True
Tokio    0    8.4    False

But for the second node I just need this data:

Paris    4    5.0
New York 7    1.3
Paris    9    6.8

And for the third node:

Tokio    2    9.1
Tokio    0    8.4

But with a table of millions of records with up to hundreds of columns.

What I have in mind is keep all the data in a matrix, and then for each node keep the info of the current columns and rows. Something like this:

Paris    4    5.0    True
New York 7    1.3    True
Tokio    2    9.1    False
Paris    9    6.8    True
Tokio    0    8.4    False

Node 2:

columns = [0,1,2]
rows = [0,1,3]

Node 3:

columns = [0,1,2]
rows = [2,4]

This way on the worst case scenario I just have to waste

size_of(int) * (number_of_columns + number_of_rows) * node

That is a lot less than having an independent data matrix for each node.

© Programmers or respective owner

Related posts about c++

Related posts about data-structures