To accommodate different datasets sizes, CART is available in several different memory sizes. The standard memory version of CART for Windows is compiled for a machine with at least 64MB of memory (RAM), and can analyze more than 4.5 million learning sample observations. The table below shows the approximate number of learn sample observations that can be used in an analysis for a given CART version size.
Formerly, CART was compiled into distinct memory versions (64MB, 128MB, etc). A user's license determined which memory version was delivered. Thus, the license was tied to the amount of workspace inherent in the program and (loosely) tied to the amount of data, type of data (categorical vs. continuous), size of final tree, etc., the user could analyze.
Licensing and workspace are handled differently in CART 5. A user's license sets a limit on the amount of learn sample data that can be analyzed. The learn sample is the data used to grow the maximal tree. Note that there is no limit to the number of test sample data points that may be analyzed.
For example, suppose our 32MB version set a learn sample limitation of 8 MB. Each data point occupies 4 bytes. Therefore, a 8MB license will allow up to 8 * 1024 * 1024 / 4 = 2,097,152 learn sample data points to be analyzed. A data point is represented by a 1-variable by- 1-observation (1-row by- 1-column).
In general, we feel that the analysis workspace provided to build the tree will be adequate for "most" modeling scenarios. However, if the user models a large number of high level categorical predictors, or is using a high level categorical target, may be encountered workspace limitations that will not allow the entire learn sample to be used. In these special cases the user will have to upgrade to a larger memory version.
The following is a table that describes the current set of "sizes" available. Please note that the minimum required RAM is not the same as the learn sample limitation. If you have any questions regarding the following information, please contact a sales representative.
- Size = minimum recommended physical memory (RAM) in MB.
- Data Limit MB = Licensed learn sample data size in MB (1 MB = 1,048,576 bytes)
- Data Limit # of values = Licensed # of learn sample values (rows by columns)
- SP cells = max number of 4 byte (Single Precision) workspace elements the program can use.
| Size (MB) | Data Limit (MB) | Data Limit # of values | SP Cells (4-byte) |
| 32 | 8 | 2,097,152 | 10,000,000 |
| 64 | 18 | 4,718,592 | 13,500,000 |
| 128 | 45 | 11,796,480 | 33,750,000 |
| 256 | 100 | 26,214,400 | 75,000,000 |
| 512 | 200 | 52,428,800 | 150,000,000 |
| 1024 | 400 | 104,857,600 | 250,000,000 |
| 2048 | 800 | 209,715,200 | 356,000,000 |
* Custom compiles up to 32 gigs available.
The number of variables CART can handle can be significantly increased if node sub-sampling is used when searching for the optimal split. In node sub-sampling, all the data are used to grow the tree, but only a sub-sample of the data is actually searched in the largest nodes near the top of the tree. Judiciously chosen sub-sampling can sometimes double the number of variables CART can search while growing the tree on all the data.

