“year”, “model”, “price”, “mileage”, “color”, “transmission”
Tabular data may be character, numeric, or boolean. The data set may be loaded in any means desired into a two-dimensional array (with the first row consisting of the column headers) using the fromArray method of the Table class.
var types = [ __table.NUMERIC, __table.CHARACTER, __table.NUMERIC, __table.NUMERIC, __table.CHARACTER, __table.CHARACTER ]; __table.fromArray( data, types );
In addition to a variety of common statistics on columns of data, quantiles may also be directly computed. Quintiles, for example, are computed by
var q = __table.get_quantiles("price", 0.2);
One-way tables (that count number of occurrences of unique items in a data column) are very easy. A return Object is provided that can be converted into a 2D table for convenience,
var obj = __table.oneWayTable("year"); var tbl = __table.__tblToArray(obj);
from which the output is
If you prefer output in percentages,
obj = __table.oneWayTable("color", true);
One-way tables are cool, but the real fun is cross-tab analysis. Two methods are provided for this type of analysis and both produce the same output. The crossTable() method allows one column to be analyzed vs. another (both are character data). The dependent category may be further organized into groups. For example, consider the example in Lantz where car model is analyzed vs. groups of colors. In the book, colors were separated into conservative and non-conservative colors to determine if there was a possible correlation between model of vehicle and color selected. In JS Math Toolkit, this same analysis can be performed with a single method call,
var output = __table.crossTable("model", "color", ["Black Silver White Gray", "Blue Gold Green Red Yellow"], ["Simple-Color", "Bold-Color"] );
The entire collection of colors was divided into two groups and the cross-table analysis was done by group, not by unique color. The output column names are provided at the end of the argument list.
The output consists of four properties:
chi2 – Total table chi-squared value
df – Table degrees of freedom
q – Q-value from chi-squared or probability that table results were obtained by chance
table – Output table with cell count, row and column percentages, and percentage of cell count vs. entire table count.
The cell chi-squared may be added in the future as part of the output.
In contrast, the crossTabulation() method performs a traditional cross-table or contingency table analysis. As an example, consider an example that can be found online, where city of residence is studied vs. favorite baseball team. Cell counts indicate responses from a survey, for example.
City Blue Jays Red Socks Yankees
Boston 11 33 7
Montreal 23 14 9
Montpellier 26 60 30
A table is created with column labels “City”, “Blue Jays”, “Red Socks”, Yankees”. The remainder of the data is supplied to the table via the fromArray() method. Since the data is already organized for a full cross-table analysis, the method call is very simple,
output = __table.crossTabulation();
Partial output is:
CrossTabulation of city vs. baseball teams Table degrees of freedom: 4 Total chi-squared: 19.35140903152151 Q-value: 0.0006703343353674507
along with the table of cell summaries.
From the chi-squared analysis, there is less than a one in one thousand chance the that table results were obtained by chance, which indicates the relationship between city and favorite baseball team warrants further study.
Update: The Table class now includes methods to auto-normalize or z-score columns and split the internal table into 2D arrays for training and test data sets.