Other Methods | The Gmax

STRUCTURAL ASSUMPTIONS OF OTHER METHODS

Regression based methods, [multiple, stepwise, ~ Logistic, ~Probit, et al] assume that:

a) .. the model structure is a linear weighted sum, which is extremely doubtful.
b) .. the variables are [statistically] well behaved, which is unlikely and
c) .. the errors are Gaussian, which is seldom achievable without including irrelevant variables.

Many packages offer regression based methods. However, they should only be used by professional statisticians. Too many non-statisticians mistake correlation for cause.

Artificial Neural Networks assume that:

a) .. the user is able to specify the architecture and squashing functions appropriate to the domain.
b) .. a connectionist approach contains the answer.

The above assumptions might be true:

c) .. if the net is trained on a non-paremetric cost function rather than back-propogated squared erroor.
d) .. if it is possible to vary the squashing function between layers, which is very complicated.
e) .. if the resulting matrix of weights were meaningful.
f) .. if the trained net predicted well outside the envelope described by the training data.
. . . . . . . . . . . . . which are all rather doubtful.

Genetic Algorithms assume that:

a) .. the user already knows which variables go into the solution rule, but
b) .. does not know the proportions or the sequence of their inclusion.

The model is a repeated application of a fixed sub-structure (chromosome) that must be predetermined by the user. The GA merely evolves the coefficients.

Not particularly helpful if there are thousands of variables. Better to narrow the field first, perhaps with PCA.

SVM, CART and MARS assume that:

.. the model structure can be reduced to a collection of linear sub-spaces.

The methods are effectively equivalent to a collection mini-regression based models and subject to the same caveats as regression above.

Clever but cumbersome. Statisticians only.

Partitioning methods such as CHAID and Random Forests assume that:

.. hypercubes of sufficient fineness can describe homogenious groups.

Since this is always true in-extemis, (ie one observation per cube), the trick is to find a minimum spanning tree that actually adds information.

CHAID is suboptimal because rectilinear hyperspace is a doubtfull analogue of real world data, which is why splitting models tend to overparameterize and employ too many variables.

With Random Forests there is no single model but rather a collection of imperfect models whose average or modal output is judged to be a prediction. Clever but cumbersome.

TheGmax

There are no assumptions and the structure is free to form itself as needed.