Salford Analytics and Data Mining Conference 2012

Insight For Data Enthusiasts • San Diego, CA • May 24-25
Training May 21-23 • Welcome Reception May 23

You are here:Home»Conference»Sessions»CART for Selecting Features in Biomedical Context

CART for Selecting Features in Biomedical Context

Among the original motivations for Classification and Regression Trees was work by Morgan, Sonquist, and others on “automatic interaction detection.” A CART model is a weighted sum of indicators of subsets of feature space determined by terminal nodes of a rooted binary tree. For real–valued features and ordinary splits, each terminal node corresponds to the indicator of an intersection of semi–infinite “intervals,” that is to say, the indicator of easy–to–understand interactions of subsets of features. Thus, it is plausible that CART can be used in selecting interactions of features to describe biomedical phenomena directly or for input to subsequent analyses. One application to be reported concerns findings for non–sentinel lymph nodes following mammography in which a dye was injected to aid imaging, and malignancy was found in a sentinel node (or nodes).

Another application concerns discovering combinations of genes and the environment associated with “insulin resistance” in Chinese women. The problem arose in a study of “candidate genes” and environmental features as together they predispose to hypertension, with “insulin resistance” an “intermediate (pre–disposing) phenotype .” These two studies were about classification. A third application was an attempt to predict mean arterial blood pressure in certain individuals on the basis of environmental features and hundreds of thousands of variable sites (SNPs) spaced throughout the human genome. This application involves regression. Research cited involved contributions by many people.

*This presentation was given by Richard Olshen at the 2009 Salford Data Mining Conference.