Salford Analytics and Data Mining Conference 2012

Insight For Data Enthusiasts • San Diego, CA • May 24-25
Training May 21-23 • Welcome Reception May 23

You are here:Home»Training»Advances in Data Mining (TreeNet®, GPS, Rulefit and Interaction Detection)

Advances in Data Mining (TreeNet®, GPS, Rulefit and Interaction Detection)

Day 3

9am – 10am

Introduction to Boosting Using Decision Trees

TreeNet stochastic gradient boosting is Stanford University Professor Jerome Friedman's latest advance in data mining methodology. In TreeNet, classification and regression models are built up gradually through a potentially large collection of small trees, each of which improves on its predecessors through an error-correcting strategy. Although each tree may have only one split, the full model can be extraordinarily accurate. The final model takes the form of a series expansion in which every term is a (small) tree.

TreeNet improves over conventional boosting in that:

  • It is relatively impervious to errors in the target, such as mislabeling
  • It is strongly resistant to overfitting
  • It generalized well to future data
10:00am – 10:15amBreak
10:15am – 11: 15am

In-Depth Discussion of TreeNet Theory

Understand the theory of TreeNet with discussions focused on:

  • Least squares regression
  • Robust regression
  • Logistic regression
  • Variable importance
  • Classification
  • Poisson regression and survival modeling
11:15am – 11:30amBreak
11:30am – 12:30pm

TreeNet in Action

Explore SPM's unique modeling automation capabilities while running multiple data sets on both GUI and Non-GUI interfaces, and the advantages and disadvantages to both. We'll introduce the CART component of SPM, and explain:

  • Building models in SPM
  • Setting control parameters
  • Interpreting output
  • Variable importance
  • Introduction to battery shaving in SPM using TreeNet
12:30pm – 1:30pmLunch
1: 30pm – 2:30pm

Interpreting TreeNet Models and Interaction Detection

Interaction detection is the detection and reporting component of TreeNet using Interaction Control Language (ICL).

You will understand:

  • How to shape the structure of interactions
  • How to impose different interactions in TreeNet models
  • Dependency plots and how they are utilized
2:30pm – 2:45pmBreak
2:45pm – 3:45pm

Modern Approaches To Regularized Regression

Generalized Path Seeker (GPS) is the most recent advance in regularized regression. This technology offers high-speed LASSO-style regression for extreme data set configurations with upwards of 100,000 predictors and possibly very few rows. Such data sets are commonplace in gene research and it is both supremely fast and efficient.

  • Application using examples in data sets
  • How GPS is implemented in SPM
  • Command line operation of SPM
  • Comments on parallel processing

Linking Engines

Explore how to:

  • Combine TreeNet’s power of transformation and variable selection with GPS
  • Identify the most influential trees in GPS with ISLE (Importance Samples Learning Ensembles)
  • Use Rulefit to identify the most influential nodes and rules in a TreeNet model
3:45pm – 4pmBreak
4pm - 5pm

Loose Ends and Application

Q&A with the experts for further discussion and apply SPM to your own data sets.