Traditional interpretation of satisfaction surveys often includes building linear or logistic regression models. In order to overcome data problems, including multi-collinearity and missing data, significant data cleansing and feature creation is required. Once one accomplishes these steps, there is still the issue of interpreting the final results. Logistic regression and even linear regression models are notoriously difficult to interpret except for indicating general trends: they do not tell us why an individual is predicted to have a particular level of satisfaction. Building decision trees using CART on the other hand is a more effective approach because it handles the problem of correlated variables and missing data automatically and provides a transparent interpretation of the behavior of individuals.
In this application, several thousand surveys from the YMCA were analyzed with the purpose of understanding which members were most satisfied with the club, were most likely to renew their membership, were most likely to recommend the club to a friend. The analysis determined which YMCA branch attributes were most associated with each of the target variables of interest.