Source can be viewed on GitHub

Logistic modeling

First attempt

Begin by loading the data used for the visualizations in part 1.

#### Run Rscript0 to load useful packages and functions ----
source("Rscript0 - libraries.R")
source("f.Wrangle.R")

#### Model data cleanup ----
source("Rscript1a - modeling data appends.R")

Recall:

The existing model (“JMod”) has two components. First, the probability a solicitation $S_{i}$ comes in before the end of the fiscal year is a function of the current stage progress $c$ and time until the end of the year $t$, $P(S_{i}=1)=f(c_{i},t)$. The expected FRP in the current fiscal year is this probability times the expected amount, $E(FRP)=P(S_{i}=1)E(S_{i})$.

Focus on the probability model. I assume that there should also be some effect due to the average close rate, season (e.g. people give more in December), planned ask amount, actual ask amount, and potentially interactions. Something like this:

\[P(S_{ij}=1)=f(c_{i},t,\mu,t_{i}^{*},a_{j}^{*},a_{j})\]

Start by trying to directly predict the booked in FY probabilities. The simplest approach is to treat everything that isn’t continuous as a factor and chuck it into a logistic regression model.

\[logit(p)=log\Big(\frac{p}{1-p}\Big)=\eta\] \[E(\eta_{i})=c_{i}+t_{i}^{*}\]

In other words, the logit of the expected close rate is dependent on some intercept $c_{i}$ due to stage, and some month effect $t_{i}^{*}$.

# Load some functions I wrote to calculate model diagnostics
source("f.Diagnostics.R")

f.Diagnostics.R

## Model each stage by when it was asked 
glm.pln.1 <- glm(FY.Plan.Book ~ as.factor(month(mdat$Planning.Dt)), data=mdat, family=binomial())
glm.clr.1 <- glm(FY.Clear.Book ~ as.factor(month(mdat$Planning.Dt)), data=mdat, family=binomial())
glm.ask.1 <- glm(FY.Ask.Book ~ as.factor(month(mdat$Planning.Dt)), data=mdat, family=binomial())
glm.ora.1 <- glm(FY.Oral.Book ~ as.factor(month(mdat$Planning.Dt)), data=mdat, family=binomial())
modnames <- c("Plan","Clear","Ask","Oral")
# Compare each of the models
kable(CompareDeviances(models=list(glm.pln.1, glm.clr.1, glm.ask.1, glm.ora.1), modnames=modnames, debug=F), digits=3)

	null.dev	dev	diff	rel.diff	df.p	df.q	p
Plan	1792.257	1609.836	182.421	0.102	1308	1297	0.000
Clear	1647.203	1591.628	55.576	0.034	1192	1181	0.000
Ask	1551.004	1511.474	39.529	0.025	1136	1125	0.000
Oral	297.946	276.139	21.807	0.073	361	350	0.026

Month-by-month coefficients result in an unambiguously better model compared to just an intercept per stage.

Why four separate models rather than a factor for stage? Note that the goal is to determine whether hitting a certain stage results in a solicitation closing that fiscal year; the target $Y$ changes from stage to stage, so the model needs to change as well.

JMod has the following structure. Cell entries are probability to close given that a solicitation is in the indicated stage and month, with an expected close date that fiscal year:

Stage	July-Feb	Mar-Apr	May	June
Plan	1/6	1/8	0	0
Clear	1/3	1/6	1/8	0
Ask	2/3	1/3	1/6	1/8
Oral	2/3	1/3	1/6	1/8
Paperwork	1	1	1	1

Here are the confusion matrices, with a classification threshold of $p=.5$:

# Count tables
ConfusionMatrix(list(glm.pln.1, glm.clr.1, glm.ask.1, glm.ora.1), modnames=modnames)

## $Plan
##        pred
## truth   FALSE TRUE  Sum
##   FALSE   501  239  740
##   TRUE    205  364  569
##   Sum     706  603 1309
## 
## $Clear
##        pred
## truth   FALSE TRUE  Sum
##   FALSE   256  296  552
##   TRUE    191  450  641
##   Sum     447  746 1193
## 
## $Ask
##        pred
## truth   FALSE TRUE  Sum
##   FALSE   180  304  484
##   TRUE    165  488  653
##   Sum     345  792 1137
## 
## $Oral
##        pred
## truth   TRUE Sum
##   FALSE   52  52
##   TRUE   310 310
##   Sum    362 362

# Proportion tables
ConfusionMatrix(list(glm.pln.1, glm.clr.1, glm.ask.1, glm.ora.1), modnames=modnames, counts=F)

## $Plan
##        pred
## truth   FALSE TRUE  Sum
##   FALSE  0.38 0.18 0.57
##   TRUE   0.16 0.28 0.43
##   Sum    0.54 0.46 1.00
## 
## $Clear
##        pred
## truth   FALSE TRUE  Sum
##   FALSE  0.21 0.25 0.46
##   TRUE   0.16 0.38 0.54
##   Sum    0.37 0.63 1.00
## 
## $Ask
##        pred
## truth   FALSE TRUE  Sum
##   FALSE  0.16 0.27 0.43
##   TRUE   0.15 0.43 0.57
##   Sum    0.30 0.70 1.00
## 
## $Oral
##        pred
## truth   TRUE  Sum
##   FALSE 0.14 0.14
##   TRUE  0.86 0.86
##   Sum   1.00 1.00

Compare errors for JMod versus the month intercepts logistic model by stage. My error function is the $L_{1}$ norm:

\[L_{1}(\epsilon)=\sum_{i=1}^{n}\left|\epsilon_{i}\right| = \sum_{i=1}^{n}\left|\hat{p}_{i}-p_{i}\right|\]

I also included the squared $L_{2}$ norm (equivalent to $RSS$ in OLS regression) for comparison, but it seems like an odd metric for range-restricted data.

## Load functions for JMod
source("f.JMod.R")

f.JMod.R

##        JMod      glm
## L1 525.5833 557.8019
## L2 427.2917 278.9010

JMod wins on the sum of absolute residuals, but it is very negatively biased, to the tune of:

## [1] -0.3449198

##        JMod      glm
## L1 545.8333 566.2442
## L2 367.6667 283.1221

A tiny bit closer on the sum of absolute residuals, but JMod is still negatively biased, around:

## [1] -0.3174071

##        JMod      glm
## L1 454.5833 537.3278
## L2 267.2188 268.6639

Clear win for JMod; bias is only:

## [1] -0.1302404

##         JMod      glm
## L1 157.12500 83.46150
## L2  92.75174 41.73075

JMod bias is:

## [1] -0.3942219

Basically, what this comes down to is how concerned are we about bias? The negative bias is due to overaggressive discounting as the end of the fiscal year (June 30) approaches.

One more idea for model comparison: why not look at the classification tables directly? Since JMod has hard cutoffs it doesn’t really make sense to use a threshold. Instead, add the predicted probabilities to the corresponding cells of a confusion matrix for each observation.

## Confusion matrix summing probabilities, rather than counting classifications
# GLM for plan stage
ConfusionMatrixWeighted(preds=fitted(glm.pln.1), truth=model.frame(glm.pln.1)[,1], dimnames=c("truth", "pred"))

##        pred
## truth   FALSE  TRUE
##   FALSE 461.1 278.9
##   TRUE  278.9 290.1

# JMod for plan stage
(jmod.1$errtable <- ConfusionMatrixWeighted(preds=jmod.1$prob, truth=mdat$FY.Plan.Book, dimnames=c("truth", "pred")))

##        pred
## truth    FALSE   TRUE
##   FALSE 702.96  37.04
##   TRUE  488.54  80.46

Note that the antidiagonal terms sum to the corresponding $L_{1}$ norm, from the “Plan JMod versus glm residual error and bias” table above.

Planned solicitations – while the glm is well balanced, JMod overwhelmingly classifies planned solicitations as unlikely to come in. This could be because JMod is tuned toward directly predicting actual amounts, and uses its weights to discount MG rather than worrying about the close probability of every AF gift.

# GLM for clear stage
ConfusionMatrixWeighted(preds=fitted(glm.clr.1), truth=model.frame(glm.clr.1)[,1], dimnames=c("truth", "pred"))

##        pred
## truth    FALSE   TRUE
##   FALSE 268.88 283.12
##   TRUE  283.12 357.88

# JMod for clear stage
(jmod.2$errtable <- ConfusionMatrixWeighted(preds=na.omit(jmod.2$prob), truth=na.omit(mdat$FY.Clear.Book), dimnames=c("truth", "pred")))

##        pred
## truth    FALSE   TRUE
##   FALSE 468.42  83.58
##   TRUE  462.25 178.75

Same story – for cleared solicitations JMod overwhelmingly categorizes them as unlikely to close.

# GLM for ask stage
ConfusionMatrixWeighted(preds=fitted(glm.ask.1), truth=model.frame(glm.ask.1)[,1], dimnames=c("truth", "pred"))

##        pred
## truth    FALSE   TRUE
##   FALSE 215.34 268.66
##   TRUE  268.66 384.34

# JMod for ask stage
(jmod.3$errtable <- ConfusionMatrixWeighted(preds=na.omit(jmod.3$prob), truth=na.omit(mdat$FY.Ask.Book), dimnames=c("truth", "pred")))

##        pred
## truth    FALSE   TRUE
##   FALSE 330.75 153.25
##   TRUE  301.33 351.67

Same story for asked, but look how much better JMod does in the True-True quadrant, bottom right.

# GLM for oral stage
ConfusionMatrixWeighted(preds=fitted(glm.ora.1), truth=model.frame(glm.ora.1)[,1], dimnames=c("truth", "pred"))

##        pred
## truth    FALSE   TRUE
##   FALSE  10.27  41.73
##   TRUE   41.73 268.27

# JMod for oral stage
(jmod.4$errtable <- ConfusionMatrixWeighted(preds=na.omit(jmod.4$prob), truth=na.omit(mdat$FY.Oral.Book), dimnames=c("truth", "pred")))

##        pred
## truth    FALSE   TRUE
##   FALSE  44.79   7.21
##   TRUE  149.92 160.08

Oral is different; JMod is much too pessimistic. Oral pledges actually behave closer to paperwork in house than to ask, and should mainly be assigned $p=0$ when scheduled to be closed in a future fiscal year.

Conclusions

Oral stage really needs its own weights.
The most important variable not currently captured by glm is whether the expected date is in a future year.

Variable selection

Definitely include an indicator for expected date in the future.

From part 1, some interesting variables include expected amount, and whether ask amount/planned ask is greater or less than 1.

Potentially interesting derived variables include the number of days left in the year, the number of days until expected close, and the amount of time that’s passed since the previous stage.

Some other thoughts.

Is logistic regression really suitable (overdispersion)?
Should it be restricted to MG-range gifts, $[\$25\text{k},\$5\text{M})$?
What nonparametric methods are worth trying?

First, let’s create those derived variables.

#### Derived variables ----
source("Rscript1b - derived variables.R")

Rscript1b - derived variables.R

Think about #3 first. I’m a fan of classification trees, and that general method is the closest to the existing approach. It would also be interesting to see which of the proposed variables end up being chosen.

Try the random forest approach to select variables. First split up the variables by which stages it’s suitable to use them for prediction.

## Identify which dependent variables contain information specific to the various stages
## Obviously shouldn't include after-the-fact information (FY. objects, actual amount, etc.)
head.all <- names(mderived)[names(mderived) %in% c("Solicitation.Type.Desc", "Planned.Fiscal.Mo", "Planned.Amt", "Expected.Fiscal.Mo", "Expected.Amt", "Planned.Ask.Band")]
# Look for "plan" and "Plan" in header ([pP] = both cases), but ignore the FY. outcome objects
head.plan <- names(mderived)[setdiff(grep("[pP]lan", names(mderived)), grep("FY.", names(mderived)))]
head.plan <- setdiff(head.plan, c(head.all, "plan2clear")) #take out plan2clear which wouldn't have been known
# Look for "Clear" in header
head.clear <- names(mderived)[setdiff(grep("[cC]lear", names(mderived)), grep("FY.", names(mderived)))]
head.clear <- setdiff(head.clear, c(head.all, "clear2ask")) #take out clear2ask which wouldn't have been known
# Look for "Ask" in header
head.ask <- names(mderived)[setdiff(grep("[aA]sk", names(mderived)), grep("FY.", names(mderived)))]
head.ask <- setdiff(head.ask, c(head.all, "ask2oral")) #take out ask2oral which wouldn't have been known
head.ask <- c(head.ask, "plan2clear") #add fields that would have been known
# Look for "Oral" in header
head.oral <- names(mderived)[setdiff(grep("[oO]ral", names(mderived)), grep("FY.", names(mderived)))]
head.oral <- setdiff(head.oral, head.all)
head.oral <- c(head.oral, "plan2clear", "clear2ask", "Ask.Amt", "Ask.Band", "Ask.Amt.Over.Pln.Amt", "Ask.Amt.Over.Pln.Fac") #add fields that would've been known

Now grow a random forest for each stage.

library(randomForest)
set.seed(15584)
# Planned stage forest
treedat <- na.omit(mderived[,c("FY.Plan.Book", head.all, head.plan)])
rf.plan <- randomForest(factor(FY.Plan.Book) ~ ., data=treedat, importance=T)
# Clear stage forest
treedat <- na.omit(mderived[,c("FY.Clear.Book", head.all, head.clear)])
rf.clear <- randomForest(factor(FY.Clear.Book) ~ ., data=treedat, importance=T)
# Ask stage forest
treedat <- na.omit(mderived[,c("FY.Ask.Book", head.all, head.ask)])
rf.ask <- randomForest(factor(FY.Ask.Book) ~ ., data=treedat, importance=T)
# Oral stage forest
treedat <- na.omit(mderived[,c("FY.Oral.Book", head.all, head.oral)])
rf.oral <- randomForest(factor(FY.Oral.Book) ~ ., data=treedat, importance=T)

The green lines are the out-of-bag error (similar to cross-validation). (Note that PlotForest() is included in f.Diagnostics.R.)

PlotForest(rf.plan, title="Plan")

PlotForest(rf.clear, title="Clear")

PlotForest(rf.ask, title="Ask")

PlotForest(rf.oral, title="Oral")

Finally, the GLM errors from the confusion matrix, threshold $p=0.5$, versus the JMod errors from the weighted confusion matrix, $\frac{1}{n}\sum_{i=1}^{n}\mathbf{1}\{x_{i}\text{ misclassified}\}$, versus the out-of-bag decision forest errors:

Model	Plan	Clear	Ask	Oral
GLM	0.34	0.41	0.42	0.14
JMod	0.40	0.46	0.40	0.43
Random forest	0.18	0.23	0.23	0.09

Great out-of-bag error rates across the board. The random forests are definitely doing something right. Which variables do they find the most helpful?

The Stage.Future, stage2EOFY, stage.fiscal.mo, and stage2expect variables show up as a distinctive double peak pattern per stage. Note that they are all correlated – because of how the forests are constructed (random branching) that means they all show up as fairly important.
Out of the four of them I would personally ditch stage2EOFY and consider dropping one of the stage.fiscal.mo or stage2expect variables as well.
Expected.Amt and Planned.Amt are good predictors before the Oral stage; Planned.Ask.Band is worse across the board.
Expected.Fiscal.Mo is important across the board, while Planned.Fiscal.Mo is useful before the Oral stage.
Similarly, Solicitation.Type.Desc is a decent predictor before
The various times between stages e.g. plan2clear are not good predictors.
Ask.Amt is a good predictor in the Ask stage, certainly much better than Ask.Band; the two Ask.Amt.Over. fractions are useless.

Re-fit the random forests, cutting out the clearly unimportant variables.

## Update the explanatory variables
exclude <- c("plan2EOFY", "clear2EOFY", "ask2EOFY", "oral2EOFY", "plan2clear", "clear2ask", "ask2oral", "Planned.Ask.Band", "Ask.Band", "Ask.Amt.Over.Pln.Amt", "Ask.Amt.Over.Pln.Fac")
head.all <- setdiff(head.all, exclude)
head.plan <- setdiff(head.plan, exclude)
head.clear <- setdiff(head.clear, exclude)
head.ask <- setdiff(head.ask, exclude)
head.oral <- setdiff(head.oral, exclude)
## Regrow the random forests
set.seed(66461)
# Planned stage forest
treedat <- na.omit(mderived[,c("FY.Plan.Book", head.all, head.plan)])
rf.plan <- randomForest(factor(FY.Plan.Book) ~ ., data=treedat, importance=T)
# Clear stage forest
treedat <- na.omit(mderived[,c("FY.Clear.Book", head.all, head.clear)])
rf.clear <- randomForest(factor(FY.Clear.Book) ~ ., data=treedat, importance=T)
# Ask stage forest
treedat <- na.omit(mderived[,c("FY.Ask.Book", head.all, head.ask)])
rf.ask <- randomForest(factor(FY.Ask.Book) ~ ., data=treedat, importance=T)
# Oral stage forest
treedat <- na.omit(mderived[,c("FY.Oral.Book", head.all, head.oral)])
rf.oral <- randomForest(factor(FY.Oral.Book) ~ ., data=treedat, importance=T)

Plotting the new importances:

And the new errors table:

Model	Plan	Clear	Ask	Oral
GLM	0.34	0.41	0.42	0.14
JMod	0.40	0.46	0.40	0.43
Random forest	0.18	0.23	0.23	0.09
Random forest 2	0.17	0.22	0.23	0.09

Looks like I might even be able to get away with taking out some more predictors. What if we exclude the redundant stage2expect, and Expected.Fiscal.Mo and Planned.Fiscal.Mo which are overwhelmingly December or June?

## Update the explanatory variables
exclude <- c(exclude, "plan2expect", "clear2expect", "ask2expect", "oral2expect", "Expected.Fiscal.Mo", "Planned.Fiscal.Mo")
head.all <- setdiff(head.all, exclude)
head.plan <- setdiff(head.plan, exclude)
head.clear <- setdiff(head.clear, exclude)
head.ask <- setdiff(head.ask, exclude)
head.oral <- setdiff(head.oral, exclude)
## Regrow the random forests
set.seed(48676)
# Planned stage forest
treedat <- na.omit(mderived[,c("FY.Plan.Book", head.all, head.plan)])
rf.plan <- randomForest(factor(FY.Plan.Book) ~ ., data=treedat, importance=T)
# Clear stage forest
treedat <- na.omit(mderived[,c("FY.Clear.Book", head.all, head.clear)])
rf.clear <- randomForest(factor(FY.Clear.Book) ~ ., data=treedat, importance=T)
# Ask stage forest
treedat <- na.omit(mderived[,c("FY.Ask.Book", head.all, head.ask)])
rf.ask <- randomForest(factor(FY.Ask.Book) ~ ., data=treedat, importance=T)
# Oral stage forest
treedat <- na.omit(mderived[,c("FY.Oral.Book", head.all, head.oral)])
rf.oral <- randomForest(factor(FY.Oral.Book) ~ ., data=treedat, importance=T)
## randomForest errors
errors <- rbind(errors, data.frame(Model="Random forest 3", Plan=tail(rf.plan$err.rate[,"OOB"],1), Clear=tail(rf.clear$err.rate[,"OOB"],1), Ask=tail(rf.ask$err.rate[,"OOB"],1), Oral=tail(rf.oral$err.rate[,"OOB"],1), stringsAsFactors=F))
## error table
kable(errors, digits=2)

Model	Plan	Clear	Ask	Oral
GLM	0.34	0.41	0.42	0.14
JMod	0.40	0.46	0.40	0.43
Random forest	0.18	0.23	0.23	0.09
Random forest 2	0.17	0.22	0.23	0.09
Random forest 3	0.17	0.21	0.23	0.10

Pretty much the same as the other forest methods, so I prefer this last Random forest 3 as the more parsimonious model. Finally, how do random forests compare to JMod when given the same amount of information?

## Regrow the random forests
set.seed(30635)
# Planned stage forest
treedat <- na.omit(mderived[,c("FY.Plan.Book", "planning.fiscal.mo", "plan2EOFY")])
rf.plan.jm <- randomForest(factor(FY.Plan.Book) ~ ., data=treedat, importance=T)
# Clear stage forest
treedat <- na.omit(mderived[,c("FY.Clear.Book", "clear.fiscal.mo", "clear2EOFY")])
rf.clear.jm <- randomForest(factor(FY.Clear.Book) ~ ., data=treedat, importance=T)
# Ask stage forest
treedat <- na.omit(mderived[,c("FY.Ask.Book", "ask.fiscal.mo", "ask2EOFY")])
rf.ask.jm <- randomForest(factor(FY.Ask.Book) ~ ., data=treedat, importance=T)
# Oral stage forest
treedat <- na.omit(mderived[,c("FY.Oral.Book", "oral.fiscal.mo", "oral2EOFY")])
rf.oral.jm <- randomForest(factor(FY.Oral.Book) ~ ., data=treedat, importance=T)
## randomForest errors
errors <- rbind(errors, data.frame(Model="Random forest, JMod fields only", Plan=tail(rf.plan.jm$err.rate[,"OOB"],1), Clear=tail(rf.clear.jm$err.rate[,"OOB"],1), Ask=tail(rf.ask.jm$err.rate[,"OOB"],1), Oral=tail(rf.oral.jm$err.rate[,"OOB"],1), stringsAsFactors=F))
## error table
kable(errors, digits=2)

Model	Plan	Clear	Ask	Oral
GLM	0.34	0.41	0.42	0.14
JMod	0.40	0.46	0.40	0.43
Random forest	0.18	0.23	0.23	0.09
Random forest 2	0.17	0.22	0.23	0.09
Random forest 3	0.17	0.21	0.23	0.10
Random forest, JMod fields only	0.32	0.43	0.40	0.15

## save to disk for later
write.table(errors, "classification.errors.txt", sep="\t", row.names=F)

Interesting – limiting random forest to JMod fields (allows splits only on Stage, stage.fiscal.mo, and stage2EOFY) results in a model that doesn’t do any better than the basic GLM. Random forest 3 is parsimonious so it looks like we may have a final set of variables.

Plan

## [1] "Solicitation.Type.Desc" "Planned.Amt"           
## [3] "Expected.Amt"           "Plan.Future"           
## [5] "planning.fiscal.mo"

Clear

## [1] "Solicitation.Type.Desc" "Planned.Amt"           
## [3] "Expected.Amt"           "Clear.Future"          
## [5] "clear.fiscal.mo"

Ask

## [1] "Solicitation.Type.Desc" "Planned.Amt"           
## [3] "Expected.Amt"           "Ask.Amt"               
## [5] "Ask.Future"             "ask.fiscal.mo"

Oral

## [1] "Solicitation.Type.Desc" "Planned.Amt"           
## [3] "Expected.Amt"           "Oral.Future"           
## [5] "oral.fiscal.mo"         "Ask.Amt"

We see the following are consistently important:

Solicitation.Type.Desc, Planned.Amt, and Expected.Amt are useful across the board
Stage.Future and stage.fiscal.mo for the corresponding stages are useful
Ask.Amt is useful in the ask and oral stages

Stage.Future and stage.fiscal.mo are both used by JMod so this introduces three new predictors useful for assessing a solicitation’s probability to be booked in a given year (plus Ask.Amt for the last two stages).

Packages used

session_info()

## Session info --------------------------------------------------------------

##  setting  value                       
##  version  R version 3.3.0 (2016-05-03)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/Chicago             
##  date     2016-05-25

## Packages ------------------------------------------------------------------

##  package      * version date       source        
##  assertthat     0.1     2013-12-06 CRAN (R 3.3.0)
##  colorspace     1.2-6   2015-03-11 CRAN (R 3.3.0)
##  DBI            0.4-1   2016-05-08 CRAN (R 3.3.0)
##  devtools     * 1.11.1  2016-04-21 CRAN (R 3.3.0)
##  digest         0.6.9   2016-01-08 CRAN (R 3.3.0)
##  dplyr        * 0.4.3   2015-09-01 CRAN (R 3.3.0)
##  evaluate       0.9     2016-04-29 CRAN (R 3.3.0)
##  formatR        1.4     2016-05-09 CRAN (R 3.3.0)
##  ggplot2      * 2.1.0   2016-03-01 CRAN (R 3.3.0)
##  gridExtra    * 2.2.1   2016-02-29 CRAN (R 3.3.0)
##  gtable         0.2.0   2016-02-26 CRAN (R 3.3.0)
##  highr          0.6     2016-05-09 CRAN (R 3.3.0)
##  htmltools      0.3.5   2016-03-21 CRAN (R 3.3.0)
##  knitr        * 1.13    2016-05-09 CRAN (R 3.3.0)
##  labeling       0.3     2014-08-23 CRAN (R 3.3.0)
##  lazyeval       0.1.10  2015-01-02 CRAN (R 3.3.0)
##  lubridate    * 1.5.6   2016-04-06 CRAN (R 3.3.0)
##  magrittr       1.5     2014-11-22 CRAN (R 3.3.0)
##  memoise        1.0.0   2016-01-29 CRAN (R 3.3.0)
##  munsell        0.4.3   2016-02-13 CRAN (R 3.3.0)
##  plyr           1.8.3   2015-06-12 CRAN (R 3.3.0)
##  R6             2.1.2   2016-01-26 CRAN (R 3.3.0)
##  randomForest * 4.6-12  2015-10-07 CRAN (R 3.3.0)
##  Rcpp           0.12.4  2016-03-26 CRAN (R 3.3.0)
##  rmarkdown      0.9.6   2016-05-01 CRAN (R 3.3.0)
##  scales       * 0.4.0   2016-02-26 CRAN (R 3.3.0)
##  stringi        1.0-1   2015-10-22 CRAN (R 3.3.0)
##  stringr        1.0.0   2015-04-30 CRAN (R 3.3.0)
##  tidyr        * 0.4.1   2016-02-05 CRAN (R 3.3.0)
##  withr          1.0.1   2016-02-04 CRAN (R 3.3.0)
##  yaml           2.1.13  2014-06-12 CRAN (R 3.3.0)

2 Training and Variable Selection

Paul Hively

April 14, 2016