Comprehensive Quiz

Multiple-choice checks and explain-it-back prompts across core analytics topics

Instructions

Time: 90-120 minutes suggested

Two question types:

Multiple-choice (MC): Choose the best answer. Some are “select all that apply.”
Feynman (F): Explain the concept as if teaching someone unfamiliar. Write your answer on paper or out loud, then click the hint and model answer to compare.

Answer reveals: Click the collapsed callout below each question to see the answer. For Feynman questions, try the Hint first before checking the Model Answer.

Feynman Scoring Rubric

Rate each of your Feynman explanations:

Level	Score	Criteria
Incomplete	0	Cannot explain the concept, or explanation has fundamental errors
Surface	1	Correct but uses jargon without unpacking it, or skips the “why”
Feynman	2	A non-expert could follow your explanation. Uses concrete examples. Addresses WHY, not just WHAT. No hand-waving.

Targets: 80%+ on MC questions AND average Feynman score $\geq$ 1.5

8. Trees, Forests & Logistic Regression

Q25. CART — Splitting Criterion

How does a classification tree (CART) decide which feature to split on at each node?

(a) It picks the feature with the highest correlation to the response
(b) It tries every feature and every split point, choosing the one that maximizes the reduction in impurity (e.g., Gini)
(c) It picks features in order of importance determined before building the tree
(d) It randomly selects features to split on

Answer & Explanation

(b) Exhaustive search. At each node, CART evaluates every possible feature and every possible split point for that feature. For each candidate split, it computes the resulting impurity (Gini for classification, variance for regression) in the two child nodes. It picks the split that produces the greatest reduction in impurity. This greedy, exhaustive approach is why trees are computationally straightforward but can be slow with many features.

Q26. CART — Overfitting

A decision tree achieves 100% accuracy on 500 training points. It has 200 leaf nodes. What is wrong?

(a) Nothing — 100% accuracy is always desirable
(b) The tree has overfit — with 200 leaves and 500 points, each leaf has ~2.5 points (too few to generalize)
(c) 200 leaves is normal for 500 data points
(d) The tree needs more splits to improve further

Answer & Explanation

(b) Overfitting. A practical heuristic is that each leaf should contain at least 5% of the training data — for 500 points, that’s at least 25 points per leaf. With 200 leaves averaging 2.5 points each, the tree has essentially memorized the training data. It will perform poorly on new data. The fix is pruning: grow the tree fully, then prune back nodes that don’t improve validation accuracy.

Q27. CART — Scale Invariance

Unlike SVM and KNN, CART does not require feature scaling. Why?

(a) CART uses correlations instead of distances
(b) CART splits on one feature at a time, so the scale of other features is irrelevant at each split
(c) CART automatically normalizes features internally
(d) Feature scaling would change the tree structure

Answer & Explanation

(b) One feature at a time. CART splits on a single feature at each node. It asks “Is income > $50,000?” or “Is age > 30?” — each split involves only one feature’s values. The scale of income doesn’t interact with the scale of age because they’re never combined in a distance calculation. SVM and KNN compute distances between points using all features simultaneously, so unequal scales create unequal influence. CART avoids this entirely.

Q28. Random Forest — Why It Works

Why does averaging 500 overfit trees (random forest) produce better predictions than a single carefully pruned tree?

(a) Each tree sees all the data and all features, so averaging stabilizes the predictions
(b) Bootstrap sampling and random feature subsets cause each tree to overfit in different ways; averaging cancels out the idiosyncratic errors
(c) Random forests use a different splitting criterion than CART
(d) The 500 trees are pruned individually before averaging

Answer & Explanation

(b) Diverse overfitting cancels out. Each tree gets a bootstrap sample (random subset with replacement) of the data, and at each split considers only a random subset of features (typically $1 + \log_2(n)$ features). This ensures the 500 trees are different — they overfit to different patterns and different noise. When you average their predictions, the idiosyncratic errors cancel out while the real signal reinforces. No pruning is needed because the averaging itself smooths out overfitting.

Q29. Confusion Matrix — Cost Analysis

Two spam filters are evaluated on 1,000 emails (600 spam, 400 legitimate):

	Filter A	Filter B
True Positives (spam caught)	540	580
False Positives (legit marked spam)	40	100
False Negatives (spam missed)	60	20
True Negatives (legit passed)	360	300

A missed spam costs $1 (annoyance). A legitimate email marked as spam costs $50 (missed business opportunity). Which filter has lower total cost?

(a) Filter A (total cost: $2,060)
(b) Filter B (total cost: $5,020)
(c) Filter A is cheaper because it has higher accuracy
(d) They have equal cost

Answer & Explanation

(a) Filter A: $2,060.

Filter A: $(60 \times \$1) + (40 \times \$50) = \$60 + \$2,000 = \$2,060$
Filter B: $(20 \times \$1) + (100 \times \$50) = \$20 + \$5,000 = \$5,020$

Filter B has higher accuracy (88% vs. 90%) and catches more spam — but it also misclassifies 2.5x more legitimate emails. When the cost of a false positive ($50) far exceeds the cost of a false negative ($1), the “less accurate” filter is economically superior. Accuracy alone is misleading when misclassification costs are asymmetric.

F17. Explain: Why Do Random Forests Sacrifice Interpretability?

A financial regulator says “I need to understand WHY your model denied this loan.” Explain why you can’t answer this with a random forest, and what model you’d use instead.

Hint

Can you trace a single decision path through 500 trees? What does a single CART tree give you instead?

Model Answer

A single decision tree gives you a clear narrative: “The loan was denied because income < $40K AND credit score < 620 AND debt-to-income ratio > 0.5.” You can trace the exact path from root to leaf and explain each decision point. A loan officer can point to the specific rules and say “here’s why.”

A random forest averages the predictions of 500 different trees. Each tree was built on a different random sample of data and considered different random subsets of features at each split. The trees disagree with each other — some might approve the loan, others deny it. The final answer is just the majority vote.

You cannot trace a meaningful narrative through 500 trees. You can report “variable importance” (which features were used most across all trees), but that tells you what matters in general, not why this specific loan was denied.

In finance and healthcare, explainability is often a legal requirement. When you must explain individual decisions, use a single CART tree, logistic regression, or linear regression. When prediction accuracy matters more than explanation (and regulations allow it), random forests are powerful.

F18. Explain: CART’s Grow-and-Prune Strategy

Explain why CART first grows a tree as deep as possible and then prunes it back, rather than stopping growth early.

Hint

What if a weak split at level 3 enables a very strong split at level 4? What happens if you stop at level 3?

Model Answer

Imagine a tree where splitting on “zipcode” at level 3 barely improves impurity. If you stop growing early (“this split isn’t good enough”), you’d never discover that within that zipcode group, splitting on “income” at level 4 produces nearly pure nodes. The level-3 split was weak by itself but essential as a stepping stone.

So CART takes a two-phase approach:

Phase 1 — Grow: Build the tree as deep as possible, splitting until leaves are pure or too small. This tree is overfit (possibly 100% training accuracy with tiny leaves), but it hasn’t missed any important splits.

Phase 2 — Prune: Walk back up the tree and remove splits that don’t improve validation accuracy. Use a pruning threshold: if removing a split reduces training accuracy by less than $\Delta$, prune it (the complexity isn’t worth the tiny gain). Also enforce a minimum leaf size (heuristic: at least 5% of training data per leaf).

The result: you keep the important deep splits while removing the noise. This is better than stopping early because you never prematurely close off branches that might contain valuable structure.

F19. Explain: Logistic Regression — Why Not Just Use Linear Regression for Binary Outcomes?

Someone asks “If I want to predict yes/no, can’t I just use regular linear regression with 0 and 1 as the response?” Explain why logistic regression exists.

Hint

What values can linear regression predict? What values make sense for a probability?

Model Answer

Linear regression predicts unbounded numbers: $\hat{y} = a_0 + a_1 x_1 + \ldots$ can produce any value from $-\infty$ to $+\infty$. If you use 0 and 1 as the response, the model might predict $-0.3$ or $1.7$ for some inputs. What does a probability of $-0.3$ or $1.7$ mean? Nothing — probabilities must be between 0 and 1.

Logistic regression fixes this by passing the linear combination through a sigmoid function: $P(Y=1) = \frac{1}{1 + e^{-(a_0 + a_1 x_1 + \ldots)}}$. The sigmoid squashes any input into the (0, 1) range, so the output is always a valid probability.

Additionally, logistic regression handles the fact that the relationship between predictors and probability is typically S-shaped, not a straight line. As study hours increase from 0 to 100, the probability of passing goes from near 0 to near 1, but it doesn’t increase linearly — it accelerates in the middle and flattens at the extremes. The sigmoid naturally captures this shape.

Self-Assessment Scorecard

Fill in your scores:

Section	MC Score	MC Total	Feynman Avg	Feynman Total
1. Classification	___	/4	___	/3
2. Validation & Clustering	___	/3	___	/3
3. Data Prep & Outliers	___	/3	___	/2
4. Change Detection	___	/3	___	/2
5. Time Series	___	/4	___	/2
6. Regression	___	/4	___	/2
7. Transformations & PCA	___	/3	___	/2
8. Trees & Forests	___	/5	___	/3
9. Cross-Module Integration	___	/4	___	/2
Totals	___	/33	___	/21

MC percentage: ___ / 33 = ___%

Feynman average: Total Feynman points / 21 = ___

Mastery Check

80%+ MC AND Feynman avg $\geq$ 1.5 $\rightarrow$ Analytics Ready (Level 3)
60-79% MC OR Feynman avg 1.0-1.4 $\rightarrow$ Review weak areas below
Below 60% MC OR Feynman avg < 1.0 $\rightarrow$ Revisit the linked walkthroughs before retesting

If You Scored Low…

Section	Review These Materials
1. Classification	SVM, KNN
2. Validation & Clustering	Cross-Validation, K-Means
3. Data Prep & Outliers	Review missingness and transformation ideas in Missing Data and PCA & Box-Cox
4. Change Detection	CUSUM
5. Time Series	Time Series
6. Regression	Regression
7. Transformations & PCA	PCA & Box-Cox
8. Trees & Forests	CART, Advanced Topics
9. Cross-Module	Start at the home page and choose the topic by modeling goal

Instructions

Feynman Scoring Rubric

1. Classification Foundations

Q1. SVM — Support Vectors

Q2. SVM — Kernel Choice

Q3. KNN — Curse of Dimensionality

Q4. Classification — Misclassification Costs

F1. Explain: Why Must You Scale Features Before Using SVM or KNN?

F2. Explain: What Are Support Vectors and Why Do They Matter?

F3. Explain: KNN vs. SVM — When Would You Choose Each?

2. Validation & Clustering

Q5. Cross-Validation — Data Leakage

Q6. Cross-Validation — LOOCV Trade-off

Q7. Clustering — k-means Failure

F4. Explain: Why Can’t You Evaluate a Model on Its Training Data?

F5. Explain: k-means vs. KNN — They Sound Similar But Are Completely Different

F6. Explain: The Elbow Method for Choosing k

3. Data Preparation & Outliers

Q8. Outlier Handling — Philosophy

Q9. Outlier Types

Q10. Outlier — Two-Model Approach

F7. Explain: “It Depends” — The Outlier Investigation Framework

F8. Explain: Why Might Removing Real Outliers Be Dangerous?

4. Change Detection — CUSUM

Q11. CUSUM — Formula Mechanics

Q12. CUSUM — Parameter Trade-off

Q13. CUSUM — Limitations

F9. Explain: The CUSUM Formula in Plain English

F10. Explain: How Would You Set C and T for Different Contexts?

5. Time Series Forecasting

Q14. Exponential Smoothing — Naming

Q15. Holt-Winters — Component Matching

Q16. ARIMA — Equivalence

Q17. Model Choice — Volatility vs. Values

F11. Explain: ARIMA vs. Exponential Smoothing — When to Use Each

F12. Explain: What Does GARCH Forecast?

6. Regression

Q18. Regression — Practical vs. Statistical Significance

Q19. Regression — Causation Trap

Q20. Regression — Adjusted R-squared

Q21. Regression — Residual Diagnostics

F13. Explain: Why Doesn’t a Significant Regression Prove Causation?

F14. Explain: R-squared vs. Adjusted R-squared

7. Transformations & PCA

Q22. PCA — The Critical Weakness

Q23. Box-Cox — Lambda Values

Q24. PCA — When to Use

F15. Explain: Why Must You Standardize Before PCA?

F16. Explain: PCA Components — What Are You Actually Keeping?

8. Trees, Forests & Logistic Regression

Q25. CART — Splitting Criterion

Q26. CART — Overfitting

Q27. CART — Scale Invariance

Q28. Random Forest — Why It Works

Q29. Confusion Matrix — Cost Analysis

F17. Explain: Why Do Random Forests Sacrifice Interpretability?

F18. Explain: CART’s Grow-and-Prune Strategy

F19. Explain: Logistic Regression — Why Not Just Use Linear Regression for Binary Outcomes?

9. Cross-Module Integration

Q30. Pipeline — Correlated Predictors + Probability Output

Q31. Supervised vs. Unsupervised — Choosing the Framework

Q32. Model Selection — Interpretability Constraint

Q33. Transformation Sequencing

F20. Explain: How Would You Build a Complete Predictive System?

F21. Explain: The Three Types of Analytics Questions

Self-Assessment Scorecard

Mastery Check

If You Scored Low…