import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
# Get wage data
from sklearn.datasets import fetch_openml
= fetch_openml(data_id=534, as_frame=True)
survey
= survey.data[survey.feature_names]
X
= survey.target y
Exercise Set 6: Explainability
In this exercise set, we will be looking at explainability.
Data
The dataset we will be looking at this time is a dataset regarding the determinants of wages from the 1985 Current Population Survey (Berndt, ER. The Practice of Econometrics. 1991. NY: Addison-Wesley), which is open source and available at OpenML. As last time, you’re welcome to use a dataset of your own.
Load data
Here we load our input data into a DataFrame
called X
and our target data into a Series
called y
.
Here we describe the data using both the documentation which came with the data, but also by computing summary statistics for the input data and target data.
print(survey.DESCR)
**Author**:
**Source**: Unknown - Date unknown
**Please cite**:
Determinants of Wages from the 1985 Current Population Survey
Summary:
The Current Population Survey (CPS) is used to supplement census information between census years. These data consist of a random sample of 534 persons from the CPS, with information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. We wish to determine (i) whether wages are related to these characteristics and (ii) whether there is a gender gap in wages.
Based on residual plots, wages were log-transformed to stabilize the variance. Age and work experience were almost perfectly correlated (r=.98). Multiple regression of log wages against sex, age, years of education, work experience, union membership, southern residence, and occupational status showed that these covariates were related to wages (pooled F test, p < .0001). The effect of age was not significant after controlling for experience. Standardized residual plots showed no patterns, except for one large outlier with lower wages than expected. This was a male, with 22 years of experience and 12 years of education, in a management position, who lived in the north and was not a union member. Removing this person from the analysis did not substantially change the results, so that the final model included the entire sample.
Adjusting for all other variables in the model, females earned 81% (75%, 88%) the wages of males (p < .0001). Wages increased 41% (28%, 56%) for every 5 additional years of education (p < .0001). They increased by 11% (7%, 14%) for every additional 10 years of experience (p < .0001). Union members were paid 23% (12%, 36%) more than non-union members (p < .0001). Northerns were paid 11% (2%, 20%) more than southerns (p =.016). Management and professional positions were paid most, and service and clerical positions were paid least (pooled F-test, p < .0001). Overall variance explained was R2 = .35.
In summary, many factors describe the variations in wages: occupational status, years of experience, years of education, sex, union membership and region of residence. However, despite adjustment for all factors that were available, there still appeared to be a gender gap in wages. There is no readily available explanation for this gender gap.
Authorization: Public Domain
Reference: Berndt, ER. The Practice of Econometrics. 1991. NY: Addison-Wesley.
Description: The datafile contains 534 observations on 11 variables sampled from the Current Population Survey of 1985. This data set demonstrates multiple regression, confounding, transformations, multicollinearity, categorical variables, ANOVA, pooled tests of significance, interactions and model building strategies.
Variable names in order from left to right:
EDUCATION: Number of years of education.
SOUTH: Indicator variable for Southern Region (1=Person lives in South, 0=Person lives elsewhere).
SEX: Indicator variable for sex (1=Female, 0=Male).
EXPERIENCE: Number of years of work experience.
UNION: Indicator variable for union membership (1=Union member, 0=Not union member).
WAGE: Wage (dollars per hour).
AGE: Age (years).
RACE: Race (1=Other, 2=Hispanic, 3=White).
OCCUPATION: Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other).
SECTOR: Sector (0=Other, 1=Manufacturing, 2=Construction).
MARR: Marital Status (0=Unmarried, 1=Married)
Therese Stukel
Dartmouth Hitchcock Medical Center
One Medical Center Dr.
Lebanon, NH 03756
e-mail: stukel@dartmouth.edu
Information about the dataset
CLASSTYPE: numeric
CLASSINDEX: none specific
Downloaded from openml.org.
="all") X.describe(include
EDUCATION | SOUTH | SEX | EXPERIENCE | UNION | AGE | RACE | OCCUPATION | SECTOR | MARR | |
---|---|---|---|---|---|---|---|---|---|---|
count | 534.000000 | 534 | 534 | 534.000000 | 534 | 534.000000 | 534 | 534 | 534 | 534 |
unique | NaN | 2 | 2 | NaN | 2 | NaN | 3 | 6 | 3 | 2 |
top | NaN | no | male | NaN | not_member | NaN | White | Other | Other | Married |
freq | NaN | 378 | 289 | NaN | 438 | NaN | 440 | 156 | 411 | 350 |
mean | 13.018727 | NaN | NaN | 17.822097 | NaN | 36.833333 | NaN | NaN | NaN | NaN |
std | 2.615373 | NaN | NaN | 12.379710 | NaN | 11.726573 | NaN | NaN | NaN | NaN |
min | 2.000000 | NaN | NaN | 0.000000 | NaN | 18.000000 | NaN | NaN | NaN | NaN |
25% | 12.000000 | NaN | NaN | 8.000000 | NaN | 28.000000 | NaN | NaN | NaN | NaN |
50% | 12.000000 | NaN | NaN | 15.000000 | NaN | 35.000000 | NaN | NaN | NaN | NaN |
75% | 15.000000 | NaN | NaN | 26.000000 | NaN | 44.000000 | NaN | NaN | NaN | NaN |
max | 18.000000 | NaN | NaN | 55.000000 | NaN | 64.000000 | NaN | NaN | NaN | NaN |
y.describe()
count 534.000000
mean 9.024064
std 5.139097
min 1.000000
25% 5.250000
50% 7.780000
75% 11.250000
max 44.500000
Name: WAGE, dtype: float64
In this exercise set, we will not perform hyperparameteroptimization, as this simplifies the code.
However, some methods can only explain the training data, whereas some can explain both the training and test data. To examine this, we split up the sample.
Exercise 1.1
Fill in the missing code such that we split the data into 80% train and 20% test.
Hints:
We have previously looked at datasplitting. Try looking at last sessions exercises.
from sklearn.model_selection import FILL IN
= FILL IN(FILL IN, random_state=73) X_train, X_test, y_train, y_test
### BEGIN SOLUTION
from sklearn.model_selection import train_test_split
= train_test_split(X, y, train_size = 0.8, random_state=73)
X_train, X_test, y_train, y_test
### END SOLUTION
Linear regression
In this section of the exercises, we will be looking at (regularized) linear regression.
We will be looking at different plots, and how regularizing linear regression with LASSO can create sparse models.
Exercise 2.1
What does this code do? Fill in the missing comments.
Hints:
We have not used this functionality before.
If you’re in doubt, try looking up the documentation.
Think about how numerical and categorical variables should be encoded.
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
#
= ["RACE", "OCCUPATION", "SECTOR", "MARR", "UNION", "SEX", "SOUTH"]
categorical_columns = ["EDUCATION", "EXPERIENCE", "AGE"]
numerical_columns
#
= OneHotEncoder()
categorical_encoder
#
= ColumnTransformer(
col_transformer
["cat", categorical_encoder, categorical_columns)
(
],=False,
verbose_feature_names_out='passthrough'
remainder )
### BEGIN SOLUTION
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
## Split columns into categorical and numerical
= ["RACE", "OCCUPATION", "SECTOR", "MARR", "UNION", "SEX", "SOUTH"]
categorical_columns = ["EDUCATION", "EXPERIENCE", "AGE"]
numerical_columns
# Create a method to one hot encode the categorical features
= OneHotEncoder()
categorical_encoder
# Create a column transformer which one hot encodes categorical features and does nothing with the remainder
= ColumnTransformer(
col_transformer
["cat", categorical_encoder, categorical_columns)
(
],=False,
verbose_feature_names_out='passthrough'
remainder
)
### END SOLUTION
To start out with, we only introduce a slight amount of regularization.
Exercise 2.2
Fill in the missing code and comments to implement a LASSO regression with \(\alpha=0.0001\).
Hints:
We created a transformer for categorical columns in the previous exercise.
We have looked at this in previous sessions. Try looking at those.
from sklearn.pipeline import FILL IN
from sklearn.linear_model import FILL IN
from sklearn.preprocessing import FILL IN
#
= Pipeline([
lasso_model 'preprocessor',FILL IN),
('scaler', FILL IN),
('regr', FILL IN)
(
]
)
#
lasso_model.fit(X_train, y_train)
#
= lasso_model[:-1].get_feature_names_out()
feature_names
#
= pd.DataFrame(
lasso_coefs -1].coef_,
lasso_model[=["Coefficients"],
columns=feature_names,
index
)
lasso_coefs
c:\Users\wkg579\.conda\envs\vive_env\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:648: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.181e+02, tolerance: 9.878e-01
model = cd_fast.enet_coordinate_descent(
Coefficients | |
---|---|
RACE_Hispanic | -0.1641 |
RACE_Other | -0.1768 |
RACE_White | 0.0528 |
OCCUPATION_Clerical | -0.1391 |
OCCUPATION_Management | 0.7714 |
OCCUPATION_Other | -0.0206 |
OCCUPATION_Professional | 0.6706 |
OCCUPATION_Sales | -0.2746 |
OCCUPATION_Service | -0.2886 |
SECTOR_Construction | 0.0598 |
SECTOR_Manufacturing | 0.3674 |
SECTOR_Other | -0.0614 |
MARR_Married | 0.2284 |
MARR_Unmarried | -0.0000 |
UNION_member | 0.5786 |
UNION_not_member | -0.0000 |
SEX_female | -0.8502 |
SEX_male | 0.0000 |
SOUTH_no | 0.0680 |
SOUTH_yes | -0.0000 |
EDUCATION | 1.7499 |
EXPERIENCE | 1.0402 |
AGE | 0.1142 |
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
# Create a pipeline which
# - one hot encodes categorical features
# - standardscales everything
# - implements a lasso
= Pipeline([
lasso_model 'preprocessor', col_transformer),
('scaler', StandardScaler()),
('regr', Lasso(alpha=0.0001))
(
]
)
# Fit to training data
lasso_model.fit(X_train, y_train)
# Get the feature names from the last step (LASSO)
= lasso_model[:-1].get_feature_names_out()
feature_names
# Create a df with coefficients
= pd.DataFrame(
lasso_coefs -1].coef_,
lasso_model[=["Coefficients"],
columns=feature_names,
index
)
= '{:,.10f}'.format
pd.options.display.float_format lasso_coefs
c:\Users\wkg579\.conda\envs\vive_env\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:648: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.181e+02, tolerance: 9.878e-01
model = cd_fast.enet_coordinate_descent(
Coefficients | |
---|---|
RACE_Hispanic | -0.1640990547 |
RACE_Other | -0.1767820499 |
RACE_White | 0.0528335182 |
OCCUPATION_Clerical | -0.1390938486 |
OCCUPATION_Management | 0.7713821430 |
OCCUPATION_Other | -0.0205680735 |
OCCUPATION_Professional | 0.6706017443 |
OCCUPATION_Sales | -0.2745718105 |
OCCUPATION_Service | -0.2886302041 |
SECTOR_Construction | 0.0598077036 |
SECTOR_Manufacturing | 0.3673974096 |
SECTOR_Other | -0.0613621789 |
MARR_Married | 0.2284015432 |
MARR_Unmarried | -0.0000000000 |
UNION_member | 0.5786492961 |
UNION_not_member | -0.0000000000 |
SEX_female | -0.8502390471 |
SEX_male | 0.0000000000 |
SOUTH_no | 0.0680407634 |
SOUTH_yes | -0.0000000000 |
EDUCATION | 1.7498552116 |
EXPERIENCE | 1.0401587185 |
AGE | 0.1142108659 |
As we know from session 4, LASSO creates sparse models due to the corner solutions stemming from L1 normalization.
Exercise 2.3
Looking at the coefficients from exercise 2.2, which coefficients are equal to zero? Why is the second line of code necessary to get the answer we intuitively agree with?
Hints:
Think about numerical accuracy and computers.
print('Equal to zero')
print(lasso_coefs.loc[lasso_coefs.Coefficients == 0])
print()
print('Close to zero')
print(lasso_coefs.loc[np.isclose(lasso_coefs.Coefficients, 0)])
Equal to zero
Coefficients
SEX_male 0.0000000000
Close to zero
Coefficients
MARR_Unmarried -0.0000000000
UNION_not_member -0.0000000000
SEX_male 0.0000000000
SOUTH_yes -0.0000000000
### BEGIN SOLUTION
# This is due to float precision. The coefficients are very small, but techically not zero due to how computers work with floating point numbers.
# If one wants true zeroes, one could create a new model with the found coefficients changed to integer zero.
### END SOLUTION
Exercise 2.4
The following code creates a weight plot for the coefficients. How should these be interpreted? Fill in the missing comments.
Hints:
On what scale are the features measured?
= plt.subplots(1,1, figsize=(10,10))
fig, ax #
=lasso_coefs.reset_index(), x='Coefficients', y='index', join=False, ax=ax)
sns.pointplot(data#
"Lasso model, low regularization")
plt.title(#
=0, color="grey", linestyle='--')
plt.axvline(x#
"Coefficient values")
plt.xlabel('')
plt.ylabel( plt.show()
Exercise 2.5
The following code creates an effect plot for the model. How should these be interpreted? Fill in the missing comments.
Hints:
On what scale are the effects measured?
What does
[:-1]
and[-1]
select from the pipeline?
#
= lasso_model[:-1].transform(X_train) * lasso_model[-1].coef_
lasso_contributions = pd.DataFrame(lasso_contributions, columns=feature_names)
df_lasso_contributions
#
= plt.subplots(1, 1, figsize=(10,10))
fig, ax =df_lasso_contributions, orient='h', ax=ax)
sns.boxplot(data=0, color="grey", linestyle='--')
plt.axvline(x'Prediction contribution')
plt.xlabel( plt.show()
### BEGIN SOLUTION
# Calulate contributions as transformed covariates (both columntransformer and standardscaler) times coefficents (last pipeline step)
= lasso_model[:-1].transform(X_train) * lasso_model[-1].coef_
lasso_contributions = pd.DataFrame(lasso_contributions, columns=feature_names)
df_lasso_contributions
# Create a boxplot with inserted vertical line
= plt.subplots(1, 1, figsize=(10,10))
fig, ax =df_lasso_contributions, orient='h', ax=ax)
sns.boxplot(data=0, color="grey", linestyle='--')
plt.axvline(x'Prediction contribution')
plt.xlabel(
plt.show()### END SOLUTION
Exercise 2.6
The following code creates an effect plot for the model with an individual plotted as well. In what situations would a plot like this be useful? Fill in the missing comments.
Hints:
Who’s the receiver of the explanation?
= plt.subplots(1, 1, figsize=(10,10))
fig, ax =df_lasso_contributions, ax=ax)
sns.boxplot(data#
=df_lasso_contributions.iloc[0].T, s=200, color="red", marker="x")
sns.scatterplot(data=90)
plt.xticks(rotation=0, color="grey", linestyle='--')
plt.axhline(y'Prediction contribution')
plt.ylabel( plt.show()
### BEGIN SOLUTION
= plt.subplots(1, 1, figsize=(10,10))
fig, ax =df_lasso_contributions, ax=ax)
sns.boxplot(data# Add a scatterplot with the contributions for a single individual (here, individual zero)
=df_lasso_contributions.iloc[0].T, s=200, color="red", marker="x")
sns.scatterplot(data=90)
plt.xticks(rotation=0, color="grey", linestyle='--')
plt.axhline(y'Prediction contribution')
plt.ylabel(
plt.show()### END SOLUTION
Exercise 2.7
Try running the code with other values of \(\alpha\). Does the model become easier to interpret? Are the same coefficients always high/low?
Hints:
Multicollinearity.
Decision tree
In this section, we will be looking at how to plot decision trees as flowcharts.
Exercise 3.1
Fill in the missing code and comments to create a plot for a decision tree with max depth 1.
Hints:
We’ve covered this in previous sessions. Try looking at our previous materials.
from sklearn.tree import plot_tree, FILL IN
#
= Pipeline([
decision_model 'preprocessor', col_transformer),
('regr', FILL IN)
(
]
)
#
= decision_model[0].get_feature_names_out()
decision_feature_names
#
decision_model.fit(X_train, y_train)
#
=(10,10))
plt.figure(figsize-1], feature_names = decision_feature_names, impurity=False)
plot_tree(decision_model[ plt.show()
### BEGIN SOLUTION
from sklearn.tree import plot_tree, DecisionTreeRegressor
# Create pipeline with max_depth 1
= Pipeline([
decision_model 'preprocessor', col_transformer),
('regr', DecisionTreeRegressor(max_depth=1))
(
]
)
# Fit to train
decision_model.fit(X_train, y_train)
# Plot flowchart
=(10,10))
plt.figure(figsize-1], feature_names = feature_names, impurity=False)
plot_tree(decision_model[
plt.show()
### END SOLUTION
Exercise 3.2
Repeat the following exercise with max depth 2 and 5. Is the model still explainable?
Hints:
Are you explaining a single prediction or the full model?
To increase legibility, try changing the
figsize
parameter, e.g.(50,10)
for the larger model.
### BEGIN SOLUTION
= [(10,10), (50,10)]
fig_sizes = [2,5]
depths
for depth, fig_size in zip(depths, fig_sizes):
# Create pipeline with given max_depth
= Pipeline([
decision_model 'preprocessor', col_transformer),
('regr', DecisionTreeRegressor(max_depth=1))
(
]
)
# Fit to train
decision_model.fit(X_train, y_train)
# Plot flowchart
=fig_size)
plt.figure(figsize-1], feature_names = feature_names, impurity=False)
plot_tree(decision_model[
plt.show()
### END SOLUTION
Model agnostic methods
In this section of the exercises, we will cover model agnostic methods, such as permutation feature importance, partial dependence plots and SHAP values.
We first create a random forest model, which is not intrinsically interpretable. This is the model we want to understand.
from sklearn.ensemble import RandomForestRegressor
= Pipeline([
forest_model 'preprocessor', col_transformer),
('regr', RandomForestRegressor())
(
]
)
forest_model.fit(X_train, y_train)
= forest_model[:-1].get_feature_names_out() feature_names
Permutation feature importance
Exercise 4.1
Fill in the missing comments to create a feature importance plot. How does one interpret these plots? What’s the difference between the two plots?
Hints:
How are permutation feature importances calculated, and what is measured?
# One uses training data, other uses test data.
# If model has picked up spurious correlation,
# this will reduce training performance,
# but not test performance
from sklearn.inspection import permutation_importance
# Calculate permutation feature importance on train data
= permutation_importance(
result =10, random_state=73, n_jobs=2, scoring='neg_mean_squared_error'
forest_model, X_train, y_train, n_repeats
)
# Sort importances by mean
= result.importances_mean.argsort()
sorted_importances_idx = pd.DataFrame(
importances
result.importances[sorted_importances_idx].T,=X_test.columns[sorted_importances_idx],
columns
)
# Plot importances with boxplot
= importances.plot.box(vert=False, whis=10)
ax "Permutation Importances (train set)")
ax.set_title(=0, color="k", linestyle="--")
ax.axvline(x"Decrease in score")
ax.set_xlabel( ax.figure.tight_layout()
from sklearn.inspection import permutation_importance
# Calculate permutation feature importance on test data
= permutation_importance(
result =10, random_state=73, n_jobs=2, scoring='neg_mean_squared_error'
forest_model, X_test, y_test, n_repeats
)
# Sort importances by mean
= result.importances_mean.argsort()
sorted_importances_idx = pd.DataFrame(
importances
result.importances[sorted_importances_idx].T,=X_test.columns[sorted_importances_idx],
columns
)
# Plot importances with boxplot
= importances.plot.box(vert=False, whis=10)
ax "Permutation Importances (test set)")
ax.set_title(=0, color="k", linestyle="--")
ax.axvline(x"Decrease in score")
ax.set_xlabel( ax.figure.tight_layout()
Partial dependencies
sklearn.inspection
has built-in functionality for creating plots from partial dependencies. An example including both partial dependence plots and individual conditional expectations can be seen on this page.
Exercise 4.2
Create two one-way partial dependence plots for
EDUCATION
andAGE
, as well as a two-way partial dependence plot in a single figure. Use the training data as background data.Hints:
This corresponds to part of the first plot on the webpage.
The code uses feature column indices to note what features to include.
from sklearn.inspection import PartialDependenceDisplay
# Index to feature
for idx, col in enumerate(X_train.columns):
print(idx, col)
# Alternatively, use dictionary
= {name:idx for idx, name in enumerate(X.columns)}
mapper
#
'EDUCATION'], mapper['AGE'], (mapper['EDUCATION'], mapper['AGE'])])
PartialDependenceDisplay.from_estimator(forest_model, X_train, [mapper[ plt.show()
0 EDUCATION
1 SOUTH
2 SEX
3 EXPERIENCE
4 UNION
5 AGE
6 RACE
7 OCCUPATION
8 SECTOR
9 MARR
### BEGIN SOLUTION
from sklearn.inspection import PartialDependenceDisplay
# Index to feature
for idx, col in enumerate(X_train.columns):
print(idx, col)
# Alternatively, use dictionary
= {name:idx for idx, name in enumerate(X.columns)}
mapper
# Input model, data and columns
'EDUCATION'], mapper['AGE'], (mapper['EDUCATION'], mapper['AGE'])])
PartialDependenceDisplay.from_estimator(forest_model, X_train, [mapper[ plt.show()
Instead of masking the heterogeneity, we can also limit ourselves to one-way plots and display the heterogeneity in the partial dependencies, creating so-called individual expectation plots.
Exercise 4.3
Create an individual conditional expectation plot for both
EDUCATION
andAGE
in a single plot. It should first be uncentered and then centered, and always include the average. Which do you prefer?Hints:
Change the
kind
of the plot
### BEGIN SOLUTION
'EDUCATION'], mapper['AGE']], kind='both')
PartialDependenceDisplay.from_estimator(forest_model, X_train, [mapper[
plt.show()'EDUCATION'], mapper['AGE']], kind='both', centered=True)
PartialDependenceDisplay.from_estimator(forest_model, X_train, [mapper[
plt.show()
### END SOLUTION
SHAP
Finally, we have the game theoretically founded SHapley Additive exPlanations, which are calculated using the SHAP package. They have a lot of different examples on their page, so look to there for code!
Note that all of the following plots can be created using a single line of code, which perhaps is also one of the causes for it’s popularity.
Exercise 4.4
Fill in the missing comments, and explain to yourself what each of the inputs to the
shap.TreeExplainer
does.Does the fact that we use training data as our background data influence what observations we can calculate SHAP values for?
Hints:
The
shap.TreeExplainer
only accepts arrays as data input.What is the background data used for?
import shap
#
= forest_model[:-1].get_feature_names_out()
forest_feature_names
#
= forest_model[:-1].transform(X_train)
X_train_as_array
#
= shap.TreeExplainer(model = forest_model[-1], data = X_train_as_array, feature_names=forest_feature_names, seed=73, feature_perturbation = 'interventional')
explainer
#
= explainer(X_train_as_array,) shap_values
96%|=================== | 411/427 [00:15<00:00]
### BEGIN SOLUTION
import shap
# get feature names for plots
= forest_model[:-1].get_feature_names_out()
forest_feature_names
# convert df to array using columntransformer
= forest_model[:-1].transform(X_train)
X_train_as_array
# create an explainer, with the following parameters
# - Should explain the last step of the pipeline, i.e. the random forest
# - Should use the training data as the background covariates when perturbing the features
# - Use the given feature names for plots
# - Seed for replicability
# - Do interventional feature perturbation (true to the model), and not path dependent (true to the data). Would also not need data if path dependent
= shap.TreeExplainer(model = forest_model[-1], data = X_train_as_array, feature_names=forest_feature_names, seed=73, feature_perturbation = 'interventional')
explainer
# Calculate the SHAP values for the given data
= explainer(X_train_as_array,)
shap_values
# We could also calculate SHAP values for any other data with the same format, data is only used when perturbing
### END SOLUTION
Exercise 4.5
Create a
force
plot of the first observationHints:
If the package complains regarding initialization of js, try adding
matplotlib=True
### BEGIN SOLUTION
0], matplotlib=True)
shap.plots.force(shap_values[
### END SOLUTION
Exercise 4.5
Sometimes the force plots can get a bit messy with long column names.
Create a
waterfall
plot of the first observation.Did it help with the issue? What are possible downsides?
### BEGIN SOLUTION
0])
shap.plots.waterfall(shap_values[
# More legible, but takes up more space. This can be infeasible if more than a single observation is to be explained
### END SOLUTION
Exercise 4.6
Although SHAP values are local explanations, they can still be aggregated for groups to gain global insights.
Create a
bar
plot of the mean absolute SHAP value for each feature
### BEGIN SOLUTION
shap.plots.bar(shap_values)
### END SOLUTION
Exercise 4.7
We can also create plots akin to partial dependence plots, examining heterogeneity for different values of a feature
Create a
scatter
plot of the SHAP values forEDUCATION
### BEGIN SOLUTION
"EDUCATION"])
shap.plots.scatter(shap_values[:,
### END SOLUTION
Exercise 4.8
To examine interactions between features, dependence plots can be coloured by another feature.
Create a
scatter
plot of the SHAP values forEDUCATION
where the colour is decided by the value ofAGE
### BEGIN SOLUTION
'EDUCATION'], shap_values[:, 'AGE'])
shap.plots.scatter(shap_values[:,
### END SOLUTION
Finally, we can examine heterogeneity for multiple features at once using beeswarm
plots.
Exercise 4.9
Create a
beeswarm
plot of the SHAP values.
### BEGIN SOLUTION
shap.plots.beeswarm(shap_values)
### END SOLUTION