Modeling using Python
General comments
R equivalents
glmnet
:pyglmnet
lme4
:pymer4
and asklearn
wrappersklearn-lmer
Scikit-learn (sklearn
)
- Mostly produces predictive models (
fit
,predict
andscore
); no built-in inference mechanisms - Easy to perform CV for parameter selection (
.GridSearchCV
) - Many metrics implemented
- Many preprocessing tools:
- Label encoding, scaling, standardization, transformations, etc.
- Many related packages:
Statsmodels (statsmodels
)
- Classical statistical techniques with inference
- ANOVAs, LMM, GLM, hypothesis testing, etc.
- Regularization (Elastic net, Rigde, LASSO)
- Rich family of GLM distributions
- Uses
R
-like formulas to describe models
Scipy stats module (scipy.stats
)
- Implements some basic statistical functions:
- Distributions
- Estimators
- Hypothesis tests
- Transformations
- Gaussian KDE
Categorical Data
Logistic Regression
sklearn.linear_model.LogisticRegression
:- L1, L2 and elastic net penalties
- For multi-class problems: one-vs-all and multinomial
pyglmnet.GLM(distr="binomial")
- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pymer4.Lmer(family="binomial")
- Mixed effect models
pyGAM.LogisticGAM
:- GAM (with interactions), Cross-validation, similar to
sklearn
’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels
:
Other GLM
- Probit:
pyglmnet.GLM(distr="probit")
- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
statsmodels
:
Ridge Classifier (Ridge regression on -1/+1 responses)
sklearn.linear_model.RidgeClassifier
sklearn.linear_model.RidgeClassifierCV
performs CV on a solution path
Discriminant analysis
Ensemble and Tree-based Methods
Gaussian Process
Naive Bayes
K-Nearest-Neighbors
sklearn.neighbors.KNearestNeighborsClassifier
- uniform weights, distance weights, custom weights
- multiple distance metrics
Neural Networks
sklearn.linear_model.Perceptron
sklearn.neural_network.MLPClassifier
- multiple layers
- activations: identity, logistic (sigmoid), ReLU, tanh
- weight decay
sklearn.neural_network.BernoulliRBM
sknn.nlp.Classifier
- Compatible with
sklearn
- Many more types of layers and activations
- Compatible with
- pyTorch, TensorFlow (see also Keras)
Support Vector Machines
Multiclass and Multilabel Data
sklearn.multiclass
- meta-estimator for one-vs-one and one-vs-rest (one-vs-all)
sklearn.multioutput.MultiOutputClassifier
- to apply binary classifiers to multiple outputs
Numerical Data
Linear Regression, ANOVA and Linear Mixed Models
sklearn.linear_model.LinearRegression
- Regularizations: Ridge, LASSO, Elastic net
- Multi-task/multi-output: Elastic net, LASSO
pymer4
- Mixed effect models
sklearn-lmer
: asklearn
wrapper with CV
pyglmnet.GLM(distr="gaussian")
- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pyGAM.LinearGAM
:- GAM (with interactions), Cross-validation, similar to
sklearn
’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels
GLM
- Count data (Poisson)
pyglmnet.GLM(distr="poisson")
- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pymer4.Lmer(family="poisson")
- Mixed effect models
pyGAM.PoissonGAM
:- GAM (with interactions), Cross-validation, similar to
sklearn
’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels
:
- Count data (Binomial)
statsmodels
:
- Count data (Negative Binomial)
statsmodels
:
- Count data (Zero-Inflated Models)
- Right-continuous Data (Gamma)
pyglmnet.GLM(distr="gamma")
- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pymer4.Lmer(family="gamma")
- Mixed effect models
pyGAM.GammaGAM
:- GAM (with interactions), Cross-validation, similar to
sklearn
’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels
:
- Right-continuous Data (Inverse Gaussian)
pymer4.Lmer(family="inverse_gaussian")
- Mixed effect models
pyGAM.InvGaussGAM
:- GAM (with interactions), Cross-validation, similar to
sklearn
’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels
:
- Right-continuous with Excess Zero Data (Tweedie with $p\in(1,2)$)
statsmodels
:
Kernel Linear Regression
sklearn.kernel_ridge.KernelRidge
- Kernels: linear, polynomial, Gaussian, etc.
Ensemble and Tree-based Methods
Gaussian Process
K-Nearest-Neighbors
sklearn.neighbors.KNearestNeighborsRegressor
- uniform weights, distance weights, custom weights
- multiple distance metrics
Neural Networks
sklearn.neural_network.MLPRegressor
- multiple layers
- activations: identity, logistic (sigmoid), ReLU, tanh
- weight decay
sknn.nlp.Regressor
- Compatible with
sklearn
- Many more types of layers and activations
- Compatible with
- pyTorch, TensorFlow (see also Keras)
Support Vector Machines
Unsupervised Learning
Clustering
Gaussian Mixture Model
Dimensionality Reduction
sklearn.decomposition
:sklearn.manifold
:- Isomap, t-SNE, eetc.
sknn.ae.AutoEncoder
- Neural network autoencoder