Modeling using Python
General comments
R equivalents
glmnet:pyglmnetlme4:pymer4and asklearnwrappersklearn-lmer
Scikit-learn (sklearn)
- Mostly produces predictive models (
fit,predictandscore); no built-in inference mechanisms - Easy to perform CV for parameter selection (
.GridSearchCV) - Many metrics implemented
- Many preprocessing tools:
- Label encoding, scaling, standardization, transformations, etc.
- Many related packages:
Statsmodels (statsmodels)
- Classical statistical techniques with inference
- ANOVAs, LMM, GLM, hypothesis testing, etc.
- Regularization (Elastic net, Rigde, LASSO)
- Rich family of GLM distributions
- Uses
R-like formulas to describe models
Scipy stats module (scipy.stats)
- Implements some basic statistical functions:
- Distributions
- Estimators
- Hypothesis tests
- Transformations
- Gaussian KDE
Categorical Data
Logistic Regression
sklearn.linear_model.LogisticRegression:- L1, L2 and elastic net penalties
- For multi-class problems: one-vs-all and multinomial
pyglmnet.GLM(distr="binomial")- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pymer4.Lmer(family="binomial")- Mixed effect models
pyGAM.LogisticGAM:- GAM (with interactions), Cross-validation, similar to
sklearn’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels:
Other GLM
- Probit:
pyglmnet.GLM(distr="probit")- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
statsmodels:
Ridge Classifier (Ridge regression on -1/+1 responses)
sklearn.linear_model.RidgeClassifiersklearn.linear_model.RidgeClassifierCVperforms CV on a solution path
Discriminant analysis
Ensemble and Tree-based Methods
Gaussian Process
Naive Bayes
K-Nearest-Neighbors
sklearn.neighbors.KNearestNeighborsClassifier- uniform weights, distance weights, custom weights
- multiple distance metrics
Neural Networks
sklearn.linear_model.Perceptronsklearn.neural_network.MLPClassifier- multiple layers
- activations: identity, logistic (sigmoid), ReLU, tanh
- weight decay
sklearn.neural_network.BernoulliRBMsknn.nlp.Classifier- Compatible with
sklearn - Many more types of layers and activations
- Compatible with
- pyTorch, TensorFlow (see also Keras)
Support Vector Machines
Multiclass and Multilabel Data
sklearn.multiclass- meta-estimator for one-vs-one and one-vs-rest (one-vs-all)
sklearn.multioutput.MultiOutputClassifier- to apply binary classifiers to multiple outputs
Numerical Data
Linear Regression, ANOVA and Linear Mixed Models
sklearn.linear_model.LinearRegression- Regularizations: Ridge, LASSO, Elastic net
- Multi-task/multi-output: Elastic net, LASSO
pymer4- Mixed effect models
sklearn-lmer: asklearnwrapper with CV
pyglmnet.GLM(distr="gaussian")- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pyGAM.LinearGAM:- GAM (with interactions), Cross-validation, similar to
sklearn’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels
GLM
- Count data (Poisson)
pyglmnet.GLM(distr="poisson")- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pymer4.Lmer(family="poisson")- Mixed effect models
pyGAM.PoissonGAM:- GAM (with interactions), Cross-validation, similar to
sklearn’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels:
- Count data (Binomial)
statsmodels:
- Count data (Negative Binomial)
statsmodels:
- Count data (Zero-Inflated Models)
- Right-continuous Data (Gamma)
pyglmnet.GLM(distr="gamma")- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pymer4.Lmer(family="gamma")- Mixed effect models
pyGAM.GammaGAM:- GAM (with interactions), Cross-validation, similar to
sklearn’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels:
- Right-continuous Data (Inverse Gaussian)
pymer4.Lmer(family="inverse_gaussian")- Mixed effect models
pyGAM.InvGaussGAM:- GAM (with interactions), Cross-validation, similar to
sklearn’s API
- GAM (with interactions), Cross-validation, similar to
statsmodels:
- Right-continuous with Excess Zero Data (Tweedie with $p\in(1,2)$)
statsmodels:
Kernel Linear Regression
sklearn.kernel_ridge.KernelRidge- Kernels: linear, polynomial, Gaussian, etc.
Ensemble and Tree-based Methods
Gaussian Process
K-Nearest-Neighbors
sklearn.neighbors.KNearestNeighborsRegressor- uniform weights, distance weights, custom weights
- multiple distance metrics
Neural Networks
sklearn.neural_network.MLPRegressor- multiple layers
- activations: identity, logistic (sigmoid), ReLU, tanh
- weight decay
sknn.nlp.Regressor- Compatible with
sklearn - Many more types of layers and activations
- Compatible with
- pyTorch, TensorFlow (see also Keras)
Support Vector Machines
Unsupervised Learning
Clustering
Gaussian Mixture Model
Dimensionality Reduction
sklearn.decomposition:sklearn.manifold:- Isomap, t-SNE, eetc.
sknn.ae.AutoEncoder- Neural network autoencoder