Modeling using Python

General comments

R equivalents

glmnet: pyglmnet
lme4: pymer4 and a sklearn wrapper sklearn-lmer

Scikit-learn (sklearn)

Mostly produces predictive models (fit, predict and score); no built-in inference mechanisms
Easy to perform CV for parameter selection (.GridSearchCV)
Many metrics implemented
- Classification, Regression, Clustering, Distances and kernels
Many preprocessing tools:
- Label encoding, scaling, standardization, transformations, etc.
Many related packages:
- Related Projects
- SciKits

Statsmodels (statsmodels)

Classical statistical techniques with inference
- ANOVAs, LMM, GLM, hypothesis testing, etc.
- Regularization (Elastic net, Rigde, LASSO)
- Rich family of GLM distributions
Uses R-like formulas to describe models

Scipy stats module (scipy.stats)

Implements some basic statistical functions:
- Distributions
- Estimators
- Hypothesis tests
- Transformations
- Gaussian KDE

Categorical Data

Logistic Regression

sklearn.linear_model.LogisticRegression:
- L1, L2 and elastic net penalties
- For multi-class problems: one-vs-all and multinomial
pyglmnet.GLM(distr="binomial")
- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pymer4.Lmer(family="binomial")
- Mixed effect models
pyGAM.LogisticGAM:
- GAM (with interactions), Cross-validation, similar to sklearn’s API
statsmodels:

Other GLM

Probit:
- pyglmnet.GLM(distr="probit")
  - Elastic net regularization (LASSO and Ridge)
  - Cross-validation
  - Group regularization
- statsmodels:
  - Probit GLM

Ridge Classifier (Ridge regression on -1/+1 responses)

sklearn.linear_model.RidgeClassifier
- sklearn.linear_model.RidgeClassifierCV performs CV on a solution path

Discriminant analysis

sklearn.discriminant_analysis
- LDA, QDA

Ensemble and Tree-based Methods

Gaussian Process

sklearn.gaussian_process.GaussianProcessClassifier

Naive Bayes

sklearn.naive_bayes

K-Nearest-Neighbors

sklearn.neighbors.KNearestNeighborsClassifier
- uniform weights, distance weights, custom weights
- multiple distance metrics

Neural Networks

sklearn.linear_model.Perceptron
sklearn.neural_network.MLPClassifier
- multiple layers
- activations: identity, logistic (sigmoid), ReLU, tanh
- weight decay
sklearn.neural_network.BernoulliRBM
sknn.nlp.Classifier
- Compatible with sklearn
- Many more types of layers and activations
pyTorch, TensorFlow (see also Keras)

Support Vector Machines

sklearn.svm
- Linear
- Kernel: linear, polynomial, Gaussian, etc.

Multiclass and Multilabel Data

sklearn.multiclass
- meta-estimator for one-vs-one and one-vs-rest (one-vs-all)
sklearn.multioutput.MultiOutputClassifier
- to apply binary classifiers to multiple outputs

Numerical Data

Linear Regression, ANOVA and Linear Mixed Models

sklearn.linear_model.LinearRegression
- Regularizations: Ridge, LASSO, Elastic net
- Multi-task/multi-output: Elastic net, LASSO
pymer4
- Mixed effect models
- sklearn-lmer: a sklearn wrapper with CV
pyglmnet.GLM(distr="gaussian")
- Elastic net regularization (LASSO and Ridge)
- Cross-validation
- Group regularization
pyGAM.LinearGAM:
- GAM (with interactions), Cross-validation, similar to sklearn’s API
statsmodels

GLM

Count data (Poisson)
- pyglmnet.GLM(distr="poisson")
  - Elastic net regularization (LASSO and Ridge)
  - Cross-validation
  - Group regularization
- pymer4.Lmer(family="poisson")
  - Mixed effect models
- pyGAM.PoissonGAM:
  - GAM (with interactions), Cross-validation, similar to sklearn’s API
- statsmodels:
Count data (Binomial)
- statsmodels:
Count data (Negative Binomial)
- statsmodels:
  - Negative Binomial GLM
  - Poisson GLM GAM
Count data (Zero-Inflated Models)
- statsmodels:
Right-continuous Data (Gamma)
- pyglmnet.GLM(distr="gamma")
  - Elastic net regularization (LASSO and Ridge)
  - Cross-validation
  - Group regularization
- pymer4.Lmer(family="gamma")
  - Mixed effect models
- pyGAM.GammaGAM:
  - GAM (with interactions), Cross-validation, similar to sklearn’s API
- statsmodels:
  - Gamma GLM
  - Gamma GLM GAM
Right-continuous Data (Inverse Gaussian)
- pymer4.Lmer(family="inverse_gaussian")
  - Mixed effect models
- pyGAM.InvGaussGAM:
  - GAM (with interactions), Cross-validation, similar to sklearn’s API
- statsmodels:
  - Inverse Gaussian GLM
  - Inverse Gaussian GLM GAM
Right-continuous with Excess Zero Data (Tweedie with $p\in(1,2)$)
- statsmodels:
  - Tweedie GLM
  - Tweedie GLM GAM

Kernel Linear Regression

sklearn.kernel_ridge.KernelRidge
- Kernels: linear, polynomial, Gaussian, etc.

Ensemble and Tree-based Methods

Gaussian Process

sklearn.gaussian_process.GaussianProcessRegressor

K-Nearest-Neighbors

sklearn.neighbors.KNearestNeighborsRegressor
- uniform weights, distance weights, custom weights
- multiple distance metrics

Neural Networks

sklearn.neural_network.MLPRegressor
- multiple layers
- activations: identity, logistic (sigmoid), ReLU, tanh
- weight decay
sknn.nlp.Regressor
- Compatible with sklearn
- Many more types of layers and activations
pyTorch, TensorFlow (see also Keras)

Support Vector Machines

sklearn.svm
- Linear
- Kernel: linear, polynomial, Gaussian, etc.

Unsupervised Learning

Clustering

sklearn.cluster
- K-means, Agglomerative clustering

Gaussian Mixture Model

sklearn.mixture.GaussianMixture

Dimensionality Reduction

sklearn.decomposition:
- Kernel PCA, PCA
sklearn.manifold:
- Isomap, t-SNE, eetc.
sknn.ae.AutoEncoder
- Neural network autoencoder

Last updated on May 4, 2020