Looking for a sequential progression of tutorials? See DataScienceTutorials.jl

Data Processing

Loading and Accessing Data

Intended Learning Outcomes:

  • Understand how to load and access various datasets in R using RDatasets.jl
  • Learn how to save and load a local dataset in CSV format using CSV.jl
Manipulating Data Frames with DataFrames.jl

Intended Learning Outcomes:

  • Learn how to inspect, describe, and convert datasets into the form of Data Frames
  • Learn how to modify a Data Frame by adding columns and imputing missing values
  • Familiarize yourself with the groupby and combine operations on Data Frames
Working with Categorical Data

Intended Learning Outcomes:

  • Understand the different types of categorical data (e.g., nominal and ordinal data) via CategoricalArrays.jl
  • Learn how to work with and utilize such categorical arrays
Understanding Scientific Types

Intended Learning Outcomes:

  • Gain a comprehension of the rationale behind having scientific types and their different categories
  • Learn how to inspect and modify the scientific types in your data using ScientificTypes.jl
  • Learn about practical tips and tricks related to scientific types
Data Processing and Visualization

Intended Learning Outcomes:

  • Learn how to apply common data processing techniques on a real-world dataset
  • Learn how to create various plots (e.g., bar charts and histograms) to analyze your data
Vectors, Matrices and Data Loading in Julia

Intended Learning Outcomes:

  • Understand how to work with vectors and matrices in Julia
  • Learn about loading and plotting datasets in Julia
MLJ for Data Scientists in Two Hours

Intended Learning Outcomes:

  • Get a grasp on using MLJ as a data scientist new to MLJ or Julia
  • Refresh your skills on building simple models
  • Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
  • Learn how to build pipelines in MLJ
  • Learn about how to manually and automatically evaluate models in MLJ
  • Understand how to perform feature selection in MLJ
  • Learn how to wrap models in iterative strategies in MLJ
  • Learn how to tune hyperparameters in MLJ
  • Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
  • Learn how to save and perform final evaluations on your models in MLJ
  • Understand the different types and methods introduced by MLJ
Linear Regression on Temporal Power Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Gain an understanding of exploratory data analytics to better understand the data before developing your model
  • Train and analyze linear regression models on temporal data with MLJ

Classification

Preparing data and model with Iris

Intended Learning Outcomes:

  • Understand why and how to coerce the data types of different variables in your dataset
  • Learn how to separate features and targets for training
  • Be able to find and load the models suitable for your data
Supervised and Unsupervised Workflows in MLJ

Intended Learning Outcomes:

  • Learn how to implement a supervised learning workflow with MLJ
  • Learn how to implement an unsupervised learning workflow with MLJ
  • Familiarize yourself with using MLJ's classification and transformation models
Hyperparameter Tuning for Single and Composite Models

Intended Learning Outcomes:

  • Learn how to optimize a single hyperparameter of your model
  • Learn how to tune multiple hyperparameters, that are possibly nested, and visualize the results
Logistic Regression & Friends on Stock Market Data

Intended Learning Outcomes:

  • Understand how to load and preprocess example datasets from RDatasets.jl
  • Explore how to train and analyze logistic regression on stock market data
  • Explore classification-related metrics such as cross-entropy loss, confusion matrix, and area under the ROC curve
  • Compare logistic regression to various other classifiers such as LDA, QDA, and KNN
  • Analyze training classification models on imbalanced datasets
Exploring Tree-based Models

Intended Learning Outcomes:

  • Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
  • Refresh your skills on hyperparameter tuning and building MLJ pipelines
Building and Tuning a Support Vector Machine

Intended Learning Outcomes:

  • Familiarize yourself with generating and visualizing custom classification data
  • Learn how to build and tune support vector machine (SVM) models with MLJ
MLJ for Data Scientists in Two Hours

Intended Learning Outcomes:

  • Get a grasp on using MLJ as a data scientist new to MLJ or Julia
  • Refresh your skills on building simple models
  • Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
  • Learn how to build pipelines in MLJ
  • Learn about how to manually and automatically evaluate models in MLJ
  • Understand how to perform feature selection in MLJ
  • Learn how to wrap models in iterative strategies in MLJ
  • Learn how to tune hyperparameters in MLJ
  • Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
  • Learn how to save and perform final evaluations on your models in MLJ
  • Understand the different types and methods introduced by MLJ
KNN, Logistic Regression and PCA on Wine Dataset

Intended Learning Outcomes:

  • Familiarize yourself with the common data preprocessing steps in MLJ
  • Refresh your skills on building pipelines and comparing classification models with MLJ
  • Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA
XGBoost on Crabs Dataset

Intended Learning Outcomes:

  • Learn how to build XGBoost models in MLJ
  • Familiarize yourself with various XGBoost hyperparameters and their effects
  • Refresh your skills on using learning curves and hyperparameter tuning in MLJ
EvoTree Classifier on Horse Colic Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing techniques in Julia
  • Get familiar with building baselines models for your learning task in MLJ
  • Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Exploring Generalized Linear Models

Intended Learning Outcomes:

  • Understand how to use generative linear models from GLM.jl in MLJ
  • Practice examples of using linear regression and logistic regression models in MLJ
  • Understand how to interpret the outputs from linear and logistic regression models
Credit Fraud Detection with Classical and Deep Models

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
  • Build and hyperparameter tune logistic regression and SVM models
  • Learn how to build basic neural networks with MLJFlux.jl
  • Learn how to correct for class imbalance using the Imbalance.jl package
Benchmarking Classification Models on Breast Cancer Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Learn how MLJ can be used to benchmark a large set of models against some dataset
BMI Classification with Decision Trees

Intended Learning Outcomes:

  • Learn how to load tabular data, set up its scientific types and study any existing imbalance
  • Observe how basic random oversampling can significantly improve decision tree performance on imbalanced data
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of Ratios Oversampling Hyperparameter

Intended Learning Outcomes:

  • Learn how to study the imbalance of an existing dataset
  • Get a stronger grasp on how the ratios hyperparameter which reflects the amount of oversampling can affect the classification decision boundaries
From RandomOversampling to ROSE

Intended Learning Outcomes:

  • Understand the relationship between pure random oversampling and the ROSE algorithm
  • Understand the effect of increasing the `s` hyperparameter for ROSE
SMOTE on Customer Churn Data

Intended Learning Outcomes:

  • Observe how SMOTE can be used to address class imbalances on a real dataset with logistic regression as the classifier
  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTEN on Mushroom Data

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Use SMOTEN to address class imbalances on a real dataset with over 20 categorical columns
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTENC on Customer Churn Data

Intended Learning Outcomes:

  • Observe how SMOTENC can be used to address class imbalances on a real dataset with categorical and continuous columns
  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of ENN Hyperparameters

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Explore the effects of various hyperparameter(s) of the ENN algorithm and how it can be useful for data cleaning
SMOTE-Tomek for Ethereum Fraud Detection

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Understand how hybrid resampling algorithms such as SMOTE-Tomek can be defined with the `BalancedModel` construct
Balanced Bagging for Cerebral Stroke Prediction

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Understand how balanced bagging can singifically improve classification performance on imbalanced data
Incremental Training of Neural Networks

Intended Learning Outcomes:

  • Explore incremental training with MLJ
Hyperparameter Tuning of Neural Networks

Intended Learning Outcomes:

  • Learn how to tune different hyperparameters of MLJFlux models with emphasis on training hyperparameters.
MNIST Classification with Neural Networks

Intended Learning Outcomes:

  • Learn how to build and training neural networks for image classification
Spam Detection with RNNs

Intended Learning Outcomes:

  • Learn how to train a neural network for spam classification over SMS messages

Regression

Preparing data and model with Iris

Intended Learning Outcomes:

  • Understand why and how to coerce the data types of different variables in your dataset
  • Learn how to separate features and targets for training
  • Be able to find and load the models suitable for your data
Building and Tuning Bagging Ensemble Models

Intended Learning Outcomes:

  • Understand how to implement bagging ensemble models in MLJ and compare them to atomic models
  • Learn how to optimize the parameters of bagging ensemble models and visualize the results
Building Random Forests with Bagging Ensembles

Intended Learning Outcomes:

  • Familiarize yourself with dealing with real-world datasets such as the Boston Housing dataset
  • Understand how to implement Random Forests using bagging over Decision Trees
  • Learn how to analyze the effect of a specific hyperparameter using MLJ's learning curve
  • Learn how to tune the parameters of Random Forests
Composing Models and Target Transformations

Intended Learning Outcomes:

  • Learn how to transform the target of your regression data using MLJ
  • Understand how to combine models and transformation algorithms in MLJ
  • Gain an understanding of the benefits of using MLJ pipelines
Multivariate Linear Regression & Interactions

Intended Learning Outcomes:

  • Understand how to build single and multivariable linear regression models with MLJ
  • Learn how to add interaction terms to model nonlinear trends in your data
  • Learn how to plot regression fits and their residuals
Building Polynomial Regression Models and Tuning Them

Intended Learning Outcomes:

  • Understand how to build a polynomial regression model with MLJ
  • Learn how to use feature selectors and models in an MLJ pipeline
  • Analyze and hyperparameter tune polynomial regression models
Ridge & Lasso Regression on Hitters Dataset

Intended Learning Outcomes:

  • Strengthen your data preparation, plotting, and analysis skills
  • Compare different types of linear regression such as Lasso and Ridge regression
  • Refresh on hyperparameter tuning and model composition with MLJ
Exploring Tree-based Models

Intended Learning Outcomes:

  • Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
  • Refresh your skills on hyperparameter tuning and building MLJ pipelines
Tree-based models on King County Houses Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Explore different tree-based models such as decision trees, random forests and gradient boosters and compare them together
Tree-based models on Airfoil Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Explore different tree-based models such as decision trees, random forests and compare them together
  • Refresh your understanding of tuning hyperparameters with MLJ and analyzing tuning results
LightGBM on Boston Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Build and analyze LightGBM models in MLJ by utilizing learning curves and hyperparameter tuning
Exploring Generalized Linear Models

Intended Learning Outcomes:

  • Understand how to use generative linear models from GLM.jl in MLJ
  • Practice examples of using linear regression and logistic regression models in MLJ
  • Understand how to interpret the outputs from linear and logistic regression models
Linear Regression on Temporal Power Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Gain an understanding of exploratory data analytics to better understand the data before developing your model
  • Train and analyze linear regression models on temporal data with MLJ
Custom Neural Networks on Boston Data

Intended Learning Outcomes:

  • Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
  • Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
KNN & Ridge Regression Learning Network on AMES Pricing Data

Intended Learning Outcomes:

  • Get familiar with building baselines models for your machine learning task
  • Learn how to build simple learning networks (advanced model composition) in MLJ
  • Learn how to tune and analyze the evaluation results from learning networks
Build Basic Learning Networks with MLJ

Intended Learning Outcomes:

  • Have a clear understanding of how learning networks function in MLJ
  • Be able to construct basic learning networks with MLJ
  • Understand how to evaluate and tune learning networks

Clustering

Unsupervised Learning with PCA and Clustering

Intended Learning Outcomes:

  • Learn how to build unsupervised models such as KMeans and PCA in MLJ
  • Learn how to analyze and visualize results from unsupervised models such as KMeans and PCA

Dimensionality Reduction

Unsupervised Learning with PCA and Clustering

Intended Learning Outcomes:

  • Learn how to build unsupervised models such as KMeans and PCA in MLJ
  • Learn how to analyze and visualize results from unsupervised models such as KMeans and PCA
KNN, Logistic Regression and PCA on Wine Dataset

Intended Learning Outcomes:

  • Familiarize yourself with the common data preprocessing steps in MLJ
  • Refresh your skills on building pipelines and comparing classification models with MLJ
  • Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA

Neural Networks

Custom Neural Networks on Boston Data

Intended Learning Outcomes:

  • Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
  • Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
Credit Fraud Detection with Classical and Deep Models

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
  • Build and hyperparameter tune logistic regression and SVM models
  • Learn how to build basic neural networks with MLJFlux.jl
  • Learn how to correct for class imbalance using the Imbalance.jl package
Benchmarking Classification Models on Breast Cancer Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Learn how MLJ can be used to benchmark a large set of models against some dataset
Incremental Training of Neural Networks

Intended Learning Outcomes:

  • Explore incremental training with MLJ
Hyperparameter Tuning of Neural Networks

Intended Learning Outcomes:

  • Learn how to tune different hyperparameters of MLJFlux models with emphasis on training hyperparameters.
Model Composition of Neural Networks

Intended Learning Outcomes:

  • Learn how to compose neural networks with other MLJ components
Comparing Neural Networks and Other Models

Intended Learning Outcomes:

  • Learn how to compare neural networks with other models
Early Stopping of Neural Networks

Intended Learning Outcomes:

  • Learn how early stopping can be applied to neural networks
Live Training of Neural Networks

Intended Learning Outcomes:

  • Train neural networks and see learning plots in realtime
Basic Neural Architectural Search

Intended Learning Outcomes:

  • Learn how to naively search and compare different neural network architecture
MNIST Classification with Neural Networks

Intended Learning Outcomes:

  • Learn how to build and training neural networks for image classification
Spam Detection with RNNs

Intended Learning Outcomes:

  • Learn how to train a neural network for spam classification over SMS messages

Class Imbalance

Credit Fraud Detection with Classical and Deep Models

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
  • Build and hyperparameter tune logistic regression and SVM models
  • Learn how to build basic neural networks with MLJFlux.jl
  • Learn how to correct for class imbalance using the Imbalance.jl package
BMI Classification with Decision Trees

Intended Learning Outcomes:

  • Learn how to load tabular data, set up its scientific types and study any existing imbalance
  • Observe how basic random oversampling can significantly improve decision tree performance on imbalanced data
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of Ratios Oversampling Hyperparameter

Intended Learning Outcomes:

  • Learn how to study the imbalance of an existing dataset
  • Get a stronger grasp on how the ratios hyperparameter which reflects the amount of oversampling can affect the classification decision boundaries
From RandomOversampling to ROSE

Intended Learning Outcomes:

  • Understand the relationship between pure random oversampling and the ROSE algorithm
  • Understand the effect of increasing the `s` hyperparameter for ROSE
SMOTE on Customer Churn Data

Intended Learning Outcomes:

  • Observe how SMOTE can be used to address class imbalances on a real dataset with logistic regression as the classifier
  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTEN on Mushroom Data

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Use SMOTEN to address class imbalances on a real dataset with over 20 categorical columns
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTENC on Customer Churn Data

Intended Learning Outcomes:

  • Observe how SMOTENC can be used to address class imbalances on a real dataset with categorical and continuous columns
  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of ENN Hyperparameters

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Explore the effects of various hyperparameter(s) of the ENN algorithm and how it can be useful for data cleaning
SMOTE-Tomek for Ethereum Fraud Detection

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Understand how hybrid resampling algorithms such as SMOTE-Tomek can be defined with the `BalancedModel` construct
Balanced Bagging for Cerebral Stroke Prediction

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Understand how balanced bagging can singifically improve classification performance on imbalanced data

Missing Value Imputation

EvoTree Classifier on Horse Colic Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing techniques in Julia
  • Get familiar with building baselines models for your learning task in MLJ
  • Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ

Encoders

Supervised and Unsupervised Workflows in MLJ

Intended Learning Outcomes:

  • Learn how to implement a supervised learning workflow with MLJ
  • Learn how to implement an unsupervised learning workflow with MLJ
  • Familiarize yourself with using MLJ's classification and transformation models
Composing Models and Target Transformations

Intended Learning Outcomes:

  • Learn how to transform the target of your regression data using MLJ
  • Understand how to combine models and transformation algorithms in MLJ
  • Gain an understanding of the benefits of using MLJ pipelines
Ridge & Lasso Regression on Hitters Dataset

Intended Learning Outcomes:

  • Strengthen your data preparation, plotting, and analysis skills
  • Compare different types of linear regression such as Lasso and Ridge regression
  • Refresh on hyperparameter tuning and model composition with MLJ
KNN, Logistic Regression and PCA on Wine Dataset

Intended Learning Outcomes:

  • Familiarize yourself with the common data preprocessing steps in MLJ
  • Refresh your skills on building pipelines and comparing classification models with MLJ
  • Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA
Tree-based models on Airfoil Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Explore different tree-based models such as decision trees, random forests and compare them together
  • Refresh your understanding of tuning hyperparameters with MLJ and analyzing tuning results
Exploring Generalized Linear Models

Intended Learning Outcomes:

  • Understand how to use generative linear models from GLM.jl in MLJ
  • Practice examples of using linear regression and logistic regression models in MLJ
  • Understand how to interpret the outputs from linear and logistic regression models
Credit Fraud Detection with Classical and Deep Models

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
  • Build and hyperparameter tune logistic regression and SVM models
  • Learn how to build basic neural networks with MLJFlux.jl
  • Learn how to correct for class imbalance using the Imbalance.jl package
Benchmarking Classification Models on Breast Cancer Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Learn how MLJ can be used to benchmark a large set of models against some dataset

Feature Engineering

Building Polynomial Regression Models and Tuning Them

Intended Learning Outcomes:

  • Understand how to build a polynomial regression model with MLJ
  • Learn how to use feature selectors and models in an MLJ pipeline
  • Analyze and hyperparameter tune polynomial regression models
MLJ for Data Scientists in Two Hours

Intended Learning Outcomes:

  • Get a grasp on using MLJ as a data scientist new to MLJ or Julia
  • Refresh your skills on building simple models
  • Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
  • Learn how to build pipelines in MLJ
  • Learn about how to manually and automatically evaluate models in MLJ
  • Understand how to perform feature selection in MLJ
  • Learn how to wrap models in iterative strategies in MLJ
  • Learn how to tune hyperparameters in MLJ
  • Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
  • Learn how to save and perform final evaluations on your models in MLJ
  • Understand the different types and methods introduced by MLJ

Hyperparameter Tuning

Hyperparameter Tuning for Single and Composite Models

Intended Learning Outcomes:

  • Learn how to optimize a single hyperparameter of your model
  • Learn how to tune multiple hyperparameters, that are possibly nested, and visualize the results
Building and Tuning Bagging Ensemble Models

Intended Learning Outcomes:

  • Understand how to implement bagging ensemble models in MLJ and compare them to atomic models
  • Learn how to optimize the parameters of bagging ensemble models and visualize the results
Building Random Forests with Bagging Ensembles

Intended Learning Outcomes:

  • Familiarize yourself with dealing with real-world datasets such as the Boston Housing dataset
  • Understand how to implement Random Forests using bagging over Decision Trees
  • Learn how to analyze the effect of a specific hyperparameter using MLJ's learning curve
  • Learn how to tune the parameters of Random Forests
Building Polynomial Regression Models and Tuning Them

Intended Learning Outcomes:

  • Understand how to build a polynomial regression model with MLJ
  • Learn how to use feature selectors and models in an MLJ pipeline
  • Analyze and hyperparameter tune polynomial regression models
Ridge & Lasso Regression on Hitters Dataset

Intended Learning Outcomes:

  • Strengthen your data preparation, plotting, and analysis skills
  • Compare different types of linear regression such as Lasso and Ridge regression
  • Refresh on hyperparameter tuning and model composition with MLJ
Exploring Tree-based Models

Intended Learning Outcomes:

  • Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
  • Refresh your skills on hyperparameter tuning and building MLJ pipelines
Building and Tuning a Support Vector Machine

Intended Learning Outcomes:

  • Familiarize yourself with generating and visualizing custom classification data
  • Learn how to build and tune support vector machine (SVM) models with MLJ
XGBoost on Crabs Dataset

Intended Learning Outcomes:

  • Learn how to build XGBoost models in MLJ
  • Familiarize yourself with various XGBoost hyperparameters and their effects
  • Refresh your skills on using learning curves and hyperparameter tuning in MLJ
EvoTree Classifier on Horse Colic Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing techniques in Julia
  • Get familiar with building baselines models for your learning task in MLJ
  • Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Tree-based models on Airfoil Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Explore different tree-based models such as decision trees, random forests and compare them together
  • Refresh your understanding of tuning hyperparameters with MLJ and analyzing tuning results
LightGBM on Boston Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Build and analyze LightGBM models in MLJ by utilizing learning curves and hyperparameter tuning
Custom Neural Networks on Boston Data

Intended Learning Outcomes:

  • Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
  • Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
KNN & Ridge Regression Learning Network on AMES Pricing Data

Intended Learning Outcomes:

  • Get familiar with building baselines models for your machine learning task
  • Learn how to build simple learning networks (advanced model composition) in MLJ
  • Learn how to tune and analyze the evaluation results from learning networks
Stacking with Learning Networks

Intended Learning Outcomes:

  • Have a grasp of how to build and analyze complex learning networks (e.g., stacking)
  • Be able to evaluate and tune learning networks
Comparing Neural Networks and Other Models

Intended Learning Outcomes:

  • Learn how to compare neural networks with other models
Basic Neural Architectural Search

Intended Learning Outcomes:

  • Learn how to naively search and compare different neural network architecture

Pipelines

Composing Models and Target Transformations

Intended Learning Outcomes:

  • Learn how to transform the target of your regression data using MLJ
  • Understand how to combine models and transformation algorithms in MLJ
  • Gain an understanding of the benefits of using MLJ pipelines
Unsupervised Learning with PCA and Clustering

Intended Learning Outcomes:

  • Learn how to build unsupervised models such as KMeans and PCA in MLJ
  • Learn how to analyze and visualize results from unsupervised models such as KMeans and PCA
MLJ for Data Scientists in Two Hours

Intended Learning Outcomes:

  • Get a grasp on using MLJ as a data scientist new to MLJ or Julia
  • Refresh your skills on building simple models
  • Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
  • Learn how to build pipelines in MLJ
  • Learn about how to manually and automatically evaluate models in MLJ
  • Understand how to perform feature selection in MLJ
  • Learn how to wrap models in iterative strategies in MLJ
  • Learn how to tune hyperparameters in MLJ
  • Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
  • Learn how to save and perform final evaluations on your models in MLJ
  • Understand the different types and methods introduced by MLJ
KNN, Logistic Regression and PCA on Wine Dataset

Intended Learning Outcomes:

  • Familiarize yourself with the common data preprocessing steps in MLJ
  • Refresh your skills on building pipelines and comparing classification models with MLJ
  • Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA
EvoTree Classifier on Horse Colic Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing techniques in Julia
  • Get familiar with building baselines models for your learning task in MLJ
  • Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Exploring Generalized Linear Models

Intended Learning Outcomes:

  • Understand how to use generative linear models from GLM.jl in MLJ
  • Practice examples of using linear regression and logistic regression models in MLJ
  • Understand how to interpret the outputs from linear and logistic regression models
Credit Fraud Detection with Classical and Deep Models

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
  • Build and hyperparameter tune logistic regression and SVM models
  • Learn how to build basic neural networks with MLJFlux.jl
  • Learn how to correct for class imbalance using the Imbalance.jl package
SMOTE-Tomek for Ethereum Fraud Detection

Intended Learning Outcomes:

  • Familiarize yourself with common MLJ workflows related to loading and processing data
  • Understand how hybrid resampling algorithms such as SMOTE-Tomek can be defined with the `BalancedModel` construct
Model Composition of Neural Networks

Intended Learning Outcomes:

  • Learn how to compose neural networks with other MLJ components

Iterative Models

Exploring Tree-based Models

Intended Learning Outcomes:

  • Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
  • Refresh your skills on hyperparameter tuning and building MLJ pipelines
MLJ for Data Scientists in Two Hours

Intended Learning Outcomes:

  • Get a grasp on using MLJ as a data scientist new to MLJ or Julia
  • Refresh your skills on building simple models
  • Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
  • Learn how to build pipelines in MLJ
  • Learn about how to manually and automatically evaluate models in MLJ
  • Understand how to perform feature selection in MLJ
  • Learn how to wrap models in iterative strategies in MLJ
  • Learn how to tune hyperparameters in MLJ
  • Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
  • Learn how to save and perform final evaluations on your models in MLJ
  • Understand the different types and methods introduced by MLJ
XGBoost on Crabs Dataset

Intended Learning Outcomes:

  • Learn how to build XGBoost models in MLJ
  • Familiarize yourself with various XGBoost hyperparameters and their effects
  • Refresh your skills on using learning curves and hyperparameter tuning in MLJ
EvoTree Classifier on Horse Colic Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing techniques in Julia
  • Get familiar with building baselines models for your learning task in MLJ
  • Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Tree-based models on King County Houses Dataset

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Explore different tree-based models such as decision trees, random forests and gradient boosters and compare them together
LightGBM on Boston Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization techniques in Julia
  • Build and analyze LightGBM models in MLJ by utilizing learning curves and hyperparameter tuning
Custom Neural Networks on Boston Data

Intended Learning Outcomes:

  • Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
  • Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
Benchmarking Classification Models on Breast Cancer Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Learn how MLJ can be used to benchmark a large set of models against some dataset
BMI Classification with Decision Trees

Intended Learning Outcomes:

  • Learn how to load tabular data, set up its scientific types and study any existing imbalance
  • Observe how basic random oversampling can significantly improve decision tree performance on imbalanced data
  • Practice MLJ workflows related to evaluation such as cross-validation and new metrics

Ensemble Models

Building and Tuning Bagging Ensemble Models

Intended Learning Outcomes:

  • Understand how to implement bagging ensemble models in MLJ and compare them to atomic models
  • Learn how to optimize the parameters of bagging ensemble models and visualize the results
Building Random Forests with Bagging Ensembles

Intended Learning Outcomes:

  • Familiarize yourself with dealing with real-world datasets such as the Boston Housing dataset
  • Understand how to implement Random Forests using bagging over Decision Trees
  • Learn how to analyze the effect of a specific hyperparameter using MLJ's learning curve
  • Learn how to tune the parameters of Random Forests
Stacking with Learning Networks

Intended Learning Outcomes:

  • Have a grasp of how to build and analyze complex learning networks (e.g., stacking)
  • Be able to evaluate and tune learning networks

Bayesian Models

Logistic Regression & Friends on Stock Market Data

Intended Learning Outcomes:

  • Understand how to load and preprocess example datasets from RDatasets.jl
  • Explore how to train and analyze logistic regression on stock market data
  • Explore classification-related metrics such as cross-entropy loss, confusion matrix, and area under the ROC curve
  • Compare logistic regression to various other classifiers such as LDA, QDA, and KNN
  • Analyze training classification models on imbalanced datasets
Benchmarking Classification Models on Breast Cancer Data

Intended Learning Outcomes:

  • Familiarize yourself with common data preprocessing and visualization workflows
  • Learn how MLJ can be used to benchmark a large set of models against some dataset