Data Processing
Loading and Accessing Data
Intended Learning Outcomes:
- Understand how to load and access various datasets in R using RDatasets.jl
- Learn how to save and load a local dataset in CSV format using CSV.jl
Manipulating Data Frames with DataFrames.jl
Intended Learning Outcomes:
- Learn how to inspect, describe, and convert datasets into the form of Data Frames
- Learn how to modify a Data Frame by adding columns and imputing missing values
- Familiarize yourself with the groupby and combine operations on Data Frames
Working with Categorical Data
Intended Learning Outcomes:
- Understand the different types of categorical data (e.g., nominal and ordinal data) via CategoricalArrays.jl
- Learn how to work with and utilize such categorical arrays
Understanding Scientific Types
Intended Learning Outcomes:
- Gain a comprehension of the rationale behind having scientific types and their different categories
- Learn how to inspect and modify the scientific types in your data using ScientificTypes.jl
- Learn about practical tips and tricks related to scientific types
Data Processing and Visualization
Intended Learning Outcomes:
- Learn how to apply common data processing techniques on a real-world dataset
- Learn how to create various plots (e.g., bar charts and histograms) to analyze your data
MLJ for Data Scientists in Two Hours
Intended Learning Outcomes:
- Get a grasp on using MLJ as a data scientist new to MLJ or Julia
- Refresh your skills on building simple models
- Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
- Learn how to build pipelines in MLJ
- Learn about how to manually and automatically evaluate models in MLJ
- Understand how to perform feature selection in MLJ
- Learn how to wrap models in iterative strategies in MLJ
- Learn how to tune hyperparameters in MLJ
- Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
- Learn how to save and perform final evaluations on your models in MLJ
- Understand the different types and methods introduced by MLJ
Linear Regression on Temporal Power Data
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization workflows
- Gain an understanding of exploratory data analytics to better understand the data before developing your model
- Train and analyze linear regression models on temporal data with MLJ
Classification
Preparing data and model with Iris
Intended Learning Outcomes:
- Understand why and how to coerce the data types of different variables in your dataset
- Learn how to separate features and targets for training
- Be able to find and load the models suitable for your data
Supervised and Unsupervised Workflows in MLJ
Intended Learning Outcomes:
- Learn how to implement a supervised learning workflow with MLJ
- Learn how to implement an unsupervised learning workflow with MLJ
- Familiarize yourself with using MLJ's classification and transformation models
Logistic Regression & Friends on Stock Market Data
Intended Learning Outcomes:
- Understand how to load and preprocess example datasets from RDatasets.jl
- Explore how to train and analyze logistic regression on stock market data
- Explore classification-related metrics such as cross-entropy loss, confusion matrix, and area under the ROC curve
- Compare logistic regression to various other classifiers such as LDA, QDA, and KNN
- Analyze training classification models on imbalanced datasets
Exploring Tree-based Models
Intended Learning Outcomes:
- Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
- Refresh your skills on hyperparameter tuning and building MLJ pipelines
Building and Tuning a Support Vector Machine
Intended Learning Outcomes:
- Familiarize yourself with generating and visualizing custom classification data
- Learn how to build and tune support vector machine (SVM) models with MLJ
MLJ for Data Scientists in Two Hours
Intended Learning Outcomes:
- Get a grasp on using MLJ as a data scientist new to MLJ or Julia
- Refresh your skills on building simple models
- Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
- Learn how to build pipelines in MLJ
- Learn about how to manually and automatically evaluate models in MLJ
- Understand how to perform feature selection in MLJ
- Learn how to wrap models in iterative strategies in MLJ
- Learn how to tune hyperparameters in MLJ
- Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
- Learn how to save and perform final evaluations on your models in MLJ
- Understand the different types and methods introduced by MLJ
KNN, Logistic Regression and PCA on Wine Dataset
Intended Learning Outcomes:
- Familiarize yourself with the common data preprocessing steps in MLJ
- Refresh your skills on building pipelines and comparing classification models with MLJ
- Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA
XGBoost on Crabs Dataset
Intended Learning Outcomes:
- Learn how to build XGBoost models in MLJ
- Familiarize yourself with various XGBoost hyperparameters and their effects
- Refresh your skills on using learning curves and hyperparameter tuning in MLJ
EvoTree Classifier on Horse Colic Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing techniques in Julia
- Get familiar with building baselines models for your learning task in MLJ
- Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Exploring Generalized Linear Models
Intended Learning Outcomes:
- Understand how to use generative linear models from GLM.jl in MLJ
- Practice examples of using linear regression and logistic regression models in MLJ
- Understand how to interpret the outputs from linear and logistic regression models
Credit Fraud Detection with Classical and Deep Models
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization workflows
- Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
- Build and hyperparameter tune logistic regression and SVM models
- Learn how to build basic neural networks with MLJFlux.jl
- Learn how to correct for class imbalance using the Imbalance.jl package
BMI Classification with Decision Trees
Intended Learning Outcomes:
- Learn how to load tabular data, set up its scientific types and study any existing imbalance
- Observe how basic random oversampling can significantly improve decision tree performance on imbalanced data
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of Ratios Oversampling Hyperparameter
Intended Learning Outcomes:
- Learn how to study the imbalance of an existing dataset
- Get a stronger grasp on how the ratios hyperparameter which reflects the amount of oversampling can affect the classification decision boundaries
From RandomOversampling to ROSE
Intended Learning Outcomes:
- Understand the relationship between pure random oversampling and the ROSE algorithm
- Understand the effect of increasing the `s` hyperparameter for ROSE
SMOTE on Customer Churn Data
Intended Learning Outcomes:
- Observe how SMOTE can be used to address class imbalances on a real dataset with logistic regression as the classifier
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTEN on Mushroom Data
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Use SMOTEN to address class imbalances on a real dataset with over 20 categorical columns
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTENC on Customer Churn Data
Intended Learning Outcomes:
- Observe how SMOTENC can be used to address class imbalances on a real dataset with categorical and continuous columns
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of ENN Hyperparameters
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Explore the effects of various hyperparameter(s) of the ENN algorithm and how it can be useful for data cleaning
SMOTE-Tomek for Ethereum Fraud Detection
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Understand how hybrid resampling algorithms such as SMOTE-Tomek can be defined with the `BalancedModel` construct
Balanced Bagging for Cerebral Stroke Prediction
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Understand how balanced bagging can singifically improve classification performance on imbalanced data
Spam Detection with RNNs
Intended Learning Outcomes:
- Learn how to train a neural network for spam classification over SMS messages
Regression
Preparing data and model with Iris
Intended Learning Outcomes:
- Understand why and how to coerce the data types of different variables in your dataset
- Learn how to separate features and targets for training
- Be able to find and load the models suitable for your data
Building and Tuning Bagging Ensemble Models
Intended Learning Outcomes:
- Understand how to implement bagging ensemble models in MLJ and compare them to atomic models
- Learn how to optimize the parameters of bagging ensemble models and visualize the results
Building Random Forests with Bagging Ensembles
Intended Learning Outcomes:
- Familiarize yourself with dealing with real-world datasets such as the Boston Housing dataset
- Understand how to implement Random Forests using bagging over Decision Trees
- Learn how to analyze the effect of a specific hyperparameter using MLJ's learning curve
- Learn how to tune the parameters of Random Forests
Composing Models and Target Transformations
Intended Learning Outcomes:
- Learn how to transform the target of your regression data using MLJ
- Understand how to combine models and transformation algorithms in MLJ
- Gain an understanding of the benefits of using MLJ pipelines
Multivariate Linear Regression & Interactions
Intended Learning Outcomes:
- Understand how to build single and multivariable linear regression models with MLJ
- Learn how to add interaction terms to model nonlinear trends in your data
- Learn how to plot regression fits and their residuals
Building Polynomial Regression Models and Tuning Them
Intended Learning Outcomes:
- Understand how to build a polynomial regression model with MLJ
- Learn how to use feature selectors and models in an MLJ pipeline
- Analyze and hyperparameter tune polynomial regression models
Ridge & Lasso Regression on Hitters Dataset
Intended Learning Outcomes:
- Strengthen your data preparation, plotting, and analysis skills
- Compare different types of linear regression such as Lasso and Ridge regression
- Refresh on hyperparameter tuning and model composition with MLJ
Exploring Tree-based Models
Intended Learning Outcomes:
- Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
- Refresh your skills on hyperparameter tuning and building MLJ pipelines
Tree-based models on King County Houses Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Explore different tree-based models such as decision trees, random forests and gradient boosters and compare them together
Tree-based models on Airfoil Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Explore different tree-based models such as decision trees, random forests and compare them together
- Refresh your understanding of tuning hyperparameters with MLJ and analyzing tuning results
LightGBM on Boston Data
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Build and analyze LightGBM models in MLJ by utilizing learning curves and hyperparameter tuning
Exploring Generalized Linear Models
Intended Learning Outcomes:
- Understand how to use generative linear models from GLM.jl in MLJ
- Practice examples of using linear regression and logistic regression models in MLJ
- Understand how to interpret the outputs from linear and logistic regression models
Linear Regression on Temporal Power Data
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization workflows
- Gain an understanding of exploratory data analytics to better understand the data before developing your model
- Train and analyze linear regression models on temporal data with MLJ
Custom Neural Networks on Boston Data
Intended Learning Outcomes:
- Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
- Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
KNN & Ridge Regression Learning Network on AMES Pricing Data
Intended Learning Outcomes:
- Get familiar with building baselines models for your machine learning task
- Learn how to build simple learning networks (advanced model composition) in MLJ
- Learn how to tune and analyze the evaluation results from learning networks
Build Basic Learning Networks with MLJ
Intended Learning Outcomes:
- Have a clear understanding of how learning networks function in MLJ
- Be able to construct basic learning networks with MLJ
- Understand how to evaluate and tune learning networks
Clustering
Unsupervised Learning with PCA and Clustering
Intended Learning Outcomes:
- Learn how to build unsupervised models such as KMeans and PCA in MLJ
- Learn how to analyze and visualize results from unsupervised models such as KMeans and PCA
Dimensionality Reduction
Unsupervised Learning with PCA and Clustering
Intended Learning Outcomes:
- Learn how to build unsupervised models such as KMeans and PCA in MLJ
- Learn how to analyze and visualize results from unsupervised models such as KMeans and PCA
KNN, Logistic Regression and PCA on Wine Dataset
Intended Learning Outcomes:
- Familiarize yourself with the common data preprocessing steps in MLJ
- Refresh your skills on building pipelines and comparing classification models with MLJ
- Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA
Neural Networks
Custom Neural Networks on Boston Data
Intended Learning Outcomes:
- Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
- Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
Credit Fraud Detection with Classical and Deep Models
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization workflows
- Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
- Build and hyperparameter tune logistic regression and SVM models
- Learn how to build basic neural networks with MLJFlux.jl
- Learn how to correct for class imbalance using the Imbalance.jl package
Spam Detection with RNNs
Intended Learning Outcomes:
- Learn how to train a neural network for spam classification over SMS messages
Class Imbalance
Credit Fraud Detection with Classical and Deep Models
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization workflows
- Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
- Build and hyperparameter tune logistic regression and SVM models
- Learn how to build basic neural networks with MLJFlux.jl
- Learn how to correct for class imbalance using the Imbalance.jl package
BMI Classification with Decision Trees
Intended Learning Outcomes:
- Learn how to load tabular data, set up its scientific types and study any existing imbalance
- Observe how basic random oversampling can significantly improve decision tree performance on imbalanced data
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of Ratios Oversampling Hyperparameter
Intended Learning Outcomes:
- Learn how to study the imbalance of an existing dataset
- Get a stronger grasp on how the ratios hyperparameter which reflects the amount of oversampling can affect the classification decision boundaries
From RandomOversampling to ROSE
Intended Learning Outcomes:
- Understand the relationship between pure random oversampling and the ROSE algorithm
- Understand the effect of increasing the `s` hyperparameter for ROSE
SMOTE on Customer Churn Data
Intended Learning Outcomes:
- Observe how SMOTE can be used to address class imbalances on a real dataset with logistic regression as the classifier
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTEN on Mushroom Data
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Use SMOTEN to address class imbalances on a real dataset with over 20 categorical columns
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
SMOTENC on Customer Churn Data
Intended Learning Outcomes:
- Observe how SMOTENC can be used to address class imbalances on a real dataset with categorical and continuous columns
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Effect of ENN Hyperparameters
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Explore the effects of various hyperparameter(s) of the ENN algorithm and how it can be useful for data cleaning
SMOTE-Tomek for Ethereum Fraud Detection
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Understand how hybrid resampling algorithms such as SMOTE-Tomek can be defined with the `BalancedModel` construct
Balanced Bagging for Cerebral Stroke Prediction
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Understand how balanced bagging can singifically improve classification performance on imbalanced data
Missing Value Imputation
EvoTree Classifier on Horse Colic Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing techniques in Julia
- Get familiar with building baselines models for your learning task in MLJ
- Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Encoders
Supervised and Unsupervised Workflows in MLJ
Intended Learning Outcomes:
- Learn how to implement a supervised learning workflow with MLJ
- Learn how to implement an unsupervised learning workflow with MLJ
- Familiarize yourself with using MLJ's classification and transformation models
Composing Models and Target Transformations
Intended Learning Outcomes:
- Learn how to transform the target of your regression data using MLJ
- Understand how to combine models and transformation algorithms in MLJ
- Gain an understanding of the benefits of using MLJ pipelines
Ridge & Lasso Regression on Hitters Dataset
Intended Learning Outcomes:
- Strengthen your data preparation, plotting, and analysis skills
- Compare different types of linear regression such as Lasso and Ridge regression
- Refresh on hyperparameter tuning and model composition with MLJ
KNN, Logistic Regression and PCA on Wine Dataset
Intended Learning Outcomes:
- Familiarize yourself with the common data preprocessing steps in MLJ
- Refresh your skills on building pipelines and comparing classification models with MLJ
- Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA
Tree-based models on Airfoil Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Explore different tree-based models such as decision trees, random forests and compare them together
- Refresh your understanding of tuning hyperparameters with MLJ and analyzing tuning results
Exploring Generalized Linear Models
Intended Learning Outcomes:
- Understand how to use generative linear models from GLM.jl in MLJ
- Practice examples of using linear regression and logistic regression models in MLJ
- Understand how to interpret the outputs from linear and logistic regression models
Credit Fraud Detection with Classical and Deep Models
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization workflows
- Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
- Build and hyperparameter tune logistic regression and SVM models
- Learn how to build basic neural networks with MLJFlux.jl
- Learn how to correct for class imbalance using the Imbalance.jl package
Feature Engineering
Building Polynomial Regression Models and Tuning Them
Intended Learning Outcomes:
- Understand how to build a polynomial regression model with MLJ
- Learn how to use feature selectors and models in an MLJ pipeline
- Analyze and hyperparameter tune polynomial regression models
MLJ for Data Scientists in Two Hours
Intended Learning Outcomes:
- Get a grasp on using MLJ as a data scientist new to MLJ or Julia
- Refresh your skills on building simple models
- Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
- Learn how to build pipelines in MLJ
- Learn about how to manually and automatically evaluate models in MLJ
- Understand how to perform feature selection in MLJ
- Learn how to wrap models in iterative strategies in MLJ
- Learn how to tune hyperparameters in MLJ
- Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
- Learn how to save and perform final evaluations on your models in MLJ
- Understand the different types and methods introduced by MLJ
Hyperparameter Tuning
Building and Tuning Bagging Ensemble Models
Intended Learning Outcomes:
- Understand how to implement bagging ensemble models in MLJ and compare them to atomic models
- Learn how to optimize the parameters of bagging ensemble models and visualize the results
Building Random Forests with Bagging Ensembles
Intended Learning Outcomes:
- Familiarize yourself with dealing with real-world datasets such as the Boston Housing dataset
- Understand how to implement Random Forests using bagging over Decision Trees
- Learn how to analyze the effect of a specific hyperparameter using MLJ's learning curve
- Learn how to tune the parameters of Random Forests
Building Polynomial Regression Models and Tuning Them
Intended Learning Outcomes:
- Understand how to build a polynomial regression model with MLJ
- Learn how to use feature selectors and models in an MLJ pipeline
- Analyze and hyperparameter tune polynomial regression models
Ridge & Lasso Regression on Hitters Dataset
Intended Learning Outcomes:
- Strengthen your data preparation, plotting, and analysis skills
- Compare different types of linear regression such as Lasso and Ridge regression
- Refresh on hyperparameter tuning and model composition with MLJ
Exploring Tree-based Models
Intended Learning Outcomes:
- Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
- Refresh your skills on hyperparameter tuning and building MLJ pipelines
Building and Tuning a Support Vector Machine
Intended Learning Outcomes:
- Familiarize yourself with generating and visualizing custom classification data
- Learn how to build and tune support vector machine (SVM) models with MLJ
XGBoost on Crabs Dataset
Intended Learning Outcomes:
- Learn how to build XGBoost models in MLJ
- Familiarize yourself with various XGBoost hyperparameters and their effects
- Refresh your skills on using learning curves and hyperparameter tuning in MLJ
EvoTree Classifier on Horse Colic Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing techniques in Julia
- Get familiar with building baselines models for your learning task in MLJ
- Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Tree-based models on Airfoil Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Explore different tree-based models such as decision trees, random forests and compare them together
- Refresh your understanding of tuning hyperparameters with MLJ and analyzing tuning results
LightGBM on Boston Data
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Build and analyze LightGBM models in MLJ by utilizing learning curves and hyperparameter tuning
Custom Neural Networks on Boston Data
Intended Learning Outcomes:
- Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
- Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
KNN & Ridge Regression Learning Network on AMES Pricing Data
Intended Learning Outcomes:
- Get familiar with building baselines models for your machine learning task
- Learn how to build simple learning networks (advanced model composition) in MLJ
- Learn how to tune and analyze the evaluation results from learning networks
Stacking with Learning Networks
Intended Learning Outcomes:
- Have a grasp of how to build and analyze complex learning networks (e.g., stacking)
- Be able to evaluate and tune learning networks
Pipelines
Composing Models and Target Transformations
Intended Learning Outcomes:
- Learn how to transform the target of your regression data using MLJ
- Understand how to combine models and transformation algorithms in MLJ
- Gain an understanding of the benefits of using MLJ pipelines
Unsupervised Learning with PCA and Clustering
Intended Learning Outcomes:
- Learn how to build unsupervised models such as KMeans and PCA in MLJ
- Learn how to analyze and visualize results from unsupervised models such as KMeans and PCA
MLJ for Data Scientists in Two Hours
Intended Learning Outcomes:
- Get a grasp on using MLJ as a data scientist new to MLJ or Julia
- Refresh your skills on building simple models
- Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
- Learn how to build pipelines in MLJ
- Learn about how to manually and automatically evaluate models in MLJ
- Understand how to perform feature selection in MLJ
- Learn how to wrap models in iterative strategies in MLJ
- Learn how to tune hyperparameters in MLJ
- Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
- Learn how to save and perform final evaluations on your models in MLJ
- Understand the different types and methods introduced by MLJ
KNN, Logistic Regression and PCA on Wine Dataset
Intended Learning Outcomes:
- Familiarize yourself with the common data preprocessing steps in MLJ
- Refresh your skills on building pipelines and comparing classification models with MLJ
- Learn how to reduce the dimensionality of high-dimensional data using dimensionality reduction techniques such as PCA
EvoTree Classifier on Horse Colic Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing techniques in Julia
- Get familiar with building baselines models for your learning task in MLJ
- Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Exploring Generalized Linear Models
Intended Learning Outcomes:
- Understand how to use generative linear models from GLM.jl in MLJ
- Practice examples of using linear regression and logistic regression models in MLJ
- Understand how to interpret the outputs from linear and logistic regression models
Credit Fraud Detection with Classical and Deep Models
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization workflows
- Refresh your understanding of classification metrics such as the confusion matrix and ROC curves
- Build and hyperparameter tune logistic regression and SVM models
- Learn how to build basic neural networks with MLJFlux.jl
- Learn how to correct for class imbalance using the Imbalance.jl package
SMOTE-Tomek for Ethereum Fraud Detection
Intended Learning Outcomes:
- Familiarize yourself with common MLJ workflows related to loading and processing data
- Understand how hybrid resampling algorithms such as SMOTE-Tomek can be defined with the `BalancedModel` construct
Iterative Models
Exploring Tree-based Models
Intended Learning Outcomes:
- Explore various tree-based models for classification and regression including ordinary decision trees, random forests, and XGBoost
- Refresh your skills on hyperparameter tuning and building MLJ pipelines
MLJ for Data Scientists in Two Hours
Intended Learning Outcomes:
- Get a grasp on using MLJ as a data scientist new to MLJ or Julia
- Refresh your skills on building simple models
- Learn how to prepare example real-life data by loading, coercing, partitioning and unpacking data
- Learn how to build pipelines in MLJ
- Learn about how to manually and automatically evaluate models in MLJ
- Understand how to perform feature selection in MLJ
- Learn how to wrap models in iterative strategies in MLJ
- Learn how to tune hyperparameters in MLJ
- Familiarize yourself with confusion matrices, ROC curve and stratified cross-validation
- Learn how to save and perform final evaluations on your models in MLJ
- Understand the different types and methods introduced by MLJ
XGBoost on Crabs Dataset
Intended Learning Outcomes:
- Learn how to build XGBoost models in MLJ
- Familiarize yourself with various XGBoost hyperparameters and their effects
- Refresh your skills on using learning curves and hyperparameter tuning in MLJ
EvoTree Classifier on Horse Colic Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing techniques in Julia
- Get familiar with building baselines models for your learning task in MLJ
- Refresh your understanding of using pipelines, evaluation and hyperparameter tuning in MLJ
Tree-based models on King County Houses Dataset
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Explore different tree-based models such as decision trees, random forests and gradient boosters and compare them together
LightGBM on Boston Data
Intended Learning Outcomes:
- Familiarize yourself with common data preprocessing and visualization techniques in Julia
- Build and analyze LightGBM models in MLJ by utilizing learning curves and hyperparameter tuning
Custom Neural Networks on Boston Data
Intended Learning Outcomes:
- Learn how to build and train arbitrary feedforward neural networks via MLJFlux.jl
- Understand how deep learning MLJFlux models can be hyperparameter tuned with MLJ
BMI Classification with Decision Trees
Intended Learning Outcomes:
- Learn how to load tabular data, set up its scientific types and study any existing imbalance
- Observe how basic random oversampling can significantly improve decision tree performance on imbalanced data
- Practice MLJ workflows related to evaluation such as cross-validation and new metrics
Ensemble Models
Building and Tuning Bagging Ensemble Models
Intended Learning Outcomes:
- Understand how to implement bagging ensemble models in MLJ and compare them to atomic models
- Learn how to optimize the parameters of bagging ensemble models and visualize the results
Building Random Forests with Bagging Ensembles
Intended Learning Outcomes:
- Familiarize yourself with dealing with real-world datasets such as the Boston Housing dataset
- Understand how to implement Random Forests using bagging over Decision Trees
- Learn how to analyze the effect of a specific hyperparameter using MLJ's learning curve
- Learn how to tune the parameters of Random Forests
Stacking with Learning Networks
Intended Learning Outcomes:
- Have a grasp of how to build and analyze complex learning networks (e.g., stacking)
- Be able to evaluate and tune learning networks
Bayesian Models
Logistic Regression & Friends on Stock Market Data
Intended Learning Outcomes:
- Understand how to load and preprocess example datasets from RDatasets.jl
- Explore how to train and analyze logistic regression on stock market data
- Explore classification-related metrics such as cross-entropy loss, confusion matrix, and area under the ROC curve
- Compare logistic regression to various other classifiers such as LDA, QDA, and KNN
- Analyze training classification models on imbalanced datasets