Lecture recording (Nov 12, 2024) here.
Lab recording (Nov 14, 2024) here.
This week we will look at two patterns that modify training: the transfer learning pattern and the hyperparameter tuning pattern. The transfer learning pattern takes parts of a previously trained model, freezes the weights, and uses these nontrainable layers in a new model that solves a similar problem. This is needed when there is a lack of large datasets that are needed to train complex machine learning models. The hyperparameter tuning pattern inserts the training loop into an optimization method to find the optimal set of model hyperparameters.
We will also look at the first of resilience patterns: batch serving. The batch serving resilience design pattern focuses on ensuring the reliability and fault tolerance of batch processing systems. This pattern is particularly important in scenarios where large volumes of data are processed in scheduled batches, and any failure can result in significant delays or data loss.
Transfer Learning Design Patterns | Deep Learning Design Patterns - Jr Data Scientist - Part 7 - Transfer Learning |
Transfer Learning | |
The Hyperparameter Tuning Pattern | Deep Learning Design Patterns - Jr Data Scientist - Part 6 - Hyperparameter Tuning |
Hyperparameter Tuning in Practice |
Assignment 5 - Multimodal Input: An Autonomous Driving System
Transfer learning, used in machine learning, is the reuse of a pre-trained model on a new problem. In transfer learning, a machine exploits the knowledge gained from a previous task to improve generalization about another. For example, in training a classifier to predict whether an image contains food, you could use the knowledge it gained during training to recognize drinks. For more information see What Is Transfer Learning?.
The Rationale
The rationale behind the transfer learning design pattern stems from the observation that deep learning models trained on large-scale datasets can learn generic features that are useful for a wide range of tasks. Transfer learning leverages this idea by reusing pre-trained models as a starting point for new tasks.
The UML
Here is the UML diagram for the transfer learning pattern:
+---------------------------------+ | Pre-trained Model | +---------------------------------+ | | | - Trained on large-scale dataset| | - Captures generic features | +---------------------------------+ ^ | +---------------------------------+ | New Task-Specific Model | +---------------------------------+ | | | - Reuses pre-trained model | | - Freezes pre-trained layers | | - Adds new task-specific layers | | - Fine-tunes the model | +---------------------------------+
Code Example - Transfer Learning
In this code, we have two classes: PretrainedModel and NewTaskModel. The PretrainedModel class represents the pre-trained model, and it has methods to load the pre-trained model weights and extract features using the pre-trained layers. The NewTaskModel class represents the model for the new task. It has methods to add new task-specific layers and perform fine-tuning on the new task-specific data. In the main() function, we create an instance of PretrainedModel and load the pre-trained model weights. Then, we create an instance of NewTaskModel. The transfer learning process involves extracting features from the pre-trained model using pretrainedModel.extractFeatures(), adding task-specific layers using newTaskModel.addTaskSpecificLayers(), and fine-tuning the model on the new task-specific data using newTaskModel.fineTune(). Finally, you can use the trained new task-specific model for inference or any other desired tasks.
Note that this code is a basic representation to illustrate the concept of transfer learning in C++. In practice,
you would need to adapt it to the specific deep learning framework or library you are using and incorporate additional
functionalities as needed:
C++: TransferLearning.cpp.
C#: TransferLearning.cs.
Java: TransferLearning.java.
Python: TransferLearning.py.
Common Usage
The transfer learning design pattern is commonly used in various scenarios to leverage the knowledge gained from pre-trained models and apply it to new tasks or domains. The following are some common usages of the transfer learning design pattern:
Code Problem - Sentiment Analysis
In this example, we have a PretrainedWordEmbeddings class that represents pre-trained word embeddings. It has
methods to load the pre-trained embeddings from a file and retrieve the word embeddings for specific words.
The SentimentAnalysisModel class represents the sentiment analysis model. It has a dependency on the
PretrainedWordEmbeddings class. The model loads the pre-trained word embeddings and uses them for training
and prediction.
In the main() function, we create an instance of SentimentAnalysisModel, load the pre-trained word
embeddings, and train the model using transfer learning by calling trainModel() with the training data file path.
Finally, we use the trained model for sentiment prediction by calling predictSentiment() with an input text.
The predicted sentiment class is returned and printed to the console.
PretrainedWordEmbeddings.h,
SentimentAnalysisModel.h,
SentimentAnalysisModel.cpp,
Sentiment.cpp.
Code Problem - Image Classification
In the following example, a pre-trained model is used to help with image classification.
PretrainedModel.h,
TaskSpecificModel.h,
ImageClassificationModel.h,
ImageClassificationMain.cpp.
A Machine Learning model is defined as a mathematical model with a number of parameters that need to be learned from the data. By training a model with existing data, we are able to fit the model parameters. However, there is another kind of parameter, known as Hyperparameters, that cannot be directly learned from the regular training process. They are usually fixed before the actual training process begins. These parameters express important properties of the model such as its complexity or how fast it should learn. For more information see Hyperparameter Tuning - Brief Theory and What you won't find in the HandBook.
The Rationale
The rationale behind the Hyperparameter Tuning Design Pattern is to systematically explore different combinations of hyperparameter values to identify the configuration that produces the best results. By tuning the hyperparameters, we aim to improve the model's performance, generalization, ability, and robustness.
The UML
Here is the UML diagram for the hyperparameter tuning design pattern:
---------------------------------- | Hyperparameter Tuning | | Design Pattern | ---------------------------------- | | | + Define hyperparameters | | + Define search space | | + Define evaluation metric | | + Select tuning strategy | | + Train and evaluate models | | + Select best hyperparameters | | + Test the model | ----------------------------------The hyperparameter tuning design pattern usually involves the following steps:
Code Example - Hyperparameter Tuning
In this program, we demonstrate each step of the Hyperparameter Tuning Design Pattern.
We define the hyperparameters (learningRate, numHiddenUnits, regularizationStrength) in Step 1.
We define the search space for each hyperparameter (learningRates, hiddenUnits, regularizationStrengths) in Step 2.
The evaluation metric is the mean squared error (MSE) in this example, defined as bestMetric in Step 3.
We use nested loops to iterate over the search space of each hyperparameter in Step 4, representing the tuning strategy (in this case, grid search).
Within the loops, we train and evaluate the model using the current hyperparameters in Step 5. The evaluation metric (MSE) is calculated by the trainAndEvaluateModel function.
We select the best hyperparameters based on the metric value in Step 6. If the current metric is better than the previous best, we update the best hyperparameters and metric.
Finally, in Step 7, we print the best hyperparameters and the corresponding metric.
C++: Tuning.cpp.
C#: Tuning.cs.
Java: Tuning.java.
Python: Tuning.py.
Common Usage
The hyperparameter tuning pattern is widely used in the software industry for optimizing the performance of machine learning models and algorithms. The following are some common usages of the hyperparameter tuning pattern:
Code Problem - F1 Score
This code demonstrates the generation of the F1 score using the hyperparameter tuning design pattern. Here's a step-by-step explanation of the code:
Code Example - Support Vector Machine Classification
Below is a complex example of the Hyperparameter Tuning Design Pattern using a hypothetical scenario of a support vector machine (SVM) for classification:
MLModel.h,
SVMClassifier.h,
HyperparameterTuningStrategy.h,
RandomSearchStrategy.h,
GridSearchStrategy.h,
HyperparameterTuningContext.h,
SVMMain.cpp.
The batch serving design pattern focuses on the efficient and reliable processing of large volumes of data in scheduled batches, rather than in real-time. This pattern is commonly used in data processing pipelines for tasks such as ETL (Extract, Transform, Load), data warehousing, and offline machine learning model training.
The Rationale
The batch serving design pattern enables efficient, scalable processing of large data volumes by scheduling and distributing workloads across multiple nodes, optimizing resource utilization and minimizing costs. It enhances reliability and fault tolerance through robust error handling mechanisms, ensuring data consistency and integrity. Additionally, batch processing simplifies data pipeline management, testing, and debugging compared to real-time systems. It is particularly suited for use cases like ETL processes, analytics, reporting, and machine learning model training.
The UML
The following is a very primitive diagram of the batch serving design pattern.
+-----------------------------------------------+ | Data Source | +-----------------------------------------------+ | v +-----------------------------------------------+ | Data Ingestion | | - Collect data from various sources | | - Store in centralized repository | +-----------------------------------------------+ | v +-----------------------------------------------+ | Job Scheduler (e.g., Airflow) | | - Manage and orchestrate batch jobs | | - Schedule jobs at regular intervals | +-----------------------------------------------+ | v +-----------------------------------------------+ | Batch Processing Framework | | (e.g., Hadoop, Spark) | | - Distribute data processing | | - Ensure fault tolerance and scalability | +-----------------------------------------------+ | v +-----------------------------------------------+ | Data Transformation | | - Clean, normalize, and aggregate data | | - Apply business logic | +-----------------------------------------------+ | v +-----------------------------------------------+ | Output Storage | | - Store processed data for consumption | | - Use suitable data stores | +-----------------------------------------------+ | v +-----------------------------------------------+ | Monitoring and Logging | | - Track job performance | | - Log execution details and errors | +-----------------------------------------------+ | v +-----------------------------------------------+ | Fault Tolerance and Recovery | | - Implement retry logic | | - Use checkpointing | | - Ensure idempotent operations | +-----------------------------------------------+
Code Example - Batch Serving
Here is a simple example to demonstrate the batch serving design pattern. In this example,
we'll simulate a scenario where we process a batch of numbers by squaring each number in the batch.
C++: BatchServing.cpp.
C#: BatchServing.cs.
Java: BatchServing.java.
Python: BatchServing.py.
Common Usage
The batch serving design pattern is commonly used in various domains and applications where processing multiple requests or tasks in groups (batches) can significantly improve efficiency, reduce latency, and optimize resource utilization. Here are some common areas where this pattern is employed:
Code Problem - Batch Prediction
This example perform batch predictions using a linear regression model:
BatchPrediction.cpp.
data is defined as a constant vector of vectors of doubles. This represents our dataset directly in the code.
The LinearRegression class represents a simple linear regression model.
The model is initialized with a vector of coefficients and an intercept.
The predict() method makes a prediction for a single feature vector.
The batchPredict() method makes predictions for a batch of feature vectors.
For a discussion of machine learning applied to the very difficult problem of routing and placing digital electronic components on a micro-chip, see Route and Place.