Lecture recording (Nov 19, 2024) here.
Lab recording (Nov 21, 2024) here.
This week we study two resilience patterns: the stateless serving function pattern and the continued model evaluation pattern. The stateless serving function pattern exports the machine learning model as a stateless function so that it can be shared by multiple clients in a scalable way. This is because production machine learning systems must be able to synchronously handle thousands to millions of prediction requests per second. The continued model evaluation pattern can detect when a deployed model is no longer fit-for-purpose by continually monitoring model predictions and evaluating model performance. This is because model performance of deployed models degrades over time either due to data drift, concept drift or other changes to the pipelines which feed data to the model.
We will also look at the transform design pattern. The transform reproducibility design pattern ensures that data transformations in processing systems can be consistently reproduced, providing data reliability, traceability, and ease of debugging.
The Stateless Serving Function Pattern | Machine Learning Design Patterns | Google Executive | Investor | Meet the Author (What is your favourite design pattern 7:37-10:25) |
The Continued Model Evaluation Pattern | Machine Learning Design Patterns | Dr Ebin Deni Raj (Design Patterns for Resilient Serving- Continuous Model Evaluation 1:12:50-1:32:30) |
Machine Learning Design Patterns (16:10-26:17) | |
Pipeline Design Patterns | What are some common data pipeline design patterns? (Extract, Transform and Load) |
Assignment 5 - Multimodal Input: An Autonomous Driving System
Assignment 6 - Investigation of Design Patterns for Machine Learning
The Rationale
The rationale behind the stateless serving function design pattern is to enable scalable and efficient handling of incoming requests in a distributed computing environment. In this pattern, each request is treated independently, and the server functions do not maintain any state or store any context about previous requests. Instead, they focus solely on processing the current request and generating a response.
When we implement the stateless design pattern, we create classes and objects that
do not retain state changes. In this approach, each use of the object, as an
example, uses the object in its organic form. In our context, state refers to
the values of the object's variables. So, there is no definitive list of states.
The state of an object is specific to a moment in time.
Using this design pattern, we can have a production ML system synchronously handle
millions of prediction requests per second.
For a discussion on stateful vs stateless, see RedHat: Stateful vs stateless.
The UML
Here is the UML diagram for the stateless serving function pattern:
+-----------------+ | Application | +-----------------+ | + serve(request)| +-----------------+ | V +--------------------+ | RequestProcessor | +--------------------+ | - process(request) | +--------------------+ | V +-------------------+ | Controller | +-------------------+ | - handle(request) | +-------------------+ | V +------------------+ | Business | +------------------+ | - doSomething() | +------------------+
Code Example - Stateless Serving Function Pattern
This example is code representation of the above UML:
C++: Stateless.cpp.
C#: Stateless.cs.
Java: Stateless.java.
Python: Stateless.py.
Common Usage
The following are some common usages of the stateless serving function pattern:
Code Problem - Server Handler
In this example, a server application handles multiple types of requests using a stateless serving function design pattern. The Request class represents a request made to the server and provides a method to retrieve the request details. The Business class simulates processing the request by performing some business logic based on the request. In this example, it simply prints the request. The Controller class handles the request and delegates it to the Business class for processing. The RequestProcessor class maintains a mapping of endpoints to their respective controllers. It extracts the endpoint from the request and finds the corresponding controller to handle the request. The Application class serves as the entry point of the program. It allows registering controllers for specific endpoints and processes incoming requests.
In the main() function, we create an instance of the Application class and register two controllers for different endpoints.
We then create multiple requests and pass them to the serve() method of the application, which delegates the processing to the appropriate controller based on the request's endpoint.
When you run this program, you will see the requests being processed by the respective controllers based on their endpoints. If an invalid endpoint is provided, an error message will be displayed.
Request.h,
Business.h,
Controller.h,
RequestProcessor.h,
RequestProcessor.cpp,
Application.h,
Server.cpp.
Code Problem - Socket Based Server
Sample code for a simple socket-based server for handling requests is given below. This version uses Winsock for socket programming on Windows.
Note that you'll need to link against the Ws2_32.lib library.
MLModel.h,
StatelessServingFunction.h,
RequestHandler.h,
RequestHandler.cpp,
Server.h,
Server.cpp,
ServerMain.cpp.
The Rationale
The rationale for the continued model evaluation pattern is to ensure that machine learning models perform effectively and reliably over time. This pattern involves regularly evaluating and monitoring models after they have been deployed in a production environment. The goal is to assess model performance, detect potential issues or drift, and take appropriate actions to maintain or improve model accuracy and reliability.
The UML
Here is the UML diagram for the continued model evaluation pattern:
+-------------------+ | ModelEvaluator | +-------------------+ | - model: Model | +-------------------+ | + evaluate(data) | | + updateModel() | +-------------------+ /\ | | | | | +-------------------+ | Model | +-------------------+ | - parameters | +-------------------+ | + predict(data) | +-------------------+
Code Example - Continued Evaluation Pattern
In the provided code, the Model class is implemented with a constructor to initialize model parameters and a predict()
method to perform the prediction.
The ModelEvaluator class holds an instance of the Model class and provides the evaluate() method to evaluate
data using the model. It returns the prediction result based on the model's prediction method. The updateModel() method
is used to update the model with new data or retrain the model.
In the main() function, you can create a ModelEvaluator object, load the data, evaluate the model
using the evaluate() method, and update the model using the updateModel() method.
Remember to replace Data with the appropriate data type used in your implementation and adjust the methods
and parameters according to your specific requirements.
C++: ContinuedEval.cpp.
C#: ContinuedEval.cs.
Java: ContinuedEval.java.
Python: ContinuedEval.py.
Common Usage
The following are some common usages of the continued model evaluation pattern:
Code Problem - Model Modification
In this example, we have a Data class that represents the input data, containing a vector of features. The Model class represents the model used for prediction, which consists of a vector of weights. The ModelEvaluator class is responsible for evaluating the model using the provided data and updating the model with new weights. In the main() function, we create a ModelEvaluator object with initial weights, load the input data, and evaluate the model using the evaluate() method. The prediction result is then printed to the console. Next, we update the model with new weights using the updateModel() method and evaluate the model again with the updated weights. The updated prediction result is printed to the console.
You can customize the example by modifying the number of features, adding additional evaluation logic, or
adjusting the model's prediction mechanism to suit your specific requirements.
Data.h,
Model.h,
Model.cpp,
ModelEvaluator.h,
ModelEvaluator.cpp,
ModelMod.cpp.
Code Problem - Linear Regression Model Modification
This example is similar to the previous. The below code creates a simple
linear regression model and continuously evaluates and updates it based on a stream of data.
Additionally, multithreading is used to simulate concurrent evaluation and updating.
LinearRegressionModel.h,
DataStreamGenerator.h,
ModelEvaluator.h,
ModelUpdater.h,
LinearRegressionModMain.cpp.
The transform design pattern for machine learning focuses on ensuring the reproducibility and consistency of data transformations, which are critical for training, testing, and deploying models. This pattern emphasizes deterministic transformations, version control, and environmental consistency to maintain data integrity and facilitate debugging.
The Rationale
The problem is that the inputs to a machine learning model are not the features that the machine learning model uses in its computations. In a text classification model, for example, the inputs are the raw text documents and the features are the numerical embedding representations of this text. When we train a machine learning model, we train it with features that are extracted from the raw inputs. The solution is to explicitly capture the transformations applied to convert the model inputs into features.
The UML
The following is a very basic UML diagram of the transform design pattern.
+-----------------------------------------------+ | Raw Data Source | | - Collect data from various sources | +-----------------------------------------------+ | v +-----------------------------------------------+ | Data Ingestion Layer | | - Ingest and store raw data | | - Ensure immutability | +-----------------------------------------------+ | v +-----------------------------------------------+ | Data Transformation Engine | | - Apply transformations to data | | - Ensure transformations are deterministic | | - Version control for transformation logic | +-----------------------------------------------+ | v +-----------------------------------------------+ | Feature Store Layer | | - Store transformed features | | - Ensure features are versioned and immutable| +-----------------------------------------------+ | v +-----------------------------------------------+ | Machine Learning Model Training | | - Use versioned features for training | | - Ensure reproducibility of training process | +-----------------------------------------------+ | v +-----------------------------------------------+ | Model Serving and Deployment | | - Deploy trained models | | - Use versioned features for prediction | +-----------------------------------------------+ | v +-----------------------------------------------+ | Monitoring and Feedback Loop | | - Monitor model performance | | - Collect feedback for continuous improvement| +-----------------------------------------------+Raw Data Source: Collects data from various sources.
Code Example - Transform
Below is a simple example of using the transform design pattern for a machine learning workflow. This example
demonstrates data normalization, which is a common transformation step. We'll use a basic class to handle the
normalization of a dataset.
C++: Transform.cpp.
C#: Transform.cs.
Java: Normalizer.java,
Transform.java.
Python: Transform.py.
Common Usage
The transform design pattern is widely used in machine learning for various purposes. Here are some common usages:
Code Problem - Data Normalization
The following code uses the transform design pattern for data normalization. The loadData function simulates the
loading of numerical data:
MultiTransform.cpp.
The Normalizer class performs data normalization.
The fit() method calculates the means and standard deviations of the features.
The transform() method normalizes a single feature vector using the stored means and standard deviations.
The fitTransform() method fits the normalizer to the data and then transforms it.