Week 8 - Data Representation Patterns: Embeddings and Feature Cross

Lecture recording (Oct 29, 2024) here.

Introduction

This week we start design patterns for machine learning. We will look at two data representation patterns: the embeddings pattern and the feature cross pattern. The embeddings pattern is for high-cardinality features where closeness relationships are important to preserve. It learns a data representation that maps high-cardinality data into a lower-dimensional space in such a way that the information relevant to the learning problem is preserved. The feature cross pattern is for models whose complexity is insufficient to learn feature relationships. It helps models learn relationships between inputs faster by explicitly making each combination of input values a separate feature.

Videos

Machine Learning, Supervised Learning:	#4 Machine Learning Specialization
	#5 Machine Learning Specialization.
Machine Learning, Unsupervised Learning:	#6 Machine Learning Specialization
	#7 Machine Learning Specialization.
Machine Learning Design Patterns:	ML Design Patterns by Lak (1 hour lecture)
	Machine Learning Design Patterns (1 hour 20 minute lecture)
The Embeddings Pattern	Machine Learning Design Patterns Embeddings (7:05-12:40)
	Machine Learning Design Patterns \| Dr Ebin Deni Raj Embeddings (43:42-59:35)

Assignment(s)

Assignment 4 - Multi-View Machine Learning Predictor

Common Categories of Machine Learning Design Patterns

Design patterns for machine learning can be broken into six categories: data representation, problem representation, patterns that modify model training, resilience, reproducibility and responsible AI. Data representation design patterns for machine learning focus on efficient and effective ways to represent and organize data for use in machine learning algorithms. Problem representation design patterns for machine learning focus on how to represent and formulate machine learning problems in a way that facilitates effective learning and modeling. Patterns that modify model training design patterns for machine learning are focused on enhancing the training process of machine learning models to improve their performance, convergence, and generalization capabilities. Resilience design patterns for machine learning are aimed at improving the robustness and fault tolerance of machine learning systems. Reproducibility design patterns for machine learning are focused on ensuring that machine learning experiments and results can be reproduced consistently. Responsible AI design patterns for machine learning focus on ensuring that machine learning systems are developed and deployed in an ethical and responsible manner. These are summarized in the image:

ML Design Patterns

These are also summarized in the second half of Common Patterns.docx. The bolded patterns are the patterns we will cover in class. The number in brackets shows the popularity rank of a particular pattern. Note that we cover the 15 most popular machine learning design patterns.

Machine Learning Lectures - Stanford University

For a full course on machine learning, see the playlist Stanford CS229: Machine Learning Full Course taught by Andrew Ng, Autumn 2018. Of interest to our study of machine learning design patterns is the second lecture on linear regression and gradient descent. See Stanford CS229: Machine Learning - Linear Regression and Gradient Descent.

For a shorter course on machine learning, see the playlist Machine Learning Specialization by Andrew Ng. For shorter videos on training data, see the following videos on supervised learning: #4 Machine Learning Specialization and #5 Machine Learning Specialization. See also the following videos on unsupervised learning: #6 Machine Learning Specialization and #7 Machine Learning Specialization.

The Embeddings Design Pattern

The Rationale

The rationale for the embeddings design pattern in machine learning is to represent high-dimensional categorical or discrete features in a lower-dimensional continuous vector space. Embeddings are learned representations that capture meaningful relationships and semantic information between different categories or entities present in the data.

The UML

Here is a very rough UML diagram for the embeddings pattern:

  +------------------+                +------------------+
  |  EmbeddingLayer  |<>--------------|      Model       |
  +------------------+                +------------------+
  | - inputDim: int  |                | - embeddingLayer: EmbeddingLayer
  | - embeddingDim: int               |                  |
  +------------------+                +------------------+
  | + getEmbedding() |                | + predict()      |
  +------------------+                +------------------+

The UML diagram for the embeddings design pattern would typically involve the following components:

Embedding Layer Class: This class represents the embedding layer component in a neural network model. It encapsulates the functionality of mapping categorical inputs to their corresponding embeddings. It typically contains parameters such as the input dimension, embedding dimension, and methods for forward propagation.
Model Class: The model class represents the overall neural network model that incorporates the embedding layer. It may include other layers such as dense layers, convolutional layers, or recurrent layers depending on the specific architecture. The model class connects the embedding layer to other components of the model.

It may also contain the following components:

Training Data Class: This class represents the training data used to train the model. It may contain the input categorical features, target variables, and other relevant data for training the embeddings. It serves as the input to the model during the training phase.
Inference Data Class: This class represents the data used for inference or prediction with the trained model. It may contain the categorical features for which embeddings are generated, as well as any additional input data required for prediction.

Code Example - Embeddings Data Pattern

The following is a simple example of the embeddings data pattern:
C++: Embedding.cpp.
C#: Embeddings.cs.
Java: Embeddings.java.
Python: Embeddings.py.

Common Usage

The following are some common usages of the embeddings pattern:

Natural Language Processing (NLP): Embeddings are widely used in NLP tasks such as text classification, sentiment analysis, named entity recognition, machine translation, and document similarity. Word embeddings, such as Word2Vec, GloVe, and fastText, capture semantic relationships between words and are used to represent textual data in a dense vector space.
Recommender Systems: Embeddings play a crucial role in recommender systems to capture user preferences and item characteristics. User embeddings and item embeddings are learned from historical user-item interactions and used to generate personalized recommendations. Embeddings enable the system to find similar users or items based on their embedding vectors.
Image and Video Processing: In computer vision tasks, embeddings are used to represent images and videos. Techniques like convolutional neural networks (CNNs) are used to learn image embeddings, which can be used for image classification, object detection, image retrieval, and more. Video embeddings can capture temporal information and are useful for tasks like action recognition and video summarization.
Anomaly Detection: Embeddings can be used to detect anomalies in data. By learning embeddings that capture normal patterns, deviations from the normal behavior can be identified as anomalies. This approach is commonly used in fraud detection, network intrusion detection, and outlier detection.
Knowledge Graphs: Embeddings are employed in knowledge graph applications to represent entities and relationships in a graph structure. Graph embeddings enable efficient similarity calculations and can be used for tasks like entity linking, link prediction, and graph-based recommendation systems.
Sequence Modeling: Embeddings are utilized in sequence modeling tasks, such as natural language generation, machine translation, and speech recognition. Sequence embeddings capture dependencies and context in sequential data, enabling the model to understand and generate meaningful sequences.

Code Problem - Movie Recommendations

We want to implement a system that recommends movies to a user based on a list of watched movies. We need an EmbeddingLayer class responsible for generating and retrieving embeddings. We need a Movie class to represent a movie with an ID and a title. We need a RecommenderSystem class that calls a recommendMovie function for a specific user, passing their ID and the list of movies they've already watched. The recommendMovie function takes a user ID and a list of watched movies and recommends a movie based on a users embeddings and similarity metric. The code is seen below.
Movie.h,
EmbeddingLayer.h,
RecommenderSystem.h,
MovieMain.cpp.

Code Problem - Predicting Financial Data

The following program uses historical prices as well as weights to predict a stock price for a given day. The result is a dot product of the two vectors (historical prices, weights).
VectorOperations.h, vector dot product
FinancialData.h,
StockPredictionModel.h, contains the embedded data
FinancialDataMain.cpp.

The Feature Cross Design Pattern

The Rationale

The rationale for the Feature Cross design pattern in machine learning is to enhance the predictive power of models by creating new, more complex features through the combination or interaction of existing features. It aims to capture higher-level relationships or interactions between features that may not be evident in their individual form.

The UML

Here is a very rough UML diagram for the feature cross pattern:

  +-----------------------+
  |   FeatureCross        |
  +-----------------------+
  | +crossFeatures()      |
  +-----------------------+
  
  +-----------------------+
  |   FeatureExtractor    |
  +-----------------------+
  | +extractFeature()     |
  +-----------------------+
  
  +-----------------------+
  |   Feature             |
  +-----------------------+
  | +calculate()          |
  +-----------------------+

Here are the components of the feature cross design pattern:

FeatureCross: This class represents the feature cross functionality. It has a method called crossFeatures() which takes input features and applies a specific algorithm or logic to combine or cross the features to create new, derived features.
FeatureExtractor: This class represents a feature extraction process. It has a method called extractFeature() which extracts features from a given input, such as raw data or pre-existing features.
Feature: This class represents a feature. It has a method called calculate() which calculates the value of the feature based on the input data or other features.

Code Example - The Feature Cross Pattern

Below are some simple code examples demonstrating the feature cross design pattern:
C++: FeatureCross.cpp.
C#: FeatureCrossMain.cs.
Java: FeatureCrossMain.java.
Python: FeatureCross.py.

Common Usage

The following are some common usages of the feature cross pattern:

Recommender Systems: Feature crosses are often employed in recommender systems to capture the interactions between user preferences and item attributes. By combining features such as user demographics, item categories, ratings, and past behaviors, more accurate recommendations can be generated.
Advertising and Marketing: In online advertising and marketing platforms, feature crosses are used to capture the interaction between user attributes (e.g., age, gender, location) and contextual information (e.g., website content, time of day). This enables targeted advertising and personalized recommendations to be delivered to specific user segments.
Natural Language Processing (NLP): Feature crosses find applications in NLP tasks such as sentiment analysis, named entity recognition, and text classification. By combining word features, syntactic features, and semantic features, models can better capture the complex relationships between words and improve performance.
Fraud Detection: In fraud detection systems, feature crosses are used to identify suspicious patterns and interactions between different variables. By combining features related to user behavior, transaction history, and contextual information, anomalies and potential fraud cases can be detected more effectively.
Health Monitoring and Predictive Analytics: Feature crosses can be used in healthcare applications to capture the interactions between different patient attributes, vital signs, and medical history. This enables the development of predictive models for disease diagnosis, patient monitoring, and personalized treatment recommendations.
Image and Video Analysis: In computer vision applications, feature crosses can be employed to capture the interactions between different visual features extracted from images or videos. This can improve object recognition, scene understanding, and image classification tasks.
Financial Analysis: In financial applications, feature crosses can help capture the interactions between various economic indicators, market trends, and investment strategies. This enables the development of more accurate predictive models for stock market forecasting, risk assessment, and portfolio optimization.

Code Problem - House Price Predictor

In this example, we define a FeatureCross class that encapsulates the logic for creating a feature cross. The createFeatureCross function takes a vector of features and calculates the cross product of all the features.

We also have a HousePriceModel class that represents a machine learning model for predicting house prices. In the train function, we iterate over the feature matrix and apply the feature cross pattern by calculating the feature cross using the FeatureCross class. In a real implementation, you would perform the actual training steps, such as updating the model's weights in a linear regression model.

The predict function takes a vector of features and applies the feature cross pattern by calculating the feature cross using the FeatureCross class. Again, in a real implementation, you would use the trained model to make the actual prediction based on the feature cross.

In the main function, we create sample training data represented by the feature matrix and the target vector (house prices). We instantiate a HousePriceModel object and train the model using the feature matrix and target values. Then, we create a sample feature vector for prediction and use the trained model to predict the house price.

The code is given below:
FeatureCross.h,
HousePriceModel.h,
PredictorMain.cpp.

Code Problem - Income Predictor

The following program crosses two sets of features - age with education, and age with experience - to predict income.
IncomePredictorMain.cpp.