Assignment 4 - Multi-View Machine Learning Predictor

Assignment 4 Instructions: here.

Due - Sunday, November 10, 2024

This is a group project of 2 or 3 students per group. The groups can be found in the document Assignment4_Groups.docx.

Introduction

For this assignment you will train a data model and make predictions based on this model. The model is based on the Taylor Series representation of a sine wave. The predictions will be displayed in multiple view, which in this case are a console view and a graphical view.

Design Considerations

There is a design pattern built for machine learning that suits the hosting of multiple views. You have to research this design pattern and use it in your assignment. You do not have to follow the design pattern if you do not wish to, but you must justify your reasons for doing so.

After you have decided on a design pattern, be sure to draw a UML diagram to show how you plan to approach this problem.

Although C++ is the only appropriate language for this assignment, you can write your code in one of the following four languages: C++, C#, Java or Python. Also keep in mind that at least one of your assignments has to be written in Java and at least one in C#.

The Components

There are two major components to this assignment: the machine learning data model and the viewing of the data. Each component can be broken into sub-components.

The Machine Learning Data Model

The machine learning data model consists of two parts: the training of the model, and the predictor. For the purposes of this assignment, the training of the weights of the model are based on a sine wave. There are two builds for the data model: a test build where you can see debug print's as your code is running, and a release build where the debug print's are disabled. The customer will get the release build. The goal is that the training model will reside on a small device that is limited in memory, therefore a reduced instruction set will be used to save space. This reduced instruction set purposefully excludes the maths library, therefore the sine wave used in the training has to be approximated by a Taylor Series.

The Taylor Series is a mathematical model that attempts to approximate mathematical functions through a series of polynomials. See Taylor's Series of sinx for the mathematical derivation of a Taylor Series that approximates a sine wave. As you can see from the derivation, the Taylor Series for a sin wave is as follows:
sinx = x - x3/3! + x5/5! - x7/7! + x9/9! . . .

Standard deviation is a measure of how close actual data is to the true data. It calculates the sum of the squares of this difference for all data points, divides this by the number of data points, and then takes the square root of this number. See Variance and Standard Deviation. For our case, the standard deviation will be calculated from the difference between our approximated sine wave and an actual sine for n data points:
Std Dev = sqrt( sum(estimate-true)2/(n-1)).

Code has been given to you that attempts to approximate a sine wave with a Taylor Series. The standard deviation is found for the approximation, and if the standard deviation is below a tolerance, the approximation is accepted. If not, the Taylor Series is expanded by one. See:
C++: MathUtils.cpp,
C#: MathUtils.cs (refer to MathUtils.cpp for detailed comments),
Java: MathUtils.java (refer to MathUtils.cpp for detailed comments),
Python: MathUtils.py (refer to MathUtils.cpp for detailed comments).

For instance, the Taylor Series for the sine wave is initially given as:
sinx = x.

Most likely the standard deviation will be well off the tolerance. Therefore expand this series to:
sinx = x - x3/3!.

The standard deviation for this approximation probably will not meet the tolerance threshold therefore the series might have to be expanded to:
sinx = x - x3/3! + x5/5!.

You will have to modify the train() function in MathUtils.cpp to continue this expansion continues until the tolerance threshold is met.

The Views

There are two view in this assignment: console view and graphical view. There is no time to set up a proper console view or a proper graphical view, so for displaying the prediction, please have each simply print out:
std::cout << "Console Prediction: " << prediction << std::endl; or
std::cout << "Graphical Prediction: " << prediction << std::endl;

MultiView Data Main Function and Other Classes

You should have some kind of intermediary class that attaches and detaches a view, that somehow calls a display prediction function, and somehow trains the model.

Your main function will set everything up, generate some random data, make a prediction, and display the results on all the views. It will do this five times. Skeleton code for the main function can be seen at:
C++: MultiViewData.cpp,
C#: MultiViewData.cs,
Java: MultiViewData.java,
Python: MultiViewData.py.

Test Program

A sample run might look as follows:

stdDev: 0.222223
ERROR: The training model is inaccurate!

stdDev: 0.0229135
ERROR: The training model is inaccurate!

stdDev: 0.00117588
ERROR: The training model is inaccurate!

The training model is accurate.

Console Prediction: 3.94557
Graphical Prediction: 3.94557

Console Prediction: -2.73723
Graphical Prediction: -2.73723

Console Prediction: 3.29787
Graphical Prediction: 3.29787

Console Prediction: 0.801149
Graphical Prediction: 0.801149

Console Prediction: 2.27915
Graphical Prediction: 2.27915

Questions

  1. Why should this code be written in C/C++ only? Why not Java, C# or Python?
  2. Name two other possible common usages for this design pattern.
  3. What might cause a test build to behave differently from a release build?
  4. Did you use interface classes for all components in this assignment? If not, why?

Marking Rubric

You will be marked out of 10 according to the following:

Does not meet expectationsSatisfactoryGoodExceeds Expectations

UML Diagram
(2 marks)
Does not meet requirementsMeets the most important requirementsMeets all requirements with minor errorsMeets all requirements with no errors
Machine Learning
Data Module
(2 marks)
Does not meet requirementsMeets the most important requirementsMeets all requirements with minor errorsMeets all requirements with no errors

The Views
(1 mark)
Does not meet requirementsMeets the most important requirementsMeets all requirements with minor errorsMeets all requirements with no errors

Other Class(es)
(1 mark)
Does not meet requirementsMeets the most important requirementsMeets all requirements with minor errorsMeets all requirements with no errors

The main function
(1 mark)
Does not meet requirementsMeets the most important requirementsMeets all requirements with minor errorsMeets all requirements with no errors
Code Documentation
(1 mark)
Does not contain documentationContains header documentation for either all files or for all functions within each file Contains header documentation for all files and for most functions within each fileContains header documentation for all files and for all functions within each file. Documents unclear code.
Questions
(2 marks)
Answers no question correctlyAnswers some questions correctlyAnswers most questions correctlyAnswers all Questions correctly

Submission

Please email all source code and answers to questions to: miguel.watler@senecapolytechnic.ca

Your answers to questions can be submitted in a separate document or embedded within your source code.

Late Policy

You will be docked 10% if your assignment is submitted 1-2 days late.
You will be docked 20% if your assignment is submitted 3-4 days late.
You will be docked 30% if your assignment is submitted 5-6 days late.
You will be docked 40% if your assignment is submitted 7 days late.
You will be docked 50% if your assignment is submitted over 7 days late.