Assignment 4 Instructions: here.
Due - Sunday, November 10, 2024
This is a group project of 2 or 3 students per group. The groups can be found in the document Assignment4_Groups.docx.
For this assignment you will train a data model and make predictions based on this model. The model is based on the Taylor Series representation of a sine wave. The predictions will be displayed in multiple view, which in this case are a console view and a graphical view.
There is a design pattern built for machine learning that suits the hosting of multiple views. You have to research this design pattern and use it in your assignment. You do not have to follow the design pattern if you do not wish to, but you must justify your reasons for doing so.
After you have decided on a design pattern, be sure to draw a UML diagram to show how you plan to approach this problem.
Although C++ is the only appropriate language for this assignment, you can write your code in one of the following four languages: C++, C#, Java or Python. Also keep in mind that at least one of your assignments has to be written in Java and at least one in C#.
There are two major components to this assignment: the machine learning data model and the viewing of the data. Each component can be broken into sub-components.
The machine learning data model consists of two parts: the training of the model, and the predictor. For the purposes of this assignment, the training of the weights of the model are based on a sine wave. There are two builds for the data model: a test build where you can see debug print's as your code is running, and a release build where the debug print's are disabled. The customer will get the release build. The goal is that the training model will reside on a small device that is limited in memory, therefore a reduced instruction set will be used to save space. This reduced instruction set purposefully excludes the maths library, therefore the sine wave used in the training has to be approximated by a Taylor Series.
The Taylor Series is a mathematical model that attempts to approximate mathematical
functions through a series of polynomials. See
Taylor's Series of sinx for the mathematical
derivation of a Taylor Series that approximates a sine wave. As you can see from the derivation,
the Taylor Series for a sin wave is as follows:
sinx = x - x3/3! + x5/5! - x7/7! + x9/9! . . .
Standard deviation is a measure of how close actual data is to the true data. It calculates
the sum of the squares of this difference for all data points, divides this by the number of data points,
and then takes the square root of this number. See
Variance and Standard Deviation. For our case,
the standard deviation will be calculated from the difference between our approximated sine wave and
an actual sine for n data points:
Std Dev = sqrt( sum(estimate-true)2/(n-1)).
Code has been given to you that attempts to approximate a sine wave with a Taylor Series. The
standard deviation is found for the approximation, and if the standard deviation is below a
tolerance, the approximation is accepted. If not, the Taylor Series is expanded by one. See:
C++: MathUtils.cpp,
C#: MathUtils.cs (refer to MathUtils.cpp for detailed comments),
Java: MathUtils.java (refer to MathUtils.cpp for detailed comments),
Python: MathUtils.py (refer to MathUtils.cpp for detailed comments).
For instance, the Taylor Series for the sine wave is initially given as:
sinx = x.
Most likely the standard deviation will be well off the tolerance. Therefore expand this
series to:
sinx = x - x3/3!.
The standard deviation for this approximation probably will not meet the tolerance threshold
therefore the series might have to be expanded to:
sinx = x - x3/3! + x5/5!.
You will have to modify the train() function in MathUtils.cpp to continue this
expansion continues until the tolerance threshold is met.
There are two view in this assignment: console view and graphical view. There is no time to
set up a proper console view or a proper graphical view, so for displaying the prediction,
please have each simply print out:
std::cout << "Console Prediction: " << prediction << std::endl; or
std::cout << "Graphical Prediction: " << prediction << std::endl;
You should have some kind of intermediary class that attaches and detaches a view, that somehow calls a display prediction function, and somehow trains the model.
Your main function will set everything up, generate some random data, make a prediction,
and display the results on all the views. It will do this five times. Skeleton code for the
main function can be seen at:
C++: MultiViewData.cpp,
C#: MultiViewData.cs,
Java: MultiViewData.java,
Python: MultiViewData.py.
A sample run might look as follows:
stdDev: 0.222223 ERROR: The training model is inaccurate! stdDev: 0.0229135 ERROR: The training model is inaccurate! stdDev: 0.00117588 ERROR: The training model is inaccurate! The training model is accurate. Console Prediction: 3.94557 Graphical Prediction: 3.94557 Console Prediction: -2.73723 Graphical Prediction: -2.73723 Console Prediction: 3.29787 Graphical Prediction: 3.29787 Console Prediction: 0.801149 Graphical Prediction: 0.801149 Console Prediction: 2.27915 Graphical Prediction: 2.27915
You will be marked out of 10 according to the following:
Does not meet expectations | Satisfactory | Good | Exceeds Expectations | |
---|---|---|---|---|
UML Diagram (2 marks) | Does not meet requirements | Meets the most important requirements | Meets all requirements with minor errors | Meets all requirements with no errors |
Machine Learning Data Module (2 marks) | Does not meet requirements | Meets the most important requirements | Meets all requirements with minor errors | Meets all requirements with no errors |
The Views (1 mark) | Does not meet requirements | Meets the most important requirements | Meets all requirements with minor errors | Meets all requirements with no errors |
Other Class(es) (1 mark) | Does not meet requirements | Meets the most important requirements | Meets all requirements with minor errors | Meets all requirements with no errors |
The main function (1 mark) | Does not meet requirements | Meets the most important requirements | Meets all requirements with minor errors | Meets all requirements with no errors |
Code Documentation (1 mark) | Does not contain documentation | Contains header documentation for either all files or for all functions within each file | Contains header documentation for all files and for most functions within each file | Contains header documentation for all files and for all functions within each file. Documents unclear code. |
Questions (2 marks) | Answers no question correctly | Answers some questions correctly | Answers most questions correctly | Answers all Questions correctly |
Please email all source code and answers to questions to: miguel.watler@senecapolytechnic.ca
Your answers to questions can be submitted in a separate document or embedded within your source code.
You will be docked 10% if your assignment is submitted 1-2 days late.
You will be docked 20% if your assignment is submitted 3-4 days late.
You will be docked 30% if your assignment is submitted 5-6 days late.
You will be docked 40% if your assignment is submitted 7 days late.
You will be docked 50% if your assignment is submitted over 7 days late.