Feel free to use libraries with general utilities, such as matplotlib, numpy and scipy for
python. However, you should implement the algorithms yourself, which means you should
NOT use pre-existing implementations of the algorithms as found in SciKit learn,
Tensorflow, etc.!

Code:
- You can only use Python 3 and you must submit your solution as a jupyter note-
book.
- Make sure all the data files needed to run your code are within the folder and
loaded with relative path. We should be able to run your code without making any
modifcations.
Report:
- You report should be brief and to the point.
- Report all the visualizations (learning curves, regression fit).
- Do not include your code in the report!




1 Sampling
-------------------
A grad student's daily routine is defined as a multinomial distribution, p, over the set
of following activities:
• Movies: 0.2
• class 0.4
• Playing: 0.1
• Studying: 0.3

1. Every morning, the student wakes up and randomly samples from this distribution
an activity to do for the rest of the day. Provided that you can only sample
from uniform distribution over (0,1), write a pseudocode to sample from the given
multinomial distribution.

2. Implement your sampling algorithm and use it to sample the student's routine for
100 days. Report the fraction of days spent in each activity. Now use it to sample
for 1000 days. Report the fraction of days spent in each activity. Compare these
fractions to the underlying multinomial distribution.




2 Model Selection
-------------------
You have to use Dataset-1 for this experiment. Dataset-1 consists of train, validation, and
test files. The input is a real valued scalar and the output is also a real valued scalar. The
dataset is generated from an n-degree polynomial and a small Gaussian noise is added to
the target.
1. Fit a 20-degree polynomial to the data.
(a) Report the training and validation RMSE (Root Mean-Square Error). Do not use
any regularization.
(b) Visualize the fit.
(c) Is the model overfitting or underfitting ? Why?

2. Now add L2 regularization to your model. Vary the value of (lambda) from 0 to 1, with a 0.01
step size.
(a) For different values of (lambda), plot the training RMSE and the validation RMSE.
(b) Find the best value of (lambda) and report the test performance for the corresponding
model.
(c) Visualize the fit for the chosen model.
(d) Is the model overfitting or underfitting ? Why?

3. What do you think is the degree of the source polynomial? Can you infer that from the
visualization produced in the previous question?




3 Gradient Descent for Regression
-------------------
You have to use Dataset-2 for this experiment. Dataset-2 consists of train, validation, and
test files. The input is a real valued scalar and the output is also a real valued scalar.

1. Fit a linear regression model to this dataset by using stochastic gradient descent (one
example at a time).
(a) Using a step size of 10^-4, plot the training and validation RMSE against the number
of epochs, until convergence.
2. Try dierent step sizes and choose the best step size by using the validation data.
(a) Report in a table the validation performance with dierent step-sizes.
(b) Report the test RMSE of the chosen model.
3. Report 5 different visualizations chosen at random to illustrate how the regression fit
evolves during the training process.
4. Repeat part 1 using full-batch gradient descent.
5. Comment on the dierence between full-batch gradient descent and stochastic gradient
descent based on your experiments.




4 Real life dataset
-------------------
For this question, you will use the Communities and Crime Data Set from the UCI repository
(http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime).

1. This is a real-life data set and as such, it would not have the nice properties that we
expect. Your first job is to make this dataset usable by filling in all the missing values.

(a) Use the sample mean of each column to fill in the missing attributes. Is this is a
good choice? Explain why or why not.
(b) What else might you use to fill in the missing attributes?
(c) If you have a better method, describe it, and use it for filling in the missing data.
Explain why your method is better.
(d) Turn in the completed data set.

2. Use the first 20% of the dataset for testing and use the remaining 80% for training in
the order given in the dataset file.
(a) Report the 5-fold cross-validation average RMSE.
(b) Report the test RMSE.

3. We now use Ridge-regression on the above data.
(a) In order to choose the best (lambda), plot the average RMSE using 5-fold cross validation,
for various values of (lambda) [x-axis: (lambda), y-axis: Average RMSE]. Explain how you chose
the range of (lambda) to explore.
(b) Which value of (lambda) gives the best fit?
(c) Report the test RMSE using the value of (lambda) you chose.
(d) Is it possible to use the information obtained during this experiment for feature
selection? If so, explain how?
(e) Report the test RMSE of the best fit you achieve with a reduced set of features?
(f) How different is the performance of the model with reduced features compared to
the model using all the features? Comment about the dierence.