HW 5¶
ISE 530 Optimization for Analytics Homework V: An AMPL exercise. Due 11:59 PM Wednesday October 21, 2020.
The spreadsheet of crime dataset has 50 rows (number of data points) and 7 columns. The first two columns are the observed crime rates to be predicted from the features in the last 5 columns. Formulate a least-squares regression problem with a regularizer term as the following quadratic program with the columns of the matrix A being the five features from the dataset and y is the 50-dimensional vector of observed crime rates from the first column:
Tikhonov-regularized LASSO Regression with \(\lambda = 1\):
So the second column the dataset is not used. Solve the problem using AMPL.
%load_ext autotime
%load_ext nb_black
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
plt.rcParams["figure.dpi"] = 300
plt.rcParams["figure.figsize"] = (16, 12)
import pandas as pd
import numpy as np
import cvxpy as cp
import scipy as sp
from scipy import optimize
import functools
crime = pd.read_table(
"./data/crime.txt",
header=None,
names=["y_1", "y_2", "x_1", "x_2", "x_3", "x_4", "x_5"],
)
crime.head()
y_1 | y_2 | x_1 | x_2 | x_3 | x_4 | x_5 | |
---|---|---|---|---|---|---|---|
0 | 478 | 184 | 40 | 74 | 11 | 31 | 20 |
1 | 494 | 213 | 32 | 72 | 11 | 43 | 18 |
2 | 643 | 347 | 57 | 70 | 18 | 16 | 16 |
3 | 341 | 565 | 31 | 71 | 11 | 25 | 19 |
4 | 773 | 327 | 67 | 72 | 9 | 29 | 24 |
loss_fn = lambda X, Y, β: 0.5 * (cp.norm2(Y - X @ β) ** 2)
regularizer = lambda β: cp.norm1(β)
objective_fn = lambda X, Y, β, λ: loss_fn(X, Y, β) + λ * regularizer(β)
mse = lambda X, Y, β: (1.0 / X.shape[0]) * loss_fn(X, Y, β).value
features = ["x_1", "x_2", "x_3", "x_4", "x_5"]
response = ["y_1"]
β = cp.Variable((len(features), 1))
λ = 1
y, A, x = crime[response].values, crime[features].values, β
prob = cp.Problem(cp.Minimize(objective_fn(X=A, Y=y, β=x, λ=λ)))
prob.solve()
print("Tikhonov-regularized LASSO Regression with λ=1 Solution")
print("=" * 40)
print(f"Status: {prob.status}")
print(f"The optimal value is: {np.round(prob.value, 2)}")
print(f"The optimal solution is: x = {[np.round(β_i, 2) for β_i in β.value.flatten()]}")
Tikhonov-regularized LASSO Regression with λ=1 Solution
========================================
Status: optimal_inaccurate
The optimal value is: 1445051.55
The optimal solution is: x = [11.2, -0.16, 13.8, 3.65, -1.45]
time: 19.6 ms
Tikhonov-regularized Lasso Regression to Quadratic Program¶
From:
Let’s set \(x = x^+ - x^-\), and so \(\vert x \vert = x^+ + x^-\), for \(x^{\pm} \geq 0\):
Expanding:
x = cp.Variable((len(features), 2))
prob = cp.Problem(
cp.Minimize(
0.5 * cp.quad_form(x[:, 0] - x[:, 1], A.T @ A)
- (y.T @ A) @ (x[:, 0] - x[:, 1])
+ np.ones(len(features)) @ (x[:, 0] + x[:, 1])
),
constraints=[x >= 0],
)
prob.solve()
print("Tikhonov-regularized LASSO Regression Quadratic Program with λ=1 Solution")
print("=" * 40)
print(f"Status: {prob.status}")
print(f"The optimal value is: {np.round(prob.value, 2)}")
print(f"The optimal solution is: x = {[np.round(β_i, 2) for β_i in β.value.flatten()]}")
Tikhonov-regularized LASSO Regression Quadratic Program with λ=1 Solution
========================================
Status: optimal
The optimal value is: -13558345.15
The optimal solution is: x = [11.2, -0.16, 13.8, 3.65, -1.45]
time: 14.5 ms