import pandas as pd
Sometimes, we may need to use a specific encoding:
encoding = "ISO-8859-1"
encoding = "utf-8"
football = pd.read_csv("football_2.csv", encoding = "ISO-8859-1")
football.head()
ID | Name | Age | Photo | Nationality | Flag | Overall | Potential | Club | Club Logo | ... | Composure | Marking | StandingTackle | SlidingTackle | GKDiving | GKHandling | GKKicking | GKPositioning | GKReflexes | Release Clause | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 207439 | L. Paredes | 24 | https://cdn.sofifa.org/players/4/19/207439.png | Argentina | https://cdn.sofifa.org/flags/52.png | 80 | 85 | NaN | https://cdn.sofifa.org/flags/52.png | ... | 74.0 | 73.0 | 75.0 | 72.0 | 9.0 | 14.0 | 6.0 | 9.0 | 10.0 | NaN |
1 | 156713 | A. Granqvist | 33 | https://cdn.sofifa.org/players/4/19/156713.png | Sweden | https://cdn.sofifa.org/flags/46.png | 80 | 80 | NaN | https://cdn.sofifa.org/flags/46.png | ... | 78.0 | 82.0 | 83.0 | 79.0 | 7.0 | 9.0 | 12.0 | 10.0 | 15.0 | NaN |
2 | 229909 | A. Lunev | 26 | https://cdn.sofifa.org/players/4/19/229909.png | Russia | https://cdn.sofifa.org/flags/40.png | 79 | 81 | NaN | https://cdn.sofifa.org/flags/40.png | ... | 69.0 | 18.0 | 20.0 | 12.0 | 80.0 | 73.0 | 65.0 | 77.0 | 85.0 | NaN |
3 | 187347 | I. Smolnikov | 29 | https://cdn.sofifa.org/players/4/19/187347.png | Russia | https://cdn.sofifa.org/flags/40.png | 79 | 79 | NaN | https://cdn.sofifa.org/flags/40.png | ... | 73.0 | 76.0 | 76.0 | 80.0 | 7.0 | 12.0 | 10.0 | 8.0 | 15.0 | NaN |
4 | 153260 | Hilton | 40 | https://cdn.sofifa.org/players/4/19/153260.png | Brazil | https://cdn.sofifa.org/flags/54.png | 78 | 78 | Montpellier HSC | https://cdn.sofifa.org/teams/2/light/70.png | ... | 70.0 | 83.0 | 77.0 | 76.0 | 12.0 | 7.0 | 11.0 | 12.0 | 13.0 | NaN |
5 rows × 88 columns
Variable names. Hard to read without the index.
football.columns.values.tolist()
['ID', 'Name', 'Age', 'Photo', 'Nationality', 'Flag', 'Overall', 'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special', 'Preferred Foot', 'International Reputation', 'Weak Foot', 'Skill Moves', 'Work Rate', 'Body Type', 'Real Face', 'Position', 'Jersey Number', 'Joined', 'Loaned From', 'Contract Valid Until', 'Height', 'Weight', 'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM', 'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB', 'LCB', 'CB', 'RCB', 'RB', 'Crossing', 'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes', 'Release Clause']
football_variables_df = pd.DataFrame(football.columns.values, columns = ["Variables"])
football_variables_df
Variables | |
---|---|
0 | ID |
1 | Name |
2 | Age |
3 | Photo |
4 | Nationality |
... | ... |
83 | GKHandling |
84 | GKKicking |
85 | GKPositioning |
86 | GKReflexes |
87 | Release Clause |
88 rows × 1 columns
Disply all rows
print(football_variables_df.to_string())
Variables 0 ID 1 Name 2 Age 3 Photo 4 Nationality 5 Flag 6 Overall 7 Potential 8 Club 9 Club Logo 10 Value 11 Wage 12 Special 13 Preferred Foot 14 International Reputation 15 Weak Foot 16 Skill Moves 17 Work Rate 18 Body Type 19 Real Face 20 Position 21 Jersey Number 22 Joined 23 Loaned From 24 Contract Valid Until 25 Height 26 Weight 27 LS 28 ST 29 RS 30 LW 31 LF 32 CF 33 RF 34 RW 35 LAM 36 CAM 37 RAM 38 LM 39 LCM 40 CM 41 RCM 42 RM 43 LWB 44 LDM 45 CDM 46 RDM 47 RWB 48 LB 49 LCB 50 CB 51 RCB 52 RB 53 Crossing 54 Finishing 55 HeadingAccuracy 56 ShortPassing 57 Volleys 58 Dribbling 59 Curve 60 FKAccuracy 61 LongPassing 62 BallControl 63 Acceleration 64 SprintSpeed 65 Agility 66 Reactions 67 Balance 68 ShotPower 69 Jumping 70 Stamina 71 Strength 72 LongShots 73 Aggression 74 Interceptions 75 Positioning 76 Vision 77 Penalties 78 Composure 79 Marking 80 StandingTackle 81 SlidingTackle 82 GKDiving 83 GKHandling 84 GKKicking 85 GKPositioning 86 GKReflexes 87 Release Clause
print(football.dtypes.to_string())
ID int64 Name object Age int64 Photo object Nationality object Flag object Overall int64 Potential int64 Club object Club Logo object Value int64 Wage int64 Special int64 Preferred Foot object International Reputation float64 Weak Foot float64 Skill Moves float64 Work Rate object Body Type object Real Face object Position object Jersey Number float64 Joined object Loaned From object Contract Valid Until object Height object Weight object LS object ST object RS object LW object LF object CF object RF object RW object LAM object CAM object RAM object LM object LCM object CM object RCM object RM object LWB object LDM object CDM object RDM object RWB object LB object LCB object CB object RCB object RB object Crossing float64 Finishing float64 HeadingAccuracy float64 ShortPassing float64 Volleys float64 Dribbling float64 Curve float64 FKAccuracy float64 LongPassing float64 BallControl float64 Acceleration float64 SprintSpeed float64 Agility float64 Reactions float64 Balance float64 ShotPower float64 Jumping float64 Stamina float64 Strength float64 LongShots float64 Aggression float64 Interceptions float64 Positioning float64 Vision float64 Penalties float64 Composure float64 Marking float64 StandingTackle float64 SlidingTackle float64 GKDiving float64 GKHandling float64 GKKicking float64 GKPositioning float64 GKReflexes float64 Release Clause object
Filter for strikers only.
football_2 = football[football["Position"] == "ST"]
football_2.head()
ID | Name | Age | Photo | Nationality | Flag | Overall | Potential | Club | Club Logo | ... | Composure | Marking | StandingTackle | SlidingTackle | GKDiving | GKHandling | GKKicking | GKPositioning | GKReflexes | Release Clause | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 187607 | A. Dzyuba | 29 | https://cdn.sofifa.org/players/4/19/187607.png | Russia | https://cdn.sofifa.org/flags/40.png | 78 | 78 | NaN | https://cdn.sofifa.org/flags/40.png | ... | 70.0 | 21.0 | 15.0 | 19.0 | 15.0 | 12.0 | 11.0 | 11.0 | 8.0 | NaN |
8 | 183389 | G. Sio | 29 | https://cdn.sofifa.org/players/4/19/183389.png | Ivory Coast | https://cdn.sofifa.org/flags/108.png | 77 | 77 | NaN | https://cdn.sofifa.org/flags/108.png | ... | 72.0 | 40.0 | 18.0 | 12.0 | 15.0 | 9.0 | 10.0 | 15.0 | 16.0 | NaN |
18 | 245683 | K. Fofana | 26 | https://cdn.sofifa.org/players/4/19/245683.png | Ivory Coast | https://cdn.sofifa.org/flags/108.png | 75 | 75 | NaN | https://cdn.sofifa.org/flags/108.png | ... | 83.0 | 23.0 | 37.0 | 46.0 | 7.0 | 11.0 | 7.0 | 11.0 | 14.0 | NaN |
45 | 190461 | B. Sigur̡arson | 27 | https://cdn.sofifa.org/players/4/19/190461.png | Iceland | https://cdn.sofifa.org/flags/24.png | 73 | 74 | NaN | https://cdn.sofifa.org/flags/24.png | ... | 76.0 | 31.0 | 39.0 | 24.0 | 9.0 | 12.0 | 10.0 | 15.0 | 16.0 | NaN |
65 | 225900 | J. Sambenito | 26 | https://cdn.sofifa.org/players/4/19/225900.png | Paraguay | https://cdn.sofifa.org/flags/58.png | 71 | 74 | NaN | https://cdn.sofifa.org/flags/58.png | ... | 74.0 | 15.0 | 16.0 | 16.0 | 15.0 | 16.0 | 15.0 | 7.0 | 7.0 | NaN |
5 rows × 88 columns
Filter for the required varables.
import numpy as np
football_3 = football_2.iloc[:, np.r_[2, 13, 18, 67, 68, 73, 75, 78, 11]]
football_3.head()
Age | Preferred Foot | Body Type | Balance | ShotPower | Aggression | Positioning | Composure | Wage | |
---|---|---|---|---|---|---|---|---|---|
5 | 29 | Right | Stocky | 32.0 | 78.0 | 75.0 | 78.0 | 70.0 | 1105 |
8 | 29 | Left | Normal | 73.0 | 77.0 | 77.0 | 76.0 | 72.0 | 2138 |
18 | 26 | Right | Normal | 60.0 | 78.0 | 67.0 | 72.0 | 83.0 | 3875 |
45 | 27 | Right | Normal | 76.0 | 68.0 | 73.0 | 73.0 | 76.0 | 3661 |
65 | 26 | Right | Lean | 64.0 | 73.0 | 49.0 | 75.0 | 74.0 | 2445 |
# Or simply (if no ranges are used)
football_2.iloc[:, [2, 13, 18, 67, 68, 73, 75, 78, 11]]
Age | Preferred Foot | Body Type | Balance | ShotPower | Aggression | Positioning | Composure | Wage | |
---|---|---|---|---|---|---|---|---|---|
5 | 29 | Right | Stocky | 32.0 | 78.0 | 75.0 | 78.0 | 70.0 | 1105 |
8 | 29 | Left | Normal | 73.0 | 77.0 | 77.0 | 76.0 | 72.0 | 2138 |
18 | 26 | Right | Normal | 60.0 | 78.0 | 67.0 | 72.0 | 83.0 | 3875 |
45 | 27 | Right | Normal | 76.0 | 68.0 | 73.0 | 73.0 | 76.0 | 3661 |
65 | 26 | Right | Lean | 64.0 | 73.0 | 49.0 | 75.0 | 74.0 | 2445 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
18181 | 19 | Right | Lean | 64.0 | 67.0 | 38.0 | 61.0 | 52.0 | 3399 |
18184 | 21 | Right | Stocky | 70.0 | 64.0 | 32.0 | 56.0 | 51.0 | 9389 |
18188 | 21 | Right | Normal | 53.0 | 61.0 | 62.0 | 60.0 | 61.0 | 10780 |
18190 | 19 | Right | Normal | 68.0 | 61.0 | 51.0 | 67.0 | 62.0 | 10121 |
18203 | 16 | Right | Lean | 60.0 | 61.0 | 36.0 | 62.0 | 63.0 | 8358 |
2152 rows × 9 columns
from sklearn.model_selection import train_test_split
Define predictors and target variable.
Creating dummies applies to categorical variables only.
If a prefix is desired:
X = pd.get_dummies(X, prefix_sep = 'dummy', drop_first = True)
X = football_3.drop(columns = ["Wage"])
# Get dummies for the caterogical variables
X = pd.get_dummies(X, drop_first = True)
y = football_3["Wage"]
X
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
5 | 29 | 32.0 | 78.0 | 75.0 | 78.0 | 70.0 | 1 | 0 | 0 | 1 |
8 | 29 | 73.0 | 77.0 | 77.0 | 76.0 | 72.0 | 0 | 0 | 1 | 0 |
18 | 26 | 60.0 | 78.0 | 67.0 | 72.0 | 83.0 | 1 | 0 | 1 | 0 |
45 | 27 | 76.0 | 68.0 | 73.0 | 73.0 | 76.0 | 1 | 0 | 1 | 0 |
65 | 26 | 64.0 | 73.0 | 49.0 | 75.0 | 74.0 | 1 | 1 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
18181 | 19 | 64.0 | 67.0 | 38.0 | 61.0 | 52.0 | 1 | 1 | 0 | 0 |
18184 | 21 | 70.0 | 64.0 | 32.0 | 56.0 | 51.0 | 1 | 0 | 0 | 1 |
18188 | 21 | 53.0 | 61.0 | 62.0 | 60.0 | 61.0 | 1 | 0 | 1 | 0 |
18190 | 19 | 68.0 | 61.0 | 51.0 | 67.0 | 62.0 | 1 | 0 | 1 | 0 |
18203 | 16 | 60.0 | 61.0 | 36.0 | 62.0 | 63.0 | 1 | 1 | 0 | 0 |
2152 rows × 10 columns
y
5 1105 8 2138 18 3875 45 3661 65 2445 ... 18181 3399 18184 9389 18188 10780 18190 10121 18203 8358 Name: Wage, Length: 2152, dtype: int64
Split the dataset
train_X, valid_X, train_y, valid_y = train_test_split(X, y, test_size = 0.3, random_state = 666)
Check.
train_X.head()
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
13946 | 26 | 69.0 | 71.0 | 64.0 | 72.0 | 72.0 | 1 | 1 | 0 | 0 |
7711 | 22 | 85.0 | 69.0 | 63.0 | 47.0 | 56.0 | 1 | 0 | 0 | 1 |
8402 | 25 | 65.0 | 52.0 | 22.0 | 55.0 | 52.0 | 1 | 1 | 0 | 0 |
13651 | 26 | 59.0 | 76.0 | 80.0 | 75.0 | 72.0 | 0 | 0 | 1 | 0 |
1625 | 28 | 55.0 | 70.0 | 55.0 | 61.0 | 71.0 | 0 | 1 | 0 | 0 |
len(train_X)
1506
train_y.head()
13946 22512 7711 6760 8402 5377 13651 13711 1625 10521 Name: Wage, dtype: int64
len(train_y)
1506
valid_X.head()
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
7882 | 30 | 59.0 | 67.0 | 39.0 | 63.0 | 52.0 | 1 | 0 | 1 | 0 |
14555 | 24 | 55.0 | 80.0 | 59.0 | 71.0 | 65.0 | 1 | 0 | 0 | 1 |
16210 | 32 | 61.0 | 70.0 | 52.0 | 75.0 | 67.0 | 1 | 0 | 0 | 1 |
15847 | 24 | 60.0 | 75.0 | 49.0 | 71.0 | 70.0 | 1 | 0 | 1 | 0 |
12382 | 27 | 67.0 | 76.0 | 55.0 | 78.0 | 77.0 | 1 | 0 | 1 | 0 |
len(valid_X)
646
valid_y.head()
7882 5628 14555 28875 16210 6941 15847 10144 12382 45877 Name: Wage, dtype: int64
len(valid_y)
646
import sklearn
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(train_X, train_y)
LinearRegression()
train_y_pred = model.predict(train_X)
train_y_pred
array([24239.85483562, 3778.75482905, -4114.30802593, ..., 18813.39087789, 21533.92321643, 22278.55484414])
train_y_pred_df = pd.DataFrame(train_y_pred, columns = ["Training_Prediction"])
train_y_pred_df
Training_Prediction | |
---|---|
0 | 24239.854836 |
1 | 3778.754829 |
2 | -4114.308026 |
3 | 29762.024484 |
4 | 15704.772118 |
... | ... |
1501 | 25387.715396 |
1502 | 21812.772659 |
1503 | 18813.390878 |
1504 | 21533.923216 |
1505 | 22278.554844 |
1506 rows × 1 columns
print("model intercept: ", model.intercept_)
print("model coefficients: ", model.coef_)
print("Model score: ", model.score(train_X, train_y))
model intercept: 293888.9838689667 model coefficients: [-9.16152410e+02 4.85334103e+01 4.63702248e+02 4.80950017e+01 6.32137998e+02 3.74975144e+02 -2.20174023e+03 -3.55489317e+05 -3.56667999e+05 -3.57613010e+05] Model score: 0.4586287207175519
Coefficients, easier to read.
print(pd.DataFrame({"Predictor": train_X.columns, "Coefficient": model.coef_}))
Predictor Coefficient 0 Age -916.152410 1 Balance 48.533410 2 ShotPower 463.702248 3 Aggression 48.095002 4 Positioning 632.137998 5 Composure 374.975144 6 Preferred Foot_Right -2201.740231 7 Body Type_Lean -355489.317404 8 Body Type_Normal -356667.999143 9 Body Type_Stocky -357613.009856
Get the RMSE for training set
mse_train = sklearn.metrics.mean_squared_error(train_y, train_y_pred)
mse_train
261404234.2377431
import math
rmse_train = math.sqrt(mse_train)
rmse_train
16168.000316605116
train_y.describe()
count 1506.000000 mean 12698.381142 std 21981.278007 min 1290.000000 25% 4692.500000 50% 6544.000000 75% 12364.250000 max 407609.000000 Name: Wage, dtype: float64
If using the dmba package:
pip install dmba
or
conda install -c conda-forge dmba
Then load the library
import dmba
from dmba import regressionSummary
import dmba
from dmba import regressionSummary
regressionSummary(train_y, train_y_pred)
Regression statistics Mean Error (ME) : 0.0000 Root Mean Squared Error (RMSE) : 16168.0003 Mean Absolute Error (MAE) : 8475.7614 Mean Percentage Error (MPE) : -32.8265 Mean Absolute Percentage Error (MAPE) : 111.6103
Residuals.
train_residuals = train_y - train_y_pred
train_residuals
13946 -1727.854836 7711 2981.245171 8402 9491.308026 13651 -16051.024484 1625 -5183.772118 ... 12759 -3215.715396 17284 -15227.772659 1016 -7666.390878 16984 -9871.923216 16744 4022.445156 Name: Wage, Length: 1506, dtype: float64
type(train_residuals)
pandas.core.series.Series
import matplotlib.pyplot as plt
plt.hist(train_residuals, bins = 30)
plt.title("Residuals for Training")
plt.show()
train_residuals_df = train_residuals.to_frame(name = "Wage_Residuals")
train_residuals_df
Wage_Residuals | |
---|---|
13946 | -1727.854836 |
7711 | 2981.245171 |
8402 | 9491.308026 |
13651 | -16051.024484 |
1625 | -5183.772118 |
... | ... |
12759 | -3215.715396 |
17284 | -15227.772659 |
1016 | -7666.390878 |
16984 | -9871.923216 |
16744 | 4022.445156 |
1506 rows × 1 columns
import matplotlib.pyplot as plt
plt.hist(train_residuals_df["Wage_Residuals"], bins = 30)
plt.title("Residuals for Training")
plt.show()
Normality
import numpy as np
from scipy.stats import shapiro
shapiro(train_y)
ShapiroResult(statistic=0.38412952423095703, pvalue=0.0)
shapiro(train_residuals)
ShapiroResult(statistic=0.5784118175506592, pvalue=0.0)
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_df = pd.DataFrame()
vif_df["features"] = train_X.columns
vif_df["VIF"] = [variance_inflation_factor(train_X.values, i) for i in range(train_X.shape[1])]
print(vif_df)
features VIF 0 Age 48.741902 1 Balance 33.364691 2 ShotPower 155.270485 3 Aggression 18.482052 4 Positioning 184.558829 5 Composure 115.774481 6 Preferred Foot_Right 7.714600 7 Body Type_Lean 32.118906 8 Body Type_Normal 59.412529 9 Body Type_Stocky 11.108926
valid_y_pred = model.predict(valid_X)
valid_y_pred
array([ 2.66530045e+03, 2.39448807e+04, 1.52116766e+04, 2.42079729e+04, 2.96013138e+04, 1.24219313e+04, 7.03030428e+03, 1.95690670e+04, 1.65262949e+04, -1.47780320e+03, 1.30968036e+04, 1.46015036e+04, 1.65416124e+04, 1.23451712e+04, 2.41133395e+04, 3.44640274e+04, 1.67102936e+04, 1.29909076e+04, 3.97181605e+03, 2.20783678e+04, 5.38322832e+03, 2.09776571e+04, 2.08805388e+04, 2.34148863e+04, 1.06999140e+04, -5.56234183e+03, 1.17865372e+04, 2.23538133e+04, 1.43890326e+04, 1.75954292e+04, -1.22354489e+02, 1.29628192e+04, 8.41552263e+03, 2.27889941e+04, 2.31121184e+03, 1.94168827e+04, -6.96486103e+03, 1.37996432e+04, 3.27861195e+03, 2.86742008e+04, 2.97875564e+04, -6.32773021e+03, 2.31928580e+04, 3.04579799e+03, 1.54775276e+04, 2.42101352e+04, -5.84285114e+03, -1.29340208e+03, 4.06247257e+03, 1.80227373e+04, 2.17889372e+04, 9.70873486e+03, 8.02344053e+03, 1.92883880e+04, 1.77181908e+04, 2.18037493e+04, -4.33997863e+03, 1.56144785e+04, 9.71271318e+03, 2.05954475e+04, 4.75178565e+03, 2.30383707e+04, 1.02677166e+04, 2.35779602e+04, 9.87039872e+03, 1.68866474e+04, -1.31118203e+03, 8.87430877e+03, 3.24007684e+03, 3.23350289e+04, 1.53211409e+04, 1.07576794e+03, 1.62358353e+03, 1.73480299e+04, 1.79438503e+04, 2.49372041e+04, 2.00529623e+04, 1.68100187e+04, 8.56136513e+03, 3.46264372e+04, 2.54399862e+04, 4.68426335e+04, 9.91675364e+03, 1.83355436e+04, 3.54108387e+04, 1.70138955e+04, 1.46236932e+04, 1.02365282e+04, 1.62242848e+04, 2.37720957e+04, 3.08256560e+04, 1.59745648e+04, 1.59032676e+04, 3.19791193e+04, 2.04379622e+04, 1.55905542e+04, 2.37955466e+04, 1.04128105e+04, 9.19329209e+03, 4.21540077e+04, 1.27777969e+04, 2.41864412e+04, 2.14468738e+04, 8.06144737e+03, 2.36390151e+04, 1.44857796e+04, 2.78377031e+04, 2.65205841e+04, 2.38842489e+03, 4.01141506e+04, 1.19248133e+04, 1.74026797e+04, -3.55892891e+03, 1.72208335e+04, 1.22610883e+04, -5.54908320e+03, 9.56791282e+03, 3.64233850e+03, 3.18218976e+04, 1.47101698e+04, 3.73624918e+03, 5.33297175e+03, 2.29637658e+04, 2.10020910e+03, 2.39209825e+04, 4.18162608e+04, 3.26198242e+04, 3.84999758e+04, 2.36460768e+03, 2.00953034e+03, 2.70110133e+04, -2.01391023e+03, -2.20258250e+03, 1.69663027e+04, 3.12074612e+04, 2.05483828e+02, -3.60379347e+03, 1.35194619e+03, -2.64933746e+03, 8.64132573e+03, 1.58664227e+04, 1.67561284e+04, 1.34151485e+04, 3.70881499e+03, 4.13986671e+03, 3.08144727e+04, 1.23279327e+04, -2.22148139e+01, 5.57685656e+03, 2.56361861e+04, 2.59934302e+04, 1.34265919e+03, 2.65983638e+04, 1.18370810e+04, 1.09183425e+04, 1.43909810e+04, 3.88503740e+03, -2.33173138e+03, 2.02760295e+04, 1.87326850e+04, 1.96038255e+04, -9.93766117e+03, 7.47558360e+03, 1.66588172e+04, 1.16956311e+04, 3.90959788e+03, 7.51416417e+03, 2.06412445e+04, 1.75336594e+04, 1.33414610e+04, 1.96836251e+03, -4.82880254e+03, -3.71099746e+02, 7.66629848e+03, 6.14881071e+03, 1.23687003e+04, 1.21231323e+04, 1.35733402e+04, 1.83551007e+04, 1.70539379e+04, 3.02524002e+04, -1.07159337e+04, 6.01304553e+03, 1.14180403e+04, -3.34690338e+03, 1.00622718e+04, 2.06918571e+04, 1.73350758e+04, 2.29914945e+04, -5.50771205e+03, 2.85330575e+04, 1.98521492e+03, 1.06942632e+04, 1.73230322e+04, 2.49513789e+03, -2.79106246e+03, 4.39038348e+04, 1.43537643e+04, 1.85450045e+04, -6.16542967e+03, 1.63629421e+04, 2.79125830e+04, 1.73431638e+04, 3.14934519e+04, 4.66349422e+03, 1.16135024e+04, 3.35813201e+04, 4.53795643e+03, 3.03687846e+04, 5.61531696e+03, 7.07152851e+03, 3.18497487e+04, 1.53772460e+04, 2.67107658e+03, 2.33955389e+04, 2.40698188e+04, -4.75904102e+03, -3.45112153e+03, 2.41953892e+04, 1.23263230e+04, 2.13734730e+04, 4.57129895e+03, 4.79790380e+03, 3.16235504e+04, 2.16213142e+04, 1.75585724e+04, 5.23850275e+03, 1.36123705e+04, 1.30277524e+04, 1.95346600e+04, 2.84475997e+03, 2.44038322e+04, 7.41182434e+03, 1.37544881e+04, -7.02660112e+02, 1.64984689e+04, 1.11341685e+04, 7.89961221e+03, 2.46923901e+04, 1.76240587e+04, 8.96931690e+03, 6.48185568e+03, -1.26222354e+03, -1.69218974e+03, 2.57214042e+03, 1.61502589e+04, 1.28138050e+04, 9.60814203e+03, 1.63259753e+04, 6.50911760e+03, -9.95762340e+03, 3.42057372e+02, 1.02190745e+04, 1.67276920e+04, 2.21600760e+04, 4.65490924e+03, 1.25402256e+04, 1.61009022e+04, 1.90925692e+04, 1.98924863e+04, 1.10061114e+04, 5.83648195e+03, 1.70692595e+04, 1.45910951e+04, 2.66219412e+04, 8.48576063e+03, 1.69487687e+04, 2.99414092e+03, 1.80086237e+04, 1.87362018e+04, 3.54003722e+04, -3.68164534e+02, 2.37253161e+04, 2.03710982e+04, 1.20499329e+04, 2.47400389e+04, -7.52342461e+03, 2.49023341e+04, 6.50286770e+03, 4.85828603e+03, 9.87788899e+03, 1.49300957e+04, 9.99666283e+03, 2.42095834e+04, 2.23392492e+03, 3.87104769e+03, 1.84381152e+04, 1.99189419e+04, 2.88502679e+04, 2.95576858e+03, 1.02739668e+04, 8.61212517e+03, 3.79011769e+04, 1.18256155e+04, 6.26209526e+03, 2.04705359e+04, 8.36627864e+03, 1.49795212e+03, 1.96112157e+04, 1.25041470e+04, 3.33043822e+03, 8.02272601e+03, 6.30190884e+01, 1.38667489e+04, -2.66181757e+03, 2.41768675e+02, 1.19301758e+04, 1.17904093e+04, 2.05007501e+04, -1.68721187e+04, 2.41243743e+04, 3.72096058e+03, 6.75666851e+03, 1.58471520e+04, 1.94148247e+04, 1.48026594e+03, 1.31713425e+04, 2.98162207e+04, -4.06148613e+03, 2.23375583e+04, 1.19660384e+04, 2.43317954e+04, 5.31000984e+04, 1.79335761e+04, 1.34151909e+04, -4.84794752e+03, 1.87710521e+04, 1.87750241e+04, 2.12204070e+04, 2.20818928e+04, 1.15592485e+03, -1.04175449e+04, -3.17875690e+03, 2.90112257e+04, 1.69765732e+04, 2.60161382e+04, 2.43898074e+04, 8.39128773e+02, 1.63886547e+04, -7.37612923e+03, 5.62485527e+03, 6.74975491e+03, 1.22944936e+04, 1.33810595e+04, 5.46762848e+03, 1.28159578e+04, 1.08639469e+04, 1.03348099e+03, 1.63132754e+03, 1.44072810e+04, 1.84246280e+04, -2.47909605e+03, 1.95474530e+04, 1.99982066e+04, 2.25021690e+04, 1.32872506e+04, 3.28203363e+03, 9.06686227e+03, 2.45238371e+04, 1.33661644e+04, -1.14433324e+03, 1.01026574e+04, 2.07442960e+04, 1.01544218e+04, 1.47549186e+04, 1.99151083e+04, 1.84707327e+04, 2.91420465e+03, 1.55694684e+04, 1.98216015e+04, 7.22538464e+03, 1.59688520e+04, 8.82264506e+03, -5.57106659e+03, -3.61325599e+03, 1.66954298e+04, 6.04080018e+03, 5.64313195e+03, 2.34846318e+04, 4.95252927e+03, 1.17609615e+04, 2.73460395e+02, 4.62524003e+04, 1.94219123e+04, 2.67636958e+03, 7.99161270e+02, 3.45401861e+04, 2.59576995e+03, 1.89228641e+04, 3.42841898e+04, 4.15955289e+03, 3.50195062e+04, 2.12903857e+04, -5.06633274e+03, 1.72083042e+04, 1.09837661e+04, 9.35620710e+03, 1.39696572e+04, 2.02365913e+04, 2.01273692e+04, 4.92787054e+03, 2.75630232e+04, 1.94245386e+04, 1.78279991e+04, 2.49990535e+04, 2.22097084e+04, 8.43216129e+03, 8.92326875e+02, 5.49338576e+03, 1.95493159e+04, 1.06933156e+04, 1.17082833e+04, 2.25027856e+04, 1.45716041e+04, 4.09754472e+03, 9.19128351e+03, 7.77620947e+03, 1.90177070e+04, 1.96619616e+04, 4.66622488e+04, 3.67881827e+03, 5.08111933e+03, 5.38303692e+03, 1.24476702e+04, 1.84716579e+04, -1.31531118e+03, 2.03243896e+04, 1.86851668e+04, 3.77256684e+03, 1.01734030e+04, 3.06560978e+04, 2.34528445e+04, 1.83387120e+04, 1.40144687e+04, 1.65206404e+04, 3.96428102e+04, 1.04960032e+04, -4.15137281e+03, 1.05653584e+04, 3.39106609e+04, 2.97230029e+04, 1.53912380e+04, 1.11446062e+04, 2.24223425e+04, 2.68597465e+04, 1.22405459e+04, -2.66504745e+03, 1.20825230e+04, 2.03241862e+04, -3.92528017e+03, 5.56118121e+03, -5.96828907e+03, 1.60675081e+04, 8.47372390e+03, 1.66574753e+04, -1.08819482e+04, 2.20162527e+04, 1.36954201e+04, 1.52982927e+04, 2.86445902e+04, 2.15596058e+04, 3.72854768e+03, 6.01809567e+03, 2.62442863e+04, 1.78847640e+04, 2.60260009e+04, 2.21141255e+04, -2.91738379e+03, 2.81612491e+04, 4.74778654e+04, 8.43042557e+03, 2.17905824e+04, 1.78861898e+04, 2.81132855e+04, 2.24379049e+04, 4.87238954e+03, 3.48894848e+04, 1.06339999e+04, 4.88834577e+03, 2.37027527e+04, 2.77041377e+04, 1.67256495e+04, -1.23767615e+03, 2.45406129e+04, -6.73009778e+03, -3.71221165e+03, 1.87645554e+04, 3.45908317e+03, 9.69112280e+03, 9.79096864e+03, -6.92458269e+03, 3.78754518e+03, -3.03440938e+03, 2.51852102e+04, -4.47402223e+03, 3.98987083e+04, 2.19649914e+04, 1.91549960e+04, 8.60118620e+03, 2.05626620e+04, -3.69035608e+03, 1.48537949e+04, 4.04843946e+03, 3.16774267e+04, 1.70576259e+04, 1.73557586e+03, 3.20480351e+04, 1.56235213e+04, 5.63278927e+03, 2.32403127e+04, 2.46781892e+04, 7.00114435e+03, -3.58186265e+03, 2.21807504e+04, -6.37439516e+03, 7.92053004e+03, 5.22713559e+03, 3.85323630e+03, 1.61069767e+04, 1.75678237e+04, 1.81873006e+04, 1.00464058e+04, 8.84400565e+03, 1.72847394e+04, 5.17322566e+02, 2.56655118e+04, 2.62752003e+04, 1.07591937e+04, 1.51229379e+04, 1.93102802e+04, 9.00728050e+03, 2.97264751e+03, -1.65027255e+03, 1.94821918e+04, -4.41337726e+03, 1.89499469e+04, 3.01911889e+01, 1.34545696e+04, 5.57215010e+03, 4.50681337e+02, 3.92060784e+04, 1.32187052e+04, 1.10678007e+04, 1.06085720e+04, -4.72793002e+03, 2.23111111e+04, 4.89426890e+03, 1.58645569e+04, -4.78889524e+03, 2.36842533e+03, 3.30683841e+04, 4.89473966e+02, 1.38929836e+04, 2.39812906e+04, -8.08396068e+03, 1.55943106e+04, 2.26993605e+04, 3.51540327e+04, 3.59982676e+04, 2.57322782e+04, 2.85020030e+04, 1.23762047e+04, 1.77077060e+04, 6.05218625e+02, 2.12468521e+04, 1.98247308e+03, 9.75425736e+03, 1.91635760e+04, 1.11030216e+04, 2.59148087e+04, 1.04661967e+04, 3.23588749e+04, 3.13942429e+03, -4.08853810e+03, 1.52620187e+04, -1.58765960e+04, 1.92577674e+04, 2.37955347e+04, 1.31788779e+04, 1.08961748e+04, 1.81281065e+04, 1.83373496e+04, 9.36618933e+03, 2.37138764e+04, 1.44469600e+04, 2.24502896e+02, 1.87964914e+04, 1.71068838e+04, -1.06577785e+03, 1.00160136e+04, 1.60681170e+04, 3.22133872e+04, 2.30064316e+04, 6.75391713e+03, 1.60378128e+04, 1.32996181e+04, -1.49122318e+02, 8.22587441e+03, 3.26499372e+04, 1.85186487e+04, -4.30577382e+03, 1.16753828e+04, 5.05871265e+03, 4.04057843e+04, 7.68904389e+03, 1.82256266e+04, -2.70297307e+03, 2.81515606e+04, -5.27176698e+03, 1.86762020e+04, 1.06676760e+04, 3.22475566e+04, -5.52105925e+03, 2.62373303e+04, 1.44986397e+04, -4.73905063e+03, 3.65046624e+04, 4.02480572e+03, 8.28482915e+03, 1.95814608e+04, 8.41202780e+03, -3.84857827e+03, 8.40586509e+03, 1.03176404e+04, 2.43958443e+04, 1.38968987e+04, 3.67708230e+04, 1.72569153e+04, 6.03696891e+03, -1.43788106e+03, 3.32488578e+04, 1.62319091e+04, 1.61211529e+04, 4.90873986e+03, 9.34189590e+03, -5.86994130e+03, 1.42400588e+04, 3.39109284e+02, 3.07350763e+03, 1.75517998e+04, 1.60393277e+04, 5.81477513e+02, -1.61141500e+03, 2.06980825e+04, 1.15261767e+04, 3.04436687e+04])
valid_y_pred_df = pd.DataFrame(valid_y_pred, columns = ["Validation_Prediction"])
valid_y_pred_df
Validation_Prediction | |
---|---|
0 | 2665.300452 |
1 | 23944.880671 |
2 | 15211.676647 |
3 | 24207.972902 |
4 | 29601.313799 |
... | ... |
641 | 581.477513 |
642 | -1611.414995 |
643 | 20698.082536 |
644 | 11526.176711 |
645 | 30443.668736 |
646 rows × 1 columns
Get the RMSE for validation set.
mse_valid = sklearn.metrics.mean_squared_error(valid_y, valid_y_pred)
mse_valid
380956622.57514906
# As before
# import math
rmse_valid = math.sqrt(mse_valid)
rmse_valid
19518.110117917386
valid_y.describe()
count 646.000000 mean 13535.160991 std 23624.770667 min 1105.000000 25% 4708.750000 50% 6750.500000 75% 12827.750000 max 301070.000000 Name: Wage, dtype: float64
# As before:
# If using the dmba package:
# pip install dmba
# Done earlier. Just for illustration
# import dmba
# from dmba import regressionSummary
regressionSummary(valid_y, valid_y_pred)
Regression statistics Mean Error (ME) : 91.4105 Root Mean Squared Error (RMSE) : 19518.1101 Mean Absolute Error (MAE) : 9319.7708 Mean Percentage Error (MPE) : -43.3987 Mean Absolute Percentage Error (MAPE) : 118.8390
Residuals.
valid_residuals = valid_y - valid_y_pred
valid_residuals.head()
7882 2962.699548 14555 4930.119329 16210 -8270.676647 15847 -14063.972902 12382 16275.686201 Name: Wage, dtype: float64
import matplotlib.pyplot as plt
plt.hist(valid_residuals, bins = 30)
plt.title("Residuals for Validation")
plt.show()
valid_residuals_df = valid_residuals.to_frame(name = "Wage_Residuals")
valid_residuals_df
Wage_Residuals | |
---|---|
7882 | 2962.699548 |
14555 | 4930.119329 |
16210 | -8270.676647 |
15847 | -14063.972902 |
12382 | 16275.686201 |
... | ... |
8620 | 5455.522487 |
10786 | 6010.414995 |
16154 | 13639.917464 |
4990 | -8092.176711 |
14654 | -14046.668736 |
646 rows × 1 columns
import matplotlib.pyplot as plt
plt.hist(valid_residuals_df["Wage_Residuals"], bins = 30)
plt.title("Residuals for Validation")
plt.show()
Scikit-learn does not provide traditional regression model summaries.
Use statsmodels package if desired.
conda install -c conda-forge statsmodels
or
pip install statsmodels
import statsmodels.api as sm
model_statsmodels = sm.OLS(train_y, train_X)
results = model_statsmodels.fit()
print(results.summary())
OLS Regression Results ======================================================================================= Dep. Variable: Wage R-squared (uncentered): 0.515 Model: OLS Adj. R-squared (uncentered): 0.512 Method: Least Squares F-statistic: 159.1 Date: Sat, 18 Feb 2023 Prob (F-statistic): 4.36e-227 Time: 13:25:45 Log-Likelihood: -16865. No. Observations: 1506 AIC: 3.375e+04 Df Residuals: 1496 BIC: 3.380e+04 Df Model: 10 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- Age -852.0938 127.623 -6.677 0.000 -1102.432 -601.756 Balance 151.9979 40.583 3.745 0.000 72.392 231.604 ShotPower 633.7931 85.723 7.393 0.000 465.642 801.944 Aggression 15.2246 36.329 0.419 0.675 -56.037 86.486 Positioning 716.7647 94.926 7.551 0.000 530.562 902.968 Composure 382.7540 81.366 4.704 0.000 223.151 542.357 Preferred Foot_Right -493.9671 1360.831 -0.363 0.717 -3163.306 2175.372 Body Type_Lean -8.622e+04 4538.366 -18.999 0.000 -9.51e+04 -7.73e+04 Body Type_Normal -8.807e+04 4632.299 -19.011 0.000 -9.72e+04 -7.9e+04 Body Type_Stocky -8.937e+04 4906.471 -18.216 0.000 -9.9e+04 -7.97e+04 ============================================================================== Omnibus: 2072.969 Durbin-Watson: 1.959 Prob(Omnibus): 0.000 Jarque-Bera (JB): 660076.649 Skew: 7.548 Prob(JB): 0.00 Kurtosis: 104.446 Cond. No. 2.46e+03 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors assume that the covariance matrix of the errors is correctly specified. [3] The condition number is large, 2.46e+03. This might indicate that there are strong multicollinearity or other numerical problems.
new_players_df = pd.read_csv("new_players.csv")
new_players_df
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 27 | 59 | 75 | 68 | 80 | 76 | 1 | 0 | 0 | 1 |
1 | 21 | 42 | 71 | 52 | 60 | 76 | 1 | 1 | 0 | 0 |
2 | 19 | 76 | 80 | 22 | 75 | 56 | 0 | 0 | 0 | 1 |
new_records_players_pred = model.predict(new_players_df)
new_records_players_pred
array([29318.86943019, 20847.31938856, 29120.84519698])
# As before
# import pandas as pd
new_records_players_pred_df = pd.DataFrame(new_records_players_pred, columns = ["Prediction"])
new_records_players_pred_df
# to export
# new_records_players_pred_df.to_csv("whatever_name.csv")
Prediction | |
---|---|
0 | 29318.869430 |
1 | 20847.319389 |
2 | 29120.845197 |
alpha = 0.05
ci = np.quantile(train_residuals, 1 - alpha)
ci
17225.89938935508
def generate_results_confint(preds, ci):
df = pd.DataFrame()
df["Prediction"] = preds
if ci >= 0:
df["upper"] = preds + ci
df["lower"] = preds - ci
else:
df["upper"] = preds - ci
df["lower"] = preds + ci
return df
new_records_players_pred_confint_df = generate_results_confint(new_records_players_pred, ci)
new_records_players_pred_confint_df
Prediction | upper | lower | |
---|---|---|---|
0 | 29318.869430 | 46544.768820 | 12092.970041 |
1 | 20847.319389 | 38073.218778 | 3621.419999 |
2 | 29120.845197 | 46346.744586 | 11894.945808 |
train_X.dtypes
Age int64 Balance float64 ShotPower float64 Aggression float64 Positioning float64 Composure float64 Preferred Foot_Right uint8 Body Type_Lean uint8 Body Type_Normal uint8 Body Type_Stocky uint8 dtype: object
train_X_variables_df = pd.DataFrame(train_X.columns.values, columns = ["Variables"])
train_X_variables_df
Variables | |
---|---|
0 | Age |
1 | Balance |
2 | ShotPower |
3 | Aggression |
4 | Positioning |
5 | Composure |
6 | Preferred Foot_Right |
7 | Body Type_Lean |
8 | Body Type_Normal |
9 | Body Type_Stocky |
train_X_log10 = np.log10(train_X.iloc[:,0:6])
train_X_log10.head()
Age | Balance | ShotPower | Aggression | Positioning | Composure | |
---|---|---|---|---|---|---|
13946 | 1.414973 | 1.838849 | 1.851258 | 1.806180 | 1.857332 | 1.857332 |
7711 | 1.342423 | 1.929419 | 1.838849 | 1.799341 | 1.672098 | 1.748188 |
8402 | 1.397940 | 1.812913 | 1.716003 | 1.342423 | 1.740363 | 1.716003 |
13651 | 1.414973 | 1.770852 | 1.880814 | 1.903090 | 1.875061 | 1.857332 |
1625 | 1.447158 | 1.740363 | 1.845098 | 1.740363 | 1.785330 | 1.851258 |
train_X2 = pd.concat((train_X_log10, train_X.iloc[:,6:10]), axis = 1)
train_X2.head()
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
13946 | 1.414973 | 1.838849 | 1.851258 | 1.806180 | 1.857332 | 1.857332 | 1 | 1 | 0 | 0 |
7711 | 1.342423 | 1.929419 | 1.838849 | 1.799341 | 1.672098 | 1.748188 | 1 | 0 | 0 | 1 |
8402 | 1.397940 | 1.812913 | 1.716003 | 1.342423 | 1.740363 | 1.716003 | 1 | 1 | 0 | 0 |
13651 | 1.414973 | 1.770852 | 1.880814 | 1.903090 | 1.875061 | 1.857332 | 0 | 0 | 1 | 0 |
1625 | 1.447158 | 1.740363 | 1.845098 | 1.740363 | 1.785330 | 1.851258 | 0 | 1 | 0 | 0 |
valid_X_log10 = np.log10(valid_X.iloc[:,0:6])
valid_X_log10.head()
Age | Balance | ShotPower | Aggression | Positioning | Composure | |
---|---|---|---|---|---|---|
7882 | 1.477121 | 1.770852 | 1.826075 | 1.591065 | 1.799341 | 1.716003 |
14555 | 1.380211 | 1.740363 | 1.903090 | 1.770852 | 1.851258 | 1.812913 |
16210 | 1.505150 | 1.785330 | 1.845098 | 1.716003 | 1.875061 | 1.826075 |
15847 | 1.380211 | 1.778151 | 1.875061 | 1.690196 | 1.851258 | 1.845098 |
12382 | 1.431364 | 1.826075 | 1.880814 | 1.740363 | 1.892095 | 1.886491 |
valid_X2 = pd.concat((valid_X_log10, valid_X.iloc[:,6:10]), axis = 1)
valid_X2.head()
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
7882 | 1.477121 | 1.770852 | 1.826075 | 1.591065 | 1.799341 | 1.716003 | 1 | 0 | 1 | 0 |
14555 | 1.380211 | 1.740363 | 1.903090 | 1.770852 | 1.851258 | 1.812913 | 1 | 0 | 0 | 1 |
16210 | 1.505150 | 1.785330 | 1.845098 | 1.716003 | 1.875061 | 1.826075 | 1 | 0 | 0 | 1 |
15847 | 1.380211 | 1.778151 | 1.875061 | 1.690196 | 1.851258 | 1.845098 | 1 | 0 | 1 | 0 |
12382 | 1.431364 | 1.826075 | 1.880814 | 1.740363 | 1.892095 | 1.886491 | 1 | 0 | 1 | 0 |
train_y2 = np.log10(train_y)
train_y2
13946 4.352414 7711 3.829947 8402 3.730540 13651 4.137069 1625 4.022057 ... 12759 4.345805 17284 3.818556 1016 4.047158 16984 4.066773 16744 4.419972 Name: Wage, Length: 1506, dtype: float64
valid_y2 = np.log10(valid_y)
valid_y2
7882 3.750354 14555 4.460522 16210 3.841422 15847 4.006209 12382 4.661595 ... 8620 3.780821 10786 3.643354 16154 4.535775 4990 3.535800 14654 4.214764 Name: Wage, Length: 646, dtype: float64
import sklearn
from sklearn.linear_model import LinearRegression
model2 = LinearRegression()
model2.fit(train_X2, train_y2)
LinearRegression()
train_y2_pred = model2.predict(train_X2)
train_y2_pred
array([4.16670693, 3.70430495, 3.5309925 , ..., 4.08446948, 4.07971215, 4.12972782])
train_y2_pred_df = pd.DataFrame(train_y2_pred, columns = ["Training_Prediction"])
train_y2_pred_df
Training_Prediction | |
---|---|
0 | 4.166707 |
1 | 3.704305 |
2 | 3.530993 |
3 | 4.228996 |
4 | 3.970243 |
... | ... |
1501 | 4.194870 |
1502 | 4.084830 |
1503 | 4.084469 |
1504 | 4.079712 |
1505 | 4.129728 |
1506 rows × 1 columns
print("model intercept: ", model2.intercept_)
print("model coefficients: ", model2.coef_)
print("Model score: ", model2.score(train_X2, train_y2))
model intercept: -2.8669058840786708 model coefficients: [-0.74070881 0.14474169 1.64201441 0.09668163 1.83662843 1.15205982 0.00265055 -0.95251776 -0.96819688 -0.9947823 ] Model score: 0.4981172727732758
Coefficients, easier to read.
print(pd.DataFrame({"Predictor": train_X2.columns, "Coefficient": model2.coef_}))
Predictor Coefficient 0 Age -0.740709 1 Balance 0.144742 2 ShotPower 1.642014 3 Aggression 0.096682 4 Positioning 1.836628 5 Composure 1.152060 6 Preferred Foot_Right 0.002651 7 Body Type_Lean -0.952518 8 Body Type_Normal -0.968197 9 Body Type_Stocky -0.994782
Get the RMSE for training set
mse_train_2 = sklearn.metrics.mean_squared_error(train_y2, train_y2_pred)
mse_train_2
0.06356591642873398
import math
rmse_train_2 = math.sqrt(mse_train_2)
rmse_train_2
0.25212282012688575
train_y2.describe()
count 1506.000000 mean 3.904520 std 0.356004 min 3.110590 25% 3.671404 50% 3.815843 75% 4.092168 max 5.610244 Name: Wage, dtype: float64
If using the dmba package:
pip install dmba
or
conda install -c conda-forge dmba
Then load the library
import dmba
from dmba import regressionSummary
import dmba
from dmba import regressionSummary
regressionSummary(train_y2, train_y2_pred)
Regression statistics Mean Error (ME) : -0.0000 Root Mean Squared Error (RMSE) : 0.2521 Mean Absolute Error (MAE) : 0.1955 Mean Percentage Error (MPE) : -0.3879 Mean Absolute Percentage Error (MAPE) : 4.9901
Normality
import numpy as np
from scipy.stats import shapiro
shapiro(train_y2)
ShapiroResult(statistic=0.9299247860908508, pvalue=5.641337850676267e-26)
train_residuals_2 = train_y2 - train_y2_pred
train_residuals_2
13946 0.185707 7711 0.125642 8402 0.199548 13651 -0.091927 1625 0.051814 ... 12759 0.150935 17284 -0.266274 1016 -0.037311 16984 -0.012939 16744 0.290244 Name: Wage, Length: 1506, dtype: float64
shapiro(train_residuals_2)
ShapiroResult(statistic=0.9941757917404175, pvalue=1.2548777704068925e-05)
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_df_2 = pd.DataFrame()
vif_df_2["features"] = train_X.columns
vif_df_2["VIF"] = [variance_inflation_factor(train_X2.values, i) for i in range(train_X2.shape[1])]
print(vif_df_2)
features VIF 0 Age 503.047038 1 Balance 392.228433 2 ShotPower 2230.845164 3 Aggression 224.646809 4 Positioning 2688.086015 5 Composure 1657.030037 6 Preferred Foot_Right 7.725654 7 Body Type_Lean 234.209226 8 Body Type_Normal 421.031902 9 Body Type_Stocky 71.646782
valid_y2_pred = model2.predict(valid_X2)
valid_y2_pred
array([3.76367355, 4.15529918, 4.02761799, 4.17061134, 4.27664223, 3.90982955, 3.86919134, 4.05394802, 4.07575135, 3.58962247, 3.90714578, 3.99290823, 3.95256648, 3.85413803, 4.19055898, 4.3770946 , 4.06555802, 4.0018328 , 3.71793422, 4.15389889, 3.75427973, 4.11372724, 4.04790165, 4.16199766, 3.81913817, 3.45592807, 3.85618089, 4.11258275, 4.01680881, 4.0156488 , 3.57791354, 3.92013811, 3.93333019, 4.15243617, 3.64206255, 4.05268594, 3.55963358, 3.93038411, 3.65549926, 4.26277204, 4.2957239 , 3.407724 , 4.0802973 , 3.71756546, 4.01320061, 4.15470805, 3.43053436, 3.58982582, 3.77744561, 4.05297243, 4.1264634 , 3.86412457, 3.80646696, 4.10663088, 4.0277011 , 4.10682818, 3.4907613 , 3.9930389 , 3.7844055 , 4.07632993, 3.74574599, 4.13168887, 3.93609012, 4.15447129, 3.8351393 , 3.98164635, 3.70036723, 3.83283241, 3.6957984 , 4.32483198, 3.97680842, 3.64890098, 3.68376159, 4.02159597, 4.04864317, 4.17927691, 4.07174931, 3.99436296, 3.89363006, 4.35326354, 4.1946218 , 4.55669742, 3.83271191, 4.02022325, 4.38496503, 3.99722499, 3.93647576, 3.89577843, 3.99995635, 4.17175375, 4.29133548, 4.0160841 , 3.98917806, 4.31295093, 4.10110026, 4.01077971, 4.15171613, 3.93406065, 3.92381907, 4.47762954, 3.86056946, 4.14341326, 4.10795381, 3.78488352, 4.13184288, 3.96830151, 4.22809601, 4.17370611, 3.70893568, 4.41445846, 3.9860071 , 4.03624456, 3.54130701, 3.97681342, 3.93321275, 3.40274183, 3.77377519, 3.7484716 , 4.31232063, 3.99361937, 3.709269 , 3.76580361, 4.12616882, 3.62756573, 4.10717141, 4.4300846 , 4.26733575, 4.42559001, 3.68110372, 3.63376597, 4.20690577, 3.55348096, 3.58817486, 4.0028743 , 4.31354532, 3.60417179, 3.46791211, 3.63460305, 3.52797995, 3.80240164, 3.99804681, 4.07003803, 3.96740854, 3.67782452, 3.7088948 , 4.29154294, 3.91117274, 3.68003573, 3.72960344, 4.20021854, 4.16794103, 3.68047068, 4.24551816, 3.89801283, 3.85297611, 3.93996911, 3.79674516, 3.5260676 , 4.08968162, 4.04203417, 4.08326129, 3.31249389, 3.78856385, 4.03608251, 3.97750166, 3.73639092, 3.89686649, 4.08641956, 4.02202326, 3.95785434, 3.66121658, 3.47142139, 3.68216025, 3.79270249, 3.74367342, 3.90196004, 3.99075541, 3.9180192 , 4.06335476, 4.00007676, 4.27009768, 3.31072324, 3.74364683, 3.93199629, 3.66915503, 3.86109118, 4.12719138, 4.02302088, 4.12066631, 3.46212247, 4.24516929, 3.60772261, 3.86266376, 4.0068775 , 3.64991768, 3.57821803, 4.51164804, 3.86884302, 4.02278378, 3.40892806, 4.00929465, 4.27441749, 4.04650643, 4.31664087, 3.79570229, 3.87387915, 4.33204612, 3.73025151, 4.26612984, 3.739114 , 3.79062207, 4.35863673, 3.9667509 , 3.65394184, 4.08544516, 4.21615412, 3.61137378, 3.47775978, 4.18257474, 3.94734872, 4.09660859, 3.86340944, 3.76653697, 4.28916253, 4.12961482, 4.01531056, 3.73242103, 3.92353868, 4.01528881, 4.05693391, 3.64922279, 4.13251124, 3.78851204, 3.96812818, 3.60234792, 4.02649726, 3.87836136, 3.78527889, 4.17130734, 4.01839531, 3.87736521, 3.84803085, 3.56809734, 3.53356624, 3.74931027, 3.93071361, 3.92688701, 3.82006385, 3.99352542, 3.78762902, 3.3088708 , 3.6389334 , 3.85878766, 3.99660732, 4.15405735, 3.77264981, 3.96080938, 4.00249334, 4.06460802, 4.07503455, 3.92041065, 3.77078081, 4.00051175, 4.01299821, 4.22032504, 3.83147516, 4.01829502, 3.77217185, 4.04027499, 4.04894302, 4.37009593, 3.66198416, 4.16404301, 4.07537538, 3.81702852, 4.18393764, 3.38560675, 4.15056587, 3.76938473, 3.66133542, 3.83298873, 3.96379047, 3.88700295, 4.11912182, 3.76007694, 3.6960155 , 4.1211959 , 4.04400082, 4.21272665, 3.69404658, 3.87553024, 3.82367299, 4.44392643, 3.90035329, 3.82670037, 4.09651587, 3.81539836, 3.62901484, 4.12364698, 3.87491952, 3.60137225, 3.8373793 , 3.60737549, 4.02177599, 3.50538178, 3.6237138 , 3.95906308, 3.93927344, 4.0979651 , 3.06117595, 4.11487814, 3.71943094, 3.7607101 , 4.00325951, 4.07500449, 3.63075039, 3.90162744, 4.24153272, 3.55020114, 4.11677891, 3.9519506 , 4.19331594, 4.64906177, 4.08705043, 3.91322188, 3.4643572 , 4.07389286, 4.01613118, 4.13090872, 4.12330112, 3.67620387, 3.3069908 , 3.50331655, 4.22838468, 4.06350805, 4.23418359, 4.16447788, 3.6789202 , 3.97233518, 3.39431118, 3.83794578, 3.83635319, 3.94286729, 3.9122974 , 3.85503556, 3.92286221, 3.86449427, 3.61902976, 3.64006414, 3.9710172 , 4.06949047, 3.51857609, 4.06396503, 4.08485153, 4.16137828, 3.96409872, 3.72730472, 3.92930897, 4.18743676, 3.9411608 , 3.66906524, 3.85038249, 4.04403355, 3.91918519, 4.06720527, 4.12111189, 4.01383475, 3.61697527, 3.97950546, 4.08334525, 3.78934118, 3.959101 , 3.80638364, 3.45757572, 3.49770466, 4.0626396 , 3.71218504, 3.8107466 , 4.14436282, 3.74999602, 3.92760265, 3.58786307, 4.53483917, 4.02076808, 3.79226698, 3.70145343, 4.35696259, 3.60565745, 4.05148097, 4.35182707, 3.69722441, 4.35512404, 4.10090967, 3.44589242, 3.94972836, 3.88857535, 3.93084888, 3.98319886, 4.12587376, 4.06091435, 3.76180862, 4.22510036, 4.04137816, 4.0553892 , 4.19899674, 4.09319953, 3.78428264, 3.64991702, 3.8155647 , 4.08733891, 3.9064268 , 3.90981766, 4.119122 , 3.95709316, 3.80575735, 3.83523119, 3.79857318, 4.045406 , 4.0474044 , 4.56175315, 3.75433912, 3.82384601, 3.81895492, 3.98409788, 4.04380715, 3.56992826, 4.08171437, 4.09281733, 3.62415787, 3.89357785, 4.32357572, 4.16051063, 4.0979448 , 3.97747216, 3.96721426, 4.43209841, 3.89480226, 3.49403115, 3.85021414, 4.31085473, 4.26492488, 3.96203775, 3.96303963, 4.13080105, 4.2010063 , 3.9434954 , 3.55218052, 3.90594563, 4.11490189, 3.53312359, 3.86513143, 3.46451959, 3.95465407, 3.80188125, 4.02308478, 3.27226357, 4.15753532, 3.94023949, 4.01913372, 4.22553447, 4.07815829, 3.74204974, 3.81970664, 4.2039179 , 4.02616824, 4.13984176, 4.1229543 , 3.55495757, 4.21420044, 4.56258001, 3.88078398, 4.1247032 , 4.04834385, 4.27668304, 4.12747814, 3.76683153, 4.33579598, 3.93189263, 3.72548763, 4.18447111, 4.23916186, 4.03404372, 3.61674952, 4.17929048, 3.41112525, 3.49051379, 4.09164834, 3.77737006, 3.82334394, 3.8455094 , 3.40492441, 3.70119257, 3.53003948, 4.18349422, 3.47598818, 4.44548399, 4.07921488, 4.03500231, 3.78440745, 4.10491587, 3.47807912, 3.92702743, 3.612196 , 4.24948279, 4.05430833, 3.63814312, 4.31071604, 4.02682925, 3.79727886, 4.17466371, 4.1356379 , 3.72467379, 3.67319298, 4.14294685, 3.40578843, 3.8218298 , 3.80856988, 3.69074869, 3.97580986, 4.05361426, 4.06683376, 3.93669446, 3.90456332, 3.99560591, 3.66866854, 4.13639514, 4.1891806 , 3.89513874, 4.00730782, 4.06946615, 3.86813897, 3.75452134, 3.55861187, 4.12137531, 3.44527625, 4.08483195, 3.5823102 , 4.00056343, 3.81471346, 3.59462776, 4.41776481, 3.90334739, 3.94811114, 3.93948265, 3.37279971, 4.09378912, 3.71299135, 3.94671423, 3.49139588, 3.66989535, 4.34275382, 3.58419407, 3.98424773, 4.15901486, 3.38451984, 3.92587981, 4.11555836, 4.35077569, 4.33447151, 4.18984644, 4.20444188, 3.98738318, 4.02013942, 3.6510333 , 4.14290566, 3.63184601, 3.90346708, 4.05734256, 3.89546729, 4.20709072, 3.91355501, 4.30099848, 3.65052262, 3.45561065, 3.93542772, 3.1519519 , 4.09416062, 4.12392262, 3.90775453, 3.85970599, 4.02417249, 3.98409722, 3.85601473, 4.18887768, 3.9512208 , 3.63514546, 4.04710009, 3.94807166, 3.5565326 , 3.84958316, 3.98819984, 4.3147352 , 4.14870834, 3.79284242, 3.9958641 , 3.95117854, 3.69148335, 3.82822223, 4.30014425, 4.06913287, 3.50836167, 3.8211041 , 3.8298556 , 4.42267047, 3.84580223, 4.06574794, 3.47712171, 4.18324774, 3.55245306, 4.05191393, 3.91601428, 4.32217236, 3.42756651, 4.19186723, 3.97225306, 3.55904222, 4.39020702, 3.78896984, 3.87030635, 4.06807547, 3.82900839, 3.41678228, 3.79877734, 3.886422 , 4.17168873, 3.89690427, 4.38305566, 4.01866806, 3.78008935, 3.62017101, 4.3320427 , 3.96860992, 4.02460072, 3.73670569, 3.83703631, 3.42525497, 3.93023985, 3.58773105, 3.68050587, 4.01015867, 4.00341998, 3.66600645, 3.52794768, 4.10312619, 3.88325128, 4.26631878])
valid_y2_pred_df = pd.DataFrame(valid_y2_pred, columns = ["Validation_Prediction"])
valid_y2_pred_df
Validation_Prediction | |
---|---|
0 | 3.763674 |
1 | 4.155299 |
2 | 4.027618 |
3 | 4.170611 |
4 | 4.276642 |
... | ... |
641 | 3.666006 |
642 | 3.527948 |
643 | 4.103126 |
644 | 3.883251 |
645 | 4.266319 |
646 rows × 1 columns
Get the RMSE for validation set.
mse_valid_2 = sklearn.metrics.mean_squared_error(valid_y2, valid_y2_pred)
mse_valid_2
0.06790509555931858
# As before
# import math
rmse_valid_2 = math.sqrt(mse_valid_2)
rmse_valid_2
0.2605860617134358
valid_y2.describe()
count 646.000000 mean 3.915194 std 0.373620 min 3.043362 25% 3.672905 50% 3.829336 75% 4.108147 max 5.478667 Name: Wage, dtype: float64
# As before:
# If using the dmba package:
# pip install dmba
# Done earlier. Just for illustration
# import dmba
# from dmba import regressionSummary
regressionSummary(valid_y2, valid_y2_pred)
Regression statistics Mean Error (ME) : -0.0105 Root Mean Squared Error (RMSE) : 0.2606 Mean Absolute Error (MAE) : 0.2020 Mean Percentage Error (MPE) : -0.7122 Mean Absolute Percentage Error (MAPE) : 5.1776
new_players_df
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 27 | 59 | 75 | 68 | 80 | 76 | 1 | 0 | 0 | 1 |
1 | 21 | 42 | 71 | 52 | 60 | 76 | 1 | 1 | 0 | 0 |
2 | 19 | 76 | 80 | 22 | 75 | 56 | 0 | 0 | 0 | 1 |
new_players_df_log10 = np.log10(new_players_df.iloc[:,0:6])
new_players_df_log10.head()
Age | Balance | ShotPower | Aggression | Positioning | Composure | |
---|---|---|---|---|---|---|
0 | 1.431364 | 1.770852 | 1.875061 | 1.832509 | 1.903090 | 1.880814 |
1 | 1.322219 | 1.623249 | 1.851258 | 1.716003 | 1.778151 | 1.880814 |
2 | 1.278754 | 1.880814 | 1.903090 | 1.342423 | 1.875061 | 1.748188 |
new_players_df_2 = pd.concat((new_players_df_log10, new_players_df.iloc[:,6:10]), axis = 1)
new_players_df_2
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1.431364 | 1.770852 | 1.875061 | 1.832509 | 1.903090 | 1.880814 | 1 | 0 | 0 | 1 |
1 | 1.322219 | 1.623249 | 1.851258 | 1.716003 | 1.778151 | 1.880814 | 1 | 1 | 0 | 0 |
2 | 1.278754 | 1.880814 | 1.903090 | 1.342423 | 1.875061 | 1.748188 | 0 | 0 | 0 | 1 |
new_records_players_pred_2 = model2.predict(new_players_df_2)
new_records_players_pred_2
array([4.25518124, 4.07711107, 4.17585671])
# As before
# import pandas as pd
new_records_players_pred_df_2 = pd.DataFrame(new_records_players_pred_2, columns = ["Prediction"])
new_records_players_pred_df_2
# to export
# new_records_players_pred_df.to_csv("whatever_name.csv")
Prediction | |
---|---|
0 | 4.255181 |
1 | 4.077111 |
2 | 4.175857 |
alpha = 0.05
ci_2 = np.quantile(train_residuals_2, 1 - alpha)
ci_2
0.43633371119162057
# as before
def generate_results_confint_2(preds, ci_2):
df = pd.DataFrame()
df["Prediction"] = preds
if ci >= 0:
df["upper"] = preds + ci_2
df["lower"] = preds - ci_2
else:
df["upper"] = preds - ci_2
df["lower"] = preds + ci_2
return df
new_records_players_pred_2_confint_df = generate_results_confint_2(new_records_players_pred_2, ci_2)
new_records_players_pred_2_confint_df
Prediction | upper | lower | |
---|---|---|---|
0 | 4.255181 | 4.691515 | 3.818848 |
1 | 4.077111 | 4.513445 | 3.640777 |
2 | 4.175857 | 4.612190 | 3.739523 |
def exp10(x):
return 10**x
# execute the function
new_records_players_pred_df_2_exp = new_records_players_pred_2_confint_df.apply(exp10)
new_records_players_pred_df_2_exp
Prediction | upper | lower | |
---|---|---|---|
0 | 17996.217938 | 49149.030458 | 6589.425204 |
1 | 11942.935088 | 32617.057784 | 4372.978686 |
2 | 14991.900984 | 40944.013939 | 5489.376187 |
train_X3 = train_X.copy()
train_y3 = train_y.copy()
valid_X3 = valid_X.copy()
valid_y3 = valid_y.copy()
from sklearn.linear_model import Ridge
model_ridge = Ridge(alpha = 1.0)
model_ridge.fit(train_X3, train_y3)
Ridge()
from sklearn.linear_model import Ridge
model_ridge = Ridge(alpha = 1.0)
model_ridge.fit(train_X3, train_y3)
Ridge()
train_y3_pred = model_ridge.predict(train_X3)
train_y3_pred
array([25188.51155671, 3860.14487559, -4188.44855596, ..., 19245.95968925, 21417.50420002, 23004.19110446])
train_y3_pred_df = pd.DataFrame(train_y3_pred, columns = ["Training_Prediction"])
train_y3_pred_df
Training_Prediction | |
---|---|
0 | 25188.511557 |
1 | 3860.144876 |
2 | -4188.448556 |
3 | 29995.100230 |
4 | 16109.083754 |
... | ... |
1501 | 26829.388840 |
1502 | 22783.529768 |
1503 | 19245.959689 |
1504 | 21417.504200 |
1505 | 23004.191104 |
1506 rows × 1 columns
print("model intercept: ", model_ridge.intercept_)
print("model coefficients: ", model_ridge.coef_)
print("Model score: ", model_ridge.score(train_X3, train_y3))
model intercept: 20081.83125341736 model coefficients: [-9.27897306e+02 5.45457726e+01 5.02883492e+02 2.41877047e+01 6.53156518e+02 4.20617179e+02 -1.86591933e+03 -8.72301759e+04 -8.86049391e+04 -8.90541217e+04] Model score: 0.3608967032383109
Coefficients, easier to read.
print(pd.DataFrame({"Predictor": train_X3.columns, "Coefficient": model_ridge.coef_}))
Predictor Coefficient 0 Age -927.897306 1 Balance 54.545773 2 ShotPower 502.883492 3 Aggression 24.187705 4 Positioning 653.156518 5 Composure 420.617179 6 Preferred Foot_Right -1865.919332 7 Body Type_Lean -87230.175924 8 Body Type_Normal -88604.939142 9 Body Type_Stocky -89054.121691
Get the RMSE for training set
mse_train_3 = sklearn.metrics.mean_squared_error(train_y3, train_y3_pred)
mse_train_3
308594700.68349236
import math
rmse_train_3 = math.sqrt(mse_train_3)
rmse_train_3
17566.86371221375
train_y3.describe()
count 1506.000000 mean 12698.381142 std 21981.278007 min 1290.000000 25% 4692.500000 50% 6544.000000 75% 12364.250000 max 407609.000000 Name: Wage, dtype: float64
If using the dmba package:
pip install dmba
or
conda install -c conda-forge dmba
Then load the library
import dmba
from dmba import regressionSummary
import dmba
from dmba import regressionSummary
regressionSummary(train_y3, train_y3_pred)
Regression statistics Mean Error (ME) : -0.0000 Root Mean Squared Error (RMSE) : 17566.8637 Mean Absolute Error (MAE) : 8938.7057 Mean Percentage Error (MPE) : -31.3821 Mean Absolute Percentage Error (MAPE) : 117.8144
Normality
import numpy as np
from scipy.stats import shapiro
shapiro(train_y3)
ShapiroResult(statistic=0.38412952423095703, pvalue=0.0)
train_residuals_3 = train_y3 - train_y3_pred
train_residuals_3
13946 -2676.511557 7711 2899.855124 8402 9565.448556 13651 -16284.100230 1625 -5588.083754 ... 12759 -4657.388840 17284 -16198.529768 1016 -8098.959689 16984 -9755.504200 16744 3296.808896 Name: Wage, Length: 1506, dtype: float64
shapiro(train_residuals_3)
ShapiroResult(statistic=0.5465858578681946, pvalue=0.0)
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_df_3 = pd.DataFrame()
vif_df_3["features"] = train_X3.columns
vif_df_3["VIF"] = [variance_inflation_factor(train_X3.values, i) for i in range(train_X3.shape[1])]
print(vif_df_3)
features VIF 0 Age 48.741902 1 Balance 33.364691 2 ShotPower 155.270485 3 Aggression 18.482052 4 Positioning 184.558829 5 Composure 115.774481 6 Preferred Foot_Right 7.714600 7 Body Type_Lean 32.118906 8 Body Type_Normal 59.412529 9 Body Type_Stocky 11.108926
valid_y3_pred = model_ridge.predict(valid_X3)
valid_y3_pred
array([ 2.64972256e+03, 2.52642557e+04, 1.64240635e+04, 2.53329585e+04, 3.10955126e+04, 1.29528286e+04, 6.70941044e+03, 2.04919113e+04, 1.67847671e+04, -1.77608909e+03, 1.37232569e+04, 1.48205527e+04, 1.60211714e+04, 1.10493542e+04, 2.52134500e+04, 3.59934836e+04, 1.78365347e+04, 1.32628212e+04, 4.48094205e+03, 2.29869164e+04, 4.96584554e+03, 2.16229676e+04, 2.09228408e+04, 2.42567025e+04, 1.03654164e+04, -6.30245310e+03, 1.23663735e+04, 2.36282687e+04, 1.49069805e+04, 1.72923603e+04, -1.41187798e+03, 1.21934953e+04, 8.99116266e+03, 2.36012248e+04, 2.03620098e+03, 1.97430366e+04, -7.88156562e+03, 1.42296144e+04, 3.02349161e+03, 2.98615135e+04, 3.17155746e+04, -7.45185605e+03, 2.36998503e+04, 2.10088321e+03, 1.55835468e+04, 2.53623215e+04, -6.50981532e+03, -1.76196097e+03, 3.62986525e+03, 1.82776299e+04, 2.19955432e+04, 9.56082725e+03, 7.90816856e+03, 1.95834541e+04, 1.78544369e+04, 2.23809073e+04, -4.90366027e+03, 1.54642694e+04, 9.31753268e+03, 2.15480020e+04, 4.57990856e+03, 2.33541602e+04, 1.11657541e+04, 2.46551380e+04, 9.84817425e+03, 1.63822774e+04, -1.80807030e+03, 8.69464783e+03, 2.72496530e+03, 3.43697545e+04, 1.59330718e+04, 9.17012681e+02, 1.66581487e+03, 1.76629155e+04, 1.83800182e+04, 2.65577927e+04, 2.01509977e+04, 1.71244262e+04, 8.74705677e+03, 3.58738523e+04, 2.65122746e+04, 5.00604302e+04, 9.64618370e+03, 1.90684929e+04, 3.70800559e+04, 1.66337474e+04, 1.47011831e+04, 1.01626727e+04, 1.62589786e+04, 2.47699032e+04, 3.16840751e+04, 1.74982126e+04, 1.63608869e+04, 3.31509430e+04, 2.07902205e+04, 1.64597306e+04, 2.47251371e+04, 1.04628862e+04, 9.57437075e+03, 4.42431055e+04, 1.26561720e+04, 2.55199260e+04, 2.23256682e+04, 8.14743820e+03, 2.45764795e+04, 1.48054501e+04, 2.85598619e+04, 2.72748441e+04, 1.65894359e+03, 4.17885584e+04, 1.22392050e+04, 1.80790225e+04, -4.47457671e+03, 1.78499306e+04, 1.31001658e+04, -6.64508399e+03, 9.19121426e+03, 3.15130742e+03, 3.35840483e+04, 1.53441534e+04, 3.51981089e+03, 5.40674384e+03, 2.42962197e+04, 1.77612088e+03, 2.46176244e+04, 4.35989119e+04, 3.41348306e+04, 4.02437061e+04, 1.66983814e+03, 2.16856079e+03, 2.83205082e+04, -2.41088911e+03, -2.36682760e+03, 1.76805901e+04, 3.30181153e+04, -3.41788605e+02, -4.42985144e+03, 9.78417778e+02, -3.41201914e+03, 8.68924567e+03, 1.65537818e+04, 1.72906353e+04, 1.33560660e+04, 3.40153805e+03, 4.03253553e+03, 3.19964770e+04, 1.20998236e+04, -8.26064462e+02, 5.55617913e+03, 2.71488508e+04, 2.68272814e+04, 1.14432826e+03, 2.83124624e+04, 1.15858075e+04, 1.08250398e+04, 1.51075765e+04, 3.30687121e+03, -2.76195568e+03, 2.12094054e+04, 1.88687090e+04, 2.04936705e+04, -1.09126099e+04, 7.30930490e+03, 1.67882716e+04, 1.15685907e+04, 3.14702931e+03, 7.35938167e+03, 2.13262354e+04, 1.84566066e+04, 1.42514599e+04, 1.65441942e+03, -5.65924431e+03, -1.00778426e+03, 7.51549323e+03, 5.67264855e+03, 1.22640084e+04, 1.33564333e+04, 1.36739531e+04, 1.90346104e+04, 1.79189151e+04, 3.18978213e+04, -1.20856433e+04, 5.73937028e+03, 1.16894482e+04, -3.87423563e+03, 1.00288060e+04, 2.20562707e+04, 1.74938502e+04, 2.37843794e+04, -6.23145484e+03, 2.94267502e+04, 1.48092211e+03, 9.80861441e+03, 1.81335960e+04, 1.85554394e+03, -3.94264372e+03, 4.56839001e+04, 1.39029242e+04, 1.91594303e+04, -7.49391130e+03, 1.66476248e+04, 2.94697109e+04, 1.76200685e+04, 3.27957846e+04, 4.31614870e+03, 1.19215192e+04, 3.51546196e+04, 4.07900156e+03, 3.19670958e+04, 5.19281830e+03, 7.22590148e+03, 3.36253366e+04, 1.55881255e+04, 2.49167421e+03, 2.48504689e+04, 2.55159442e+04, -4.77474929e+03, -4.42846792e+03, 2.59056546e+04, 1.26388624e+04, 2.15742548e+04, 4.08688167e+03, 5.14702418e+03, 3.31563915e+04, 2.27265145e+04, 1.79822987e+04, 4.88434587e+03, 1.41505327e+04, 1.41902561e+04, 2.02603488e+04, 2.05241854e+03, 2.55923095e+04, 7.44893187e+03, 1.39030195e+04, -2.08543587e+03, 1.76049436e+04, 1.16541566e+04, 7.37981695e+03, 2.52979520e+04, 1.82726281e+04, 8.90597823e+03, 6.23015687e+03, -1.55508696e+03, -2.63008981e+03, 2.13902693e+03, 1.61152051e+04, 1.22182585e+04, 9.64623645e+03, 1.63859443e+04, 6.14276618e+03, -1.11170706e+04, 2.10258398e+02, 9.68578502e+03, 1.69246037e+04, 2.31236071e+04, 3.64414735e+03, 1.25187206e+04, 1.70574873e+04, 1.90115826e+04, 2.04687259e+04, 1.10294579e+04, 6.15333543e+03, 1.71130713e+04, 1.49853247e+04, 2.73762381e+04, 8.27010205e+03, 1.71023730e+04, 2.89285921e+03, 1.92188863e+04, 1.92121483e+04, 3.67284627e+04, -6.07554785e+02, 2.50715536e+04, 2.16535634e+04, 1.15711762e+04, 2.61352627e+04, -8.60774818e+03, 2.55153591e+04, 6.52033108e+03, 3.83335571e+03, 1.01184551e+04, 1.49659750e+04, 1.01356377e+04, 2.50175606e+04, 1.87688328e+03, 3.39055929e+03, 1.94403783e+04, 2.04072294e+04, 2.95313635e+04, 1.93304405e+03, 1.03166764e+04, 8.90465883e+03, 4.03368608e+04, 1.15774038e+04, 6.91783988e+03, 2.08803031e+04, 7.72041122e+03, 7.47676798e+02, 2.04555233e+04, 1.13323091e+04, 2.19823569e+03, 8.45970953e+03, -1.60356018e+02, 1.51640672e+04, -3.23890555e+03, -5.41129809e+02, 1.18993186e+04, 1.17328722e+04, 2.14902835e+04, -1.84628171e+04, 2.48351686e+04, 3.78260306e+03, 6.02224315e+03, 1.64688353e+04, 1.97621129e+04, 9.78325175e+02, 1.26317064e+04, 3.04878019e+04, -5.12399165e+03, 2.35210556e+04, 1.21487080e+04, 2.50761478e+04, 5.54846852e+04, 1.81841465e+04, 1.40340286e+04, -5.67563080e+03, 1.93854989e+04, 1.91248164e+04, 2.23340938e+04, 2.27299834e+04, 5.01194389e+02, -1.15661438e+04, -3.77023194e+03, 3.00930391e+04, 1.72765208e+04, 2.69024012e+04, 2.54266342e+04, 7.74975266e+02, 1.64742338e+04, -8.23238191e+03, 4.84693210e+03, 6.87596180e+03, 1.32836570e+04, 1.37999954e+04, 5.47063742e+03, 1.32036417e+04, 1.09850059e+04, 7.52434673e+02, 1.40420709e+03, 1.45256380e+04, 1.92371576e+04, -2.95672651e+03, 1.96692843e+04, 2.09792274e+04, 2.30787686e+04, 1.31057062e+04, 2.81278489e+03, 9.19501376e+03, 2.54078773e+04, 1.39056907e+04, -1.48998032e+03, 9.43590431e+03, 2.08441471e+04, 1.08298218e+04, 1.57316997e+04, 2.06514061e+04, 1.89700762e+04, 1.97035528e+03, 1.53088095e+04, 2.05240124e+04, 6.72708804e+03, 1.60869486e+04, 8.65837159e+03, -7.18089848e+03, -4.26694157e+03, 1.76287093e+04, 5.62502207e+03, 4.82475782e+03, 2.42722486e+04, 3.59591071e+03, 1.17751784e+04, -2.93829377e+02, 4.80477917e+04, 1.93483751e+04, 2.38544972e+03, 6.92000223e+02, 3.60260896e+04, 1.49467166e+03, 2.00469931e+04, 3.58091380e+04, 3.22001833e+03, 3.60920606e+04, 2.20140015e+04, -6.37074328e+03, 1.72028120e+04, 1.08869315e+04, 9.15671520e+03, 1.41010720e+04, 2.11844206e+04, 2.12523380e+04, 5.05986592e+03, 2.87168385e+04, 1.98271066e+04, 1.78028470e+04, 2.55563086e+04, 2.26937924e+04, 7.65953819e+03, 1.14027031e+03, 4.95056974e+03, 2.05747082e+04, 1.06161080e+04, 1.19372254e+04, 2.38570744e+04, 1.47834347e+04, 3.92652328e+03, 9.07837512e+03, 7.70710391e+03, 1.95896382e+04, 1.98514394e+04, 4.87556054e+04, 3.65470215e+03, 5.10325452e+03, 5.10605410e+03, 1.25633366e+04, 1.87506241e+04, -2.57565109e+03, 2.06597667e+04, 1.95280430e+04, 2.73315802e+03, 1.03630664e+04, 3.26718614e+04, 2.44039167e+04, 1.88165479e+04, 1.41204917e+04, 1.67221228e+04, 4.06206105e+04, 1.07958563e+04, -4.61966557e+03, 1.04215163e+04, 3.49053711e+04, 3.06430420e+04, 1.54659743e+04, 1.22507073e+04, 2.29838211e+04, 2.80689352e+04, 1.24090610e+04, -3.49672644e+03, 1.25655413e+04, 2.05060764e+04, -4.41494270e+03, 6.23406765e+03, -6.70792866e+03, 1.56835455e+04, 8.79905672e+03, 1.70954798e+04, -1.20006299e+04, 2.32764489e+04, 1.39722455e+04, 1.62556336e+04, 2.90585260e+04, 2.22614474e+04, 3.44931672e+03, 6.56518945e+03, 2.80309304e+04, 1.89065022e+04, 2.66251273e+04, 2.24008904e+04, -3.46664344e+03, 2.98405605e+04, 4.99986035e+04, 8.12192118e+03, 2.32108496e+04, 1.87207048e+04, 2.94864225e+04, 2.30635700e+04, 4.03739664e+03, 3.64064534e+04, 1.05239389e+04, 4.49418032e+03, 2.46542703e+04, 2.88372647e+04, 1.76145821e+04, -1.20502662e+03, 2.61923258e+04, -7.61830669e+03, -4.50469220e+03, 1.91039876e+04, 2.80859861e+03, 9.02589549e+03, 9.54643401e+03, -7.87782051e+03, 2.77553648e+03, -3.96550627e+03, 2.63663828e+04, -5.16826783e+03, 4.16443749e+04, 2.25535536e+04, 2.04084092e+04, 8.41693418e+03, 2.05492947e+04, -4.22402137e+03, 1.48464457e+04, 3.01046448e+03, 3.28672702e+04, 1.73061251e+04, 4.24882811e+02, 3.43642557e+04, 1.57630928e+04, 5.52028075e+03, 2.43579326e+04, 2.52115136e+04, 6.67538351e+03, -4.19654223e+03, 2.29247760e+04, -7.01731014e+03, 7.45820243e+03, 5.35308176e+03, 2.97321929e+03, 1.56642176e+04, 1.85091797e+04, 1.85282979e+04, 1.03320994e+04, 8.86594198e+03, 1.79533622e+04, -3.54957681e+01, 2.62175580e+04, 2.70939411e+04, 1.06984674e+04, 1.61454223e+04, 1.97848288e+04, 9.28741712e+03, 2.35441972e+03, -2.59330284e+03, 2.06180656e+04, -5.18959023e+03, 1.93485397e+04, -9.71407235e+02, 1.40957274e+04, 5.84901372e+03, 1.73548488e+02, 4.09526861e+04, 1.36427385e+04, 1.16803204e+04, 1.11871214e+04, -5.65502163e+03, 2.31504982e+04, 3.75366145e+03, 1.57885881e+04, -6.34163254e+03, 2.26687996e+03, 3.40943947e+04, 9.97980536e+00, 1.50604487e+04, 2.44857791e+04, -9.35765870e+03, 1.52885189e+04, 2.41150212e+04, 3.73041735e+04, 3.70898660e+04, 2.74715032e+04, 2.87226250e+04, 1.26741468e+04, 1.79040878e+04, 4.60183007e+02, 2.23109943e+04, 1.78538625e+03, 9.02557010e+03, 2.01956331e+04, 1.05164678e+04, 2.69007396e+04, 1.03669063e+04, 3.37218711e+04, 2.89384174e+03, -4.98294404e+03, 1.62006981e+04, -1.68883107e+04, 2.00918359e+04, 2.40758274e+04, 1.38033618e+04, 1.06915559e+04, 1.88552352e+04, 1.85701688e+04, 8.58962927e+03, 2.44692265e+04, 1.44986629e+04, -8.81666640e+02, 1.98617818e+04, 1.72889990e+04, -2.38132445e+03, 1.00666818e+04, 1.61490980e+04, 3.37592177e+04, 2.35518465e+04, 6.69464349e+03, 1.62140691e+04, 1.33486771e+04, -7.37100276e+02, 7.64119851e+03, 3.38947073e+04, 1.90228970e+04, -5.14680564e+03, 1.16282656e+04, 5.24410508e+03, 4.22266653e+04, 7.46513445e+03, 1.92853443e+04, -2.96787795e+03, 2.83142670e+04, -5.67270244e+03, 1.87006172e+04, 1.06228110e+04, 3.40462412e+04, -6.17229472e+03, 2.76193489e+04, 1.46349742e+04, -5.40404007e+03, 3.87923821e+04, 4.26609772e+03, 9.17012026e+03, 1.98957880e+04, 8.87239181e+03, -4.78964782e+03, 8.28162370e+03, 1.00649189e+04, 2.54524277e+04, 1.36966510e+04, 3.83869347e+04, 1.74591469e+04, 5.91810624e+03, -2.20336374e+03, 3.44185013e+04, 1.68850384e+04, 1.63163285e+04, 4.39716151e+03, 9.59128260e+03, -7.16391412e+03, 1.45834243e+04, -9.00687687e+02, 2.64116608e+03, 1.75427015e+04, 1.66920621e+04, 5.42307771e+02, -3.37066104e+03, 2.15759581e+04, 1.15776922e+04, 3.14873465e+04])
valid_y3_pred_df = pd.DataFrame(valid_y3_pred, columns = ["Validation_Prediction"])
valid_y3_pred_df
Validation_Prediction | |
---|---|
0 | 2649.722558 |
1 | 25264.255715 |
2 | 16424.063481 |
3 | 25332.958518 |
4 | 31095.512607 |
... | ... |
641 | 542.307771 |
642 | -3370.661042 |
643 | 21575.958067 |
644 | 11577.692233 |
645 | 31487.346512 |
646 rows × 1 columns
Get the RMSE for validation set.
mse_valid_3 = sklearn.metrics.mean_squared_error(valid_y3, valid_y3_pred)
mse_valid_3
377856956.3335032
# As before
# import math
rmse_valid_3 = math.sqrt(mse_valid_3)
rmse_valid_3
19438.543060978187
valid_y3.describe()
count 646.000000 mean 13535.160991 std 23624.770667 min 1105.000000 25% 4708.750000 50% 6750.500000 75% 12827.750000 max 301070.000000 Name: Wage, dtype: float64
# As before:
# If using the dmba package:
# pip install dmba
# Done earlier. Just for illustration
# import dmba
# from dmba import regressionSummary
regressionSummary(valid_y3, valid_y3_pred)
Regression statistics Mean Error (ME) : -146.0444 Root Mean Squared Error (RMSE) : 19438.5431 Mean Absolute Error (MAE) : 9585.2762 Mean Percentage Error (MPE) : -42.4877 Mean Absolute Percentage Error (MAPE) : 124.7749
new_players_df
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 27 | 59 | 75 | 68 | 80 | 76 | 1 | 0 | 0 | 1 |
1 | 21 | 42 | 71 | 52 | 60 | 76 | 1 | 1 | 0 | 0 |
2 | 19 | 76 | 80 | 22 | 75 | 56 | 0 | 0 | 0 | 1 |
new_records_players_pred_3 = model_ridge.predict(new_players_df)
new_records_players_pred_3
array([30907.21640178, 21909.60027727, 30847.24918278])
# As before
# import pandas as pd
new_records_players_pred_df_3 = pd.DataFrame(new_records_players_pred_3, columns = ["Prediction"])
new_records_players_pred_df_3
# to export
# new_records_players_pred_df.to_csv("whatever_name.csv")
Prediction | |
---|---|
0 | 30907.216402 |
1 | 21909.600277 |
2 | 30847.249183 |
alpha = 0.05
ci_3 = np.quantile(train_residuals_3, 1 - alpha)
ci_3
17045.478790700115
def generate_results_confint_3(preds, ci_3):
df = pd.DataFrame()
df["Prediction"] = preds
if ci >= 0:
df["upper"] = preds + ci_3
df["lower"] = preds - ci_3
else:
df["upper"] = preds - ci_3
df["lower"] = preds + ci_3
return df
new_records_players_pred_confint_df_3 = generate_results_confint_3(new_records_players_pred_3, ci_3)
new_records_players_pred_confint_df_3
Prediction | upper | lower | |
---|---|---|---|
0 | 30907.216402 | 47952.695192 | 13861.737611 |
1 | 21909.600277 | 38955.079068 | 4864.121487 |
2 | 30847.249183 | 47892.727973 | 13801.770392 |
train_X4 = train_X.copy()
train_y4 = train_y.copy()
valid_X4 = valid_X.copy()
valid_y4 = valid_y.copy()
from sklearn import linear_model
model_lasso = linear_model.Lasso(alpha = 0.5, tol = 2, max_iter = 10)
model_lasso.fit(train_X4, train_y4)
Lasso(alpha=0.5, max_iter=10, tol=2)
train_y4_pred = model_lasso.predict(train_X4)
train_y4_pred
array([23664.75129721, 7309.43845012, 1682.14259148, ..., 25528.99628982, 23094.83450666, 23317.79242772])
train_y4_pred_df = pd.DataFrame(train_y4_pred, columns = ["Training_Prediction"])
train_y4_pred_df
Training_Prediction | |
---|---|
0 | 23664.751297 |
1 | 7309.438450 |
2 | 1682.142591 |
3 | 25612.713391 |
4 | 22900.953559 |
... | ... |
1501 | 28429.116142 |
1502 | 4783.812410 |
1503 | 25528.996290 |
1504 | 23094.834507 |
1505 | 23317.792428 |
1506 rows × 1 columns
print("model intercept: ", model_lasso.intercept_)
print("model coefficients: ", model_lasso.coef_)
print("Model score: ", model_lasso.score(train_X4, train_y4))
model intercept: -86996.1156468611 model coefficients: [ 878.5716549 73.05446438 860.5312622 -13.47353083 257.41231334 32.58020897 -2105.20659376 3767.57721845 -518.43939572 -1578.14026859] Model score: 0.18302583344714407
Coefficients, easier to read.
print(pd.DataFrame({"Predictor": train_X4.columns, "Coefficient": model_lasso.coef_}))
Predictor Coefficient 0 Age 878.571655 1 Balance 73.054464 2 ShotPower 860.531262 3 Aggression -13.473531 4 Positioning 257.412313 5 Composure 32.580209 6 Preferred Foot_Right -2105.206594 7 Body Type_Lean 3767.577218 8 Body Type_Normal -518.439396 9 Body Type_Stocky -1578.140269
Get the RMSE for training set
mse_train_4 = sklearn.metrics.mean_squared_error(train_y4, train_y4_pred)
mse_train_4
394480672.6408317
import math
rmse_train_4 = math.sqrt(mse_train_4)
rmse_train_4
19861.537519558544
train_y4.describe()
count 1506.000000 mean 12698.381142 std 21981.278007 min 1290.000000 25% 4692.500000 50% 6544.000000 75% 12364.250000 max 407609.000000 Name: Wage, dtype: float64
If using the dmba package:
pip install dmba
or
conda install -c conda-forge dmba
Then load the library
import dmba
from dmba import regressionSummary
import dmba
from dmba import regressionSummary
regressionSummary(train_y4, train_y4_pred)
Regression statistics Mean Error (ME) : 0.0000 Root Mean Squared Error (RMSE) : 19861.5375 Mean Absolute Error (MAE) : 10701.9033 Mean Percentage Error (MPE) : -31.7885 Mean Absolute Percentage Error (MAPE) : 145.9077
Normality
import numpy as np
from scipy.stats import shapiro
shapiro(train_y4)
ShapiroResult(statistic=0.38412952423095703, pvalue=0.0)
train_residuals_4 = train_y4 - train_y4_pred
train_residuals_4
13946 -1152.751297 7711 -549.438450 8402 3694.857409 13651 -11901.713391 1625 -12379.953559 ... 12759 -6257.116142 17284 1801.187590 1016 -14381.996290 16984 -11432.834507 16744 2983.207572 Name: Wage, Length: 1506, dtype: float64
shapiro(train_residuals_4)
ShapiroResult(statistic=0.5732489824295044, pvalue=0.0)
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_df_4 = pd.DataFrame()
vif_df_4["features"] = train_X4.columns
vif_df_4["VIF"] = [variance_inflation_factor(train_X4.values, i) for i in range(train_X4.shape[1])]
print(vif_df_4)
features VIF 0 Age 48.741902 1 Balance 33.364691 2 ShotPower 155.270485 3 Aggression 18.482052 4 Positioning 184.558829 5 Composure 115.774481 6 Preferred Foot_Right 7.714600 7 Body Type_Lean 32.118906 8 Body Type_Normal 59.412529 9 Body Type_Stocky 11.108926
valid_y4_pred = model_lasso.predict(valid_X4)
valid_y4_pred
array([ 1.60888749e+04, 2.28658032e+04, 2.29165150e+04, 2.02857565e+04, 2.62424904e+04, 6.67204751e+03, 1.81177179e+04, 6.41889523e+03, 2.69140122e+04, -7.34359467e+02, 5.62294937e+03, 1.95665783e+04, 9.67738388e+03, 9.83296597e+03, 2.64396039e+04, 4.10728682e+04, 2.39865085e+04, 2.42018735e+04, 3.60388275e+03, 2.61979797e+04, -3.97585033e+02, 2.19805171e+04, 1.57147085e+04, 2.50387186e+04, 3.52740625e+03, -1.12435316e+04, 1.47258031e+04, 1.39643707e+04, 2.22710897e+04, 1.26309930e+04, 6.54265365e+03, 1.94652956e+04, 2.87796041e+04, 2.27009351e+04, 8.02049536e+03, 1.78417813e+04, 6.09700927e+03, 2.07105507e+03, -5.74940302e+03, 2.73190544e+04, 3.13813566e+04, -1.75414133e+04, 1.87993583e+04, 9.10826540e+03, 1.60578541e+04, 2.73955332e+04, -4.58333255e+03, 7.19872200e+03, 8.04411300e+03, 2.02694381e+04, 2.27172953e+04, 6.53840561e+03, 2.07694645e+03, 2.72765691e+04, 1.09033157e+04, 1.36896612e+04, 4.27639965e+02, 1.41412401e+04, 2.05011614e+03, 1.40974560e+04, 1.93406089e+04, 2.99152387e+04, 2.31520588e+04, 2.47631286e+04, 3.43348748e+03, 2.36939432e+04, 1.36181766e+04, 2.76957840e+03, -2.03776570e+03, 3.47129067e+04, 1.89990075e+04, -1.75023659e+03, 3.01520525e+02, 1.04627448e+04, 2.62403265e+04, 2.66132919e+04, 1.41224816e+04, 9.59158193e+03, 1.85603403e+04, 2.66950059e+04, 2.84043835e+04, 4.37907666e+04, 2.37083249e+03, 1.77106111e+04, 3.50975309e+04, 1.32232907e+04, 6.92081485e+03, 1.71187991e+04, 1.76364337e+04, 2.49008126e+04, 2.65661978e+04, 1.80388153e+04, 1.04781396e+04, 2.81083442e+04, 2.38637678e+04, 1.87402022e+04, 1.39745881e+04, 2.37620749e+04, 2.04038056e+04, 3.04591318e+04, 1.59192615e+04, 1.39363677e+04, 3.32839738e+04, 5.53410122e+03, 3.21413922e+04, 1.80982290e+04, 1.87306118e+04, 2.29560836e+04, 6.99944165e+03, 4.06076710e+04, 2.33133879e+04, 1.35916222e+04, 8.33711061e+03, 2.26958161e+04, 1.23702955e+04, -1.27709895e+04, 5.75901385e+03, 4.88970659e+03, 2.78235806e+04, 1.96004174e+04, 4.59367928e+02, 3.03766150e+03, 1.88247776e+04, -5.49801173e+03, 2.54392756e+04, 3.50776464e+04, 2.84189046e+04, 3.34495332e+04, -3.74478093e+02, 7.17734196e+03, 2.04475163e+04, -3.90257396e+03, 8.51705656e+03, 1.20419618e+04, 3.12936781e+04, -1.77652605e+03, -8.15480142e+03, 2.55030846e+02, -1.07168959e+04, -3.92066826e+03, 1.78793280e+04, 2.65900919e+04, 1.79857707e+04, -1.88812413e+03, -9.21307823e+01, 2.66310360e+04, 1.42427135e+04, 8.90525482e+03, 5.06681050e+02, 2.36865409e+04, 2.85556930e+04, 6.71464963e+03, 3.38685951e+04, 5.50140701e+03, 1.44291766e+04, 2.44361591e+03, 1.10923943e+04, -4.79461466e+03, 1.73182301e+04, 9.19798343e+03, 1.98289424e+04, -1.67570552e+04, 4.75537493e+03, 2.17511075e+04, 3.07270871e+04, 7.54793161e+03, 2.42095051e+04, 2.55495164e+04, 2.34050760e+04, 1.55194262e+04, -2.44420941e+03, -1.38997798e+04, 1.51027652e+04, -1.57547089e+03, 5.54208650e+03, 7.37557579e+02, 2.25591142e+04, 1.15972177e+04, 2.25169608e+04, 1.48101737e+04, 2.60619906e+04, -7.11529825e+03, 4.41655556e+02, 1.90016427e+04, 2.06300472e+04, 4.45581728e+03, 2.72346581e+04, 2.26692972e+04, 1.66070388e+04, -3.34656643e+03, 2.37028391e+04, -6.99913025e+03, 6.39434938e+03, 4.35908400e+03, -2.06729861e+03, -5.58996134e+02, 3.86144201e+04, 2.78935579e+03, 9.37102135e+03, -6.97122364e+03, 1.37569537e+04, 3.73297127e+04, 2.42269396e+04, 3.06369603e+04, 1.84434748e+04, 1.17156680e+04, 3.35720256e+04, -1.91862641e+03, 2.16452938e+04, -4.88260091e+03, 4.62975862e+03, 4.09837158e+04, 1.79503901e+04, 3.64415514e+02, 1.66633135e+04, 3.32423824e+04, 1.13084337e+04, -1.24435223e+04, 2.96966921e+04, 1.85483604e+04, 2.14978785e+04, 2.60371649e+04, 1.67770907e+04, 2.17863454e+04, 2.80850844e+04, 2.05488151e+04, 2.84596132e+03, 1.56514499e+04, 2.50457505e+04, 1.53038368e+04, -2.10420470e+03, 1.04771486e+04, 1.08202903e+04, 1.64081414e+04, 5.38762075e+02, 1.94600342e+04, 1.33156332e+04, -6.62094416e+03, 1.69535139e+04, 1.18997505e+04, 1.65717592e+04, 1.82224667e+04, -7.01880524e+03, -8.02125460e+03, 1.12112209e+04, 1.01939541e+04, 1.53144871e+04, -3.68156802e+03, 6.95021475e+03, 1.07493781e+04, -1.55063158e+04, 7.23664071e+03, 4.68077893e+03, 1.14209803e+04, 2.44929255e+04, 1.98843194e+04, 1.74738019e+04, 1.25251769e+04, 1.79250166e+04, 1.97538995e+04, 1.87939890e+04, 9.63343492e+03, 1.09965060e+04, 2.35214579e+04, 2.41829656e+04, 2.43581552e+03, 1.64577593e+04, 1.27118398e+04, 1.63961883e+04, 1.32278591e+04, 2.88379301e+04, 8.91057395e+03, 2.35511830e+04, 1.76595572e+04, 6.99332839e+02, 2.18978660e+04, -1.42876612e+04, 1.58988435e+04, 4.42680677e+03, 3.91184834e+03, 1.22576103e+04, 7.34186242e+03, 1.09420656e+04, 1.76738855e+04, 1.33831337e+04, -6.37160171e+03, 3.69193289e+04, 2.15356035e+04, 2.54768158e+04, 4.52041538e+03, 8.74892408e+03, 9.59495112e+03, 4.62705394e+04, 1.37422401e+04, 1.28107742e+04, 2.11039601e+04, 3.17852399e+03, -7.60302984e+03, 2.68399444e+04, -7.46115997e+02, -8.61116274e+03, 8.46258726e+03, 3.18954628e+03, 2.44497263e+04, -9.00470385e+03, 2.75658141e+03, 1.88641332e+04, 1.48955244e+04, 1.62578564e+04, -1.98085164e+04, 2.51306804e+04, -3.47888573e+02, -3.83135545e+03, 1.40263929e+04, 1.52814340e+04, -6.69429127e+03, 1.29184076e+04, 3.20941758e+04, 4.06613399e+01, 1.25995833e+04, 1.91431660e+04, 2.63589862e+04, 3.81945715e+04, 2.34658540e+04, 5.49099238e+03, -1.40751599e+04, 1.85026469e+04, 1.97590221e+04, 2.31917144e+04, 1.58724974e+04, -7.87105665e+02, -1.45681988e+04, -6.66176802e+03, 3.21785679e+04, 2.45184042e+04, 3.24873271e+04, 2.56065571e+04, 6.87115371e+03, 5.40968391e+03, -9.39165633e+03, 1.41752250e+04, 1.72407072e+04, 1.89420324e+04, 5.72073480e+03, 2.14517630e+04, 7.61933028e+03, 4.86244567e+03, 6.05864416e+03, 6.23672050e+03, 1.35262705e+04, 2.10523162e+04, -2.44342898e+03, 3.05601701e+04, 1.53326377e+04, 2.74806751e+04, 1.44554675e+04, 3.73210805e+03, 2.26703966e+04, 2.43727356e+04, 1.48885122e+04, 1.10605836e+04, 2.51943385e+03, 1.98587479e+04, 1.67562321e+04, 3.32498668e+04, 2.52278618e+04, 6.87409114e+03, 6.15607398e+03, 1.86627647e+04, 1.84280274e+04, 2.28444296e+03, 1.58605452e+04, 6.46186369e+02, -8.97251493e+03, -3.72358569e+03, 2.27231350e+04, 4.88534470e+03, 1.11998861e+04, 1.82165976e+04, 1.32147369e+04, 1.36947600e+04, -5.80173138e+03, 2.69399308e+04, 1.91517860e+04, 1.93484166e+04, 1.54226866e+04, 3.60286200e+04, 4.47914169e+02, 1.37536816e+04, 3.46808670e+04, -6.93837922e+03, 2.30809413e+04, 2.01956160e+04, -1.63418067e+04, 1.01373639e+04, 6.34659526e+03, 2.18460500e+04, 2.07755790e+04, 2.92644227e+04, 1.63887211e+04, 4.28153755e+03, 2.67536540e+04, 9.02755879e+03, 2.38623679e+04, 2.64816130e+04, 2.39299875e+04, 2.00292678e+03, 2.06038174e+03, 1.12373667e+04, 2.36371608e+04, 1.32021048e+04, 1.51747075e+04, 1.99552447e+04, 6.79833775e+03, 2.18167688e+04, 9.01418537e+03, -9.99352663e+02, 1.67641319e+04, 1.16662632e+04, 4.24323517e+04, 1.12985080e+04, 1.60572403e+04, 2.44525141e+04, 2.27511222e+04, 2.08737879e+04, -5.82400428e+03, 2.82174294e+04, 3.11551420e+04, -4.69698911e+03, 1.19888693e+04, 3.73228426e+04, 2.13593987e+04, 2.55976493e+04, 2.24854620e+04, 5.41969148e+03, 2.81381425e+04, 1.15023507e+04, -3.86262187e+03, -1.19446767e+03, 3.45223441e+04, 2.58357274e+04, 9.23895335e+03, 2.11859025e+04, 2.86370512e+04, 1.97167704e+04, 1.40977758e+04, 8.05747461e+03, 1.57975645e+04, 2.52698900e+04, -2.99220337e+03, 2.27021369e+04, -7.34827734e+03, 9.42477370e+03, 7.44978350e+03, 1.58720826e+04, -1.95381434e+04, 2.68251362e+04, 9.47770452e+03, 2.32776944e+04, 3.04295550e+04, 8.78853438e+03, 7.18708126e+03, 1.62088849e+04, 2.37940551e+04, 8.85774288e+03, 1.25139539e+04, 2.21013450e+04, -1.78983235e+03, 2.39546348e+04, 3.58780279e+04, 1.61462498e+04, 1.79346638e+04, 1.52933406e+04, 3.30664727e+04, 1.53642916e+04, 6.37359615e+03, 2.86878854e+04, 2.37880945e+04, -4.94550903e+03, 3.00543561e+04, 2.90182003e+04, 2.52515789e+04, 4.78106890e+03, 3.20769640e+04, -7.30460403e+03, -1.25622268e+04, 2.29174987e+04, 1.39012875e+04, -3.06930042e+03, 1.11624990e+04, -1.48666371e+04, 3.06589123e+03, 6.72357884e+02, 2.47464704e+04, -6.77626185e+03, 3.14378530e+04, 2.07278930e+04, 2.72607806e+04, 1.04486112e+04, 2.56922422e+04, -1.02766329e+04, 2.35137439e+04, 9.62746024e+02, 2.79983466e+04, 2.21216593e+04, 9.20928565e+03, 3.03490668e+04, 1.99912055e+04, 1.20438670e+04, 3.00985832e+04, 2.28445373e+04, 2.53193368e+03, 1.28598758e+04, 2.14288904e+04, -1.23016386e+04, 1.16536355e+04, 1.51943815e+04, -6.95738781e+03, 1.94049429e+04, 1.82144135e+04, 2.10293876e+04, 2.42393706e+04, 1.88700518e+04, 3.12614580e+03, 6.68480328e+03, 2.11872995e+04, 1.88734202e+04, 8.00493699e+03, 1.93347662e+04, 1.96450720e+04, 1.34064991e+04, 1.64068990e+04, -6.94743474e+03, 2.56211052e+04, -8.41223183e+03, 2.58562216e+04, -6.85420116e+03, 2.64068791e+04, 1.73265562e+04, -4.01980007e+03, 3.76516850e+04, 1.56277151e+04, 2.01996049e+04, 2.65081750e+04, -1.05868509e+04, 2.59287853e+04, -4.96806088e+03, 1.32000523e+04, -7.10351398e+03, 2.88477413e+03, 4.24803109e+04, -6.84345017e+03, 1.98806772e+04, 2.45388299e+04, -8.16594535e+03, 8.36178352e+03, 1.78555867e+04, 2.28804969e+04, 2.42634875e+04, 2.74984963e+04, 2.46513003e+04, 2.17666350e+04, 1.44914746e+04, 3.12145127e+03, 3.04987095e+04, -2.02211535e+03, 1.98677231e+04, 1.96423504e+04, 8.90696985e+03, 3.07026025e+04, 1.48978513e+04, 2.84487445e+04, -4.97523053e+03, -8.66521484e+02, 1.29763444e+04, -7.99536222e+03, 2.75359156e+04, 2.86476834e+04, 7.53286174e+03, 6.11551173e+03, 1.73268105e+04, 1.96363681e+04, 1.78118381e+04, 2.83903425e+04, 4.65014871e+03, 1.02605521e+03, 2.17578611e+04, 5.42991734e+03, -1.24515281e+04, 1.86704096e+04, 1.35216557e+04, 3.17605101e+04, 1.94115740e+04, 9.94811501e+03, 1.81044663e+04, 1.13553424e+04, 1.05495804e+04, 8.14530718e+03, 2.54853869e+04, 3.39594266e+04, 3.59410210e+03, 6.61478785e+03, 1.69307324e+04, 4.05745556e+04, 1.83727980e+04, 2.44659292e+04, -1.38422062e+04, 2.39812839e+04, 6.30206872e+03, 1.70565923e+04, 1.52304237e+04, 3.31359512e+04, -1.06537963e+04, 2.39127251e+04, 1.32528237e+04, 6.41677453e+03, 2.82429089e+04, 1.84240516e+04, 2.39529817e+04, 1.49468820e+04, 1.41015024e+04, -3.23017246e+03, -5.59277267e+03, 1.63212296e+04, 2.42808217e+04, 1.24670307e+04, 3.15458260e+04, 2.17338233e+04, 8.28983452e+03, 1.41617551e+04, 2.69616326e+04, 1.10601573e+04, 1.65777186e+04, 5.50576865e+03, 5.76146552e+03, -9.76252428e+03, 1.17118783e+04, 1.21399049e+03, -1.46113543e+03, 4.59854240e+03, 1.31693546e+04, 4.99593646e+03, -1.02712169e+04, 1.74376599e+04, 1.21050714e+04, 1.22957991e+04])
valid_y4_pred_df = pd.DataFrame(valid_y4_pred, columns = ["Validation_Prediction"])
valid_y4_pred_df
Validation_Prediction | |
---|---|
0 | 16088.874881 |
1 | 22865.803237 |
2 | 22916.515027 |
3 | 20285.756474 |
4 | 26242.490423 |
... | ... |
641 | 4995.936456 |
642 | -10271.216895 |
643 | 17437.659872 |
644 | 12105.071401 |
645 | 12295.799079 |
646 rows × 1 columns
Get the RMSE for validation set.
mse_valid_4 = sklearn.metrics.mean_squared_error(valid_y4, valid_y4_pred)
mse_valid_4
445185876.78789973
# As before
# import math
rmse_valid_4 = math.sqrt(mse_valid_4)
rmse_valid_4
21099.428352159204
valid_y4.describe()
count 646.000000 mean 13535.160991 std 23624.770667 min 1105.000000 25% 4708.750000 50% 6750.500000 75% 12827.750000 max 301070.000000 Name: Wage, dtype: float64
# As before:
# If using the dmba package:
# pip install dmba
# Done earlier. Just for illustration
# import dmba
# from dmba import regressionSummary
regressionSummary(valid_y4, valid_y4_pred)
Regression statistics Mean Error (ME) : -213.0518 Root Mean Squared Error (RMSE) : 21099.4284 Mean Absolute Error (MAE) : 11072.2334 Mean Percentage Error (MPE) : -48.3380 Mean Absolute Percentage Error (MAPE) : 152.1841
new_players_df
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 27 | 59 | 75 | 68 | 80 | 76 | 1 | 0 | 0 | 1 |
1 | 21 | 42 | 71 | 52 | 60 | 76 | 1 | 1 | 0 | 0 |
2 | 19 | 76 | 80 | 22 | 75 | 56 | 0 | 0 | 0 | 1 |
new_records_players_pred_4 = model_lasso.predict(new_players_df)
new_records_players_pred_4
array([24044.91108935, 14502.47793021, 23347.2433213 ])
# As before
# import pandas as pd
new_records_players_pred_df_4 = pd.DataFrame(new_records_players_pred_4, columns = ["Prediction"])
new_records_players_pred_df_4
# to export
# new_records_players_pred_df.to_csv("whatever_name.csv")
Prediction | |
---|---|
0 | 24044.911089 |
1 | 14502.477930 |
2 | 23347.243321 |
alpha = 0.05
ci_4 = np.quantile(train_residuals_4, 1 - alpha)
ci_4
19771.787699952376
def generate_results_confint_4(preds, ci_4):
df = pd.DataFrame()
df["Prediction"] = preds
if ci >= 0:
df["upper"] = preds + ci_4
df["lower"] = preds - ci_4
else:
df["upper"] = preds - ci_4
df["lower"] = preds + ci_4
return df
new_records_players_pred_confint_df_4 = generate_results_confint_4(new_records_players_pred_4, ci_4)
new_records_players_pred_confint_df_4
Prediction | upper | lower | |
---|---|---|---|
0 | 24044.911089 | 43816.698789 | 4273.123389 |
1 | 14502.477930 | 34274.265630 | -5269.309770 |
2 | 23347.243321 | 43119.031021 | 3575.455621 |
from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
train_X5 = train_X.copy()
train_y5 = train_y.copy()
valid_X5 = valid_X.copy()
valid_y5 = valid_y.copy()
train_X5, train_y5 = make_regression(n_features = 10, random_state = 666)
model_elastic = ElasticNet(random_state = 666)
model_elastic.fit(train_X5, train_y5)
ElasticNet(random_state=666)
type(train_X5)
numpy.ndarray
type(train_y5)
numpy.ndarray
train_y5_pred = model_elastic.predict(train_X5)
train_y5_pred
array([-239.89702287, 30.20471618, 146.41102948, 62.35336983, 177.08753921, -209.59344487, -159.0879793 , -242.38244857, 54.87544413, -26.34258741, 231.11945687, 309.32913489, 5.68659987, -102.16230273, 1.22670383, -59.4133525 , -85.15133799, -98.99056433, -87.74329659, 44.51773042, 159.3243627 , 117.5443711 , -164.62941892, 125.34439908, 8.90255763, -114.7913873 , 291.75693957, -202.28642913, 171.71712251, -3.92022359, -143.08624391, -89.08740894, -106.63397392, -194.63807216, -244.85791217, 122.01268078, 57.49586543, 175.80148181, -154.49993083, -41.72967953, 239.68753594, 8.39230607, 67.89458582, 203.52636571, -15.51052836, 97.53529516, 56.9366598 , -368.15724976, -26.46591475, -116.38192221, -129.1303425 , 277.99750842, -217.03245809, 110.77806504, -98.47472604, -37.8287712 , 211.12887163, -41.54899521, -114.55327138, 33.64785209, -3.58899371, 34.09284041, 60.20734999, -252.26679574, 154.37071339, -62.95584088, 127.81457702, -220.9094447 , -287.99704049, 42.56795837, -160.11760685, 42.74309451, 42.01724616, 44.95060293, -171.09356266, -143.00605263, -128.79839338, -158.04057611, 23.3411471 , -208.12593485, -60.49186376, -97.30339631, 254.14383238, -93.8974994 , 301.25260918, 9.89642346, -62.83757572, -130.10642815, 89.3330917 , 19.50624336, -2.98545981, -72.02696057, -39.62413746, -93.51527208, 141.07669442, 91.93363588, 51.80867706, -34.02927417, -75.6741745 , 95.76213508])
train_y5_pred_df = pd.DataFrame(train_y5_pred, columns = ["Training_Prediction"])
train_y5_pred_df
Training_Prediction | |
---|---|
0 | -239.897023 |
1 | 30.204716 |
2 | 146.411029 |
3 | 62.353370 |
4 | 177.087539 |
... | ... |
95 | 91.933636 |
96 | 51.808677 |
97 | -34.029274 |
98 | -75.674174 |
99 | 95.762135 |
100 rows × 1 columns
print("model intercept: ", model_elastic.intercept_)
print("model coefficients: ", model_elastic.coef_)
print("Model score: ", model_elastic.score(train_X5, train_y5))
model intercept: -6.133468616184231 model coefficients: [24.3065105 1.72597924 32.79853103 11.29839994 57.71459241 60.51954808 50.97607045 46.07791909 55.01341304 60.31821326] Model score: 0.8825434233397067
Get the RMSE for training set
mse_train_5 = sklearn.metrics.mean_squared_error(train_y5, train_y5_pred)
mse_train_5
5476.117116636182
import math
rmse_train_5 = math.sqrt(mse_train_5)
rmse_train_5
74.00079132439181
np.std(train_y5)
215.9223978114257
If using the dmba package:
pip install dmba
or
conda install -c conda-forge dmba
Then load the library
import dmba
from dmba import regressionSummary
import dmba
from dmba import regressionSummary
regressionSummary(train_y5, train_y5_pred)
Regression statistics Mean Error (ME) : 0.0000 Root Mean Squared Error (RMSE) : 74.0008 Mean Absolute Error (MAE) : 59.4893 Mean Percentage Error (MPE) : -12.6902 Mean Absolute Percentage Error (MAPE) : 82.0362
Normality
import numpy as np
from scipy.stats import shapiro
shapiro(train_y5)
ShapiroResult(statistic=0.9933440089225769, pvalue=0.907919704914093)
train_residuals_5 = train_y5 - train_y5_pred
train_residuals_5
array([-115.87355838, 20.55225498, 105.29541848, 70.33789465, 85.10534445, -119.32838286, -90.17104944, -94.61626251, 49.49931428, -13.57318615, 134.61766684, 121.55259874, 6.61723475, -38.86744747, -8.05409341, 15.02212719, -43.36458565, -2.26900164, -44.19868389, 31.22462816, 83.89791053, 62.4307288 , -81.24968022, 65.54643487, 7.05336542, -61.84098657, 168.23587577, -74.16622347, 116.40139242, 22.32224043, -57.88069139, -14.05995462, -10.96375877, -95.15789269, -151.64687109, 53.23292199, 35.31233236, 90.27628713, -94.06304233, 25.63635158, 112.03495283, 3.70977227, 33.8162091 , 88.91654268, -31.254995 , 39.51105913, 29.7090374 , -167.38446484, 25.89299222, -48.92758299, -84.58876275, 148.84402504, -114.49031964, 76.21986696, -62.0456898 , -20.23421487, 89.04950568, -1.57350087, -56.13256276, 1.78988291, -26.7977081 , 44.23729077, 0.6689099 , -131.99698805, 77.07876543, -25.64571667, 48.77518018, -94.01834119, -157.90006355, 23.29664651, -77.19246555, 55.45884587, 5.61274092, 33.90949488, -80.64159432, -67.74133301, -52.99278782, -43.69628765, 12.47426646, -108.35823508, -17.70919494, -57.74881675, 135.72993831, -51.58580114, 185.07822914, 8.27717559, -31.0038929 , -66.45850356, 27.76413165, 36.8337016 , -6.4878831 , -19.69525644, -9.7356234 , -32.31802903, 81.7506021 , 51.2448407 , 36.8203488 , -16.76202468, 14.50928694, 75.27942723])
shapiro(train_residuals_5)
ShapiroResult(statistic=0.9953042268753052, pvalue=0.9822211861610413)
VIF may not apply
valid_y5_pred = model_elastic.predict(valid_X5)
valid_y5_pred
C:\Users\byeo\Anaconda3\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but ElasticNet was fitted without feature names warnings.warn(
array([10352.05910353, 12105.44069022, 12255.07200558, 12134.3868721 , 13147.61617235, 10243.06386701, 11313.32224233, 11032.09564747, 12791.92844967, 8908.74659791, 10387.15719227, 11464.07362857, 10906.60311722, 10608.97750665, 12831.67428293, 14471.71294576, 12320.35842574, 12052.28358989, 9515.58499339, 12535.33345669, 10151.37624276, 12170.51937152, 11334.90749268, 12147.7654329 , 9794.85597288, 8467.82894156, 10686.93646591, 12472.92965584, 11986.96867404, 11006.64046664, 9411.96610806, 11355.46901679, 12196.38602963, 12709.13983713, 8970.73975657, 11031.00275534, 10464.23712765, 10808.36750752, 8957.50677167, 13141.91778899, 13576.62603385, 8444.99670939, 11176.32407544, 9902.04169514, 12083.37100625, 12948.20034059, 8122.52822429, 9119.78689569, 10840.40275917, 11571.32309035, 12138.15395455, 10521.20361808, 9795.8311031 , 12483.43309484, 11418.35024528, 11595.3768299 , 8463.55143622, 11243.80020398, 9690.64626627, 11292.38087084, 10254.85268662, 12809.3242474 , 11854.75627534, 12182.38536325, 9764.61970382, 11595.34411481, 10976.90641622, 10068.74833515, 9401.38304107, 13545.93970676, 10717.11283639, 9233.12389415, 10440.76178304, 11229.62273501, 12312.2758096 , 12193.72842343, 11397.17414624, 10716.94611653, 11713.08899622, 13630.14063311, 13003.82337793, 15368.58879639, 9761.59161119, 12149.79714464, 14078.54221453, 11085.17724553, 10721.85414481, 11104.99277646, 12101.6610874 , 12476.05627164, 13347.31622935, 11723.9698162 , 10614.75795541, 13080.32887597, 11901.4709434 , 11840.69140093, 11466.01742385, 12048.83126325, 11536.50027683, 14044.74049154, 9899.55146143, 11766.23338592, 12373.55297498, 9485.01408948, 12152.60672717, 11011.72078307, 12035.81933079, 12523.09261077, 10044.86099962, 13679.61907932, 12119.19426514, 11555.1113625 , 9195.81690137, 11059.13270076, 11291.64505669, 8129.2754332 , 9453.67669073, 10438.16486942, 13496.484549 , 11713.15135056, 9565.65875894, 10127.63828516, 11211.25847167, 8730.46393144, 11464.48238118, 13992.42592753, 12680.9219006 , 13920.38074777, 9548.5274636 , 8889.91531046, 11755.34749597, 9125.06800726, 9121.30409183, 10794.18810957, 13345.27496844, 8959.37264987, 8194.45885751, 9090.65510799, 8559.99299338, 9895.62335948, 11191.77979577, 12314.54463745, 11448.29771524, 9160.53378102, 9156.10331445, 12909.91829704, 10859.31073101, 10070.28886388, 9204.80176842, 12178.93635149, 12830.08269192, 9876.61311764, 13882.44657459, 10469.0559072 , 10375.89184879, 10857.74933212, 11568.49556747, 8363.05241204, 11656.17277285, 11071.47688265, 11524.56418679, 7851.60146357, 9697.27927879, 12111.67380638, 12002.93630653, 10001.97324197, 11498.83815181, 12221.96067723, 10924.12307689, 11378.00042588, 9283.17597152, 8606.10583163, 9978.55170431, 9736.45350326, 9331.35800275, 10215.8221707 , 12561.04023405, 11107.05957564, 11581.62588877, 10522.64186496, 12798.60222989, 8118.80733405, 9598.93135701, 11417.7759794 , 10414.19744897, 10118.95520032, 12950.08051071, 12077.26462182, 11444.33982293, 8538.05715103, 13014.41184658, 9024.42345157, 10577.58323199, 10824.73004598, 8959.15622165, 9534.11628987, 14733.93676839, 9631.46855787, 10754.48023068, 8142.09307727, 11148.02085278, 13941.64646087, 11565.43884066, 13632.35722122, 10554.12980061, 10130.09110264, 12728.87927866, 10110.23841177, 12104.07799694, 9800.3472128 , 9850.34126804, 14501.1381244 , 11385.27675579, 8917.82199634, 11768.23753761, 13311.39581984, 9936.86455837, 9051.59327363, 12573.05898577, 11060.07161974, 11961.10514464, 11593.11161466, 9832.93259552, 12492.82402189, 12478.95319134, 11784.66170994, 9689.1675903 , 10556.50773932, 12768.75650148, 11101.65276867, 8914.20026887, 11882.03457769, 9807.98637367, 11884.85828607, 9353.19048249, 11777.48423306, 10074.44938971, 10161.97810891, 12287.35960953, 10605.23222895, 10756.1904017 , 11152.64050382, 8884.06056659, 8762.31273865, 10444.39992637, 10519.90297765, 10906.78681814, 9985.65571416, 11319.41523747, 10209.02622837, 7658.66885234, 9269.95118605, 10448.99818541, 10603.75092298, 13021.66467132, 10954.2872683 , 12007.53026353, 11106.66541944, 12126.05183304, 11685.5686955 , 10998.1895253 , 9623.88504223, 10508.47416779, 11800.44219586, 12881.95959803, 10789.33550312, 11631.22698844, 10747.60354427, 11463.68774696, 11661.68780507, 13924.20457518, 9646.73076185, 13060.49409883, 11382.35314513, 9819.18444803, 12535.12146765, 8264.88828232, 12032.72088507, 9717.07893981, 9153.073164 , 10643.48540969, 11153.4817259 , 10863.82417413, 11734.51629157, 10654.21132183, 9483.72026011, 12745.64816369, 11552.51173636, 12392.31644634, 9622.34634275, 10327.2845096 , 9968.3212924 , 14767.08367069, 10554.27373735, 10679.54042274, 12297.78539478, 10002.89591453, 8926.61180675, 12718.1118005 , 10785.41060887, 8502.21359245, 10366.58046846, 8884.93706053, 12147.61982551, 8540.44564116, 9568.2296874 , 11746.79516293, 11430.09696906, 12074.74703264, 6932.90638676, 11701.53820379, 9766.42011926, 9550.75765528, 11329.59590192, 12041.51349855, 8855.99875912, 10934.37560517, 12682.62925959, 9339.65202606, 11163.52806206, 11360.37063843, 13075.13477091, 15304.40279194, 12964.4931213 , 10251.0757177 , 8353.81395642, 11892.10702604, 12101.18330051, 12283.95447085, 12177.44514571, 10274.42117972, 7787.09898337, 8360.20426381, 13258.87581233, 12259.53250973, 13261.76167787, 13009.05294426, 9592.56570422, 10496.55630757, 8346.13613016, 11649.80655946, 10817.12275202, 11465.9587795 , 10268.55390299, 11161.97842404, 10441.96006572, 9975.27498703, 8804.15908119, 8864.31669839, 11066.63683321, 12374.1176849 , 8449.18106432, 12293.82925208, 11690.10907326, 12722.85412665, 11682.91566374, 10062.56357869, 11832.5551472 , 13531.39222571, 11224.24260704, 9967.5888109 , 10225.76056374, 12254.50939371, 11769.97854703, 13159.58425392, 12658.84164177, 10618.01908829, 9012.0754734 , 11188.27492229, 11814.17928184, 10130.3801664 , 10889.78883617, 9807.04788674, 8530.3089394 , 8433.50363483, 12328.56321367, 9267.9742972 , 11014.41671047, 12236.77969102, 10760.96746641, 11495.89420143, 8713.55480494, 14191.9125366 , 11861.89064184, 10929.37940198, 9852.93814648, 13532.24549513, 8975.74784107, 11950.50064859, 13922.89647561, 9206.669179 , 13480.56012939, 11824.1702701 , 8428.99332366, 10280.62786165, 10761.27083749, 11888.92838727, 11404.13078614, 12723.24363395, 10964.34822915, 9976.53263002, 12442.64820484, 11015.48251927, 11818.12250712, 13363.67455221, 12625.19385989, 9711.13491179, 9743.68379298, 11080.78110568, 11987.06609374, 11063.27187731, 10935.10091144, 11435.63005305, 10717.67824628, 10668.92756333, 9858.02067596, 9759.03261078, 11732.14060299, 10895.69565095, 15186.2229698 , 10398.41151905, 10895.39018223, 11743.04593773, 12011.69427789, 11413.31377267, 8920.81218442, 12100.64853017, 12142.685605 , 8669.94306172, 10863.2632574 , 13842.42029289, 12295.14895607, 12776.83092699, 11786.50872889, 11064.71718986, 13794.76181894, 10717.36420635, 8572.96565498, 9687.30808491, 13931.27732276, 12540.69681497, 11384.08531118, 11878.75612326, 11876.75124707, 11945.06229399, 11271.79918663, 8897.90743476, 10371.35173223, 12415.0924782 , 8863.66537364, 11616.70456178, 8607.00880655, 10581.63405677, 9719.38670152, 11385.44425347, 7473.04793819, 13022.48744777, 11396.35349915, 11882.66746664, 13018.94540275, 10950.96681061, 10055.39106399, 10465.06848606, 13096.17154433, 10664.92517367, 11846.69946251, 11920.63555508, 8848.83944817, 11676.45409336, 14680.38429109, 11238.51192127, 12484.13530186, 11623.46332236, 13694.25618208, 11908.87620961, 10457.15701494, 12367.81699053, 11605.92559126, 9909.44232993, 12650.86097834, 12503.35314765, 11754.6511254 , 9521.76644665, 12951.99522976, 8300.87555145, 8129.5321238 , 12482.49265728, 10729.17565091, 9704.17957748, 10108.01493561, 8040.68551843, 9459.58537563, 8875.6548931 , 12545.83598965, 8120.71763799, 13381.77370333, 11457.21936677, 12149.08982721, 10114.53871164, 12604.93300858, 8408.49356248, 10492.56132372, 8800.45969315, 12572.50219925, 12052.52606993, 9558.83283931, 13302.16550826, 12140.71994083, 10561.94991113, 13252.63075741, 12361.08207867, 9293.17586843, 10773.29517317, 12575.22191801, 8150.31728555, 10158.31730872, 10498.21211411, 9294.11718509, 11512.22492837, 12086.95191814, 11905.58951226, 12056.21979514, 11783.55043798, 10381.87249236, 9614.51471077, 11982.33598525, 11563.37704148, 10979.23136041, 11899.11945894, 12037.11036324, 11125.77367072, 10586.4840021 , 8706.95283703, 13711.34439512, 8070.54339278, 12049.8503929 , 9027.25424165, 12085.50782813, 10705.38046147, 8717.6062918 , 14370.0903199 , 11037.30558497, 11487.37476506, 11322.71420903, 8033.28239732, 12002.20514018, 9264.05699996, 11466.35549296, 8942.4513159 , 9025.74462322, 13902.44326097, 8893.78707802, 11567.77257344, 12551.09105077, 8402.89606438, 10930.78799594, 11648.91542521, 12917.56854583, 12713.15335777, 11814.65224847, 12958.12124841, 12158.03067397, 11179.37678291, 9278.12924163, 12640.02543852, 8546.49534972, 11348.10962545, 11072.99129439, 11023.97942909, 12601.76271511, 11570.97552015, 13563.39314908, 8976.9451948 , 8887.12914342, 11164.12997477, 7349.59801436, 12297.6377403 , 12003.38087139, 11077.40788284, 10127.2837807 , 10759.46387563, 10649.04178383, 10796.07837745, 13048.04448806, 10464.6499887 , 9455.36973093, 11106.23404978, 10223.40089743, 8676.97838846, 10877.06833973, 10786.72331017, 13232.87131753, 12560.65966486, 10129.57959853, 11762.86563682, 11213.62427294, 10723.83478156, 10323.19110693, 12887.51244029, 12165.12147262, 9067.67034924, 9815.91069961, 10922.34608037, 14307.36191062, 10816.18660139, 12202.81147321, 8499.88212335, 12350.43882535, 9230.93735655, 11370.61611376, 11200.67622136, 13770.79491507, 8330.06371747, 11742.99846727, 11046.85775595, 9297.32581794, 12799.89408687, 10574.64858958, 11423.39836545, 11315.90253714, 10307.65262234, 7893.2361795 , 9870.0870919 , 10938.20075926, 12948.14320036, 10814.20903472, 14047.97609045, 11302.41790933, 9946.63323734, 9500.22293708, 14010.0340521 , 10282.6559869 , 11970.80623827, 9874.12347864, 9971.93112213, 8493.84065595, 10388.7205472 , 8966.89519204, 9195.00539556, 10897.77393112, 11058.80252548, 9508.46221552, 8710.25832147, 12042.05594272, 10221.49974833, 12455.93087572])
valid_y5_pred_df = pd.DataFrame(valid_y5_pred, columns = ["Validation_Prediction"])
valid_y5_pred_df
Validation_Prediction | |
---|---|
0 | 10352.059104 |
1 | 12105.440690 |
2 | 12255.072006 |
3 | 12134.386872 |
4 | 13147.616172 |
... | ... |
641 | 9508.462216 |
642 | 8710.258321 |
643 | 12042.055943 |
644 | 10221.499748 |
645 | 12455.930876 |
646 rows × 1 columns
Get the RMSE for validation set.
mse_valid_5 = sklearn.metrics.mean_squared_error(valid_y5, valid_y5_pred)
mse_valid_5
529003242.8305753
# As before
# import math
rmse_valid_5 = math.sqrt(mse_valid_5)
rmse_valid_5
23000.070496208817
np.std(valid_y5)
23606.478158841604
# As before:
# If using the dmba package:
# pip install dmba
# Done earlier. Just for illustration
# import dmba
# from dmba import regressionSummary
regressionSummary(valid_y5, valid_y5_pred)
Regression statistics Mean Error (ME) : 2498.2800 Root Mean Squared Error (RMSE) : 23000.0705 Mean Absolute Error (MAE) : 9245.0033 Mean Percentage Error (MPE) : -69.6001 Mean Absolute Percentage Error (MAPE) : 93.7624
new_players_df
Age | Balance | ShotPower | Aggression | Positioning | Composure | Preferred Foot_Right | Body Type_Lean | Body Type_Normal | Body Type_Stocky | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 27 | 59 | 75 | 68 | 80 | 76 | 1 | 0 | 0 | 1 |
1 | 21 | 42 | 71 | 52 | 60 | 76 | 1 | 1 | 0 | 0 |
2 | 19 | 76 | 80 | 22 | 75 | 56 | 0 | 0 | 0 | 1 |
#new_players_df = make_regression(n_features = 1, random_state = 666)
new_records_players_pred_5 = model_elastic.predict(new_players_df)
new_records_players_pred_5
C:\Users\byeo\Anaconda3\lib\site-packages\sklearn\base.py:443: UserWarning: X has feature names, but ElasticNet was fitted without feature names warnings.warn(
array([13308.1034435 , 11652.42206796, 11237.31927071])
new_records_players_pred_5_df = pd.DataFrame(new_records_players_pred_5, columns = ["Prediction"])
new_records_players_pred_5_df
Prediction | |
---|---|
0 | 13308.103443 |
1 | 11652.422068 |
2 | 11237.319271 |
alpha = 0.05
ci_5 = np.quantile(train_residuals_5, 1 - alpha)
ci_5
122.20585214240019
def generate_results_confint_5(preds, ci_5):
df = pd.DataFrame()
df["Prediction"] = preds
if ci >= 0:
df["upper"] = preds + ci_5
df["lower"] = preds - ci_5
else:
df["upper"] = preds - ci_5
df["lower"] = preds + ci_5
return df
new_records_players_pred_confint_df_5 = generate_results_confint_5(new_records_players_pred_5, ci_5)
new_records_players_pred_confint_df_5
Prediction | upper | lower | |
---|---|---|---|
0 | 13308.103443 | 13430.309296 | 13185.897591 |
1 | 11652.422068 | 11774.627920 | 11530.216216 |
2 | 11237.319271 | 11359.525123 | 11115.113419 |