- 数据集描述: sklearn的lfw_people函数在线下载55个外国人图片文件夹数据集来精确实现人脸识别并提取人脸特征向量
 

 
- 数据集地址: sklearn.datasets.fetch_lfw_people — scikit-learn 1.2.1 documentation
 - PCA降维:   pca = PCA(n_components=0.9) 
 - 数据拆分:  X_train, X_test, y_train, y_test = train_test_split(X,  y,  test_size = 0.1)
 - 支持向量机:  svc = SVC()      # svc.fit(X_train_pca, y_train)
 - 网格搜索最佳参数:
 
svc = SVC()
params = {'C':np.logspace(-3,1,20),'kernel':['rbf','poly','sigmoid','linear']}
gc = GridSearchCV(estimator = svc,param_grid = params)
gc.fit(X_train_pca,y_train)
print('网格搜索最佳参数:',gc.best_params_)   #  {'C': 3.79269019073,'kernel':'rbf'}
print('模型得分是:',gc.score(X_test_pca,y_test))   # 0.883720930232
y_pred = gc.predict(X_test_pca)
 
 
2、SVC建模人脸识别
 
2.1、导包
 
import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV
from sklearn import datasets
 
2.2、数据加载
 
# 第一次加载,需要联网下载
# 下载路径:C:\Users\likai\scikit_learn_data\lfw_home
faces = datasets.fetch_lfw_people(resize= 1,min_faces_per_person=70)
# 形状是:(125,94)
X  = faces['data']
y = faces['target']
display(X.shape,y.shape)       # (1288, 11750)   (1288,)
 
2.3、数据降维与拆分
 
pca = PCA(n_components=0.9)
X_pca = pca.fit_transform(X)X_train,X_test,X_train_pca,X_test_pca,
y_train,y_test = train_test_split(X, X_pca, y, test_size = 0.1)
display(X_train.shape,X_test.shape)
display(X_train_pca.shape,X_test_pca.shape)
 
2.4、直接使用SVC建模预测 
 
svc = SVC()
svc.fit(X_train_pca,y_train)
svc.score(X_test_pca,y_test)      # 输出:0.7984496124031008
 
2.5、网格搜索确定最佳参数 
 
%%time
svc = SVC()
params = {'C':np.logspace(-3,1,20),'kernel':['rbf','poly','sigmoid','linear']}
gc = GridSearchCV(estimator = svc,param_grid = params)
gc.fit(X_train_pca,y_train)
print('网格搜索最佳参数:',gc.best_params_)   #  {'C': 3.79269019073,'kernel':'rbf'}
print('模型得分是:',gc.score(X_test_pca,y_test))   # 0.883720930232
y_pred = gc.predict(X_test_pca)
 

 
2.6、数据可视化
 
target_names = faces.target_names
print('目标任务名字如下:',target_names)
plt.figure(figsize=(5*2,10*3))
for i in range(50):plt.subplot(10,5,i + 1)plt.imshow(X_test[i].reshape(125,-1),cmap = 'gray')true_name = target_names[y_test[i]].split(' ')[-1]pred_name = target_names[y_pred[i]].split(' ')[-1]plt.title('True:%s\nPred:%s' % (true_name,pred_name))plt.axis('off')
 
