当前位置：首页 > news >正文

从数据到模型：手把手教你预处理MPIIFaceGaze和EyeDiap数据集（Python实战）

news 2026/5/25 17:32:27

从数据到模型手把手教你预处理MPIIFaceGaze和EyeDiap数据集Python实战当你第一次打开MPIIFaceGaze或EyeDiap数据集的压缩包时那种面对杂乱文件夹和神秘.mat文件的迷茫感我太熟悉了。作为计算机视觉工程师我们总想直接跳到模型训练那一步但数据预处理的质量往往决定了整个项目的成败。本文将带你用Python一步步征服这两个经典视线估计数据集把原始数据转化为模型友好的格式。1. 环境准备与数据解压视线估计任务对数据精度要求极高一个像素的偏差可能导致完全错误的注视方向预测。我们先搭建可靠的预处理环境conda create -n gaze_preprocess python3.8 conda activate gaze_preprocess pip install numpy opencv-python scipy matplotlib scikit-learn对于MPIIFaceGaze解压后会看到15个以p00-p14命名的文件夹每个包含按日期组织的图像文件夹calibration/ 目录下的相机参数文件神秘的Annotation.mat文件EyeDiap的结构则更为复杂Participant_X/ ├── Session_X/ │ ├── ball_tracking.txt │ ├── eye_tracking.txt │ ├── head_pose.txt │ └── screen_coordinates.txt └── calibration/ ├── depth_calibration.txt └── rgb_vga_calibration.txt提示建议使用os.walk()遍历目录避免硬编码路径。不同参与者的数据质量可能有显著差异。2. 解析MPIIFaceGaze的标注数据MPIIFaceGaze的标注信息都存储在Annotation.mat文件中使用scipy.io加载后会遇到第一个坑——MATLAB和Python的数组索引差异import scipy.io annot scipy.io.loadmat(p00/Annotation.mat) data annot[data][0][0] # MATLAB多维数组的特殊结构 # 关键字段索引对照表 field_mapping { image_path: 0, # 字符串 screen_pos: [1,2], # 屏幕坐标(x,y) face_landmarks: range(3,15), # 6个面部关键点 head_pose: range(15,21), # [rx,ry,rz,tx,ty,tz] face_center: range(21,24), # 3D人脸中心 gaze_target: range(24,27), # 3D注视目标 eval_eye: 27 # 评估用眼(L/R) }处理相机参数时需要特别注意畸变系数。以下代码演示如何构建相机模型camera_params scipy.io.loadmat(p00/calibration/Camera.mat) camera_matrix camera_params[cameraMatrix] dist_coeffs camera_params[distCoeffs][0] def undistort_points(points, camera_matrix, dist_coeffs): 校正图像关键点畸变 points np.array(points, dtypenp.float32) return cv2.undistortPoints(points, camera_matrix, dist_coeffs, Pcamera_matrix)3. 处理EyeDiap的时序数据EyeDiap的挑战在于其视频序列特性需要同步处理多个.txt文件中的时序数据。我们首先构建时间对齐的数据结构def load_eyediap_session(session_path): 加载单个session的所有跟踪数据 data { head_pose: parse_tracking_file(f{session_path}/head_pose.txt), eye_pos: parse_tracking_file(f{session_path}/eye_tracking.txt), ball_pos: parse_tracking_file(f{session_path}/ball_tracking.txt) } # 时间对齐处理 timestamps np.intersect1d(data[head_pose][ts], data[eye_pos][ts]) aligned_data [] for ts in timestamps: frame_data { head_rot: data[head_pose][data[head_pose][ts] ts][[rx,ry,rz]], eye_3d: data[eye_pos][data[eye_pos][ts] ts][[x,y,z]], ball_3d: data[ball_pos][data[ball_pos][ts] ts][[x,y,z]] } aligned_data.append(frame_data) return aligned_data对于动态头部运动(M)的session建议先进行平滑滤波from scipy.signal import savgol_filter def smooth_head_poses(head_poses, window_size11, poly_order2): 使用Savitzky-Golay滤波器平滑头部运动 angles np.array([p[head_rot] for p in head_poses]) smoothed savgol_filter(angles, window_size, poly_order, axis0) for i, p in enumerate(head_poses): p[head_rot] smoothed[i]4. 坐标系转换与数据归一化不同数据集使用不同的坐标系系统这是视线估计中最容易出错的环节。MPIIFaceGaze使用相机坐标系而EyeDiap使用Kinect的深度相机坐标系。坐标系转换矩阵对比数据集原点X轴Y轴Z轴单位MPIIFaceGaze相机光心右下前mmEyeDiapKinect红外相机右上前m实现坐标系统一化的关键代码def mpii_to_standard(coords): 转换MPIIFaceGaze坐标到标准坐标系(Y向上) # 原始数据中Y轴向下需要翻转 transform np.array([[1, 0, 0], [0, -1, 0], [0, 0, 1]]) return coords transform.T def eyediap_to_standard(coords): 转换EyeDiap坐标到标准坐标系(米→毫米) return coords * 1000 # 米转换为毫米数据归一化是提升模型泛化能力的关键步骤。对于视线方向我们通常使用球面坐标系def cartesian_to_spherical(vec): 3D笛卡尔坐标转球面坐标(θ,φ) r np.linalg.norm(vec) theta np.arctan2(vec[1], vec[0]) # 水平角 phi np.arcsin(vec[2] / r) # 俯仰角 return np.array([theta, phi])5. 构建PyTorch/TensorFlow数据集类最后我们将处理好的数据封装成标准数据集格式。以下是PyTorch的实现示例from torch.utils.data import Dataset class GazeDataset(Dataset): def __init__(self, images, gaze_labels, transformNone): self.images images self.labels gaze_labels self.transform transform def __len__(self): return len(self.images) def __getitem__(self, idx): image cv2.imread(self.images[idx]) image cv2.cvtColor(image, cv2.COLOR_BGR2RGB) label self.labels[idx] # 已经预处理为[θ,φ] if self.transform: image self.transform(image) return { image: image, gaze: torch.FloatTensor(label), original_image: self.images[idx] }对于需要头部姿态信息的情况可以扩展数据加载逻辑def load_sample_with_pose(idx): sample self.__getitem__(idx) pose self.head_poses[idx] # [rx, ry, rz, tx, ty, tz] # 转换为4x4变换矩阵 rot_mat Rotation.from_euler(xyz, pose[:3]).as_matrix() transform np.eye(4) transform[:3, :3] rot_mat transform[:3, 3] pose[3:] sample[head_pose] torch.FloatTensor(transform) return sample6. 常见问题与解决方案在实际预处理过程中你可能会遇到以下典型问题问题1MPIIFaceGaze中某些参与者的数据无法加载检查MATLAB版本差异有些.mat文件需要指定mat_dtypeTrue使用h5py替代scipy.io加载新版MATLAB文件问题2EyeDiap的时间戳对不齐检查不同.txt文件的帧率是否一致尝试插值对齐而非严格时间匹配问题3归一化后的视线方向存在异常值添加简单的物理约束检查def is_valid_gaze(theta, phi): return -np.pi theta np.pi and -np.pi/2 phi np.pi/2预处理完成后建议保存多种格式的数据副本# 保存为HDF5格式 with h5py.File(preprocessed.h5, w) as f: f.create_dataset(images, dataimage_paths) f.create_dataset(gaze, datagaze_vectors) # 保存为TFRecords (TensorFlow优化格式) def _bytes_feature(value): return tf.train.Feature(bytes_listtf.train.BytesList(value[value])) example tf.train.Example(featurestf.train.Features(feature{ image: _bytes_feature(encoded_image), gaze: _bytes_feature(gaze_vector.tobytes()) }))记住好的预处理流程应该像精心调校的生产线——每个处理步骤都可验证、可复现并且留有原始数据的备份。当你的模型表现不稳定时90%的情况下问题出在预处理环节而不是模型架构本身。

查看全文

http://www.zskr.cn/news/1381796.html