Faster R-CNN PyTorch 1.2 自定义数据集训练:VOC格式20类mAP 80.36%实战指南
1. 环境配置与项目准备
在开始训练之前,我们需要确保开发环境配置正确。以下是推荐的配置步骤:
# 创建Python虚拟环境 conda create -n fasterrcnn python=3.7 conda activate fasterrcnn # 安装PyTorch 1.2及相关依赖 pip install torch==1.2.0 torchvision==0.4.0 pip install opencv-python pillow matplotlib tqdm项目结构建议如下:
faster-rcnn-pytorch/ ├── data/ │ ├── VOCdevkit/ │ │ └── VOC2007/ │ │ ├── Annotations/ │ │ ├── JPEGImages/ │ │ └── ImageSets/ ├── model_data/ │ ├── voc_classes.txt │ └── pretrained_weights/ ├── utils/ ├── train.py ├── predict.py └── get_map.py提示:建议使用Git克隆官方仓库以获取完整代码结构:
git clone https://github.com/bubbliiiing/faster-rcnn-pytorch
2. 数据集准备与VOC格式转换
Faster R-CNN通常使用PASCAL VOC格式的数据集。以下是自定义数据集转换的关键步骤:
目录结构规范:
- JPEGImages/:存放所有训练图片(.jpg)
- Annotations/:存放XML格式标注文件
- ImageSets/Main/:包含train.txt, val.txt等划分文件
标注文件示例:
<annotation> <filename>000001.jpg</filename> <size> <width>800</width> <height>600</height> <depth>3</depth> </size> <object> <name>cat</name> <bndbox> <xmin>100</xmin> <ymin>200</ymin> <xmax>300</xmax> <ymax>400</ymax> </bndbox> </object> </annotation>- 自动生成训练集/验证集划分:
# voc_annotation.py关键参数配置 classes_path = 'model_data/voc_classes.txt' # 你的类别定义文件 trainval_percent = 0.9 # 训练验证集比例 train_percent = 0.9 # 训练集比例3. 模型架构关键参数解析
Faster R-CNN的核心组件需要特别关注以下配置参数:
| 参数名称 | 推荐值 | 作用说明 |
|---|---|---|
| anchors_size | [8,16,32] | 控制先验框的基准大小 |
| backbone | resnet50 | 特征提取网络选择 |
| input_shape | [600,600] | 输入图像尺寸 |
| Freeze_Epoch | 50 | 冻结训练轮数 |
| UnFreeze_Epoch | 100 | 解冻训练总轮数 |
| Freeze_batch_size | 4 | 冻结阶段batch size |
| Unfreeze_batch_size | 2 | 解冻阶段batch size |
特征提取网络配置示例:
class ResNet(nn.Module): def __init__(self, block, layers, num_classes=1000): self.inplanes = 64 super(ResNet, self).__init__() # 初始卷积层 self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0, ceil_mode=True) # 四个残差块 self.layer1 = self._make_layer(block, 64, layers[0]) self.layer2 = self._make_layer(block, 128, layers[1], stride=2) self.layer3 = self._make_layer(block, 256, layers[2], stride=2) self.layer4 = self._make_layer(block, 512, layers[3], stride=2) # ROI头部网络 self.roi_pool = RoIPool((7,7), spatial_scale=1/16.0)4. 训练流程优化技巧
4.1 两阶段训练策略
- 冻结训练阶段:
# train.py中的关键配置 Freeze_Train = True Init_Epoch = 0 Freeze_Epoch = 50 Freeze_batch_size = 4 Freeze_lr = 1e-4- 解冻训练阶段:
UnFreeze_Epoch = 100 Unfreeze_batch_size = 2 Unfreeze_lr = 1e-54.2 学习率调整策略
推荐使用余弦退火学习率:
def get_lr(optimizer): for param_group in optimizer.param_groups: return param_group['lr'] lr_scheduler = optim.lr_scheduler.CosineAnnealingLR( optimizer, T_max=5, eta_min=1e-5 )4.3 数据增强方案
# 训练集数据增强 train_transform = transforms.Compose([ transforms.Resize((600,600)), transforms.ColorJitter( brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # 验证集只需基础转换 val_transform = transforms.Compose([ transforms.Resize((600,600)), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])5. 模型评估与结果分析
5.1 mAP计算流程
使用get_map.py计算mAP的步骤:
- 生成预测结果:
python get_dr_txt.py --model-path logs/ep100-loss0.02.pth- 计算mAP:
python get_map.py --dr-path results/detection-results/ \ --gt-path data/VOCdevkit/VOC2007/Annotations/5.2 关键评估指标
| 指标名称 | 本模型结果 | 参考基准 |
|---|---|---|
| mAP@0.5 | 80.36% | 77.5% (VGG16) |
| mAP@0.5:0.95 | 56.2% | 53.4% (VGG16) |
| 推理速度 | 8 FPS | 5 FPS (VGG16) |
5.3 常见问题排查
- 低召回率:调整RPN的NMS阈值(默认0.7)
- 高误检率:增加正样本的IoU阈值(默认0.5)
- 训练震荡:减小学习率或增大batch size
6. 模型部署与推理优化
6.1 预测脚本关键参数
# predict.py配置示例 _defaults = { "model_path": 'logs/ep100-loss0.02.pth', "classes_path": 'model_data/voc_classes.txt', "confidence": 0.5, # 置信度阈值 "nms_iou": 0.3, # NMS IoU阈值 "backbone": "resnet50" }6.2 TorchScript导出
# 模型导出为TorchScript model.eval() example = torch.rand(1, 3, 600, 600).to(device) traced_script = torch.jit.trace(model, example) traced_script.save("fasterrcnn_res50.pt")6.3 性能优化技巧
- TensorRT加速:
trtexec --onnx=fasterrcnn.onnx \ --saveEngine=fasterrcnn.engine \ --fp16- 量化压缩:
model = torch.quantization.quantize_dynamic( model, {nn.Linear, nn.Conv2d}, dtype=torch.qint8 )7. 进阶调优方向
- 自定义backbone:
from torchvision.models import mobilenet_v3_large backbone = mobilenet_v3_large(pretrained=True).features backbone.out_channels = 960 # 必须设置输出通道数- 改进RPN结构:
class CustomRPN(nn.Module): def __init__(self, in_channels): super().__init__() self.conv = nn.Conv2d(in_channels, 512, 3, padding=1) self.cls_logits = nn.Conv2d(512, 3*2, 1) # 3 anchors * 2 classes self.bbox_pred = nn.Conv2d(512, 3*4, 1) # 3 anchors * 4 coords def forward(self, x): logits = [] bbox_reg = [] for feature in x: t = F.relu(self.conv(feature)) logits.append(self.cls_logits(t)) bbox_reg.append(self.bbox_pred(t)) return logits, bbox_reg- 损失函数改进:
def focal_loss(pred, target, alpha=0.25, gamma=2.0): BCE_loss = F.binary_cross_entropy_with_logits(pred, target, reduction='none') pt = torch.exp(-BCE_loss) loss = alpha * (1-pt)**gamma * BCE_loss return loss.mean()在实际项目中,我们发现当使用ResNet50作为backbone时,将anchors_size调整为[4,8,16]对小物体检测效果提升约5%。而采用余弦退火学习率策略相比固定学习率,最终mAP可提升2-3个百分点。