Spring Boot 批量数据导入性能优化实战指南-尧图网络科技

Spring Boot 批量数据导入性能优化实战指南

一、问题背景

在企业级系统中，Excel 批量导入是高频业务场景。当导入数据量从几十条增长到数千条时，未经优化的逐条处理方式会导致接口响应时间呈线性增长，甚至触发 HTTP 超时。

核心矛盾：校验逻辑需要逐条判断，但数据库交互不应逐条执行。

二、性能瓶颈分析

典型的未优化导入流程

解析 Excel → 循环每条数据 { 查询表A验证是否存在（1次DB） 查询表B验证是否重复（1次DB） 插入记录（1次DB） } → 返回结果

N 条数据 = 3N 次数据库交互

耗时分布（5000条数据）

操作	单次耗时	总次数	累计耗时
JDBC 连接获取/释放	~1ms	15000	~15s
SQL 执行	~3-5ms	15000	~45-75s
网络传输开销	~0.5ms	15000	~7.5s
合计	~60-90s

注：

博客：

https://blog.csdn.net/badao_liumang_qizhi

三、优化方案对比

方案 A：逐条查询 + 逐条保存（原始方式）

for(ExcelRowrow:dataList){Entityentity=repository.findByCode(row.getCode());// N次if(exists(entity)){fail;continue;}Existingrecord=recordRepo.findByCode(row.getCode());// N次if(exists(record)){fail;continue;}recordRepo.save(newRecord);// N次}

优点：逻辑简单
缺点：DB 交互 3N 次，5000 条约 60-90s

方案 B：批量预查询 + 内存校验 + 批量保存（推荐）

// 1. 收集所有待校验的 codeList<String>allCodes=extractCodes(dataList);// 2. 批量查询，转 Map/Set（2次DB）Map<String,Entity>entityMap=batchQuery(allCodes);Set<String>existingCodes=batchQueryExisting(allCodes);// 3. 内存中循环校验（0次DB）for(ExcelRowrow:dataList){Entityentity=entityMap.get(row.getCode());// O(1)if(existingCodes.contains(row.getCode())){fail;}// O(1)successList.add(buildRecord(row));}// 4. 批量保存（1次DB）repository.saveAll(successList);

优点：DB 交互仅 3 次，5000 条约 5-10s
缺点：内存占用略高

方案 C：异步 MQ 处理（大数据量场景）

接口层：解析Excel → 发MQ消息 → 立即返回taskId 消费端：校验 + 保存 + 更新导入状态 前端：轮询 taskId 获取进度

优点：不受 HTTP 超时限制，支持数万条
缺点：架构复杂，需 MQ + 状态管理 + 前端轮询

方案对比总结

维度	方案A（逐条）	方案B（批量预查询）	方案C（异步MQ）
DB 交互次数	3N	3	3（异步）
5000条耗时	60-90s	5-10s	接口<1s
实现复杂度	低	中	高
内存占用	低	中	中
适用场景	<100条	100~10000条	>10000条

四、关键技术点

4.1 批量 IN 查询的注意事项

MySQL 的IN子句有长度限制（通常建议不超过 1000 个参数）。当数据量超出时需分批查询：

// 分批查询，每批1000条private<T>List<T>batchQuery(List<String>codes,Function<List<String>,List<T>>queryFn){List<T>result=newArrayList<>();intbatchSize=1000;for(inti=0;i<codes.size();i+=batchSize){List<String>batch=codes.subList(i,Math.min(i+batchSize,codes.size()));result.addAll(queryFn.apply(batch));}returnresult;}

4.2 Map 构建时的去重策略

一个 key 可能对应多条记录时，需指定冲突合并策略：

// (m1, m2) -> m1 表示取第一条Map<String,Entity>map=list.stream().collect(Collectors.toMap(Entity::getCode,e->e,(m1,m2)->m1));

4.3 saveAll 的批量 flush 配置

JPA 默认saveAll仍是逐条persist，需配合 Hibernate 批量配置才能真正发挥批量 INSERT 性能：

spring:jpa:properties:hibernate:jdbc:batch_size:500order_inserts:trueorder_updates:true

4.4 批量保存的容错降级

批量保存可能因某条数据违反约束而整体失败，需降级为逐条保存以精确定位：

try{repository.saveAll(successList);// 尝试批量}catch(Exceptione){// 降级为逐条，精确捕获失败记录for(Recordrecord:successList){try{repository.save(record);}catch(Exceptionex){failList.add(record.getCode(),ex.getMessage());}}}

4.5 导入条数限制

防止内存溢出和接口超时，应在入口处限制单次导入量：

if(dataList.size()>MAX_IMPORT_SIZE){thrownewBusinessException("单次导入不能超过"+MAX_IMPORT_SIZE+"条");}

五、完整示例代码

5.1 Controller 层

@Slf4j@RestController@RequestMapping("/api/employee")publicclassEmployeeImportController{privatestaticfinalintMAX_IMPORT_SIZE=5000;@ResourceprivateEmployeeImportServiceemployeeImportService;@PostMapping("/import")publicRestResult<ImportResultDto>importEmployee(@RequestParam("file")MultipartFilefile){// 1. 文件校验if(file==null||file.isEmpty()){thrownewBizException("导入文件不能为空");}// 2. 解析ExcelList<EmployeeExcelDto>dataList=ExcelUtil.parse(file,EmployeeExcelDto.class);if(dataList==null||dataList.isEmpty()){thrownewBizException("导入数据为空，请重新导入");}// 3. 条数限制if(dataList.size()>MAX_IMPORT_SIZE){thrownewBizException("单次导入不能超过"+MAX_IMPORT_SIZE+"条，当前数据量："+dataList.size()+"条");}// 4. 执行导入ImportResultDtoresult=employeeImportService.batchImport(dataList);returnRestResult.success(result);}}

5.2 Service 层（核心优化逻辑）

@Slf4j@ServicepublicclassEmployeeImportServiceImplimplementsEmployeeImportService{@ResourceprivateDepartmentRepositorydepartmentRepository;@ResourceprivateEmployeeRepositoryemployeeRepository;@ResourceprivateAliOssTemplatealiOssTemplate;@OverridepublicImportResultDtobatchImport(List<EmployeeExcelDto>dataList){ImportResultDtoresultDto=newImportResultDto();List<String[]>failList=newArrayList<>();List<Employee>successList=newArrayList<>();// ========== 第一步：提取所有待校验的编码 ==========List<String>allDeptCodes=dataList.stream().map(EmployeeExcelDto::getDeptCode).filter(Objects::nonNull).map(String::trim).distinct().collect(Collectors.toList());List<String>allEmpNos=dataList.stream().map(EmployeeExcelDto::getEmpNo).filter(Objects::nonNull).map(String::trim).distinct().collect(Collectors.toList());// ========== 第二步：批量预查询（仅2次DB） ==========// 查询所有部门，转为 Map<deptCode, Department>Map<String,Department>deptMap=newHashMap<>();if(!allDeptCodes.isEmpty()){List<Department>deptList=departmentRepository.findByCodeIn(allDeptCodes);if(deptList!=null){deptMap=deptList.stream().collect(Collectors.toMap(Department::getCode,d->d,(d1,d2)->d1));}}// 查询已存在的员工工号Set<String>existingEmpNos=newHashSet<>();if(!allEmpNos.isEmpty()){List<Employee>existingList=employeeRepository.findByEmpNoIn(allEmpNos);if(existingList!=null){existingEmpNos=existingList.stream().map(Employee::getEmpNo).collect(Collectors.toSet());}}// ========== 第三步：内存中逐条校验（0次DB） ==========Set<String>batchDuplicate=newHashSet<>();// 批次内去重for(EmployeeExcelDtodto:dataList){StringempNo=dto.getEmpNo();StringdeptCode=dto.getDeptCode();// 校验工号不能为空if(empNo==null||empNo.trim().isEmpty()){failList.add(newString[]{empNo,"员工工号不能为空"});continue;}StringtrimmedNo=empNo.trim();// 校验批次内重复if(batchDuplicate.contains(trimmedNo)){failList.add(newString[]{empNo,"工号在导入文件中重复"});continue;}// 校验部门是否存在（内存Map查找 O(1)）Departmentdept=deptMap.get(deptCode!=null?deptCode.trim():"");if(dept==null){failList.add(newString[]{empNo,"部门编码不存在"});continue;}// 校验工号是否已存在（内存Set查找 O(1)）if(existingEmpNos.contains(trimmedNo)){failList.add(newString[]{empNo,"员工工号已存在"});continue;}// 校验通过，构建实体Employeeemployee=newEmployee();employee.setEmpNo(trimmedNo);employee.setName(dto.getName());employee.setDeptId(dept.getId());employee.setCreateTime(newDate());successList.add(employee);batchDuplicate.add(trimmedNo);}// ========== 第四步：批量保存（1次DB） ==========intsuccessCount=0;if(!successList.isEmpty()){try{employeeRepository.saveAll(successList);successCount=successList.size();}catch(Exceptione){log.error("批量保存失败，降级为逐条保存",e);for(Employeeemp:successList){try{employeeRepository.save(emp);successCount++;}catch(Exceptionex){failList.add(newString[]{emp.getEmpNo(),"保存失败："+ex.getMessage()});}}}}// ========== 第五步：生成失败报告 ==========resultDto.setSuccessCount(successCount);resultDto.setFailCount(failList.size());if(!failList.isEmpty()){resultDto.setFailFileUrl(generateFailExcel(failList));}returnresultDto;}/** * 生成失败数据Excel并上传OSS. */privateStringgenerateFailExcel(List<String[]>failList){try(XSSFWorkbookworkbook=newXSSFWorkbook()){Sheetsheet=workbook.createSheet("导入失败数据");// 表头样式CellStyleheaderStyle=createHeaderStyle(workbook);RowheaderRow=sheet.createRow(0);createCell(headerRow,0,"员工工号",headerStyle);createCell(headerRow,1,"失败原因",headerStyle);// 失败原因红色字体样式CellStylefailStyle=createFailStyle(workbook);// 填充数据for(inti=0;i<failList.size();i++){Rowrow=sheet.createRow(i+1);row.createCell(0).setCellValue(failList.get(i)[0]!=null?failList.get(i)[0]:"");Cellcell=row.createCell(1);cell.setCellValue(failList.get(i)[1]!=null?failList.get(i)[1]:"");cell.setCellStyle(failStyle);}sheet.setColumnWidth(0,20*256);sheet.setColumnWidth(1,35*256);// 写临时文件 → 上传OSS → 删除临时文件FiletempFile=File.createTempFile("import-fail-",".xlsx");try(FileOutputStreamfos=newFileOutputStream(tempFile)){workbook.write(fos);}Stringurl=aliOssTemplate.uploadFile(tempFile);tempFile.delete();returnurl;}catch(Exceptione){log.error("生成失败报告异常",e);return"";}}privateCellStylecreateHeaderStyle(XSSFWorkbookworkbook){Fontfont=workbook.createFont();font.setBold(true);font.setColor(IndexedColors.WHITE.getIndex());CellStylestyle=workbook.createCellStyle();style.setFont(font);style.setFillForegroundColor(IndexedColors.ROYAL_BLUE.getIndex());style.setFillPattern(FillPatternType.SOLID_FOREGROUND);style.setAlignment(HorizontalAlignment.CENTER);returnstyle;}privateCellStylecreateFailStyle(XSSFWorkbookworkbook){Fontfont=workbook.createFont();font.setColor(IndexedColors.RED.getIndex());CellStylestyle=workbook.createCellStyle();style.setFont(font);returnstyle;}privatevoidcreateCell(Rowrow,intcol,Stringvalue,CellStylestyle){Cellcell=row.createCell(col);cell.setCellValue(value);cell.setCellStyle(style);}}

5.3 DTO 定义

@DatapublicclassImportResultDto{/** 成功数量. */privateintsuccessCount;/** 失败数量. */privateintfailCount;/** 失败文件下载URL. */privateStringfailFileUrl;}@DatapublicclassEmployeeExcelDto{/** 员工工号. */privateStringempNo;/** 员工姓名. */privateStringname;/** 部门编码. */privateStringdeptCode;}