当前位置: 首页 > news >正文

CANN混元视频配置说明

YAML Parameter Description

【免费下载链接】cann-recipes-infer本项目针对LLM与多模态模型推理业务中的典型模型、加速算法,提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer

Hunyuan-Video inference parameters are maintained inconfig/*.yaml. Select a config by settingYAML_FILE_NAMEininfer.sh.

Default configs:

  • Single-card baseline:single.yaml
  • 8-card sequence parallel:sp8.yaml
  • Single-card FP8:single_fp8.yaml
  • Single-card sparse attention:single_sparse.yaml
  • 8-card sparse attention:sp8_sparse.yaml
model_args: model-base: "ckpts" # Weight root directory. Relative paths are resolved from models/hunyuan-video/. prompt: "A cat walks ..." # Text prompt for video generation. video-size: [720, 1280] # Output size in [H, W]. video-length: 129 # Output frame count. Constraint: 4n+1. infer-steps: 50 # Number of denoising steps. seed: 42 # Random seed. embedded-cfg-scale: 6.0 # Embedded CFG guidance scale. flow-shift: 7.0 # FlowMatch timestep shift. flow-reverse: true # Whether to use reverse flow scheduling. Options: [false, true]. use-cpu-offload: true # Whether to enable CPU offload. Options: [false, true]. extract_q_k_data: false # Whether to extract QK data for sparse attention offline profiling. Options: [false, true]. extract_path: "path/to/qk_dir" # Output directory for extracted QK data. Required when extract_q_k_data is true. ulysses-degree: 8 # Ulysses sequence parallel degree. Multi-card configs use this field. ring-degree: 1 # Ring attention degree. Sparse configs currently require 1. use-vae-parallel: true # Whether to enable VAE parallelism. Options: [false, true]. fa-perblock-fp8: true # Whether to enable FP8 FA activation quantization. Options: [false, true]. mm-mxfp8: true # Whether to enable MXFP8 matmul quantization. Options: [false, true]. dit-weight: "/abs/path/ckpt.pt" # Optional DiT checkpoint path. model: "HYVideo-T/2-cfgdistill" # DiT architecture. Options: ["HYVideo-T/2", "HYVideo-T/2-cfgdistill"]. model-resolution: "720p" # Model resolution preset. Options: ["540p", "720p"]. precision: "bf16" # DiT precision. Options: ["fp32", "fp16", "bf16"]. seed-type: "auto" # Seed source. Options: ["file", "random", "fixed", "auto"]. model_name: "hunyuan-video" # Model name. Options: ["hunyuan-video"]. world_size: 1 # Number of launched processes. Multi-card configs require world_size = ulysses-degree * ring-degree. master_port: 29600 # torchrun master port. entry_script: "sample_video.py" # Entry script. Options: ["sample_video.py"]. dit_cache: method: "NoCache" # DiT cache method. Options: ["NoCache", "FBCache", "TeaCache", "TaylorSeer"]. params: # FBCache / TeaCache rel_l1_thresh: 0.05 # Relative L1 threshold. Larger values are faster but may reduce quality. # TeaCache coefficients: [] # TeaCache polynomial coefficients. warmup: 2 # Number of initial full-compute steps. # TaylorSeer n_derivatives: 3 # Taylor expansion order. skip_interval_steps: 4 # Full-compute interval. cutoff_steps: 1 # Number of final full-compute steps. offload: true # Whether to offload TaylorSeer history states to CPU. Options: [false, true]. sparse: method: "SVG" # Sparse attention method. Options: ["no_sparse", "TopK", "SVG"]. block_size_Q: 128 # Q-axis block size. block_size_K: 512 # K-axis block size. model: "HunyuanVideo" # Sparse module model type. Options: ["HunyuanVideo"]. params: TopK: sparse_time_step: "10-49" # Active denoising step range. Format: "start-end". sparsity_files_path: "./sparsity/720x1280x129/v3" # Offline profiling sparsity file directory. CAC_threshold: 0.66 # TopK threshold. SVG: sparse_time_step: "14-49" # Active denoising step range. Format: "start-end". sparsity: 0.8 # SVG sparsity ratio. sample_mse_max_row: 5000 # Maximum sampled rows for MSE. context_length: 256 # SVG context length.

Notes:

  • Sparse attention and DiT cache are mutually exclusive. Keepdit_cache.method: "NoCache"in sparse configs.
  • TopKrequires sparsity files that matchvideo-sizeandvideo-length.
  • extract_q_k_datais used to generate QK data for sparse attention offline profiling. Setextract_pathto a writable directory when enabling it.
  • TaylorSeer may require high host memory at large resolutions and long frame counts.

【免费下载链接】cann-recipes-infer本项目针对LLM与多模态模型推理业务中的典型模型、加速算法,提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.zskr.cn/news/1325831.html

相关文章:

  • CANNBot torch-compile 快速入门
  • 企业如何利用Taotoken为内部工具提供统一AI能力网关
  • Keil C51评估版SRC指令限制解析与解决方案
  • 量子能量传输(QET)协议原理与实现技术详解
  • cann/asc-devkit SetGradOutput接口
  • CTF中的音频隐写术实战:从‘兔耳’和‘调频收音机’两道Misc题,学会用Python脚本提取隐藏信息
  • 穿透算法黑箱:2026论文降AI率工具深度测评,早标网语义保真度99%
  • 昇腾NPU算子开发进阶:深入理解ops-tensor中的解决方案注册机制 [特殊字符]
  • 从CVE-2017-11882到CVE-2018-0802:一个Office漏洞的“补丁绕过”实战复现与调试分析
  • 别再被假密码骗了!手把手教你用010 Editor识别并破解ZIP/RAR伪加密压缩包
  • USB/IP Windows:打破物理限制的USB设备网络共享终极方案
  • CANN/asc-devkit MrgSort合并排序函数
  • VS Code 轻量自动化实战:Trae 集成 3 步配置与 5 个高频任务模板
  • Configor 自动重载功能深度解析:实现配置热更新的终极指南
  • CANN Bench UnsortedSegmentSum 算子评测
  • CANN/cann-bench稀疏注意力算子
  • cann/asc-devkit: EXTERN_IMPL_BUFPOOL宏详解
  • CANN/asc-devkit TSCM队列入队函数
  • CANN/asc-devkit InitStartBufHandle函数说明
  • CANN/asc-devkit TSCM分配张量
  • HTC6232:内置 QC 快充 + 电池均衡,2A 双节 / 三节锂电升压充电器
  • libvncserver实战:给你的嵌入式Linux设备(如树莓派)添加远程桌面控制功能
  • 选排放后处理公司看这里,2026 年 5 月推荐更新,发电机尾气氮氧化物治理/定制异形消声器,排放后处理厂家哪个好 - 品牌推荐师
  • 告别机械音!用‘小蜗语音工具1.9’制作有声小说和视频字幕的保姆级教程
  • CANN/AscendC卷积反向SetWeight接口
  • CANN/asc-devkit Tan接口临时空间大小获取
  • CANN/asc-devkit Round接口文档
  • 通过 Python 快速开始你的第一个 Taotoken 多模型调用示例
  • 2026最新亲测!3款文本转语音工具真香神器,免费无套路好用到哭!
  • 避坑指南:用YOLOv5处理VisDrone数据集时,你可能会遇到的5个问题及解决方法