当前位置: 首页 > news >正文

CANN/AMCT大模型量化示例

AMCT Large Model Quantization

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

1 Quantization Prerequisites

1.1 Install Dependencies

The dependency packages for this sample can be found in requirements.txt

Note that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed

1.2 Model and Dataset Preparation

This sample uses Llama2-7b, qwen2-7b, and qwen3-8b models with wikitext2 dataset as examples. Please download the models yourself and pass the model path to the script. The dataset is loaded online.

1.3 Simple Quantization Configuration

The quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:

from amct_pytorch import HIFP8_CAST_CFG

If you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.

The cast algorithm supports weight-only quantization and full quantization. The supported quantization types and quantization configurations are:

FieldTypeDescriptionValue RangeNotes
batch_numuint32Number of batches used for quantization1/
skip_layersstrLayers to skip quantization/Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or letters
weights.typestrQuantized weight type'hifloat8'/
weights.symmetricboolSymmetric quantizationTRUE/
weights.strategystrQuantization granularity'tensor'/'channel'/
inputs.typestrQuantized activation type'hifloat8'/
inputs.symmetricboolSymmetric quantizationTRUE/
inputs.strategystrQuantization granularity'tensor'/'token'/
algorithmdictQuantization algorithm configuration used{'cast'}/

2 Quantization Example

2.1 Use Interface Method to Call

step 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model path in the sample program according to actual conditions:

python3 src/run_llama2_samples.py --model_path=/data/Llama2_7b_hf/
python3 src/run_qwen_samples.py --model_path=/data/Qwen2-7b/
python3 src/run_qwen_samples.py --model_path=/data/Qwen3-8B/

If the following information appears, it indicates that quantization is successful:

Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707

Where Score is the quantized model PPL. For specific values, refer to the following table:

ModelCalibration SetDatasetPre-quantization PPLPost-quantization PPL
LLAMA2-7Bpilevalwikitext25.4725.524
QWEN2-7Bpilevalwikitext27.1377.188
QWEN3-8Bpilevalwikitext29.7159.745

After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.zskr.cn/news/1471307.html

相关文章:

  • 2026重庆拍照出片餐酒吧排行:重庆夜景吃饭打卡点/重庆夜景酒吧/重庆夜景餐酒吧/重庆宝藏餐酒吧/全景视野优先 - 优质品牌商家
  • 3步搭建你的AI智能交易系统:TradingAgents-CN中文版全攻略
  • 速腾RS-Lidar-16 + 超核CH110 IMU:手把手教你搞定LIO-SAM数据适配与标定(Ubuntu 18.04 ROS Melodic)
  • 从config.json到实战:深入理解distilbert_finetuned_yahoo_answers_topics-openmind配置文件
  • 072、姿态控制:偏航通道设计
  • 2026宣城疑难税务处理技术要点与靠谱服务解析 - 优质品牌商家
  • 别再用颜色识别了!用OpenMV 4 Plus + Edge Impulse,5分钟搞定一个垃圾分类小助手
  • 从std::mutex到std::recursive_mutex:你的C++多线程设计可能需要一次重构
  • SQL多维聚合实战:ROLLUP、CUBE与GROUPING SETS深度解析
  • BERT-Autocorrector模型配置详解:24层BERT架构参数解析
  • 解决Dify工作流图像渲染挑战:Artifact扩展与动态内容生成技术深度解析
  • 百度网盘批量转存终极教程:三步告别手动操作,实现资源自动化管理
  • Veo 2时长限制倒计时警报(仅剩2个Beta通道未封禁):资深AIGC工程师紧急整理的48小时合规迁移清单
  • 3步搭建AI投资顾问:零代码体验多智能体股票分析系统
  • 073、姿态控制:解耦与耦合分析
  • HC32F460 GPIO配置全流程详解:从解锁寄存器到设置240MHz主频下的等待周期
  • 手写生产级球形百分比图表:SVG+CSS变量实现高质感数据可视化
  • 终极指南:如何将Umi-OCR无缝集成到自动化工作流中,实现一键文字识别
  • 品味潮汕:正宗鸭屎香、汕头凤凰单枞、汕头特产三兄弟猪肉脯、汕头特产老药桔、汕头特产肉脯、汕头特产茶叶、汕头茶叶伴手礼选择指南 - 优质品牌商家
  • Mermaid Live Editor实战指南:用代码思维重塑图表创作效率
  • 大模型内容安全机制原理与企业级防护实践
  • ExifToolGUI:告别命令行,用图形化界面轻松管理照片元数据的终极指南
  • PyTorch工程实战:数据加载、模型训练与部署的12个关键决策点
  • 如何用TrafficMonitor插件打造终极Windows桌面监控中心:完整指南
  • 如何高效使用HsMod:炉石传说完整自定义体验终极指南
  • AI代理安全治理:从身份管控到决策可观测的七项实操底线
  • 2026年评价高的车间粉尘报警器/壁挂式粉尘报警器/台式粉尘报警器厂家推荐与选型指南 - 行业平台推荐
  • 2026年主流平面MOS实测评测:低压MOS/平面MOS/替代料MOS/沟槽MOS/现货MOS/超结MOS/高压MOS/选择指南 - 优质品牌商家
  • 从字节流到可读数据:C语言中串口数据解析的完整流程(含代码片段)
  • 如何零成本搭建专业级A股智能分析系统:3步实现机构级投资决策