Llama 4 Scout 17B 16E Instruct
Model Overview
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
- Model Architecture: The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality. Llama 4 Scout is a 17 billion parameter model with 16 experts.
- Model Release Date: April 5, 2025
- Repository: llama-models/models/llama4
- Model Source: meta-llama/Llama-4-Scout-17B-16E-Instruct
- License: llama4
- Supported languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese.
Multi Model QPC Configuration # 1
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
CCL_Enabled |
QPC URL |
QPC Size |
QPC Download |
Onnx URL |
Onnx Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
128 |
8192 |
False |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Encoder_128pl__vs2488_cl8192_bs1_c16_ts4_sdk1_21_4.tar.gz |
9.9GB |
Download |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz |
Download |
23-Apr-2026 |
| MXFP6 |
4 |
16 |
1 |
2448 |
8192 |
False |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Decoder_128pl__vs2488_cl8192_bs1_c16_ts4_sdk1_21_4.tar.gz |
94 GB |
Download |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz |
Download |
23-Apr-2026 |
Multi Model QPC Configuration # 2
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
CCL_Enabled |
QPC URL |
QPC Size |
QPC Download |
Onnx URL |
Onnx Download |
Generation Date |
| MXFP6 |
8 |
16 |
1 |
128 |
8192 |
False |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Encoder_128pl__vs2488_cl8192_bs1_c16_ts8_sdk1_21_4.tar.gz |
37GB |
Download |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz |
Download |
23-Apr-2026 |
| MXFP6 |
8 |
16 |
1 |
2448 |
8192 |
False |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Decoder_128pl__vs2488_cl8192_bs1_c16_ts8_sdk1_21_4.tar.gz |
110GB |
Download |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz |
Download |
23-Apr-2026 |
Multi Model QPC Configuration # 3
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
CCL_Enabled |
QPC URL |
QPC Size |
QPC Download |
Onnx URL |
Onnx Download |
Generation Date |
| MXFP6 |
4 |
8 |
1 |
128 |
8192 |
False |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Encoder_128pl__vs2488_cl8192_bs1_c8_ts4_sdk1_21_4.tar.gz |
11GB |
Download |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz |
Download |
24-Apr-2026 |
| MXFP6 |
4 |
8 |
1 |
2448 |
8192 |
False |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Decoder_128pl__vs2488_cl8192_bs1_c8_ts4_sdk1_21_4.tar.gz |
94GB |
Download |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz |
Download |
24-Apr-2026 |
Multi Model QPC Configuration # 4
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
CCL_Enabled |
QPC URL |
QPC Size |
QPC Download |
Onnx URL |
Onnx Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
128 |
65536 |
False |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_encoder_128pl__vs2488_cl65536_bs1_c16_ts4_sdk1_21_4.tar.gz |
9.9GB |
Download |
https://dc00tk1pxen80.cloudfront.net/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Encoder_ONNX.tar.gz |
Download |
4-May-2026 |
| MXFP6 |
4 |
16 |
1 |
2448 |
65536 |
False |
http://qualcom-qpc-models.s3-website-us-east-1.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/llama4-scout_Decoder_128pl__vs2488_cl65536_bs1_c16_ts4_sdk1_21_4.tar.gz |
95GB |
Download |
https://qualcom-qpc-models.s3-accelerate.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama-4-Scout-17B-16E-Instruct_Decoder_ONNX.tar.gz |
Download |
4-May-2026 |
Text only Single QPC Configuration # 5
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
CCL_Enabled |
QPC URL |
QPC Size |
QPC Download |
Onnx URL |
Onnx Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
128 |
65536 |
False |
http://qualcom-qpc-models.s3-website-us-east-1.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama4_scout_16B_Instruct_text_only_pl128_cl65536_bs1_c16_ts4_sdk1_21_4.tar.gz |
95GB |
Download |
http://qualcom-qpc-models.s3-website-us-east-1.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama4_scout_16B_Instruct_single_textonly_ONNX.tar.gz |
Download |
13-May-2026 |
Text only Single QPC Configuration # 6
| Precision |
SoCs / Tensor slicing |
NSP-Cores (per SoC) |
Batch Size |
Chunking Prompt Length |
Context Length (CL) |
CCL_Enabled |
QPC URL |
QPC Size |
QPC Download |
Onnx URL |
Onnx Download |
Generation Date |
| MXFP6 |
4 |
16 |
1 |
128 |
65536 |
True |
http://qualcom-qpc-models.s3-website-us-east-1.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama4_scout_16B_Instruct_text_only_pl128_cl65536_[2048,4096,8192,12288,16384,24576,32768,65536]ccl_bs1_c16_ts4_sdk1_21_4.tar.gz |
100GB |
Download |
http://qualcom-qpc-models.s3-website-us-east-1.amazonaws.com/SDK1.21.4.0/meta-llama/Llama-4-Scout-17B-16E-Instruct/Llama4_scout_16B_Instruct_text_only_ccl_ONNX.tar.gz |
Download |
18-May-2026 |