Example of converting and using a Huggingface Whisper model with whisper.cpp
For a student project
• 65 min read
!git clone https://github.com/ggerganov/whisper.cpp.git
Cloning into 'whisper.cpp'... remote: Enumerating objects: 7336, done. remote: Total 7336 (delta 0), reused 0 (delta 0), pack-reused 7336 Receiving objects: 100% (7336/7336), 10.87 MiB | 20.88 MiB/s, done. Resolving deltas: 100% (4753/4753), done.
%cd whisper.cpp
!make -j 8
/content/whisper.cpp I whisper.cpp build info: I UNAME_S: Linux I UNAME_P: x86_64 I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 I LDFLAGS: I CC: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 I CXX: g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -c ggml.c -o ggml.o cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -c ggml-alloc.c -o ggml-alloc.o cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -c ggml-backend.c -o ggml-backend.o cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -c ggml-quants.c -o ggml-quants.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -c whisper.cpp -o whisper.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 examples/bench/bench.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o bench g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o quantize g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 examples/server/server.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o server ./main -h usage: ./main [options] file0.wav file1.wav ... options: -h, --help [default] show this help message and exit -t N, --threads N [2 ] number of threads to use during computation -p N, --processors N [1 ] number of processors to use during computation -ot N, --offset-t N [0 ] time offset in milliseconds -on N, --offset-n N [0 ] segment index offset -d N, --duration N [0 ] duration of audio to process in milliseconds -mc N, --max-context N [-1 ] maximum number of text context tokens to store -ml N, --max-len N [0 ] maximum segment length in characters -sow, --split-on-word [false ] split on word rather than on token -bo N, --best-of N [5 ] number of best candidates to keep -bs N, --beam-size N [5 ] beam size for beam search -ac N, --audio-ctx N [0 ] audio context size (0 - all) -wt N, --word-thold N [0.01 ] word timestamp probability threshold -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail -debug, --debug-mode [false ] enable debug mode (eg. dump log_mel) -tr, --translate [false ] translate from source language to english -di, --diarize [false ] stereo audio diarization -tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model) -nf, --no-fallback [false ] do not use temperature fallback while decoding -otxt, --output-txt [false ] output result in a text file -ovtt, --output-vtt [false ] output result in a vtt file -osrt, --output-srt [false ] output result in a srt file -olrc, --output-lrc [false ] output result in a lrc file -owts, --output-words [false ] output script for generating karaoke video -fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video -ocsv, --output-csv [false ] output result in a CSV file -oj, --output-json [false ] output result in a JSON file -ojf, --output-json-full [false ] include more information in the JSON file -of FNAME, --output-file FNAME [ ] output file path (without file extension) -np, --no-prints [false ] do not print anything other than the results -ps, --print-special [false ] print special tokens -pc, --print-colors [false ] print colors -pp, --print-progress [false ] print progress -nt, --no-timestamps [false ] do not print timestamps -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect) -dl, --detect-language [false ] exit after automatically detecting language --prompt PROMPT [ ] initial prompt -m FNAME, --model FNAME [models/ggml-base.en.bin] model path -f FNAME, --file FNAME [ ] input WAV file path -oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference -ls, --log-score [false ] log best decoder scores of tokens -ng, --no-gpu [false ] disable GPU
!git lfs install
Updated git hooks. Git LFS initialized.
%cd /content
!git clone https://huggingface.co/NbAiLab/whisper-large-sme
/content Cloning into 'whisper-large-sme'... remote: Enumerating objects: 444, done. remote: Total 444 (delta 0), reused 0 (delta 0), pack-reused 444 Receiving objects: 100% (444/444), 642.41 KiB | 9.73 MiB/s, done. Resolving deltas: 100% (146/146), done.
%cd /content
!git clone https://github.com/openai/whisper
/content Cloning into 'whisper'... remote: Enumerating objects: 712, done. remote: Counting objects: 100% (10/10), done. remote: Compressing objects: 100% (10/10), done. remote: Total 712 (delta 1), reused 3 (delta 0), pack-reused 702 Receiving objects: 100% (712/712), 12.43 MiB | 20.53 MiB/s, done. Resolving deltas: 100% (419/419), done.
!mkdir whisper-ggml-sme
!python whisper.cpp/models/convert-h5-to-ggml.py ./whisper-large-sme/ ./whisper ./whisper-ggml-sme/
mkdir: cannot create directory ‘whisper-ggml-sme’: File exists /usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() model.encoder.conv1.weight -> encoder.conv1.weight encoder.conv1.weight 3 (1280, 80, 3) model.encoder.conv1.bias -> encoder.conv1.bias Reshaped variable: encoder.conv1.bias to shape: (1280, 1) encoder.conv1.bias 2 (1280, 1) Converting to float32 model.encoder.conv2.weight -> encoder.conv2.weight encoder.conv2.weight 3 (1280, 1280, 3) model.encoder.conv2.bias -> encoder.conv2.bias Reshaped variable: encoder.conv2.bias to shape: (1280, 1) encoder.conv2.bias 2 (1280, 1) Converting to float32 model.encoder.embed_positions.weight -> encoder.positional_embedding encoder.positional_embedding 2 (1500, 1280) Converting to float32 model.encoder.layers.0.self_attn.k_proj.weight -> encoder.blocks.0.attn.key.weight encoder.blocks.0.attn.key.weight 2 (1280, 1280) model.encoder.layers.0.self_attn.v_proj.weight -> encoder.blocks.0.attn.value.weight encoder.blocks.0.attn.value.weight 2 (1280, 1280) model.encoder.layers.0.self_attn.v_proj.bias -> encoder.blocks.0.attn.value.bias encoder.blocks.0.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.0.self_attn.q_proj.weight -> encoder.blocks.0.attn.query.weight encoder.blocks.0.attn.query.weight 2 (1280, 1280) model.encoder.layers.0.self_attn.q_proj.bias -> encoder.blocks.0.attn.query.bias encoder.blocks.0.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.0.self_attn.out_proj.weight -> encoder.blocks.0.attn.out.weight encoder.blocks.0.attn.out.weight 2 (1280, 1280) model.encoder.layers.0.self_attn.out_proj.bias -> encoder.blocks.0.attn.out.bias encoder.blocks.0.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.0.self_attn_layer_norm.weight -> encoder.blocks.0.attn_ln.weight encoder.blocks.0.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.0.self_attn_layer_norm.bias -> encoder.blocks.0.attn_ln.bias encoder.blocks.0.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.0.fc1.weight -> encoder.blocks.0.mlp.0.weight encoder.blocks.0.mlp.0.weight 2 (5120, 1280) model.encoder.layers.0.fc1.bias -> encoder.blocks.0.mlp.0.bias encoder.blocks.0.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.0.fc2.weight -> encoder.blocks.0.mlp.2.weight encoder.blocks.0.mlp.2.weight 2 (1280, 5120) model.encoder.layers.0.fc2.bias -> encoder.blocks.0.mlp.2.bias encoder.blocks.0.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.0.final_layer_norm.weight -> encoder.blocks.0.mlp_ln.weight encoder.blocks.0.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.0.final_layer_norm.bias -> encoder.blocks.0.mlp_ln.bias encoder.blocks.0.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.1.self_attn.k_proj.weight -> encoder.blocks.1.attn.key.weight encoder.blocks.1.attn.key.weight 2 (1280, 1280) model.encoder.layers.1.self_attn.v_proj.weight -> encoder.blocks.1.attn.value.weight encoder.blocks.1.attn.value.weight 2 (1280, 1280) model.encoder.layers.1.self_attn.v_proj.bias -> encoder.blocks.1.attn.value.bias encoder.blocks.1.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.1.self_attn.q_proj.weight -> encoder.blocks.1.attn.query.weight encoder.blocks.1.attn.query.weight 2 (1280, 1280) model.encoder.layers.1.self_attn.q_proj.bias -> encoder.blocks.1.attn.query.bias encoder.blocks.1.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.1.self_attn.out_proj.weight -> encoder.blocks.1.attn.out.weight encoder.blocks.1.attn.out.weight 2 (1280, 1280) model.encoder.layers.1.self_attn.out_proj.bias -> encoder.blocks.1.attn.out.bias encoder.blocks.1.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.1.self_attn_layer_norm.weight -> encoder.blocks.1.attn_ln.weight encoder.blocks.1.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.1.self_attn_layer_norm.bias -> encoder.blocks.1.attn_ln.bias encoder.blocks.1.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.1.fc1.weight -> encoder.blocks.1.mlp.0.weight encoder.blocks.1.mlp.0.weight 2 (5120, 1280) model.encoder.layers.1.fc1.bias -> encoder.blocks.1.mlp.0.bias encoder.blocks.1.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.1.fc2.weight -> encoder.blocks.1.mlp.2.weight encoder.blocks.1.mlp.2.weight 2 (1280, 5120) model.encoder.layers.1.fc2.bias -> encoder.blocks.1.mlp.2.bias encoder.blocks.1.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.1.final_layer_norm.weight -> encoder.blocks.1.mlp_ln.weight encoder.blocks.1.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.1.final_layer_norm.bias -> encoder.blocks.1.mlp_ln.bias encoder.blocks.1.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.2.self_attn.k_proj.weight -> encoder.blocks.2.attn.key.weight encoder.blocks.2.attn.key.weight 2 (1280, 1280) model.encoder.layers.2.self_attn.v_proj.weight -> encoder.blocks.2.attn.value.weight encoder.blocks.2.attn.value.weight 2 (1280, 1280) model.encoder.layers.2.self_attn.v_proj.bias -> encoder.blocks.2.attn.value.bias encoder.blocks.2.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.2.self_attn.q_proj.weight -> encoder.blocks.2.attn.query.weight encoder.blocks.2.attn.query.weight 2 (1280, 1280) model.encoder.layers.2.self_attn.q_proj.bias -> encoder.blocks.2.attn.query.bias encoder.blocks.2.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.2.self_attn.out_proj.weight -> encoder.blocks.2.attn.out.weight encoder.blocks.2.attn.out.weight 2 (1280, 1280) model.encoder.layers.2.self_attn.out_proj.bias -> encoder.blocks.2.attn.out.bias encoder.blocks.2.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.2.self_attn_layer_norm.weight -> encoder.blocks.2.attn_ln.weight encoder.blocks.2.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.2.self_attn_layer_norm.bias -> encoder.blocks.2.attn_ln.bias encoder.blocks.2.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.2.fc1.weight -> encoder.blocks.2.mlp.0.weight encoder.blocks.2.mlp.0.weight 2 (5120, 1280) model.encoder.layers.2.fc1.bias -> encoder.blocks.2.mlp.0.bias encoder.blocks.2.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.2.fc2.weight -> encoder.blocks.2.mlp.2.weight encoder.blocks.2.mlp.2.weight 2 (1280, 5120) model.encoder.layers.2.fc2.bias -> encoder.blocks.2.mlp.2.bias encoder.blocks.2.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.2.final_layer_norm.weight -> encoder.blocks.2.mlp_ln.weight encoder.blocks.2.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.2.final_layer_norm.bias -> encoder.blocks.2.mlp_ln.bias encoder.blocks.2.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.3.self_attn.k_proj.weight -> encoder.blocks.3.attn.key.weight encoder.blocks.3.attn.key.weight 2 (1280, 1280) model.encoder.layers.3.self_attn.v_proj.weight -> encoder.blocks.3.attn.value.weight encoder.blocks.3.attn.value.weight 2 (1280, 1280) model.encoder.layers.3.self_attn.v_proj.bias -> encoder.blocks.3.attn.value.bias encoder.blocks.3.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.3.self_attn.q_proj.weight -> encoder.blocks.3.attn.query.weight encoder.blocks.3.attn.query.weight 2 (1280, 1280) model.encoder.layers.3.self_attn.q_proj.bias -> encoder.blocks.3.attn.query.bias encoder.blocks.3.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.3.self_attn.out_proj.weight -> encoder.blocks.3.attn.out.weight encoder.blocks.3.attn.out.weight 2 (1280, 1280) model.encoder.layers.3.self_attn.out_proj.bias -> encoder.blocks.3.attn.out.bias encoder.blocks.3.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.3.self_attn_layer_norm.weight -> encoder.blocks.3.attn_ln.weight encoder.blocks.3.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.3.self_attn_layer_norm.bias -> encoder.blocks.3.attn_ln.bias encoder.blocks.3.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.3.fc1.weight -> encoder.blocks.3.mlp.0.weight encoder.blocks.3.mlp.0.weight 2 (5120, 1280) model.encoder.layers.3.fc1.bias -> encoder.blocks.3.mlp.0.bias encoder.blocks.3.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.3.fc2.weight -> encoder.blocks.3.mlp.2.weight encoder.blocks.3.mlp.2.weight 2 (1280, 5120) model.encoder.layers.3.fc2.bias -> encoder.blocks.3.mlp.2.bias encoder.blocks.3.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.3.final_layer_norm.weight -> encoder.blocks.3.mlp_ln.weight encoder.blocks.3.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.3.final_layer_norm.bias -> encoder.blocks.3.mlp_ln.bias encoder.blocks.3.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.4.self_attn.k_proj.weight -> encoder.blocks.4.attn.key.weight encoder.blocks.4.attn.key.weight 2 (1280, 1280) model.encoder.layers.4.self_attn.v_proj.weight -> encoder.blocks.4.attn.value.weight encoder.blocks.4.attn.value.weight 2 (1280, 1280) model.encoder.layers.4.self_attn.v_proj.bias -> encoder.blocks.4.attn.value.bias encoder.blocks.4.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.4.self_attn.q_proj.weight -> encoder.blocks.4.attn.query.weight encoder.blocks.4.attn.query.weight 2 (1280, 1280) model.encoder.layers.4.self_attn.q_proj.bias -> encoder.blocks.4.attn.query.bias encoder.blocks.4.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.4.self_attn.out_proj.weight -> encoder.blocks.4.attn.out.weight encoder.blocks.4.attn.out.weight 2 (1280, 1280) model.encoder.layers.4.self_attn.out_proj.bias -> encoder.blocks.4.attn.out.bias encoder.blocks.4.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.4.self_attn_layer_norm.weight -> encoder.blocks.4.attn_ln.weight encoder.blocks.4.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.4.self_attn_layer_norm.bias -> encoder.blocks.4.attn_ln.bias encoder.blocks.4.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.4.fc1.weight -> encoder.blocks.4.mlp.0.weight encoder.blocks.4.mlp.0.weight 2 (5120, 1280) model.encoder.layers.4.fc1.bias -> encoder.blocks.4.mlp.0.bias encoder.blocks.4.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.4.fc2.weight -> encoder.blocks.4.mlp.2.weight encoder.blocks.4.mlp.2.weight 2 (1280, 5120) model.encoder.layers.4.fc2.bias -> encoder.blocks.4.mlp.2.bias encoder.blocks.4.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.4.final_layer_norm.weight -> encoder.blocks.4.mlp_ln.weight encoder.blocks.4.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.4.final_layer_norm.bias -> encoder.blocks.4.mlp_ln.bias encoder.blocks.4.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.5.self_attn.k_proj.weight -> encoder.blocks.5.attn.key.weight encoder.blocks.5.attn.key.weight 2 (1280, 1280) model.encoder.layers.5.self_attn.v_proj.weight -> encoder.blocks.5.attn.value.weight encoder.blocks.5.attn.value.weight 2 (1280, 1280) model.encoder.layers.5.self_attn.v_proj.bias -> encoder.blocks.5.attn.value.bias encoder.blocks.5.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.5.self_attn.q_proj.weight -> encoder.blocks.5.attn.query.weight encoder.blocks.5.attn.query.weight 2 (1280, 1280) model.encoder.layers.5.self_attn.q_proj.bias -> encoder.blocks.5.attn.query.bias encoder.blocks.5.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.5.self_attn.out_proj.weight -> encoder.blocks.5.attn.out.weight encoder.blocks.5.attn.out.weight 2 (1280, 1280) model.encoder.layers.5.self_attn.out_proj.bias -> encoder.blocks.5.attn.out.bias encoder.blocks.5.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.5.self_attn_layer_norm.weight -> encoder.blocks.5.attn_ln.weight encoder.blocks.5.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.5.self_attn_layer_norm.bias -> encoder.blocks.5.attn_ln.bias encoder.blocks.5.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.5.fc1.weight -> encoder.blocks.5.mlp.0.weight encoder.blocks.5.mlp.0.weight 2 (5120, 1280) model.encoder.layers.5.fc1.bias -> encoder.blocks.5.mlp.0.bias encoder.blocks.5.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.5.fc2.weight -> encoder.blocks.5.mlp.2.weight encoder.blocks.5.mlp.2.weight 2 (1280, 5120) model.encoder.layers.5.fc2.bias -> encoder.blocks.5.mlp.2.bias encoder.blocks.5.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.5.final_layer_norm.weight -> encoder.blocks.5.mlp_ln.weight encoder.blocks.5.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.5.final_layer_norm.bias -> encoder.blocks.5.mlp_ln.bias encoder.blocks.5.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.6.self_attn.k_proj.weight -> encoder.blocks.6.attn.key.weight encoder.blocks.6.attn.key.weight 2 (1280, 1280) model.encoder.layers.6.self_attn.v_proj.weight -> encoder.blocks.6.attn.value.weight encoder.blocks.6.attn.value.weight 2 (1280, 1280) model.encoder.layers.6.self_attn.v_proj.bias -> encoder.blocks.6.attn.value.bias encoder.blocks.6.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.6.self_attn.q_proj.weight -> encoder.blocks.6.attn.query.weight encoder.blocks.6.attn.query.weight 2 (1280, 1280) model.encoder.layers.6.self_attn.q_proj.bias -> encoder.blocks.6.attn.query.bias encoder.blocks.6.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.6.self_attn.out_proj.weight -> encoder.blocks.6.attn.out.weight encoder.blocks.6.attn.out.weight 2 (1280, 1280) model.encoder.layers.6.self_attn.out_proj.bias -> encoder.blocks.6.attn.out.bias encoder.blocks.6.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.6.self_attn_layer_norm.weight -> encoder.blocks.6.attn_ln.weight encoder.blocks.6.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.6.self_attn_layer_norm.bias -> encoder.blocks.6.attn_ln.bias encoder.blocks.6.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.6.fc1.weight -> encoder.blocks.6.mlp.0.weight encoder.blocks.6.mlp.0.weight 2 (5120, 1280) model.encoder.layers.6.fc1.bias -> encoder.blocks.6.mlp.0.bias encoder.blocks.6.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.6.fc2.weight -> encoder.blocks.6.mlp.2.weight encoder.blocks.6.mlp.2.weight 2 (1280, 5120) model.encoder.layers.6.fc2.bias -> encoder.blocks.6.mlp.2.bias encoder.blocks.6.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.6.final_layer_norm.weight -> encoder.blocks.6.mlp_ln.weight encoder.blocks.6.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.6.final_layer_norm.bias -> encoder.blocks.6.mlp_ln.bias encoder.blocks.6.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.7.self_attn.k_proj.weight -> encoder.blocks.7.attn.key.weight encoder.blocks.7.attn.key.weight 2 (1280, 1280) model.encoder.layers.7.self_attn.v_proj.weight -> encoder.blocks.7.attn.value.weight encoder.blocks.7.attn.value.weight 2 (1280, 1280) model.encoder.layers.7.self_attn.v_proj.bias -> encoder.blocks.7.attn.value.bias encoder.blocks.7.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.7.self_attn.q_proj.weight -> encoder.blocks.7.attn.query.weight encoder.blocks.7.attn.query.weight 2 (1280, 1280) model.encoder.layers.7.self_attn.q_proj.bias -> encoder.blocks.7.attn.query.bias encoder.blocks.7.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.7.self_attn.out_proj.weight -> encoder.blocks.7.attn.out.weight encoder.blocks.7.attn.out.weight 2 (1280, 1280) model.encoder.layers.7.self_attn.out_proj.bias -> encoder.blocks.7.attn.out.bias encoder.blocks.7.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.7.self_attn_layer_norm.weight -> encoder.blocks.7.attn_ln.weight encoder.blocks.7.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.7.self_attn_layer_norm.bias -> encoder.blocks.7.attn_ln.bias encoder.blocks.7.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.7.fc1.weight -> encoder.blocks.7.mlp.0.weight encoder.blocks.7.mlp.0.weight 2 (5120, 1280) model.encoder.layers.7.fc1.bias -> encoder.blocks.7.mlp.0.bias encoder.blocks.7.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.7.fc2.weight -> encoder.blocks.7.mlp.2.weight encoder.blocks.7.mlp.2.weight 2 (1280, 5120) model.encoder.layers.7.fc2.bias -> encoder.blocks.7.mlp.2.bias encoder.blocks.7.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.7.final_layer_norm.weight -> encoder.blocks.7.mlp_ln.weight encoder.blocks.7.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.7.final_layer_norm.bias -> encoder.blocks.7.mlp_ln.bias encoder.blocks.7.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.8.self_attn.k_proj.weight -> encoder.blocks.8.attn.key.weight encoder.blocks.8.attn.key.weight 2 (1280, 1280) model.encoder.layers.8.self_attn.v_proj.weight -> encoder.blocks.8.attn.value.weight encoder.blocks.8.attn.value.weight 2 (1280, 1280) model.encoder.layers.8.self_attn.v_proj.bias -> encoder.blocks.8.attn.value.bias encoder.blocks.8.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.8.self_attn.q_proj.weight -> encoder.blocks.8.attn.query.weight encoder.blocks.8.attn.query.weight 2 (1280, 1280) model.encoder.layers.8.self_attn.q_proj.bias -> encoder.blocks.8.attn.query.bias encoder.blocks.8.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.8.self_attn.out_proj.weight -> encoder.blocks.8.attn.out.weight encoder.blocks.8.attn.out.weight 2 (1280, 1280) model.encoder.layers.8.self_attn.out_proj.bias -> encoder.blocks.8.attn.out.bias encoder.blocks.8.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.8.self_attn_layer_norm.weight -> encoder.blocks.8.attn_ln.weight encoder.blocks.8.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.8.self_attn_layer_norm.bias -> encoder.blocks.8.attn_ln.bias encoder.blocks.8.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.8.fc1.weight -> encoder.blocks.8.mlp.0.weight encoder.blocks.8.mlp.0.weight 2 (5120, 1280) model.encoder.layers.8.fc1.bias -> encoder.blocks.8.mlp.0.bias encoder.blocks.8.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.8.fc2.weight -> encoder.blocks.8.mlp.2.weight encoder.blocks.8.mlp.2.weight 2 (1280, 5120) model.encoder.layers.8.fc2.bias -> encoder.blocks.8.mlp.2.bias encoder.blocks.8.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.8.final_layer_norm.weight -> encoder.blocks.8.mlp_ln.weight encoder.blocks.8.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.8.final_layer_norm.bias -> encoder.blocks.8.mlp_ln.bias encoder.blocks.8.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.9.self_attn.k_proj.weight -> encoder.blocks.9.attn.key.weight encoder.blocks.9.attn.key.weight 2 (1280, 1280) model.encoder.layers.9.self_attn.v_proj.weight -> encoder.blocks.9.attn.value.weight encoder.blocks.9.attn.value.weight 2 (1280, 1280) model.encoder.layers.9.self_attn.v_proj.bias -> encoder.blocks.9.attn.value.bias encoder.blocks.9.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.9.self_attn.q_proj.weight -> encoder.blocks.9.attn.query.weight encoder.blocks.9.attn.query.weight 2 (1280, 1280) model.encoder.layers.9.self_attn.q_proj.bias -> encoder.blocks.9.attn.query.bias encoder.blocks.9.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.9.self_attn.out_proj.weight -> encoder.blocks.9.attn.out.weight encoder.blocks.9.attn.out.weight 2 (1280, 1280) model.encoder.layers.9.self_attn.out_proj.bias -> encoder.blocks.9.attn.out.bias encoder.blocks.9.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.9.self_attn_layer_norm.weight -> encoder.blocks.9.attn_ln.weight encoder.blocks.9.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.9.self_attn_layer_norm.bias -> encoder.blocks.9.attn_ln.bias encoder.blocks.9.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.9.fc1.weight -> encoder.blocks.9.mlp.0.weight encoder.blocks.9.mlp.0.weight 2 (5120, 1280) model.encoder.layers.9.fc1.bias -> encoder.blocks.9.mlp.0.bias encoder.blocks.9.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.9.fc2.weight -> encoder.blocks.9.mlp.2.weight encoder.blocks.9.mlp.2.weight 2 (1280, 5120) model.encoder.layers.9.fc2.bias -> encoder.blocks.9.mlp.2.bias encoder.blocks.9.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.9.final_layer_norm.weight -> encoder.blocks.9.mlp_ln.weight encoder.blocks.9.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.9.final_layer_norm.bias -> encoder.blocks.9.mlp_ln.bias encoder.blocks.9.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.10.self_attn.k_proj.weight -> encoder.blocks.10.attn.key.weight encoder.blocks.10.attn.key.weight 2 (1280, 1280) model.encoder.layers.10.self_attn.v_proj.weight -> encoder.blocks.10.attn.value.weight encoder.blocks.10.attn.value.weight 2 (1280, 1280) model.encoder.layers.10.self_attn.v_proj.bias -> encoder.blocks.10.attn.value.bias encoder.blocks.10.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.10.self_attn.q_proj.weight -> encoder.blocks.10.attn.query.weight encoder.blocks.10.attn.query.weight 2 (1280, 1280) model.encoder.layers.10.self_attn.q_proj.bias -> encoder.blocks.10.attn.query.bias encoder.blocks.10.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.10.self_attn.out_proj.weight -> encoder.blocks.10.attn.out.weight encoder.blocks.10.attn.out.weight 2 (1280, 1280) model.encoder.layers.10.self_attn.out_proj.bias -> encoder.blocks.10.attn.out.bias encoder.blocks.10.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.10.self_attn_layer_norm.weight -> encoder.blocks.10.attn_ln.weight encoder.blocks.10.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.10.self_attn_layer_norm.bias -> encoder.blocks.10.attn_ln.bias encoder.blocks.10.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.10.fc1.weight -> encoder.blocks.10.mlp.0.weight encoder.blocks.10.mlp.0.weight 2 (5120, 1280) model.encoder.layers.10.fc1.bias -> encoder.blocks.10.mlp.0.bias encoder.blocks.10.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.10.fc2.weight -> encoder.blocks.10.mlp.2.weight encoder.blocks.10.mlp.2.weight 2 (1280, 5120) model.encoder.layers.10.fc2.bias -> encoder.blocks.10.mlp.2.bias encoder.blocks.10.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.10.final_layer_norm.weight -> encoder.blocks.10.mlp_ln.weight encoder.blocks.10.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.10.final_layer_norm.bias -> encoder.blocks.10.mlp_ln.bias encoder.blocks.10.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.11.self_attn.k_proj.weight -> encoder.blocks.11.attn.key.weight encoder.blocks.11.attn.key.weight 2 (1280, 1280) model.encoder.layers.11.self_attn.v_proj.weight -> encoder.blocks.11.attn.value.weight encoder.blocks.11.attn.value.weight 2 (1280, 1280) model.encoder.layers.11.self_attn.v_proj.bias -> encoder.blocks.11.attn.value.bias encoder.blocks.11.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.11.self_attn.q_proj.weight -> encoder.blocks.11.attn.query.weight encoder.blocks.11.attn.query.weight 2 (1280, 1280) model.encoder.layers.11.self_attn.q_proj.bias -> encoder.blocks.11.attn.query.bias encoder.blocks.11.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.11.self_attn.out_proj.weight -> encoder.blocks.11.attn.out.weight encoder.blocks.11.attn.out.weight 2 (1280, 1280) model.encoder.layers.11.self_attn.out_proj.bias -> encoder.blocks.11.attn.out.bias encoder.blocks.11.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.11.self_attn_layer_norm.weight -> encoder.blocks.11.attn_ln.weight encoder.blocks.11.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.11.self_attn_layer_norm.bias -> encoder.blocks.11.attn_ln.bias encoder.blocks.11.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.11.fc1.weight -> encoder.blocks.11.mlp.0.weight encoder.blocks.11.mlp.0.weight 2 (5120, 1280) model.encoder.layers.11.fc1.bias -> encoder.blocks.11.mlp.0.bias encoder.blocks.11.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.11.fc2.weight -> encoder.blocks.11.mlp.2.weight encoder.blocks.11.mlp.2.weight 2 (1280, 5120) model.encoder.layers.11.fc2.bias -> encoder.blocks.11.mlp.2.bias encoder.blocks.11.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.11.final_layer_norm.weight -> encoder.blocks.11.mlp_ln.weight encoder.blocks.11.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.11.final_layer_norm.bias -> encoder.blocks.11.mlp_ln.bias encoder.blocks.11.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.12.self_attn.k_proj.weight -> encoder.blocks.12.attn.key.weight encoder.blocks.12.attn.key.weight 2 (1280, 1280) model.encoder.layers.12.self_attn.v_proj.weight -> encoder.blocks.12.attn.value.weight encoder.blocks.12.attn.value.weight 2 (1280, 1280) model.encoder.layers.12.self_attn.v_proj.bias -> encoder.blocks.12.attn.value.bias encoder.blocks.12.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.12.self_attn.q_proj.weight -> encoder.blocks.12.attn.query.weight encoder.blocks.12.attn.query.weight 2 (1280, 1280) model.encoder.layers.12.self_attn.q_proj.bias -> encoder.blocks.12.attn.query.bias encoder.blocks.12.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.12.self_attn.out_proj.weight -> encoder.blocks.12.attn.out.weight encoder.blocks.12.attn.out.weight 2 (1280, 1280) model.encoder.layers.12.self_attn.out_proj.bias -> encoder.blocks.12.attn.out.bias encoder.blocks.12.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.12.self_attn_layer_norm.weight -> encoder.blocks.12.attn_ln.weight encoder.blocks.12.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.12.self_attn_layer_norm.bias -> encoder.blocks.12.attn_ln.bias encoder.blocks.12.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.12.fc1.weight -> encoder.blocks.12.mlp.0.weight encoder.blocks.12.mlp.0.weight 2 (5120, 1280) model.encoder.layers.12.fc1.bias -> encoder.blocks.12.mlp.0.bias encoder.blocks.12.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.12.fc2.weight -> encoder.blocks.12.mlp.2.weight encoder.blocks.12.mlp.2.weight 2 (1280, 5120) model.encoder.layers.12.fc2.bias -> encoder.blocks.12.mlp.2.bias encoder.blocks.12.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.12.final_layer_norm.weight -> encoder.blocks.12.mlp_ln.weight encoder.blocks.12.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.12.final_layer_norm.bias -> encoder.blocks.12.mlp_ln.bias encoder.blocks.12.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.13.self_attn.k_proj.weight -> encoder.blocks.13.attn.key.weight encoder.blocks.13.attn.key.weight 2 (1280, 1280) model.encoder.layers.13.self_attn.v_proj.weight -> encoder.blocks.13.attn.value.weight encoder.blocks.13.attn.value.weight 2 (1280, 1280) model.encoder.layers.13.self_attn.v_proj.bias -> encoder.blocks.13.attn.value.bias encoder.blocks.13.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.13.self_attn.q_proj.weight -> encoder.blocks.13.attn.query.weight encoder.blocks.13.attn.query.weight 2 (1280, 1280) model.encoder.layers.13.self_attn.q_proj.bias -> encoder.blocks.13.attn.query.bias encoder.blocks.13.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.13.self_attn.out_proj.weight -> encoder.blocks.13.attn.out.weight encoder.blocks.13.attn.out.weight 2 (1280, 1280) model.encoder.layers.13.self_attn.out_proj.bias -> encoder.blocks.13.attn.out.bias encoder.blocks.13.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.13.self_attn_layer_norm.weight -> encoder.blocks.13.attn_ln.weight encoder.blocks.13.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.13.self_attn_layer_norm.bias -> encoder.blocks.13.attn_ln.bias encoder.blocks.13.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.13.fc1.weight -> encoder.blocks.13.mlp.0.weight encoder.blocks.13.mlp.0.weight 2 (5120, 1280) model.encoder.layers.13.fc1.bias -> encoder.blocks.13.mlp.0.bias encoder.blocks.13.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.13.fc2.weight -> encoder.blocks.13.mlp.2.weight encoder.blocks.13.mlp.2.weight 2 (1280, 5120) model.encoder.layers.13.fc2.bias -> encoder.blocks.13.mlp.2.bias encoder.blocks.13.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.13.final_layer_norm.weight -> encoder.blocks.13.mlp_ln.weight encoder.blocks.13.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.13.final_layer_norm.bias -> encoder.blocks.13.mlp_ln.bias encoder.blocks.13.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.14.self_attn.k_proj.weight -> encoder.blocks.14.attn.key.weight encoder.blocks.14.attn.key.weight 2 (1280, 1280) model.encoder.layers.14.self_attn.v_proj.weight -> encoder.blocks.14.attn.value.weight encoder.blocks.14.attn.value.weight 2 (1280, 1280) model.encoder.layers.14.self_attn.v_proj.bias -> encoder.blocks.14.attn.value.bias encoder.blocks.14.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.14.self_attn.q_proj.weight -> encoder.blocks.14.attn.query.weight encoder.blocks.14.attn.query.weight 2 (1280, 1280) model.encoder.layers.14.self_attn.q_proj.bias -> encoder.blocks.14.attn.query.bias encoder.blocks.14.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.14.self_attn.out_proj.weight -> encoder.blocks.14.attn.out.weight encoder.blocks.14.attn.out.weight 2 (1280, 1280) model.encoder.layers.14.self_attn.out_proj.bias -> encoder.blocks.14.attn.out.bias encoder.blocks.14.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.14.self_attn_layer_norm.weight -> encoder.blocks.14.attn_ln.weight encoder.blocks.14.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.14.self_attn_layer_norm.bias -> encoder.blocks.14.attn_ln.bias encoder.blocks.14.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.14.fc1.weight -> encoder.blocks.14.mlp.0.weight encoder.blocks.14.mlp.0.weight 2 (5120, 1280) model.encoder.layers.14.fc1.bias -> encoder.blocks.14.mlp.0.bias encoder.blocks.14.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.14.fc2.weight -> encoder.blocks.14.mlp.2.weight encoder.blocks.14.mlp.2.weight 2 (1280, 5120) model.encoder.layers.14.fc2.bias -> encoder.blocks.14.mlp.2.bias encoder.blocks.14.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.14.final_layer_norm.weight -> encoder.blocks.14.mlp_ln.weight encoder.blocks.14.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.14.final_layer_norm.bias -> encoder.blocks.14.mlp_ln.bias encoder.blocks.14.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.15.self_attn.k_proj.weight -> encoder.blocks.15.attn.key.weight encoder.blocks.15.attn.key.weight 2 (1280, 1280) model.encoder.layers.15.self_attn.v_proj.weight -> encoder.blocks.15.attn.value.weight encoder.blocks.15.attn.value.weight 2 (1280, 1280) model.encoder.layers.15.self_attn.v_proj.bias -> encoder.blocks.15.attn.value.bias encoder.blocks.15.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.15.self_attn.q_proj.weight -> encoder.blocks.15.attn.query.weight encoder.blocks.15.attn.query.weight 2 (1280, 1280) model.encoder.layers.15.self_attn.q_proj.bias -> encoder.blocks.15.attn.query.bias encoder.blocks.15.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.15.self_attn.out_proj.weight -> encoder.blocks.15.attn.out.weight encoder.blocks.15.attn.out.weight 2 (1280, 1280) model.encoder.layers.15.self_attn.out_proj.bias -> encoder.blocks.15.attn.out.bias encoder.blocks.15.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.15.self_attn_layer_norm.weight -> encoder.blocks.15.attn_ln.weight encoder.blocks.15.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.15.self_attn_layer_norm.bias -> encoder.blocks.15.attn_ln.bias encoder.blocks.15.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.15.fc1.weight -> encoder.blocks.15.mlp.0.weight encoder.blocks.15.mlp.0.weight 2 (5120, 1280) model.encoder.layers.15.fc1.bias -> encoder.blocks.15.mlp.0.bias encoder.blocks.15.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.15.fc2.weight -> encoder.blocks.15.mlp.2.weight encoder.blocks.15.mlp.2.weight 2 (1280, 5120) model.encoder.layers.15.fc2.bias -> encoder.blocks.15.mlp.2.bias encoder.blocks.15.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.15.final_layer_norm.weight -> encoder.blocks.15.mlp_ln.weight encoder.blocks.15.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.15.final_layer_norm.bias -> encoder.blocks.15.mlp_ln.bias encoder.blocks.15.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.16.self_attn.k_proj.weight -> encoder.blocks.16.attn.key.weight encoder.blocks.16.attn.key.weight 2 (1280, 1280) model.encoder.layers.16.self_attn.v_proj.weight -> encoder.blocks.16.attn.value.weight encoder.blocks.16.attn.value.weight 2 (1280, 1280) model.encoder.layers.16.self_attn.v_proj.bias -> encoder.blocks.16.attn.value.bias encoder.blocks.16.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.16.self_attn.q_proj.weight -> encoder.blocks.16.attn.query.weight encoder.blocks.16.attn.query.weight 2 (1280, 1280) model.encoder.layers.16.self_attn.q_proj.bias -> encoder.blocks.16.attn.query.bias encoder.blocks.16.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.16.self_attn.out_proj.weight -> encoder.blocks.16.attn.out.weight encoder.blocks.16.attn.out.weight 2 (1280, 1280) model.encoder.layers.16.self_attn.out_proj.bias -> encoder.blocks.16.attn.out.bias encoder.blocks.16.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.16.self_attn_layer_norm.weight -> encoder.blocks.16.attn_ln.weight encoder.blocks.16.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.16.self_attn_layer_norm.bias -> encoder.blocks.16.attn_ln.bias encoder.blocks.16.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.16.fc1.weight -> encoder.blocks.16.mlp.0.weight encoder.blocks.16.mlp.0.weight 2 (5120, 1280) model.encoder.layers.16.fc1.bias -> encoder.blocks.16.mlp.0.bias encoder.blocks.16.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.16.fc2.weight -> encoder.blocks.16.mlp.2.weight encoder.blocks.16.mlp.2.weight 2 (1280, 5120) model.encoder.layers.16.fc2.bias -> encoder.blocks.16.mlp.2.bias encoder.blocks.16.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.16.final_layer_norm.weight -> encoder.blocks.16.mlp_ln.weight encoder.blocks.16.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.16.final_layer_norm.bias -> encoder.blocks.16.mlp_ln.bias encoder.blocks.16.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.17.self_attn.k_proj.weight -> encoder.blocks.17.attn.key.weight encoder.blocks.17.attn.key.weight 2 (1280, 1280) model.encoder.layers.17.self_attn.v_proj.weight -> encoder.blocks.17.attn.value.weight encoder.blocks.17.attn.value.weight 2 (1280, 1280) model.encoder.layers.17.self_attn.v_proj.bias -> encoder.blocks.17.attn.value.bias encoder.blocks.17.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.17.self_attn.q_proj.weight -> encoder.blocks.17.attn.query.weight encoder.blocks.17.attn.query.weight 2 (1280, 1280) model.encoder.layers.17.self_attn.q_proj.bias -> encoder.blocks.17.attn.query.bias encoder.blocks.17.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.17.self_attn.out_proj.weight -> encoder.blocks.17.attn.out.weight encoder.blocks.17.attn.out.weight 2 (1280, 1280) model.encoder.layers.17.self_attn.out_proj.bias -> encoder.blocks.17.attn.out.bias encoder.blocks.17.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.17.self_attn_layer_norm.weight -> encoder.blocks.17.attn_ln.weight encoder.blocks.17.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.17.self_attn_layer_norm.bias -> encoder.blocks.17.attn_ln.bias encoder.blocks.17.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.17.fc1.weight -> encoder.blocks.17.mlp.0.weight encoder.blocks.17.mlp.0.weight 2 (5120, 1280) model.encoder.layers.17.fc1.bias -> encoder.blocks.17.mlp.0.bias encoder.blocks.17.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.17.fc2.weight -> encoder.blocks.17.mlp.2.weight encoder.blocks.17.mlp.2.weight 2 (1280, 5120) model.encoder.layers.17.fc2.bias -> encoder.blocks.17.mlp.2.bias encoder.blocks.17.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.17.final_layer_norm.weight -> encoder.blocks.17.mlp_ln.weight encoder.blocks.17.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.17.final_layer_norm.bias -> encoder.blocks.17.mlp_ln.bias encoder.blocks.17.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.18.self_attn.k_proj.weight -> encoder.blocks.18.attn.key.weight encoder.blocks.18.attn.key.weight 2 (1280, 1280) model.encoder.layers.18.self_attn.v_proj.weight -> encoder.blocks.18.attn.value.weight encoder.blocks.18.attn.value.weight 2 (1280, 1280) model.encoder.layers.18.self_attn.v_proj.bias -> encoder.blocks.18.attn.value.bias encoder.blocks.18.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.18.self_attn.q_proj.weight -> encoder.blocks.18.attn.query.weight encoder.blocks.18.attn.query.weight 2 (1280, 1280) model.encoder.layers.18.self_attn.q_proj.bias -> encoder.blocks.18.attn.query.bias encoder.blocks.18.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.18.self_attn.out_proj.weight -> encoder.blocks.18.attn.out.weight encoder.blocks.18.attn.out.weight 2 (1280, 1280) model.encoder.layers.18.self_attn.out_proj.bias -> encoder.blocks.18.attn.out.bias encoder.blocks.18.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.18.self_attn_layer_norm.weight -> encoder.blocks.18.attn_ln.weight encoder.blocks.18.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.18.self_attn_layer_norm.bias -> encoder.blocks.18.attn_ln.bias encoder.blocks.18.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.18.fc1.weight -> encoder.blocks.18.mlp.0.weight encoder.blocks.18.mlp.0.weight 2 (5120, 1280) model.encoder.layers.18.fc1.bias -> encoder.blocks.18.mlp.0.bias encoder.blocks.18.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.18.fc2.weight -> encoder.blocks.18.mlp.2.weight encoder.blocks.18.mlp.2.weight 2 (1280, 5120) model.encoder.layers.18.fc2.bias -> encoder.blocks.18.mlp.2.bias encoder.blocks.18.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.18.final_layer_norm.weight -> encoder.blocks.18.mlp_ln.weight encoder.blocks.18.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.18.final_layer_norm.bias -> encoder.blocks.18.mlp_ln.bias encoder.blocks.18.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.19.self_attn.k_proj.weight -> encoder.blocks.19.attn.key.weight encoder.blocks.19.attn.key.weight 2 (1280, 1280) model.encoder.layers.19.self_attn.v_proj.weight -> encoder.blocks.19.attn.value.weight encoder.blocks.19.attn.value.weight 2 (1280, 1280) model.encoder.layers.19.self_attn.v_proj.bias -> encoder.blocks.19.attn.value.bias encoder.blocks.19.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.19.self_attn.q_proj.weight -> encoder.blocks.19.attn.query.weight encoder.blocks.19.attn.query.weight 2 (1280, 1280) model.encoder.layers.19.self_attn.q_proj.bias -> encoder.blocks.19.attn.query.bias encoder.blocks.19.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.19.self_attn.out_proj.weight -> encoder.blocks.19.attn.out.weight encoder.blocks.19.attn.out.weight 2 (1280, 1280) model.encoder.layers.19.self_attn.out_proj.bias -> encoder.blocks.19.attn.out.bias encoder.blocks.19.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.19.self_attn_layer_norm.weight -> encoder.blocks.19.attn_ln.weight encoder.blocks.19.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.19.self_attn_layer_norm.bias -> encoder.blocks.19.attn_ln.bias encoder.blocks.19.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.19.fc1.weight -> encoder.blocks.19.mlp.0.weight encoder.blocks.19.mlp.0.weight 2 (5120, 1280) model.encoder.layers.19.fc1.bias -> encoder.blocks.19.mlp.0.bias encoder.blocks.19.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.19.fc2.weight -> encoder.blocks.19.mlp.2.weight encoder.blocks.19.mlp.2.weight 2 (1280, 5120) model.encoder.layers.19.fc2.bias -> encoder.blocks.19.mlp.2.bias encoder.blocks.19.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.19.final_layer_norm.weight -> encoder.blocks.19.mlp_ln.weight encoder.blocks.19.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.19.final_layer_norm.bias -> encoder.blocks.19.mlp_ln.bias encoder.blocks.19.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.20.self_attn.k_proj.weight -> encoder.blocks.20.attn.key.weight encoder.blocks.20.attn.key.weight 2 (1280, 1280) model.encoder.layers.20.self_attn.v_proj.weight -> encoder.blocks.20.attn.value.weight encoder.blocks.20.attn.value.weight 2 (1280, 1280) model.encoder.layers.20.self_attn.v_proj.bias -> encoder.blocks.20.attn.value.bias encoder.blocks.20.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.20.self_attn.q_proj.weight -> encoder.blocks.20.attn.query.weight encoder.blocks.20.attn.query.weight 2 (1280, 1280) model.encoder.layers.20.self_attn.q_proj.bias -> encoder.blocks.20.attn.query.bias encoder.blocks.20.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.20.self_attn.out_proj.weight -> encoder.blocks.20.attn.out.weight encoder.blocks.20.attn.out.weight 2 (1280, 1280) model.encoder.layers.20.self_attn.out_proj.bias -> encoder.blocks.20.attn.out.bias encoder.blocks.20.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.20.self_attn_layer_norm.weight -> encoder.blocks.20.attn_ln.weight encoder.blocks.20.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.20.self_attn_layer_norm.bias -> encoder.blocks.20.attn_ln.bias encoder.blocks.20.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.20.fc1.weight -> encoder.blocks.20.mlp.0.weight encoder.blocks.20.mlp.0.weight 2 (5120, 1280) model.encoder.layers.20.fc1.bias -> encoder.blocks.20.mlp.0.bias encoder.blocks.20.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.20.fc2.weight -> encoder.blocks.20.mlp.2.weight encoder.blocks.20.mlp.2.weight 2 (1280, 5120) model.encoder.layers.20.fc2.bias -> encoder.blocks.20.mlp.2.bias encoder.blocks.20.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.20.final_layer_norm.weight -> encoder.blocks.20.mlp_ln.weight encoder.blocks.20.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.20.final_layer_norm.bias -> encoder.blocks.20.mlp_ln.bias encoder.blocks.20.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.21.self_attn.k_proj.weight -> encoder.blocks.21.attn.key.weight encoder.blocks.21.attn.key.weight 2 (1280, 1280) model.encoder.layers.21.self_attn.v_proj.weight -> encoder.blocks.21.attn.value.weight encoder.blocks.21.attn.value.weight 2 (1280, 1280) model.encoder.layers.21.self_attn.v_proj.bias -> encoder.blocks.21.attn.value.bias encoder.blocks.21.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.21.self_attn.q_proj.weight -> encoder.blocks.21.attn.query.weight encoder.blocks.21.attn.query.weight 2 (1280, 1280) model.encoder.layers.21.self_attn.q_proj.bias -> encoder.blocks.21.attn.query.bias encoder.blocks.21.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.21.self_attn.out_proj.weight -> encoder.blocks.21.attn.out.weight encoder.blocks.21.attn.out.weight 2 (1280, 1280) model.encoder.layers.21.self_attn.out_proj.bias -> encoder.blocks.21.attn.out.bias encoder.blocks.21.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.21.self_attn_layer_norm.weight -> encoder.blocks.21.attn_ln.weight encoder.blocks.21.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.21.self_attn_layer_norm.bias -> encoder.blocks.21.attn_ln.bias encoder.blocks.21.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.21.fc1.weight -> encoder.blocks.21.mlp.0.weight encoder.blocks.21.mlp.0.weight 2 (5120, 1280) model.encoder.layers.21.fc1.bias -> encoder.blocks.21.mlp.0.bias encoder.blocks.21.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.21.fc2.weight -> encoder.blocks.21.mlp.2.weight encoder.blocks.21.mlp.2.weight 2 (1280, 5120) model.encoder.layers.21.fc2.bias -> encoder.blocks.21.mlp.2.bias encoder.blocks.21.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.21.final_layer_norm.weight -> encoder.blocks.21.mlp_ln.weight encoder.blocks.21.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.21.final_layer_norm.bias -> encoder.blocks.21.mlp_ln.bias encoder.blocks.21.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.22.self_attn.k_proj.weight -> encoder.blocks.22.attn.key.weight encoder.blocks.22.attn.key.weight 2 (1280, 1280) model.encoder.layers.22.self_attn.v_proj.weight -> encoder.blocks.22.attn.value.weight encoder.blocks.22.attn.value.weight 2 (1280, 1280) model.encoder.layers.22.self_attn.v_proj.bias -> encoder.blocks.22.attn.value.bias encoder.blocks.22.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.22.self_attn.q_proj.weight -> encoder.blocks.22.attn.query.weight encoder.blocks.22.attn.query.weight 2 (1280, 1280) model.encoder.layers.22.self_attn.q_proj.bias -> encoder.blocks.22.attn.query.bias encoder.blocks.22.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.22.self_attn.out_proj.weight -> encoder.blocks.22.attn.out.weight encoder.blocks.22.attn.out.weight 2 (1280, 1280) model.encoder.layers.22.self_attn.out_proj.bias -> encoder.blocks.22.attn.out.bias encoder.blocks.22.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.22.self_attn_layer_norm.weight -> encoder.blocks.22.attn_ln.weight encoder.blocks.22.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.22.self_attn_layer_norm.bias -> encoder.blocks.22.attn_ln.bias encoder.blocks.22.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.22.fc1.weight -> encoder.blocks.22.mlp.0.weight encoder.blocks.22.mlp.0.weight 2 (5120, 1280) model.encoder.layers.22.fc1.bias -> encoder.blocks.22.mlp.0.bias encoder.blocks.22.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.22.fc2.weight -> encoder.blocks.22.mlp.2.weight encoder.blocks.22.mlp.2.weight 2 (1280, 5120) model.encoder.layers.22.fc2.bias -> encoder.blocks.22.mlp.2.bias encoder.blocks.22.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.22.final_layer_norm.weight -> encoder.blocks.22.mlp_ln.weight encoder.blocks.22.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.22.final_layer_norm.bias -> encoder.blocks.22.mlp_ln.bias encoder.blocks.22.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.23.self_attn.k_proj.weight -> encoder.blocks.23.attn.key.weight encoder.blocks.23.attn.key.weight 2 (1280, 1280) model.encoder.layers.23.self_attn.v_proj.weight -> encoder.blocks.23.attn.value.weight encoder.blocks.23.attn.value.weight 2 (1280, 1280) model.encoder.layers.23.self_attn.v_proj.bias -> encoder.blocks.23.attn.value.bias encoder.blocks.23.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.23.self_attn.q_proj.weight -> encoder.blocks.23.attn.query.weight encoder.blocks.23.attn.query.weight 2 (1280, 1280) model.encoder.layers.23.self_attn.q_proj.bias -> encoder.blocks.23.attn.query.bias encoder.blocks.23.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.23.self_attn.out_proj.weight -> encoder.blocks.23.attn.out.weight encoder.blocks.23.attn.out.weight 2 (1280, 1280) model.encoder.layers.23.self_attn.out_proj.bias -> encoder.blocks.23.attn.out.bias encoder.blocks.23.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.23.self_attn_layer_norm.weight -> encoder.blocks.23.attn_ln.weight encoder.blocks.23.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.23.self_attn_layer_norm.bias -> encoder.blocks.23.attn_ln.bias encoder.blocks.23.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.23.fc1.weight -> encoder.blocks.23.mlp.0.weight encoder.blocks.23.mlp.0.weight 2 (5120, 1280) model.encoder.layers.23.fc1.bias -> encoder.blocks.23.mlp.0.bias encoder.blocks.23.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.23.fc2.weight -> encoder.blocks.23.mlp.2.weight encoder.blocks.23.mlp.2.weight 2 (1280, 5120) model.encoder.layers.23.fc2.bias -> encoder.blocks.23.mlp.2.bias encoder.blocks.23.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.23.final_layer_norm.weight -> encoder.blocks.23.mlp_ln.weight encoder.blocks.23.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.23.final_layer_norm.bias -> encoder.blocks.23.mlp_ln.bias encoder.blocks.23.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.24.self_attn.k_proj.weight -> encoder.blocks.24.attn.key.weight encoder.blocks.24.attn.key.weight 2 (1280, 1280) model.encoder.layers.24.self_attn.v_proj.weight -> encoder.blocks.24.attn.value.weight encoder.blocks.24.attn.value.weight 2 (1280, 1280) model.encoder.layers.24.self_attn.v_proj.bias -> encoder.blocks.24.attn.value.bias encoder.blocks.24.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.24.self_attn.q_proj.weight -> encoder.blocks.24.attn.query.weight encoder.blocks.24.attn.query.weight 2 (1280, 1280) model.encoder.layers.24.self_attn.q_proj.bias -> encoder.blocks.24.attn.query.bias encoder.blocks.24.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.24.self_attn.out_proj.weight -> encoder.blocks.24.attn.out.weight encoder.blocks.24.attn.out.weight 2 (1280, 1280) model.encoder.layers.24.self_attn.out_proj.bias -> encoder.blocks.24.attn.out.bias encoder.blocks.24.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.24.self_attn_layer_norm.weight -> encoder.blocks.24.attn_ln.weight encoder.blocks.24.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.24.self_attn_layer_norm.bias -> encoder.blocks.24.attn_ln.bias encoder.blocks.24.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.24.fc1.weight -> encoder.blocks.24.mlp.0.weight encoder.blocks.24.mlp.0.weight 2 (5120, 1280) model.encoder.layers.24.fc1.bias -> encoder.blocks.24.mlp.0.bias encoder.blocks.24.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.24.fc2.weight -> encoder.blocks.24.mlp.2.weight encoder.blocks.24.mlp.2.weight 2 (1280, 5120) model.encoder.layers.24.fc2.bias -> encoder.blocks.24.mlp.2.bias encoder.blocks.24.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.24.final_layer_norm.weight -> encoder.blocks.24.mlp_ln.weight encoder.blocks.24.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.24.final_layer_norm.bias -> encoder.blocks.24.mlp_ln.bias encoder.blocks.24.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.25.self_attn.k_proj.weight -> encoder.blocks.25.attn.key.weight encoder.blocks.25.attn.key.weight 2 (1280, 1280) model.encoder.layers.25.self_attn.v_proj.weight -> encoder.blocks.25.attn.value.weight encoder.blocks.25.attn.value.weight 2 (1280, 1280) model.encoder.layers.25.self_attn.v_proj.bias -> encoder.blocks.25.attn.value.bias encoder.blocks.25.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.25.self_attn.q_proj.weight -> encoder.blocks.25.attn.query.weight encoder.blocks.25.attn.query.weight 2 (1280, 1280) model.encoder.layers.25.self_attn.q_proj.bias -> encoder.blocks.25.attn.query.bias encoder.blocks.25.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.25.self_attn.out_proj.weight -> encoder.blocks.25.attn.out.weight encoder.blocks.25.attn.out.weight 2 (1280, 1280) model.encoder.layers.25.self_attn.out_proj.bias -> encoder.blocks.25.attn.out.bias encoder.blocks.25.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.25.self_attn_layer_norm.weight -> encoder.blocks.25.attn_ln.weight encoder.blocks.25.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.25.self_attn_layer_norm.bias -> encoder.blocks.25.attn_ln.bias encoder.blocks.25.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.25.fc1.weight -> encoder.blocks.25.mlp.0.weight encoder.blocks.25.mlp.0.weight 2 (5120, 1280) model.encoder.layers.25.fc1.bias -> encoder.blocks.25.mlp.0.bias encoder.blocks.25.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.25.fc2.weight -> encoder.blocks.25.mlp.2.weight encoder.blocks.25.mlp.2.weight 2 (1280, 5120) model.encoder.layers.25.fc2.bias -> encoder.blocks.25.mlp.2.bias encoder.blocks.25.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.25.final_layer_norm.weight -> encoder.blocks.25.mlp_ln.weight encoder.blocks.25.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.25.final_layer_norm.bias -> encoder.blocks.25.mlp_ln.bias encoder.blocks.25.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.26.self_attn.k_proj.weight -> encoder.blocks.26.attn.key.weight encoder.blocks.26.attn.key.weight 2 (1280, 1280) model.encoder.layers.26.self_attn.v_proj.weight -> encoder.blocks.26.attn.value.weight encoder.blocks.26.attn.value.weight 2 (1280, 1280) model.encoder.layers.26.self_attn.v_proj.bias -> encoder.blocks.26.attn.value.bias encoder.blocks.26.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.26.self_attn.q_proj.weight -> encoder.blocks.26.attn.query.weight encoder.blocks.26.attn.query.weight 2 (1280, 1280) model.encoder.layers.26.self_attn.q_proj.bias -> encoder.blocks.26.attn.query.bias encoder.blocks.26.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.26.self_attn.out_proj.weight -> encoder.blocks.26.attn.out.weight encoder.blocks.26.attn.out.weight 2 (1280, 1280) model.encoder.layers.26.self_attn.out_proj.bias -> encoder.blocks.26.attn.out.bias encoder.blocks.26.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.26.self_attn_layer_norm.weight -> encoder.blocks.26.attn_ln.weight encoder.blocks.26.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.26.self_attn_layer_norm.bias -> encoder.blocks.26.attn_ln.bias encoder.blocks.26.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.26.fc1.weight -> encoder.blocks.26.mlp.0.weight encoder.blocks.26.mlp.0.weight 2 (5120, 1280) model.encoder.layers.26.fc1.bias -> encoder.blocks.26.mlp.0.bias encoder.blocks.26.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.26.fc2.weight -> encoder.blocks.26.mlp.2.weight encoder.blocks.26.mlp.2.weight 2 (1280, 5120) model.encoder.layers.26.fc2.bias -> encoder.blocks.26.mlp.2.bias encoder.blocks.26.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.26.final_layer_norm.weight -> encoder.blocks.26.mlp_ln.weight encoder.blocks.26.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.26.final_layer_norm.bias -> encoder.blocks.26.mlp_ln.bias encoder.blocks.26.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.27.self_attn.k_proj.weight -> encoder.blocks.27.attn.key.weight encoder.blocks.27.attn.key.weight 2 (1280, 1280) model.encoder.layers.27.self_attn.v_proj.weight -> encoder.blocks.27.attn.value.weight encoder.blocks.27.attn.value.weight 2 (1280, 1280) model.encoder.layers.27.self_attn.v_proj.bias -> encoder.blocks.27.attn.value.bias encoder.blocks.27.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.27.self_attn.q_proj.weight -> encoder.blocks.27.attn.query.weight encoder.blocks.27.attn.query.weight 2 (1280, 1280) model.encoder.layers.27.self_attn.q_proj.bias -> encoder.blocks.27.attn.query.bias encoder.blocks.27.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.27.self_attn.out_proj.weight -> encoder.blocks.27.attn.out.weight encoder.blocks.27.attn.out.weight 2 (1280, 1280) model.encoder.layers.27.self_attn.out_proj.bias -> encoder.blocks.27.attn.out.bias encoder.blocks.27.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.27.self_attn_layer_norm.weight -> encoder.blocks.27.attn_ln.weight encoder.blocks.27.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.27.self_attn_layer_norm.bias -> encoder.blocks.27.attn_ln.bias encoder.blocks.27.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.27.fc1.weight -> encoder.blocks.27.mlp.0.weight encoder.blocks.27.mlp.0.weight 2 (5120, 1280) model.encoder.layers.27.fc1.bias -> encoder.blocks.27.mlp.0.bias encoder.blocks.27.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.27.fc2.weight -> encoder.blocks.27.mlp.2.weight encoder.blocks.27.mlp.2.weight 2 (1280, 5120) model.encoder.layers.27.fc2.bias -> encoder.blocks.27.mlp.2.bias encoder.blocks.27.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.27.final_layer_norm.weight -> encoder.blocks.27.mlp_ln.weight encoder.blocks.27.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.27.final_layer_norm.bias -> encoder.blocks.27.mlp_ln.bias encoder.blocks.27.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.28.self_attn.k_proj.weight -> encoder.blocks.28.attn.key.weight encoder.blocks.28.attn.key.weight 2 (1280, 1280) model.encoder.layers.28.self_attn.v_proj.weight -> encoder.blocks.28.attn.value.weight encoder.blocks.28.attn.value.weight 2 (1280, 1280) model.encoder.layers.28.self_attn.v_proj.bias -> encoder.blocks.28.attn.value.bias encoder.blocks.28.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.28.self_attn.q_proj.weight -> encoder.blocks.28.attn.query.weight encoder.blocks.28.attn.query.weight 2 (1280, 1280) model.encoder.layers.28.self_attn.q_proj.bias -> encoder.blocks.28.attn.query.bias encoder.blocks.28.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.28.self_attn.out_proj.weight -> encoder.blocks.28.attn.out.weight encoder.blocks.28.attn.out.weight 2 (1280, 1280) model.encoder.layers.28.self_attn.out_proj.bias -> encoder.blocks.28.attn.out.bias encoder.blocks.28.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.28.self_attn_layer_norm.weight -> encoder.blocks.28.attn_ln.weight encoder.blocks.28.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.28.self_attn_layer_norm.bias -> encoder.blocks.28.attn_ln.bias encoder.blocks.28.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.28.fc1.weight -> encoder.blocks.28.mlp.0.weight encoder.blocks.28.mlp.0.weight 2 (5120, 1280) model.encoder.layers.28.fc1.bias -> encoder.blocks.28.mlp.0.bias encoder.blocks.28.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.28.fc2.weight -> encoder.blocks.28.mlp.2.weight encoder.blocks.28.mlp.2.weight 2 (1280, 5120) model.encoder.layers.28.fc2.bias -> encoder.blocks.28.mlp.2.bias encoder.blocks.28.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.28.final_layer_norm.weight -> encoder.blocks.28.mlp_ln.weight encoder.blocks.28.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.28.final_layer_norm.bias -> encoder.blocks.28.mlp_ln.bias encoder.blocks.28.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.29.self_attn.k_proj.weight -> encoder.blocks.29.attn.key.weight encoder.blocks.29.attn.key.weight 2 (1280, 1280) model.encoder.layers.29.self_attn.v_proj.weight -> encoder.blocks.29.attn.value.weight encoder.blocks.29.attn.value.weight 2 (1280, 1280) model.encoder.layers.29.self_attn.v_proj.bias -> encoder.blocks.29.attn.value.bias encoder.blocks.29.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.29.self_attn.q_proj.weight -> encoder.blocks.29.attn.query.weight encoder.blocks.29.attn.query.weight 2 (1280, 1280) model.encoder.layers.29.self_attn.q_proj.bias -> encoder.blocks.29.attn.query.bias encoder.blocks.29.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.29.self_attn.out_proj.weight -> encoder.blocks.29.attn.out.weight encoder.blocks.29.attn.out.weight 2 (1280, 1280) model.encoder.layers.29.self_attn.out_proj.bias -> encoder.blocks.29.attn.out.bias encoder.blocks.29.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.29.self_attn_layer_norm.weight -> encoder.blocks.29.attn_ln.weight encoder.blocks.29.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.29.self_attn_layer_norm.bias -> encoder.blocks.29.attn_ln.bias encoder.blocks.29.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.29.fc1.weight -> encoder.blocks.29.mlp.0.weight encoder.blocks.29.mlp.0.weight 2 (5120, 1280) model.encoder.layers.29.fc1.bias -> encoder.blocks.29.mlp.0.bias encoder.blocks.29.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.29.fc2.weight -> encoder.blocks.29.mlp.2.weight encoder.blocks.29.mlp.2.weight 2 (1280, 5120) model.encoder.layers.29.fc2.bias -> encoder.blocks.29.mlp.2.bias encoder.blocks.29.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.29.final_layer_norm.weight -> encoder.blocks.29.mlp_ln.weight encoder.blocks.29.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.29.final_layer_norm.bias -> encoder.blocks.29.mlp_ln.bias encoder.blocks.29.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.30.self_attn.k_proj.weight -> encoder.blocks.30.attn.key.weight encoder.blocks.30.attn.key.weight 2 (1280, 1280) model.encoder.layers.30.self_attn.v_proj.weight -> encoder.blocks.30.attn.value.weight encoder.blocks.30.attn.value.weight 2 (1280, 1280) model.encoder.layers.30.self_attn.v_proj.bias -> encoder.blocks.30.attn.value.bias encoder.blocks.30.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.30.self_attn.q_proj.weight -> encoder.blocks.30.attn.query.weight encoder.blocks.30.attn.query.weight 2 (1280, 1280) model.encoder.layers.30.self_attn.q_proj.bias -> encoder.blocks.30.attn.query.bias encoder.blocks.30.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.30.self_attn.out_proj.weight -> encoder.blocks.30.attn.out.weight encoder.blocks.30.attn.out.weight 2 (1280, 1280) model.encoder.layers.30.self_attn.out_proj.bias -> encoder.blocks.30.attn.out.bias encoder.blocks.30.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.30.self_attn_layer_norm.weight -> encoder.blocks.30.attn_ln.weight encoder.blocks.30.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.30.self_attn_layer_norm.bias -> encoder.blocks.30.attn_ln.bias encoder.blocks.30.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.30.fc1.weight -> encoder.blocks.30.mlp.0.weight encoder.blocks.30.mlp.0.weight 2 (5120, 1280) model.encoder.layers.30.fc1.bias -> encoder.blocks.30.mlp.0.bias encoder.blocks.30.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.30.fc2.weight -> encoder.blocks.30.mlp.2.weight encoder.blocks.30.mlp.2.weight 2 (1280, 5120) model.encoder.layers.30.fc2.bias -> encoder.blocks.30.mlp.2.bias encoder.blocks.30.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.30.final_layer_norm.weight -> encoder.blocks.30.mlp_ln.weight encoder.blocks.30.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.30.final_layer_norm.bias -> encoder.blocks.30.mlp_ln.bias encoder.blocks.30.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.31.self_attn.k_proj.weight -> encoder.blocks.31.attn.key.weight encoder.blocks.31.attn.key.weight 2 (1280, 1280) model.encoder.layers.31.self_attn.v_proj.weight -> encoder.blocks.31.attn.value.weight encoder.blocks.31.attn.value.weight 2 (1280, 1280) model.encoder.layers.31.self_attn.v_proj.bias -> encoder.blocks.31.attn.value.bias encoder.blocks.31.attn.value.bias 1 (1280,) Converting to float32 model.encoder.layers.31.self_attn.q_proj.weight -> encoder.blocks.31.attn.query.weight encoder.blocks.31.attn.query.weight 2 (1280, 1280) model.encoder.layers.31.self_attn.q_proj.bias -> encoder.blocks.31.attn.query.bias encoder.blocks.31.attn.query.bias 1 (1280,) Converting to float32 model.encoder.layers.31.self_attn.out_proj.weight -> encoder.blocks.31.attn.out.weight encoder.blocks.31.attn.out.weight 2 (1280, 1280) model.encoder.layers.31.self_attn.out_proj.bias -> encoder.blocks.31.attn.out.bias encoder.blocks.31.attn.out.bias 1 (1280,) Converting to float32 model.encoder.layers.31.self_attn_layer_norm.weight -> encoder.blocks.31.attn_ln.weight encoder.blocks.31.attn_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.31.self_attn_layer_norm.bias -> encoder.blocks.31.attn_ln.bias encoder.blocks.31.attn_ln.bias 1 (1280,) Converting to float32 model.encoder.layers.31.fc1.weight -> encoder.blocks.31.mlp.0.weight encoder.blocks.31.mlp.0.weight 2 (5120, 1280) model.encoder.layers.31.fc1.bias -> encoder.blocks.31.mlp.0.bias encoder.blocks.31.mlp.0.bias 1 (5120,) Converting to float32 model.encoder.layers.31.fc2.weight -> encoder.blocks.31.mlp.2.weight encoder.blocks.31.mlp.2.weight 2 (1280, 5120) model.encoder.layers.31.fc2.bias -> encoder.blocks.31.mlp.2.bias encoder.blocks.31.mlp.2.bias 1 (1280,) Converting to float32 model.encoder.layers.31.final_layer_norm.weight -> encoder.blocks.31.mlp_ln.weight encoder.blocks.31.mlp_ln.weight 1 (1280,) Converting to float32 model.encoder.layers.31.final_layer_norm.bias -> encoder.blocks.31.mlp_ln.bias encoder.blocks.31.mlp_ln.bias 1 (1280,) Converting to float32 model.encoder.layer_norm.weight -> encoder.ln_post.weight encoder.ln_post.weight 1 (1280,) Converting to float32 model.encoder.layer_norm.bias -> encoder.ln_post.bias encoder.ln_post.bias 1 (1280,) Converting to float32 model.decoder.embed_tokens.weight -> decoder.token_embedding.weight decoder.token_embedding.weight 2 (51865, 1280) model.decoder.embed_positions.weight -> decoder.positional_embedding decoder.positional_embedding 2 (448, 1280) Converting to float32 model.decoder.layers.0.self_attn.k_proj.weight -> decoder.blocks.0.attn.key.weight decoder.blocks.0.attn.key.weight 2 (1280, 1280) model.decoder.layers.0.self_attn.v_proj.weight -> decoder.blocks.0.attn.value.weight decoder.blocks.0.attn.value.weight 2 (1280, 1280) model.decoder.layers.0.self_attn.v_proj.bias -> decoder.blocks.0.attn.value.bias decoder.blocks.0.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.0.self_attn.q_proj.weight -> decoder.blocks.0.attn.query.weight decoder.blocks.0.attn.query.weight 2 (1280, 1280) model.decoder.layers.0.self_attn.q_proj.bias -> decoder.blocks.0.attn.query.bias decoder.blocks.0.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.0.self_attn.out_proj.weight -> decoder.blocks.0.attn.out.weight decoder.blocks.0.attn.out.weight 2 (1280, 1280) model.decoder.layers.0.self_attn.out_proj.bias -> decoder.blocks.0.attn.out.bias decoder.blocks.0.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.0.self_attn_layer_norm.weight -> decoder.blocks.0.attn_ln.weight decoder.blocks.0.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.0.self_attn_layer_norm.bias -> decoder.blocks.0.attn_ln.bias decoder.blocks.0.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.0.encoder_attn.k_proj.weight -> decoder.blocks.0.cross_attn.key.weight decoder.blocks.0.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.0.encoder_attn.v_proj.weight -> decoder.blocks.0.cross_attn.value.weight decoder.blocks.0.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.0.encoder_attn.v_proj.bias -> decoder.blocks.0.cross_attn.value.bias decoder.blocks.0.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.0.encoder_attn.q_proj.weight -> decoder.blocks.0.cross_attn.query.weight decoder.blocks.0.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.0.encoder_attn.q_proj.bias -> decoder.blocks.0.cross_attn.query.bias decoder.blocks.0.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.0.encoder_attn.out_proj.weight -> decoder.blocks.0.cross_attn.out.weight decoder.blocks.0.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.0.encoder_attn.out_proj.bias -> decoder.blocks.0.cross_attn.out.bias decoder.blocks.0.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.0.encoder_attn_layer_norm.weight -> decoder.blocks.0.cross_attn_ln.weight decoder.blocks.0.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.0.encoder_attn_layer_norm.bias -> decoder.blocks.0.cross_attn_ln.bias decoder.blocks.0.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.0.fc1.weight -> decoder.blocks.0.mlp.0.weight decoder.blocks.0.mlp.0.weight 2 (5120, 1280) model.decoder.layers.0.fc1.bias -> decoder.blocks.0.mlp.0.bias decoder.blocks.0.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.0.fc2.weight -> decoder.blocks.0.mlp.2.weight decoder.blocks.0.mlp.2.weight 2 (1280, 5120) model.decoder.layers.0.fc2.bias -> decoder.blocks.0.mlp.2.bias decoder.blocks.0.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.0.final_layer_norm.weight -> decoder.blocks.0.mlp_ln.weight decoder.blocks.0.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.0.final_layer_norm.bias -> decoder.blocks.0.mlp_ln.bias decoder.blocks.0.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.1.self_attn.k_proj.weight -> decoder.blocks.1.attn.key.weight decoder.blocks.1.attn.key.weight 2 (1280, 1280) model.decoder.layers.1.self_attn.v_proj.weight -> decoder.blocks.1.attn.value.weight decoder.blocks.1.attn.value.weight 2 (1280, 1280) model.decoder.layers.1.self_attn.v_proj.bias -> decoder.blocks.1.attn.value.bias decoder.blocks.1.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.1.self_attn.q_proj.weight -> decoder.blocks.1.attn.query.weight decoder.blocks.1.attn.query.weight 2 (1280, 1280) model.decoder.layers.1.self_attn.q_proj.bias -> decoder.blocks.1.attn.query.bias decoder.blocks.1.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.1.self_attn.out_proj.weight -> decoder.blocks.1.attn.out.weight decoder.blocks.1.attn.out.weight 2 (1280, 1280) model.decoder.layers.1.self_attn.out_proj.bias -> decoder.blocks.1.attn.out.bias decoder.blocks.1.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.1.self_attn_layer_norm.weight -> decoder.blocks.1.attn_ln.weight decoder.blocks.1.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.1.self_attn_layer_norm.bias -> decoder.blocks.1.attn_ln.bias decoder.blocks.1.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.1.encoder_attn.k_proj.weight -> decoder.blocks.1.cross_attn.key.weight decoder.blocks.1.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.1.encoder_attn.v_proj.weight -> decoder.blocks.1.cross_attn.value.weight decoder.blocks.1.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.1.encoder_attn.v_proj.bias -> decoder.blocks.1.cross_attn.value.bias decoder.blocks.1.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.1.encoder_attn.q_proj.weight -> decoder.blocks.1.cross_attn.query.weight decoder.blocks.1.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.1.encoder_attn.q_proj.bias -> decoder.blocks.1.cross_attn.query.bias decoder.blocks.1.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.1.encoder_attn.out_proj.weight -> decoder.blocks.1.cross_attn.out.weight decoder.blocks.1.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.1.encoder_attn.out_proj.bias -> decoder.blocks.1.cross_attn.out.bias decoder.blocks.1.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.1.encoder_attn_layer_norm.weight -> decoder.blocks.1.cross_attn_ln.weight decoder.blocks.1.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.1.encoder_attn_layer_norm.bias -> decoder.blocks.1.cross_attn_ln.bias decoder.blocks.1.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.1.fc1.weight -> decoder.blocks.1.mlp.0.weight decoder.blocks.1.mlp.0.weight 2 (5120, 1280) model.decoder.layers.1.fc1.bias -> decoder.blocks.1.mlp.0.bias decoder.blocks.1.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.1.fc2.weight -> decoder.blocks.1.mlp.2.weight decoder.blocks.1.mlp.2.weight 2 (1280, 5120) model.decoder.layers.1.fc2.bias -> decoder.blocks.1.mlp.2.bias decoder.blocks.1.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.1.final_layer_norm.weight -> decoder.blocks.1.mlp_ln.weight decoder.blocks.1.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.1.final_layer_norm.bias -> decoder.blocks.1.mlp_ln.bias decoder.blocks.1.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.2.self_attn.k_proj.weight -> decoder.blocks.2.attn.key.weight decoder.blocks.2.attn.key.weight 2 (1280, 1280) model.decoder.layers.2.self_attn.v_proj.weight -> decoder.blocks.2.attn.value.weight decoder.blocks.2.attn.value.weight 2 (1280, 1280) model.decoder.layers.2.self_attn.v_proj.bias -> decoder.blocks.2.attn.value.bias decoder.blocks.2.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.2.self_attn.q_proj.weight -> decoder.blocks.2.attn.query.weight decoder.blocks.2.attn.query.weight 2 (1280, 1280) model.decoder.layers.2.self_attn.q_proj.bias -> decoder.blocks.2.attn.query.bias decoder.blocks.2.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.2.self_attn.out_proj.weight -> decoder.blocks.2.attn.out.weight decoder.blocks.2.attn.out.weight 2 (1280, 1280) model.decoder.layers.2.self_attn.out_proj.bias -> decoder.blocks.2.attn.out.bias decoder.blocks.2.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.2.self_attn_layer_norm.weight -> decoder.blocks.2.attn_ln.weight decoder.blocks.2.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.2.self_attn_layer_norm.bias -> decoder.blocks.2.attn_ln.bias decoder.blocks.2.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.2.encoder_attn.k_proj.weight -> decoder.blocks.2.cross_attn.key.weight decoder.blocks.2.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.2.encoder_attn.v_proj.weight -> decoder.blocks.2.cross_attn.value.weight decoder.blocks.2.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.2.encoder_attn.v_proj.bias -> decoder.blocks.2.cross_attn.value.bias decoder.blocks.2.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.2.encoder_attn.q_proj.weight -> decoder.blocks.2.cross_attn.query.weight decoder.blocks.2.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.2.encoder_attn.q_proj.bias -> decoder.blocks.2.cross_attn.query.bias decoder.blocks.2.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.2.encoder_attn.out_proj.weight -> decoder.blocks.2.cross_attn.out.weight decoder.blocks.2.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.2.encoder_attn.out_proj.bias -> decoder.blocks.2.cross_attn.out.bias decoder.blocks.2.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.2.encoder_attn_layer_norm.weight -> decoder.blocks.2.cross_attn_ln.weight decoder.blocks.2.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.2.encoder_attn_layer_norm.bias -> decoder.blocks.2.cross_attn_ln.bias decoder.blocks.2.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.2.fc1.weight -> decoder.blocks.2.mlp.0.weight decoder.blocks.2.mlp.0.weight 2 (5120, 1280) model.decoder.layers.2.fc1.bias -> decoder.blocks.2.mlp.0.bias decoder.blocks.2.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.2.fc2.weight -> decoder.blocks.2.mlp.2.weight decoder.blocks.2.mlp.2.weight 2 (1280, 5120) model.decoder.layers.2.fc2.bias -> decoder.blocks.2.mlp.2.bias decoder.blocks.2.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.2.final_layer_norm.weight -> decoder.blocks.2.mlp_ln.weight decoder.blocks.2.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.2.final_layer_norm.bias -> decoder.blocks.2.mlp_ln.bias decoder.blocks.2.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.3.self_attn.k_proj.weight -> decoder.blocks.3.attn.key.weight decoder.blocks.3.attn.key.weight 2 (1280, 1280) model.decoder.layers.3.self_attn.v_proj.weight -> decoder.blocks.3.attn.value.weight decoder.blocks.3.attn.value.weight 2 (1280, 1280) model.decoder.layers.3.self_attn.v_proj.bias -> decoder.blocks.3.attn.value.bias decoder.blocks.3.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.3.self_attn.q_proj.weight -> decoder.blocks.3.attn.query.weight decoder.blocks.3.attn.query.weight 2 (1280, 1280) model.decoder.layers.3.self_attn.q_proj.bias -> decoder.blocks.3.attn.query.bias decoder.blocks.3.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.3.self_attn.out_proj.weight -> decoder.blocks.3.attn.out.weight decoder.blocks.3.attn.out.weight 2 (1280, 1280) model.decoder.layers.3.self_attn.out_proj.bias -> decoder.blocks.3.attn.out.bias decoder.blocks.3.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.3.self_attn_layer_norm.weight -> decoder.blocks.3.attn_ln.weight decoder.blocks.3.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.3.self_attn_layer_norm.bias -> decoder.blocks.3.attn_ln.bias decoder.blocks.3.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.3.encoder_attn.k_proj.weight -> decoder.blocks.3.cross_attn.key.weight decoder.blocks.3.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.3.encoder_attn.v_proj.weight -> decoder.blocks.3.cross_attn.value.weight decoder.blocks.3.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.3.encoder_attn.v_proj.bias -> decoder.blocks.3.cross_attn.value.bias decoder.blocks.3.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.3.encoder_attn.q_proj.weight -> decoder.blocks.3.cross_attn.query.weight decoder.blocks.3.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.3.encoder_attn.q_proj.bias -> decoder.blocks.3.cross_attn.query.bias decoder.blocks.3.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.3.encoder_attn.out_proj.weight -> decoder.blocks.3.cross_attn.out.weight decoder.blocks.3.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.3.encoder_attn.out_proj.bias -> decoder.blocks.3.cross_attn.out.bias decoder.blocks.3.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.3.encoder_attn_layer_norm.weight -> decoder.blocks.3.cross_attn_ln.weight decoder.blocks.3.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.3.encoder_attn_layer_norm.bias -> decoder.blocks.3.cross_attn_ln.bias decoder.blocks.3.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.3.fc1.weight -> decoder.blocks.3.mlp.0.weight decoder.blocks.3.mlp.0.weight 2 (5120, 1280) model.decoder.layers.3.fc1.bias -> decoder.blocks.3.mlp.0.bias decoder.blocks.3.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.3.fc2.weight -> decoder.blocks.3.mlp.2.weight decoder.blocks.3.mlp.2.weight 2 (1280, 5120) model.decoder.layers.3.fc2.bias -> decoder.blocks.3.mlp.2.bias decoder.blocks.3.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.3.final_layer_norm.weight -> decoder.blocks.3.mlp_ln.weight decoder.blocks.3.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.3.final_layer_norm.bias -> decoder.blocks.3.mlp_ln.bias decoder.blocks.3.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.4.self_attn.k_proj.weight -> decoder.blocks.4.attn.key.weight decoder.blocks.4.attn.key.weight 2 (1280, 1280) model.decoder.layers.4.self_attn.v_proj.weight -> decoder.blocks.4.attn.value.weight decoder.blocks.4.attn.value.weight 2 (1280, 1280) model.decoder.layers.4.self_attn.v_proj.bias -> decoder.blocks.4.attn.value.bias decoder.blocks.4.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.4.self_attn.q_proj.weight -> decoder.blocks.4.attn.query.weight decoder.blocks.4.attn.query.weight 2 (1280, 1280) model.decoder.layers.4.self_attn.q_proj.bias -> decoder.blocks.4.attn.query.bias decoder.blocks.4.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.4.self_attn.out_proj.weight -> decoder.blocks.4.attn.out.weight decoder.blocks.4.attn.out.weight 2 (1280, 1280) model.decoder.layers.4.self_attn.out_proj.bias -> decoder.blocks.4.attn.out.bias decoder.blocks.4.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.4.self_attn_layer_norm.weight -> decoder.blocks.4.attn_ln.weight decoder.blocks.4.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.4.self_attn_layer_norm.bias -> decoder.blocks.4.attn_ln.bias decoder.blocks.4.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.4.encoder_attn.k_proj.weight -> decoder.blocks.4.cross_attn.key.weight decoder.blocks.4.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.4.encoder_attn.v_proj.weight -> decoder.blocks.4.cross_attn.value.weight decoder.blocks.4.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.4.encoder_attn.v_proj.bias -> decoder.blocks.4.cross_attn.value.bias decoder.blocks.4.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.4.encoder_attn.q_proj.weight -> decoder.blocks.4.cross_attn.query.weight decoder.blocks.4.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.4.encoder_attn.q_proj.bias -> decoder.blocks.4.cross_attn.query.bias decoder.blocks.4.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.4.encoder_attn.out_proj.weight -> decoder.blocks.4.cross_attn.out.weight decoder.blocks.4.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.4.encoder_attn.out_proj.bias -> decoder.blocks.4.cross_attn.out.bias decoder.blocks.4.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.4.encoder_attn_layer_norm.weight -> decoder.blocks.4.cross_attn_ln.weight decoder.blocks.4.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.4.encoder_attn_layer_norm.bias -> decoder.blocks.4.cross_attn_ln.bias decoder.blocks.4.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.4.fc1.weight -> decoder.blocks.4.mlp.0.weight decoder.blocks.4.mlp.0.weight 2 (5120, 1280) model.decoder.layers.4.fc1.bias -> decoder.blocks.4.mlp.0.bias decoder.blocks.4.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.4.fc2.weight -> decoder.blocks.4.mlp.2.weight decoder.blocks.4.mlp.2.weight 2 (1280, 5120) model.decoder.layers.4.fc2.bias -> decoder.blocks.4.mlp.2.bias decoder.blocks.4.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.4.final_layer_norm.weight -> decoder.blocks.4.mlp_ln.weight decoder.blocks.4.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.4.final_layer_norm.bias -> decoder.blocks.4.mlp_ln.bias decoder.blocks.4.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.5.self_attn.k_proj.weight -> decoder.blocks.5.attn.key.weight decoder.blocks.5.attn.key.weight 2 (1280, 1280) model.decoder.layers.5.self_attn.v_proj.weight -> decoder.blocks.5.attn.value.weight decoder.blocks.5.attn.value.weight 2 (1280, 1280) model.decoder.layers.5.self_attn.v_proj.bias -> decoder.blocks.5.attn.value.bias decoder.blocks.5.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.5.self_attn.q_proj.weight -> decoder.blocks.5.attn.query.weight decoder.blocks.5.attn.query.weight 2 (1280, 1280) model.decoder.layers.5.self_attn.q_proj.bias -> decoder.blocks.5.attn.query.bias decoder.blocks.5.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.5.self_attn.out_proj.weight -> decoder.blocks.5.attn.out.weight decoder.blocks.5.attn.out.weight 2 (1280, 1280) model.decoder.layers.5.self_attn.out_proj.bias -> decoder.blocks.5.attn.out.bias decoder.blocks.5.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.5.self_attn_layer_norm.weight -> decoder.blocks.5.attn_ln.weight decoder.blocks.5.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.5.self_attn_layer_norm.bias -> decoder.blocks.5.attn_ln.bias decoder.blocks.5.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.5.encoder_attn.k_proj.weight -> decoder.blocks.5.cross_attn.key.weight decoder.blocks.5.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.5.encoder_attn.v_proj.weight -> decoder.blocks.5.cross_attn.value.weight decoder.blocks.5.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.5.encoder_attn.v_proj.bias -> decoder.blocks.5.cross_attn.value.bias decoder.blocks.5.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.5.encoder_attn.q_proj.weight -> decoder.blocks.5.cross_attn.query.weight decoder.blocks.5.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.5.encoder_attn.q_proj.bias -> decoder.blocks.5.cross_attn.query.bias decoder.blocks.5.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.5.encoder_attn.out_proj.weight -> decoder.blocks.5.cross_attn.out.weight decoder.blocks.5.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.5.encoder_attn.out_proj.bias -> decoder.blocks.5.cross_attn.out.bias decoder.blocks.5.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.5.encoder_attn_layer_norm.weight -> decoder.blocks.5.cross_attn_ln.weight decoder.blocks.5.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.5.encoder_attn_layer_norm.bias -> decoder.blocks.5.cross_attn_ln.bias decoder.blocks.5.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.5.fc1.weight -> decoder.blocks.5.mlp.0.weight decoder.blocks.5.mlp.0.weight 2 (5120, 1280) model.decoder.layers.5.fc1.bias -> decoder.blocks.5.mlp.0.bias decoder.blocks.5.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.5.fc2.weight -> decoder.blocks.5.mlp.2.weight decoder.blocks.5.mlp.2.weight 2 (1280, 5120) model.decoder.layers.5.fc2.bias -> decoder.blocks.5.mlp.2.bias decoder.blocks.5.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.5.final_layer_norm.weight -> decoder.blocks.5.mlp_ln.weight decoder.blocks.5.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.5.final_layer_norm.bias -> decoder.blocks.5.mlp_ln.bias decoder.blocks.5.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.6.self_attn.k_proj.weight -> decoder.blocks.6.attn.key.weight decoder.blocks.6.attn.key.weight 2 (1280, 1280) model.decoder.layers.6.self_attn.v_proj.weight -> decoder.blocks.6.attn.value.weight decoder.blocks.6.attn.value.weight 2 (1280, 1280) model.decoder.layers.6.self_attn.v_proj.bias -> decoder.blocks.6.attn.value.bias decoder.blocks.6.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.6.self_attn.q_proj.weight -> decoder.blocks.6.attn.query.weight decoder.blocks.6.attn.query.weight 2 (1280, 1280) model.decoder.layers.6.self_attn.q_proj.bias -> decoder.blocks.6.attn.query.bias decoder.blocks.6.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.6.self_attn.out_proj.weight -> decoder.blocks.6.attn.out.weight decoder.blocks.6.attn.out.weight 2 (1280, 1280) model.decoder.layers.6.self_attn.out_proj.bias -> decoder.blocks.6.attn.out.bias decoder.blocks.6.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.6.self_attn_layer_norm.weight -> decoder.blocks.6.attn_ln.weight decoder.blocks.6.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.6.self_attn_layer_norm.bias -> decoder.blocks.6.attn_ln.bias decoder.blocks.6.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.6.encoder_attn.k_proj.weight -> decoder.blocks.6.cross_attn.key.weight decoder.blocks.6.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.6.encoder_attn.v_proj.weight -> decoder.blocks.6.cross_attn.value.weight decoder.blocks.6.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.6.encoder_attn.v_proj.bias -> decoder.blocks.6.cross_attn.value.bias decoder.blocks.6.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.6.encoder_attn.q_proj.weight -> decoder.blocks.6.cross_attn.query.weight decoder.blocks.6.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.6.encoder_attn.q_proj.bias -> decoder.blocks.6.cross_attn.query.bias decoder.blocks.6.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.6.encoder_attn.out_proj.weight -> decoder.blocks.6.cross_attn.out.weight decoder.blocks.6.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.6.encoder_attn.out_proj.bias -> decoder.blocks.6.cross_attn.out.bias decoder.blocks.6.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.6.encoder_attn_layer_norm.weight -> decoder.blocks.6.cross_attn_ln.weight decoder.blocks.6.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.6.encoder_attn_layer_norm.bias -> decoder.blocks.6.cross_attn_ln.bias decoder.blocks.6.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.6.fc1.weight -> decoder.blocks.6.mlp.0.weight decoder.blocks.6.mlp.0.weight 2 (5120, 1280) model.decoder.layers.6.fc1.bias -> decoder.blocks.6.mlp.0.bias decoder.blocks.6.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.6.fc2.weight -> decoder.blocks.6.mlp.2.weight decoder.blocks.6.mlp.2.weight 2 (1280, 5120) model.decoder.layers.6.fc2.bias -> decoder.blocks.6.mlp.2.bias decoder.blocks.6.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.6.final_layer_norm.weight -> decoder.blocks.6.mlp_ln.weight decoder.blocks.6.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.6.final_layer_norm.bias -> decoder.blocks.6.mlp_ln.bias decoder.blocks.6.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.7.self_attn.k_proj.weight -> decoder.blocks.7.attn.key.weight decoder.blocks.7.attn.key.weight 2 (1280, 1280) model.decoder.layers.7.self_attn.v_proj.weight -> decoder.blocks.7.attn.value.weight decoder.blocks.7.attn.value.weight 2 (1280, 1280) model.decoder.layers.7.self_attn.v_proj.bias -> decoder.blocks.7.attn.value.bias decoder.blocks.7.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.7.self_attn.q_proj.weight -> decoder.blocks.7.attn.query.weight decoder.blocks.7.attn.query.weight 2 (1280, 1280) model.decoder.layers.7.self_attn.q_proj.bias -> decoder.blocks.7.attn.query.bias decoder.blocks.7.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.7.self_attn.out_proj.weight -> decoder.blocks.7.attn.out.weight decoder.blocks.7.attn.out.weight 2 (1280, 1280) model.decoder.layers.7.self_attn.out_proj.bias -> decoder.blocks.7.attn.out.bias decoder.blocks.7.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.7.self_attn_layer_norm.weight -> decoder.blocks.7.attn_ln.weight decoder.blocks.7.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.7.self_attn_layer_norm.bias -> decoder.blocks.7.attn_ln.bias decoder.blocks.7.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.7.encoder_attn.k_proj.weight -> decoder.blocks.7.cross_attn.key.weight decoder.blocks.7.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.7.encoder_attn.v_proj.weight -> decoder.blocks.7.cross_attn.value.weight decoder.blocks.7.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.7.encoder_attn.v_proj.bias -> decoder.blocks.7.cross_attn.value.bias decoder.blocks.7.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.7.encoder_attn.q_proj.weight -> decoder.blocks.7.cross_attn.query.weight decoder.blocks.7.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.7.encoder_attn.q_proj.bias -> decoder.blocks.7.cross_attn.query.bias decoder.blocks.7.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.7.encoder_attn.out_proj.weight -> decoder.blocks.7.cross_attn.out.weight decoder.blocks.7.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.7.encoder_attn.out_proj.bias -> decoder.blocks.7.cross_attn.out.bias decoder.blocks.7.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.7.encoder_attn_layer_norm.weight -> decoder.blocks.7.cross_attn_ln.weight decoder.blocks.7.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.7.encoder_attn_layer_norm.bias -> decoder.blocks.7.cross_attn_ln.bias decoder.blocks.7.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.7.fc1.weight -> decoder.blocks.7.mlp.0.weight decoder.blocks.7.mlp.0.weight 2 (5120, 1280) model.decoder.layers.7.fc1.bias -> decoder.blocks.7.mlp.0.bias decoder.blocks.7.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.7.fc2.weight -> decoder.blocks.7.mlp.2.weight decoder.blocks.7.mlp.2.weight 2 (1280, 5120) model.decoder.layers.7.fc2.bias -> decoder.blocks.7.mlp.2.bias decoder.blocks.7.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.7.final_layer_norm.weight -> decoder.blocks.7.mlp_ln.weight decoder.blocks.7.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.7.final_layer_norm.bias -> decoder.blocks.7.mlp_ln.bias decoder.blocks.7.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.8.self_attn.k_proj.weight -> decoder.blocks.8.attn.key.weight decoder.blocks.8.attn.key.weight 2 (1280, 1280) model.decoder.layers.8.self_attn.v_proj.weight -> decoder.blocks.8.attn.value.weight decoder.blocks.8.attn.value.weight 2 (1280, 1280) model.decoder.layers.8.self_attn.v_proj.bias -> decoder.blocks.8.attn.value.bias decoder.blocks.8.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.8.self_attn.q_proj.weight -> decoder.blocks.8.attn.query.weight decoder.blocks.8.attn.query.weight 2 (1280, 1280) model.decoder.layers.8.self_attn.q_proj.bias -> decoder.blocks.8.attn.query.bias decoder.blocks.8.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.8.self_attn.out_proj.weight -> decoder.blocks.8.attn.out.weight decoder.blocks.8.attn.out.weight 2 (1280, 1280) model.decoder.layers.8.self_attn.out_proj.bias -> decoder.blocks.8.attn.out.bias decoder.blocks.8.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.8.self_attn_layer_norm.weight -> decoder.blocks.8.attn_ln.weight decoder.blocks.8.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.8.self_attn_layer_norm.bias -> decoder.blocks.8.attn_ln.bias decoder.blocks.8.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.8.encoder_attn.k_proj.weight -> decoder.blocks.8.cross_attn.key.weight decoder.blocks.8.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.8.encoder_attn.v_proj.weight -> decoder.blocks.8.cross_attn.value.weight decoder.blocks.8.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.8.encoder_attn.v_proj.bias -> decoder.blocks.8.cross_attn.value.bias decoder.blocks.8.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.8.encoder_attn.q_proj.weight -> decoder.blocks.8.cross_attn.query.weight decoder.blocks.8.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.8.encoder_attn.q_proj.bias -> decoder.blocks.8.cross_attn.query.bias decoder.blocks.8.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.8.encoder_attn.out_proj.weight -> decoder.blocks.8.cross_attn.out.weight decoder.blocks.8.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.8.encoder_attn.out_proj.bias -> decoder.blocks.8.cross_attn.out.bias decoder.blocks.8.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.8.encoder_attn_layer_norm.weight -> decoder.blocks.8.cross_attn_ln.weight decoder.blocks.8.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.8.encoder_attn_layer_norm.bias -> decoder.blocks.8.cross_attn_ln.bias decoder.blocks.8.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.8.fc1.weight -> decoder.blocks.8.mlp.0.weight decoder.blocks.8.mlp.0.weight 2 (5120, 1280) model.decoder.layers.8.fc1.bias -> decoder.blocks.8.mlp.0.bias decoder.blocks.8.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.8.fc2.weight -> decoder.blocks.8.mlp.2.weight decoder.blocks.8.mlp.2.weight 2 (1280, 5120) model.decoder.layers.8.fc2.bias -> decoder.blocks.8.mlp.2.bias decoder.blocks.8.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.8.final_layer_norm.weight -> decoder.blocks.8.mlp_ln.weight decoder.blocks.8.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.8.final_layer_norm.bias -> decoder.blocks.8.mlp_ln.bias decoder.blocks.8.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.9.self_attn.k_proj.weight -> decoder.blocks.9.attn.key.weight decoder.blocks.9.attn.key.weight 2 (1280, 1280) model.decoder.layers.9.self_attn.v_proj.weight -> decoder.blocks.9.attn.value.weight decoder.blocks.9.attn.value.weight 2 (1280, 1280) model.decoder.layers.9.self_attn.v_proj.bias -> decoder.blocks.9.attn.value.bias decoder.blocks.9.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.9.self_attn.q_proj.weight -> decoder.blocks.9.attn.query.weight decoder.blocks.9.attn.query.weight 2 (1280, 1280) model.decoder.layers.9.self_attn.q_proj.bias -> decoder.blocks.9.attn.query.bias decoder.blocks.9.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.9.self_attn.out_proj.weight -> decoder.blocks.9.attn.out.weight decoder.blocks.9.attn.out.weight 2 (1280, 1280) model.decoder.layers.9.self_attn.out_proj.bias -> decoder.blocks.9.attn.out.bias decoder.blocks.9.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.9.self_attn_layer_norm.weight -> decoder.blocks.9.attn_ln.weight decoder.blocks.9.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.9.self_attn_layer_norm.bias -> decoder.blocks.9.attn_ln.bias decoder.blocks.9.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.9.encoder_attn.k_proj.weight -> decoder.blocks.9.cross_attn.key.weight decoder.blocks.9.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.9.encoder_attn.v_proj.weight -> decoder.blocks.9.cross_attn.value.weight decoder.blocks.9.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.9.encoder_attn.v_proj.bias -> decoder.blocks.9.cross_attn.value.bias decoder.blocks.9.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.9.encoder_attn.q_proj.weight -> decoder.blocks.9.cross_attn.query.weight decoder.blocks.9.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.9.encoder_attn.q_proj.bias -> decoder.blocks.9.cross_attn.query.bias decoder.blocks.9.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.9.encoder_attn.out_proj.weight -> decoder.blocks.9.cross_attn.out.weight decoder.blocks.9.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.9.encoder_attn.out_proj.bias -> decoder.blocks.9.cross_attn.out.bias decoder.blocks.9.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.9.encoder_attn_layer_norm.weight -> decoder.blocks.9.cross_attn_ln.weight decoder.blocks.9.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.9.encoder_attn_layer_norm.bias -> decoder.blocks.9.cross_attn_ln.bias decoder.blocks.9.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.9.fc1.weight -> decoder.blocks.9.mlp.0.weight decoder.blocks.9.mlp.0.weight 2 (5120, 1280) model.decoder.layers.9.fc1.bias -> decoder.blocks.9.mlp.0.bias decoder.blocks.9.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.9.fc2.weight -> decoder.blocks.9.mlp.2.weight decoder.blocks.9.mlp.2.weight 2 (1280, 5120) model.decoder.layers.9.fc2.bias -> decoder.blocks.9.mlp.2.bias decoder.blocks.9.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.9.final_layer_norm.weight -> decoder.blocks.9.mlp_ln.weight decoder.blocks.9.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.9.final_layer_norm.bias -> decoder.blocks.9.mlp_ln.bias decoder.blocks.9.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.10.self_attn.k_proj.weight -> decoder.blocks.10.attn.key.weight decoder.blocks.10.attn.key.weight 2 (1280, 1280) model.decoder.layers.10.self_attn.v_proj.weight -> decoder.blocks.10.attn.value.weight decoder.blocks.10.attn.value.weight 2 (1280, 1280) model.decoder.layers.10.self_attn.v_proj.bias -> decoder.blocks.10.attn.value.bias decoder.blocks.10.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.10.self_attn.q_proj.weight -> decoder.blocks.10.attn.query.weight decoder.blocks.10.attn.query.weight 2 (1280, 1280) model.decoder.layers.10.self_attn.q_proj.bias -> decoder.blocks.10.attn.query.bias decoder.blocks.10.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.10.self_attn.out_proj.weight -> decoder.blocks.10.attn.out.weight decoder.blocks.10.attn.out.weight 2 (1280, 1280) model.decoder.layers.10.self_attn.out_proj.bias -> decoder.blocks.10.attn.out.bias decoder.blocks.10.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.10.self_attn_layer_norm.weight -> decoder.blocks.10.attn_ln.weight decoder.blocks.10.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.10.self_attn_layer_norm.bias -> decoder.blocks.10.attn_ln.bias decoder.blocks.10.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.10.encoder_attn.k_proj.weight -> decoder.blocks.10.cross_attn.key.weight decoder.blocks.10.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.10.encoder_attn.v_proj.weight -> decoder.blocks.10.cross_attn.value.weight decoder.blocks.10.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.10.encoder_attn.v_proj.bias -> decoder.blocks.10.cross_attn.value.bias decoder.blocks.10.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.10.encoder_attn.q_proj.weight -> decoder.blocks.10.cross_attn.query.weight decoder.blocks.10.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.10.encoder_attn.q_proj.bias -> decoder.blocks.10.cross_attn.query.bias decoder.blocks.10.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.10.encoder_attn.out_proj.weight -> decoder.blocks.10.cross_attn.out.weight decoder.blocks.10.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.10.encoder_attn.out_proj.bias -> decoder.blocks.10.cross_attn.out.bias decoder.blocks.10.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.10.encoder_attn_layer_norm.weight -> decoder.blocks.10.cross_attn_ln.weight decoder.blocks.10.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.10.encoder_attn_layer_norm.bias -> decoder.blocks.10.cross_attn_ln.bias decoder.blocks.10.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.10.fc1.weight -> decoder.blocks.10.mlp.0.weight decoder.blocks.10.mlp.0.weight 2 (5120, 1280) model.decoder.layers.10.fc1.bias -> decoder.blocks.10.mlp.0.bias decoder.blocks.10.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.10.fc2.weight -> decoder.blocks.10.mlp.2.weight decoder.blocks.10.mlp.2.weight 2 (1280, 5120) model.decoder.layers.10.fc2.bias -> decoder.blocks.10.mlp.2.bias decoder.blocks.10.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.10.final_layer_norm.weight -> decoder.blocks.10.mlp_ln.weight decoder.blocks.10.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.10.final_layer_norm.bias -> decoder.blocks.10.mlp_ln.bias decoder.blocks.10.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.11.self_attn.k_proj.weight -> decoder.blocks.11.attn.key.weight decoder.blocks.11.attn.key.weight 2 (1280, 1280) model.decoder.layers.11.self_attn.v_proj.weight -> decoder.blocks.11.attn.value.weight decoder.blocks.11.attn.value.weight 2 (1280, 1280) model.decoder.layers.11.self_attn.v_proj.bias -> decoder.blocks.11.attn.value.bias decoder.blocks.11.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.11.self_attn.q_proj.weight -> decoder.blocks.11.attn.query.weight decoder.blocks.11.attn.query.weight 2 (1280, 1280) model.decoder.layers.11.self_attn.q_proj.bias -> decoder.blocks.11.attn.query.bias decoder.blocks.11.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.11.self_attn.out_proj.weight -> decoder.blocks.11.attn.out.weight decoder.blocks.11.attn.out.weight 2 (1280, 1280) model.decoder.layers.11.self_attn.out_proj.bias -> decoder.blocks.11.attn.out.bias decoder.blocks.11.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.11.self_attn_layer_norm.weight -> decoder.blocks.11.attn_ln.weight decoder.blocks.11.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.11.self_attn_layer_norm.bias -> decoder.blocks.11.attn_ln.bias decoder.blocks.11.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.11.encoder_attn.k_proj.weight -> decoder.blocks.11.cross_attn.key.weight decoder.blocks.11.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.11.encoder_attn.v_proj.weight -> decoder.blocks.11.cross_attn.value.weight decoder.blocks.11.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.11.encoder_attn.v_proj.bias -> decoder.blocks.11.cross_attn.value.bias decoder.blocks.11.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.11.encoder_attn.q_proj.weight -> decoder.blocks.11.cross_attn.query.weight decoder.blocks.11.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.11.encoder_attn.q_proj.bias -> decoder.blocks.11.cross_attn.query.bias decoder.blocks.11.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.11.encoder_attn.out_proj.weight -> decoder.blocks.11.cross_attn.out.weight decoder.blocks.11.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.11.encoder_attn.out_proj.bias -> decoder.blocks.11.cross_attn.out.bias decoder.blocks.11.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.11.encoder_attn_layer_norm.weight -> decoder.blocks.11.cross_attn_ln.weight decoder.blocks.11.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.11.encoder_attn_layer_norm.bias -> decoder.blocks.11.cross_attn_ln.bias decoder.blocks.11.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.11.fc1.weight -> decoder.blocks.11.mlp.0.weight decoder.blocks.11.mlp.0.weight 2 (5120, 1280) model.decoder.layers.11.fc1.bias -> decoder.blocks.11.mlp.0.bias decoder.blocks.11.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.11.fc2.weight -> decoder.blocks.11.mlp.2.weight decoder.blocks.11.mlp.2.weight 2 (1280, 5120) model.decoder.layers.11.fc2.bias -> decoder.blocks.11.mlp.2.bias decoder.blocks.11.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.11.final_layer_norm.weight -> decoder.blocks.11.mlp_ln.weight decoder.blocks.11.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.11.final_layer_norm.bias -> decoder.blocks.11.mlp_ln.bias decoder.blocks.11.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.12.self_attn.k_proj.weight -> decoder.blocks.12.attn.key.weight decoder.blocks.12.attn.key.weight 2 (1280, 1280) model.decoder.layers.12.self_attn.v_proj.weight -> decoder.blocks.12.attn.value.weight decoder.blocks.12.attn.value.weight 2 (1280, 1280) model.decoder.layers.12.self_attn.v_proj.bias -> decoder.blocks.12.attn.value.bias decoder.blocks.12.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.12.self_attn.q_proj.weight -> decoder.blocks.12.attn.query.weight decoder.blocks.12.attn.query.weight 2 (1280, 1280) model.decoder.layers.12.self_attn.q_proj.bias -> decoder.blocks.12.attn.query.bias decoder.blocks.12.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.12.self_attn.out_proj.weight -> decoder.blocks.12.attn.out.weight decoder.blocks.12.attn.out.weight 2 (1280, 1280) model.decoder.layers.12.self_attn.out_proj.bias -> decoder.blocks.12.attn.out.bias decoder.blocks.12.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.12.self_attn_layer_norm.weight -> decoder.blocks.12.attn_ln.weight decoder.blocks.12.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.12.self_attn_layer_norm.bias -> decoder.blocks.12.attn_ln.bias decoder.blocks.12.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.12.encoder_attn.k_proj.weight -> decoder.blocks.12.cross_attn.key.weight decoder.blocks.12.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.12.encoder_attn.v_proj.weight -> decoder.blocks.12.cross_attn.value.weight decoder.blocks.12.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.12.encoder_attn.v_proj.bias -> decoder.blocks.12.cross_attn.value.bias decoder.blocks.12.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.12.encoder_attn.q_proj.weight -> decoder.blocks.12.cross_attn.query.weight decoder.blocks.12.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.12.encoder_attn.q_proj.bias -> decoder.blocks.12.cross_attn.query.bias decoder.blocks.12.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.12.encoder_attn.out_proj.weight -> decoder.blocks.12.cross_attn.out.weight decoder.blocks.12.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.12.encoder_attn.out_proj.bias -> decoder.blocks.12.cross_attn.out.bias decoder.blocks.12.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.12.encoder_attn_layer_norm.weight -> decoder.blocks.12.cross_attn_ln.weight decoder.blocks.12.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.12.encoder_attn_layer_norm.bias -> decoder.blocks.12.cross_attn_ln.bias decoder.blocks.12.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.12.fc1.weight -> decoder.blocks.12.mlp.0.weight decoder.blocks.12.mlp.0.weight 2 (5120, 1280) model.decoder.layers.12.fc1.bias -> decoder.blocks.12.mlp.0.bias decoder.blocks.12.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.12.fc2.weight -> decoder.blocks.12.mlp.2.weight decoder.blocks.12.mlp.2.weight 2 (1280, 5120) model.decoder.layers.12.fc2.bias -> decoder.blocks.12.mlp.2.bias decoder.blocks.12.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.12.final_layer_norm.weight -> decoder.blocks.12.mlp_ln.weight decoder.blocks.12.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.12.final_layer_norm.bias -> decoder.blocks.12.mlp_ln.bias decoder.blocks.12.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.13.self_attn.k_proj.weight -> decoder.blocks.13.attn.key.weight decoder.blocks.13.attn.key.weight 2 (1280, 1280) model.decoder.layers.13.self_attn.v_proj.weight -> decoder.blocks.13.attn.value.weight decoder.blocks.13.attn.value.weight 2 (1280, 1280) model.decoder.layers.13.self_attn.v_proj.bias -> decoder.blocks.13.attn.value.bias decoder.blocks.13.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.13.self_attn.q_proj.weight -> decoder.blocks.13.attn.query.weight decoder.blocks.13.attn.query.weight 2 (1280, 1280) model.decoder.layers.13.self_attn.q_proj.bias -> decoder.blocks.13.attn.query.bias decoder.blocks.13.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.13.self_attn.out_proj.weight -> decoder.blocks.13.attn.out.weight decoder.blocks.13.attn.out.weight 2 (1280, 1280) model.decoder.layers.13.self_attn.out_proj.bias -> decoder.blocks.13.attn.out.bias decoder.blocks.13.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.13.self_attn_layer_norm.weight -> decoder.blocks.13.attn_ln.weight decoder.blocks.13.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.13.self_attn_layer_norm.bias -> decoder.blocks.13.attn_ln.bias decoder.blocks.13.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.13.encoder_attn.k_proj.weight -> decoder.blocks.13.cross_attn.key.weight decoder.blocks.13.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.13.encoder_attn.v_proj.weight -> decoder.blocks.13.cross_attn.value.weight decoder.blocks.13.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.13.encoder_attn.v_proj.bias -> decoder.blocks.13.cross_attn.value.bias decoder.blocks.13.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.13.encoder_attn.q_proj.weight -> decoder.blocks.13.cross_attn.query.weight decoder.blocks.13.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.13.encoder_attn.q_proj.bias -> decoder.blocks.13.cross_attn.query.bias decoder.blocks.13.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.13.encoder_attn.out_proj.weight -> decoder.blocks.13.cross_attn.out.weight decoder.blocks.13.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.13.encoder_attn.out_proj.bias -> decoder.blocks.13.cross_attn.out.bias decoder.blocks.13.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.13.encoder_attn_layer_norm.weight -> decoder.blocks.13.cross_attn_ln.weight decoder.blocks.13.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.13.encoder_attn_layer_norm.bias -> decoder.blocks.13.cross_attn_ln.bias decoder.blocks.13.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.13.fc1.weight -> decoder.blocks.13.mlp.0.weight decoder.blocks.13.mlp.0.weight 2 (5120, 1280) model.decoder.layers.13.fc1.bias -> decoder.blocks.13.mlp.0.bias decoder.blocks.13.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.13.fc2.weight -> decoder.blocks.13.mlp.2.weight decoder.blocks.13.mlp.2.weight 2 (1280, 5120) model.decoder.layers.13.fc2.bias -> decoder.blocks.13.mlp.2.bias decoder.blocks.13.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.13.final_layer_norm.weight -> decoder.blocks.13.mlp_ln.weight decoder.blocks.13.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.13.final_layer_norm.bias -> decoder.blocks.13.mlp_ln.bias decoder.blocks.13.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.14.self_attn.k_proj.weight -> decoder.blocks.14.attn.key.weight decoder.blocks.14.attn.key.weight 2 (1280, 1280) model.decoder.layers.14.self_attn.v_proj.weight -> decoder.blocks.14.attn.value.weight decoder.blocks.14.attn.value.weight 2 (1280, 1280) model.decoder.layers.14.self_attn.v_proj.bias -> decoder.blocks.14.attn.value.bias decoder.blocks.14.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.14.self_attn.q_proj.weight -> decoder.blocks.14.attn.query.weight decoder.blocks.14.attn.query.weight 2 (1280, 1280) model.decoder.layers.14.self_attn.q_proj.bias -> decoder.blocks.14.attn.query.bias decoder.blocks.14.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.14.self_attn.out_proj.weight -> decoder.blocks.14.attn.out.weight decoder.blocks.14.attn.out.weight 2 (1280, 1280) model.decoder.layers.14.self_attn.out_proj.bias -> decoder.blocks.14.attn.out.bias decoder.blocks.14.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.14.self_attn_layer_norm.weight -> decoder.blocks.14.attn_ln.weight decoder.blocks.14.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.14.self_attn_layer_norm.bias -> decoder.blocks.14.attn_ln.bias decoder.blocks.14.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.14.encoder_attn.k_proj.weight -> decoder.blocks.14.cross_attn.key.weight decoder.blocks.14.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.14.encoder_attn.v_proj.weight -> decoder.blocks.14.cross_attn.value.weight decoder.blocks.14.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.14.encoder_attn.v_proj.bias -> decoder.blocks.14.cross_attn.value.bias decoder.blocks.14.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.14.encoder_attn.q_proj.weight -> decoder.blocks.14.cross_attn.query.weight decoder.blocks.14.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.14.encoder_attn.q_proj.bias -> decoder.blocks.14.cross_attn.query.bias decoder.blocks.14.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.14.encoder_attn.out_proj.weight -> decoder.blocks.14.cross_attn.out.weight decoder.blocks.14.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.14.encoder_attn.out_proj.bias -> decoder.blocks.14.cross_attn.out.bias decoder.blocks.14.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.14.encoder_attn_layer_norm.weight -> decoder.blocks.14.cross_attn_ln.weight decoder.blocks.14.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.14.encoder_attn_layer_norm.bias -> decoder.blocks.14.cross_attn_ln.bias decoder.blocks.14.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.14.fc1.weight -> decoder.blocks.14.mlp.0.weight decoder.blocks.14.mlp.0.weight 2 (5120, 1280) model.decoder.layers.14.fc1.bias -> decoder.blocks.14.mlp.0.bias decoder.blocks.14.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.14.fc2.weight -> decoder.blocks.14.mlp.2.weight decoder.blocks.14.mlp.2.weight 2 (1280, 5120) model.decoder.layers.14.fc2.bias -> decoder.blocks.14.mlp.2.bias decoder.blocks.14.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.14.final_layer_norm.weight -> decoder.blocks.14.mlp_ln.weight decoder.blocks.14.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.14.final_layer_norm.bias -> decoder.blocks.14.mlp_ln.bias decoder.blocks.14.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.15.self_attn.k_proj.weight -> decoder.blocks.15.attn.key.weight decoder.blocks.15.attn.key.weight 2 (1280, 1280) model.decoder.layers.15.self_attn.v_proj.weight -> decoder.blocks.15.attn.value.weight decoder.blocks.15.attn.value.weight 2 (1280, 1280) model.decoder.layers.15.self_attn.v_proj.bias -> decoder.blocks.15.attn.value.bias decoder.blocks.15.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.15.self_attn.q_proj.weight -> decoder.blocks.15.attn.query.weight decoder.blocks.15.attn.query.weight 2 (1280, 1280) model.decoder.layers.15.self_attn.q_proj.bias -> decoder.blocks.15.attn.query.bias decoder.blocks.15.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.15.self_attn.out_proj.weight -> decoder.blocks.15.attn.out.weight decoder.blocks.15.attn.out.weight 2 (1280, 1280) model.decoder.layers.15.self_attn.out_proj.bias -> decoder.blocks.15.attn.out.bias decoder.blocks.15.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.15.self_attn_layer_norm.weight -> decoder.blocks.15.attn_ln.weight decoder.blocks.15.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.15.self_attn_layer_norm.bias -> decoder.blocks.15.attn_ln.bias decoder.blocks.15.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.15.encoder_attn.k_proj.weight -> decoder.blocks.15.cross_attn.key.weight decoder.blocks.15.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.15.encoder_attn.v_proj.weight -> decoder.blocks.15.cross_attn.value.weight decoder.blocks.15.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.15.encoder_attn.v_proj.bias -> decoder.blocks.15.cross_attn.value.bias decoder.blocks.15.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.15.encoder_attn.q_proj.weight -> decoder.blocks.15.cross_attn.query.weight decoder.blocks.15.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.15.encoder_attn.q_proj.bias -> decoder.blocks.15.cross_attn.query.bias decoder.blocks.15.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.15.encoder_attn.out_proj.weight -> decoder.blocks.15.cross_attn.out.weight decoder.blocks.15.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.15.encoder_attn.out_proj.bias -> decoder.blocks.15.cross_attn.out.bias decoder.blocks.15.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.15.encoder_attn_layer_norm.weight -> decoder.blocks.15.cross_attn_ln.weight decoder.blocks.15.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.15.encoder_attn_layer_norm.bias -> decoder.blocks.15.cross_attn_ln.bias decoder.blocks.15.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.15.fc1.weight -> decoder.blocks.15.mlp.0.weight decoder.blocks.15.mlp.0.weight 2 (5120, 1280) model.decoder.layers.15.fc1.bias -> decoder.blocks.15.mlp.0.bias decoder.blocks.15.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.15.fc2.weight -> decoder.blocks.15.mlp.2.weight decoder.blocks.15.mlp.2.weight 2 (1280, 5120) model.decoder.layers.15.fc2.bias -> decoder.blocks.15.mlp.2.bias decoder.blocks.15.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.15.final_layer_norm.weight -> decoder.blocks.15.mlp_ln.weight decoder.blocks.15.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.15.final_layer_norm.bias -> decoder.blocks.15.mlp_ln.bias decoder.blocks.15.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.16.self_attn.k_proj.weight -> decoder.blocks.16.attn.key.weight decoder.blocks.16.attn.key.weight 2 (1280, 1280) model.decoder.layers.16.self_attn.v_proj.weight -> decoder.blocks.16.attn.value.weight decoder.blocks.16.attn.value.weight 2 (1280, 1280) model.decoder.layers.16.self_attn.v_proj.bias -> decoder.blocks.16.attn.value.bias decoder.blocks.16.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.16.self_attn.q_proj.weight -> decoder.blocks.16.attn.query.weight decoder.blocks.16.attn.query.weight 2 (1280, 1280) model.decoder.layers.16.self_attn.q_proj.bias -> decoder.blocks.16.attn.query.bias decoder.blocks.16.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.16.self_attn.out_proj.weight -> decoder.blocks.16.attn.out.weight decoder.blocks.16.attn.out.weight 2 (1280, 1280) model.decoder.layers.16.self_attn.out_proj.bias -> decoder.blocks.16.attn.out.bias decoder.blocks.16.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.16.self_attn_layer_norm.weight -> decoder.blocks.16.attn_ln.weight decoder.blocks.16.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.16.self_attn_layer_norm.bias -> decoder.blocks.16.attn_ln.bias decoder.blocks.16.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.16.encoder_attn.k_proj.weight -> decoder.blocks.16.cross_attn.key.weight decoder.blocks.16.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.16.encoder_attn.v_proj.weight -> decoder.blocks.16.cross_attn.value.weight decoder.blocks.16.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.16.encoder_attn.v_proj.bias -> decoder.blocks.16.cross_attn.value.bias decoder.blocks.16.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.16.encoder_attn.q_proj.weight -> decoder.blocks.16.cross_attn.query.weight decoder.blocks.16.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.16.encoder_attn.q_proj.bias -> decoder.blocks.16.cross_attn.query.bias decoder.blocks.16.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.16.encoder_attn.out_proj.weight -> decoder.blocks.16.cross_attn.out.weight decoder.blocks.16.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.16.encoder_attn.out_proj.bias -> decoder.blocks.16.cross_attn.out.bias decoder.blocks.16.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.16.encoder_attn_layer_norm.weight -> decoder.blocks.16.cross_attn_ln.weight decoder.blocks.16.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.16.encoder_attn_layer_norm.bias -> decoder.blocks.16.cross_attn_ln.bias decoder.blocks.16.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.16.fc1.weight -> decoder.blocks.16.mlp.0.weight decoder.blocks.16.mlp.0.weight 2 (5120, 1280) model.decoder.layers.16.fc1.bias -> decoder.blocks.16.mlp.0.bias decoder.blocks.16.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.16.fc2.weight -> decoder.blocks.16.mlp.2.weight decoder.blocks.16.mlp.2.weight 2 (1280, 5120) model.decoder.layers.16.fc2.bias -> decoder.blocks.16.mlp.2.bias decoder.blocks.16.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.16.final_layer_norm.weight -> decoder.blocks.16.mlp_ln.weight decoder.blocks.16.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.16.final_layer_norm.bias -> decoder.blocks.16.mlp_ln.bias decoder.blocks.16.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.17.self_attn.k_proj.weight -> decoder.blocks.17.attn.key.weight decoder.blocks.17.attn.key.weight 2 (1280, 1280) model.decoder.layers.17.self_attn.v_proj.weight -> decoder.blocks.17.attn.value.weight decoder.blocks.17.attn.value.weight 2 (1280, 1280) model.decoder.layers.17.self_attn.v_proj.bias -> decoder.blocks.17.attn.value.bias decoder.blocks.17.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.17.self_attn.q_proj.weight -> decoder.blocks.17.attn.query.weight decoder.blocks.17.attn.query.weight 2 (1280, 1280) model.decoder.layers.17.self_attn.q_proj.bias -> decoder.blocks.17.attn.query.bias decoder.blocks.17.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.17.self_attn.out_proj.weight -> decoder.blocks.17.attn.out.weight decoder.blocks.17.attn.out.weight 2 (1280, 1280) model.decoder.layers.17.self_attn.out_proj.bias -> decoder.blocks.17.attn.out.bias decoder.blocks.17.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.17.self_attn_layer_norm.weight -> decoder.blocks.17.attn_ln.weight decoder.blocks.17.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.17.self_attn_layer_norm.bias -> decoder.blocks.17.attn_ln.bias decoder.blocks.17.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.17.encoder_attn.k_proj.weight -> decoder.blocks.17.cross_attn.key.weight decoder.blocks.17.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.17.encoder_attn.v_proj.weight -> decoder.blocks.17.cross_attn.value.weight decoder.blocks.17.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.17.encoder_attn.v_proj.bias -> decoder.blocks.17.cross_attn.value.bias decoder.blocks.17.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.17.encoder_attn.q_proj.weight -> decoder.blocks.17.cross_attn.query.weight decoder.blocks.17.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.17.encoder_attn.q_proj.bias -> decoder.blocks.17.cross_attn.query.bias decoder.blocks.17.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.17.encoder_attn.out_proj.weight -> decoder.blocks.17.cross_attn.out.weight decoder.blocks.17.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.17.encoder_attn.out_proj.bias -> decoder.blocks.17.cross_attn.out.bias decoder.blocks.17.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.17.encoder_attn_layer_norm.weight -> decoder.blocks.17.cross_attn_ln.weight decoder.blocks.17.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.17.encoder_attn_layer_norm.bias -> decoder.blocks.17.cross_attn_ln.bias decoder.blocks.17.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.17.fc1.weight -> decoder.blocks.17.mlp.0.weight decoder.blocks.17.mlp.0.weight 2 (5120, 1280) model.decoder.layers.17.fc1.bias -> decoder.blocks.17.mlp.0.bias decoder.blocks.17.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.17.fc2.weight -> decoder.blocks.17.mlp.2.weight decoder.blocks.17.mlp.2.weight 2 (1280, 5120) model.decoder.layers.17.fc2.bias -> decoder.blocks.17.mlp.2.bias decoder.blocks.17.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.17.final_layer_norm.weight -> decoder.blocks.17.mlp_ln.weight decoder.blocks.17.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.17.final_layer_norm.bias -> decoder.blocks.17.mlp_ln.bias decoder.blocks.17.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.18.self_attn.k_proj.weight -> decoder.blocks.18.attn.key.weight decoder.blocks.18.attn.key.weight 2 (1280, 1280) model.decoder.layers.18.self_attn.v_proj.weight -> decoder.blocks.18.attn.value.weight decoder.blocks.18.attn.value.weight 2 (1280, 1280) model.decoder.layers.18.self_attn.v_proj.bias -> decoder.blocks.18.attn.value.bias decoder.blocks.18.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.18.self_attn.q_proj.weight -> decoder.blocks.18.attn.query.weight decoder.blocks.18.attn.query.weight 2 (1280, 1280) model.decoder.layers.18.self_attn.q_proj.bias -> decoder.blocks.18.attn.query.bias decoder.blocks.18.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.18.self_attn.out_proj.weight -> decoder.blocks.18.attn.out.weight decoder.blocks.18.attn.out.weight 2 (1280, 1280) model.decoder.layers.18.self_attn.out_proj.bias -> decoder.blocks.18.attn.out.bias decoder.blocks.18.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.18.self_attn_layer_norm.weight -> decoder.blocks.18.attn_ln.weight decoder.blocks.18.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.18.self_attn_layer_norm.bias -> decoder.blocks.18.attn_ln.bias decoder.blocks.18.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.18.encoder_attn.k_proj.weight -> decoder.blocks.18.cross_attn.key.weight decoder.blocks.18.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.18.encoder_attn.v_proj.weight -> decoder.blocks.18.cross_attn.value.weight decoder.blocks.18.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.18.encoder_attn.v_proj.bias -> decoder.blocks.18.cross_attn.value.bias decoder.blocks.18.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.18.encoder_attn.q_proj.weight -> decoder.blocks.18.cross_attn.query.weight decoder.blocks.18.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.18.encoder_attn.q_proj.bias -> decoder.blocks.18.cross_attn.query.bias decoder.blocks.18.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.18.encoder_attn.out_proj.weight -> decoder.blocks.18.cross_attn.out.weight decoder.blocks.18.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.18.encoder_attn.out_proj.bias -> decoder.blocks.18.cross_attn.out.bias decoder.blocks.18.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.18.encoder_attn_layer_norm.weight -> decoder.blocks.18.cross_attn_ln.weight decoder.blocks.18.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.18.encoder_attn_layer_norm.bias -> decoder.blocks.18.cross_attn_ln.bias decoder.blocks.18.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.18.fc1.weight -> decoder.blocks.18.mlp.0.weight decoder.blocks.18.mlp.0.weight 2 (5120, 1280) model.decoder.layers.18.fc1.bias -> decoder.blocks.18.mlp.0.bias decoder.blocks.18.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.18.fc2.weight -> decoder.blocks.18.mlp.2.weight decoder.blocks.18.mlp.2.weight 2 (1280, 5120) model.decoder.layers.18.fc2.bias -> decoder.blocks.18.mlp.2.bias decoder.blocks.18.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.18.final_layer_norm.weight -> decoder.blocks.18.mlp_ln.weight decoder.blocks.18.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.18.final_layer_norm.bias -> decoder.blocks.18.mlp_ln.bias decoder.blocks.18.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.19.self_attn.k_proj.weight -> decoder.blocks.19.attn.key.weight decoder.blocks.19.attn.key.weight 2 (1280, 1280) model.decoder.layers.19.self_attn.v_proj.weight -> decoder.blocks.19.attn.value.weight decoder.blocks.19.attn.value.weight 2 (1280, 1280) model.decoder.layers.19.self_attn.v_proj.bias -> decoder.blocks.19.attn.value.bias decoder.blocks.19.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.19.self_attn.q_proj.weight -> decoder.blocks.19.attn.query.weight decoder.blocks.19.attn.query.weight 2 (1280, 1280) model.decoder.layers.19.self_attn.q_proj.bias -> decoder.blocks.19.attn.query.bias decoder.blocks.19.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.19.self_attn.out_proj.weight -> decoder.blocks.19.attn.out.weight decoder.blocks.19.attn.out.weight 2 (1280, 1280) model.decoder.layers.19.self_attn.out_proj.bias -> decoder.blocks.19.attn.out.bias decoder.blocks.19.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.19.self_attn_layer_norm.weight -> decoder.blocks.19.attn_ln.weight decoder.blocks.19.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.19.self_attn_layer_norm.bias -> decoder.blocks.19.attn_ln.bias decoder.blocks.19.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.19.encoder_attn.k_proj.weight -> decoder.blocks.19.cross_attn.key.weight decoder.blocks.19.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.19.encoder_attn.v_proj.weight -> decoder.blocks.19.cross_attn.value.weight decoder.blocks.19.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.19.encoder_attn.v_proj.bias -> decoder.blocks.19.cross_attn.value.bias decoder.blocks.19.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.19.encoder_attn.q_proj.weight -> decoder.blocks.19.cross_attn.query.weight decoder.blocks.19.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.19.encoder_attn.q_proj.bias -> decoder.blocks.19.cross_attn.query.bias decoder.blocks.19.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.19.encoder_attn.out_proj.weight -> decoder.blocks.19.cross_attn.out.weight decoder.blocks.19.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.19.encoder_attn.out_proj.bias -> decoder.blocks.19.cross_attn.out.bias decoder.blocks.19.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.19.encoder_attn_layer_norm.weight -> decoder.blocks.19.cross_attn_ln.weight decoder.blocks.19.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.19.encoder_attn_layer_norm.bias -> decoder.blocks.19.cross_attn_ln.bias decoder.blocks.19.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.19.fc1.weight -> decoder.blocks.19.mlp.0.weight decoder.blocks.19.mlp.0.weight 2 (5120, 1280) model.decoder.layers.19.fc1.bias -> decoder.blocks.19.mlp.0.bias decoder.blocks.19.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.19.fc2.weight -> decoder.blocks.19.mlp.2.weight decoder.blocks.19.mlp.2.weight 2 (1280, 5120) model.decoder.layers.19.fc2.bias -> decoder.blocks.19.mlp.2.bias decoder.blocks.19.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.19.final_layer_norm.weight -> decoder.blocks.19.mlp_ln.weight decoder.blocks.19.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.19.final_layer_norm.bias -> decoder.blocks.19.mlp_ln.bias decoder.blocks.19.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.20.self_attn.k_proj.weight -> decoder.blocks.20.attn.key.weight decoder.blocks.20.attn.key.weight 2 (1280, 1280) model.decoder.layers.20.self_attn.v_proj.weight -> decoder.blocks.20.attn.value.weight decoder.blocks.20.attn.value.weight 2 (1280, 1280) model.decoder.layers.20.self_attn.v_proj.bias -> decoder.blocks.20.attn.value.bias decoder.blocks.20.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.20.self_attn.q_proj.weight -> decoder.blocks.20.attn.query.weight decoder.blocks.20.attn.query.weight 2 (1280, 1280) model.decoder.layers.20.self_attn.q_proj.bias -> decoder.blocks.20.attn.query.bias decoder.blocks.20.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.20.self_attn.out_proj.weight -> decoder.blocks.20.attn.out.weight decoder.blocks.20.attn.out.weight 2 (1280, 1280) model.decoder.layers.20.self_attn.out_proj.bias -> decoder.blocks.20.attn.out.bias decoder.blocks.20.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.20.self_attn_layer_norm.weight -> decoder.blocks.20.attn_ln.weight decoder.blocks.20.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.20.self_attn_layer_norm.bias -> decoder.blocks.20.attn_ln.bias decoder.blocks.20.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.20.encoder_attn.k_proj.weight -> decoder.blocks.20.cross_attn.key.weight decoder.blocks.20.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.20.encoder_attn.v_proj.weight -> decoder.blocks.20.cross_attn.value.weight decoder.blocks.20.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.20.encoder_attn.v_proj.bias -> decoder.blocks.20.cross_attn.value.bias decoder.blocks.20.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.20.encoder_attn.q_proj.weight -> decoder.blocks.20.cross_attn.query.weight decoder.blocks.20.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.20.encoder_attn.q_proj.bias -> decoder.blocks.20.cross_attn.query.bias decoder.blocks.20.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.20.encoder_attn.out_proj.weight -> decoder.blocks.20.cross_attn.out.weight decoder.blocks.20.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.20.encoder_attn.out_proj.bias -> decoder.blocks.20.cross_attn.out.bias decoder.blocks.20.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.20.encoder_attn_layer_norm.weight -> decoder.blocks.20.cross_attn_ln.weight decoder.blocks.20.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.20.encoder_attn_layer_norm.bias -> decoder.blocks.20.cross_attn_ln.bias decoder.blocks.20.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.20.fc1.weight -> decoder.blocks.20.mlp.0.weight decoder.blocks.20.mlp.0.weight 2 (5120, 1280) model.decoder.layers.20.fc1.bias -> decoder.blocks.20.mlp.0.bias decoder.blocks.20.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.20.fc2.weight -> decoder.blocks.20.mlp.2.weight decoder.blocks.20.mlp.2.weight 2 (1280, 5120) model.decoder.layers.20.fc2.bias -> decoder.blocks.20.mlp.2.bias decoder.blocks.20.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.20.final_layer_norm.weight -> decoder.blocks.20.mlp_ln.weight decoder.blocks.20.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.20.final_layer_norm.bias -> decoder.blocks.20.mlp_ln.bias decoder.blocks.20.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.21.self_attn.k_proj.weight -> decoder.blocks.21.attn.key.weight decoder.blocks.21.attn.key.weight 2 (1280, 1280) model.decoder.layers.21.self_attn.v_proj.weight -> decoder.blocks.21.attn.value.weight decoder.blocks.21.attn.value.weight 2 (1280, 1280) model.decoder.layers.21.self_attn.v_proj.bias -> decoder.blocks.21.attn.value.bias decoder.blocks.21.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.21.self_attn.q_proj.weight -> decoder.blocks.21.attn.query.weight decoder.blocks.21.attn.query.weight 2 (1280, 1280) model.decoder.layers.21.self_attn.q_proj.bias -> decoder.blocks.21.attn.query.bias decoder.blocks.21.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.21.self_attn.out_proj.weight -> decoder.blocks.21.attn.out.weight decoder.blocks.21.attn.out.weight 2 (1280, 1280) model.decoder.layers.21.self_attn.out_proj.bias -> decoder.blocks.21.attn.out.bias decoder.blocks.21.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.21.self_attn_layer_norm.weight -> decoder.blocks.21.attn_ln.weight decoder.blocks.21.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.21.self_attn_layer_norm.bias -> decoder.blocks.21.attn_ln.bias decoder.blocks.21.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.21.encoder_attn.k_proj.weight -> decoder.blocks.21.cross_attn.key.weight decoder.blocks.21.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.21.encoder_attn.v_proj.weight -> decoder.blocks.21.cross_attn.value.weight decoder.blocks.21.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.21.encoder_attn.v_proj.bias -> decoder.blocks.21.cross_attn.value.bias decoder.blocks.21.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.21.encoder_attn.q_proj.weight -> decoder.blocks.21.cross_attn.query.weight decoder.blocks.21.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.21.encoder_attn.q_proj.bias -> decoder.blocks.21.cross_attn.query.bias decoder.blocks.21.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.21.encoder_attn.out_proj.weight -> decoder.blocks.21.cross_attn.out.weight decoder.blocks.21.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.21.encoder_attn.out_proj.bias -> decoder.blocks.21.cross_attn.out.bias decoder.blocks.21.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.21.encoder_attn_layer_norm.weight -> decoder.blocks.21.cross_attn_ln.weight decoder.blocks.21.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.21.encoder_attn_layer_norm.bias -> decoder.blocks.21.cross_attn_ln.bias decoder.blocks.21.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.21.fc1.weight -> decoder.blocks.21.mlp.0.weight decoder.blocks.21.mlp.0.weight 2 (5120, 1280) model.decoder.layers.21.fc1.bias -> decoder.blocks.21.mlp.0.bias decoder.blocks.21.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.21.fc2.weight -> decoder.blocks.21.mlp.2.weight decoder.blocks.21.mlp.2.weight 2 (1280, 5120) model.decoder.layers.21.fc2.bias -> decoder.blocks.21.mlp.2.bias decoder.blocks.21.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.21.final_layer_norm.weight -> decoder.blocks.21.mlp_ln.weight decoder.blocks.21.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.21.final_layer_norm.bias -> decoder.blocks.21.mlp_ln.bias decoder.blocks.21.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.22.self_attn.k_proj.weight -> decoder.blocks.22.attn.key.weight decoder.blocks.22.attn.key.weight 2 (1280, 1280) model.decoder.layers.22.self_attn.v_proj.weight -> decoder.blocks.22.attn.value.weight decoder.blocks.22.attn.value.weight 2 (1280, 1280) model.decoder.layers.22.self_attn.v_proj.bias -> decoder.blocks.22.attn.value.bias decoder.blocks.22.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.22.self_attn.q_proj.weight -> decoder.blocks.22.attn.query.weight decoder.blocks.22.attn.query.weight 2 (1280, 1280) model.decoder.layers.22.self_attn.q_proj.bias -> decoder.blocks.22.attn.query.bias decoder.blocks.22.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.22.self_attn.out_proj.weight -> decoder.blocks.22.attn.out.weight decoder.blocks.22.attn.out.weight 2 (1280, 1280) model.decoder.layers.22.self_attn.out_proj.bias -> decoder.blocks.22.attn.out.bias decoder.blocks.22.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.22.self_attn_layer_norm.weight -> decoder.blocks.22.attn_ln.weight decoder.blocks.22.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.22.self_attn_layer_norm.bias -> decoder.blocks.22.attn_ln.bias decoder.blocks.22.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.22.encoder_attn.k_proj.weight -> decoder.blocks.22.cross_attn.key.weight decoder.blocks.22.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.22.encoder_attn.v_proj.weight -> decoder.blocks.22.cross_attn.value.weight decoder.blocks.22.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.22.encoder_attn.v_proj.bias -> decoder.blocks.22.cross_attn.value.bias decoder.blocks.22.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.22.encoder_attn.q_proj.weight -> decoder.blocks.22.cross_attn.query.weight decoder.blocks.22.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.22.encoder_attn.q_proj.bias -> decoder.blocks.22.cross_attn.query.bias decoder.blocks.22.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.22.encoder_attn.out_proj.weight -> decoder.blocks.22.cross_attn.out.weight decoder.blocks.22.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.22.encoder_attn.out_proj.bias -> decoder.blocks.22.cross_attn.out.bias decoder.blocks.22.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.22.encoder_attn_layer_norm.weight -> decoder.blocks.22.cross_attn_ln.weight decoder.blocks.22.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.22.encoder_attn_layer_norm.bias -> decoder.blocks.22.cross_attn_ln.bias decoder.blocks.22.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.22.fc1.weight -> decoder.blocks.22.mlp.0.weight decoder.blocks.22.mlp.0.weight 2 (5120, 1280) model.decoder.layers.22.fc1.bias -> decoder.blocks.22.mlp.0.bias decoder.blocks.22.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.22.fc2.weight -> decoder.blocks.22.mlp.2.weight decoder.blocks.22.mlp.2.weight 2 (1280, 5120) model.decoder.layers.22.fc2.bias -> decoder.blocks.22.mlp.2.bias decoder.blocks.22.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.22.final_layer_norm.weight -> decoder.blocks.22.mlp_ln.weight decoder.blocks.22.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.22.final_layer_norm.bias -> decoder.blocks.22.mlp_ln.bias decoder.blocks.22.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.23.self_attn.k_proj.weight -> decoder.blocks.23.attn.key.weight decoder.blocks.23.attn.key.weight 2 (1280, 1280) model.decoder.layers.23.self_attn.v_proj.weight -> decoder.blocks.23.attn.value.weight decoder.blocks.23.attn.value.weight 2 (1280, 1280) model.decoder.layers.23.self_attn.v_proj.bias -> decoder.blocks.23.attn.value.bias decoder.blocks.23.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.23.self_attn.q_proj.weight -> decoder.blocks.23.attn.query.weight decoder.blocks.23.attn.query.weight 2 (1280, 1280) model.decoder.layers.23.self_attn.q_proj.bias -> decoder.blocks.23.attn.query.bias decoder.blocks.23.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.23.self_attn.out_proj.weight -> decoder.blocks.23.attn.out.weight decoder.blocks.23.attn.out.weight 2 (1280, 1280) model.decoder.layers.23.self_attn.out_proj.bias -> decoder.blocks.23.attn.out.bias decoder.blocks.23.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.23.self_attn_layer_norm.weight -> decoder.blocks.23.attn_ln.weight decoder.blocks.23.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.23.self_attn_layer_norm.bias -> decoder.blocks.23.attn_ln.bias decoder.blocks.23.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.23.encoder_attn.k_proj.weight -> decoder.blocks.23.cross_attn.key.weight decoder.blocks.23.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.23.encoder_attn.v_proj.weight -> decoder.blocks.23.cross_attn.value.weight decoder.blocks.23.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.23.encoder_attn.v_proj.bias -> decoder.blocks.23.cross_attn.value.bias decoder.blocks.23.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.23.encoder_attn.q_proj.weight -> decoder.blocks.23.cross_attn.query.weight decoder.blocks.23.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.23.encoder_attn.q_proj.bias -> decoder.blocks.23.cross_attn.query.bias decoder.blocks.23.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.23.encoder_attn.out_proj.weight -> decoder.blocks.23.cross_attn.out.weight decoder.blocks.23.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.23.encoder_attn.out_proj.bias -> decoder.blocks.23.cross_attn.out.bias decoder.blocks.23.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.23.encoder_attn_layer_norm.weight -> decoder.blocks.23.cross_attn_ln.weight decoder.blocks.23.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.23.encoder_attn_layer_norm.bias -> decoder.blocks.23.cross_attn_ln.bias decoder.blocks.23.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.23.fc1.weight -> decoder.blocks.23.mlp.0.weight decoder.blocks.23.mlp.0.weight 2 (5120, 1280) model.decoder.layers.23.fc1.bias -> decoder.blocks.23.mlp.0.bias decoder.blocks.23.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.23.fc2.weight -> decoder.blocks.23.mlp.2.weight decoder.blocks.23.mlp.2.weight 2 (1280, 5120) model.decoder.layers.23.fc2.bias -> decoder.blocks.23.mlp.2.bias decoder.blocks.23.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.23.final_layer_norm.weight -> decoder.blocks.23.mlp_ln.weight decoder.blocks.23.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.23.final_layer_norm.bias -> decoder.blocks.23.mlp_ln.bias decoder.blocks.23.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.24.self_attn.k_proj.weight -> decoder.blocks.24.attn.key.weight decoder.blocks.24.attn.key.weight 2 (1280, 1280) model.decoder.layers.24.self_attn.v_proj.weight -> decoder.blocks.24.attn.value.weight decoder.blocks.24.attn.value.weight 2 (1280, 1280) model.decoder.layers.24.self_attn.v_proj.bias -> decoder.blocks.24.attn.value.bias decoder.blocks.24.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.24.self_attn.q_proj.weight -> decoder.blocks.24.attn.query.weight decoder.blocks.24.attn.query.weight 2 (1280, 1280) model.decoder.layers.24.self_attn.q_proj.bias -> decoder.blocks.24.attn.query.bias decoder.blocks.24.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.24.self_attn.out_proj.weight -> decoder.blocks.24.attn.out.weight decoder.blocks.24.attn.out.weight 2 (1280, 1280) model.decoder.layers.24.self_attn.out_proj.bias -> decoder.blocks.24.attn.out.bias decoder.blocks.24.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.24.self_attn_layer_norm.weight -> decoder.blocks.24.attn_ln.weight decoder.blocks.24.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.24.self_attn_layer_norm.bias -> decoder.blocks.24.attn_ln.bias decoder.blocks.24.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.24.encoder_attn.k_proj.weight -> decoder.blocks.24.cross_attn.key.weight decoder.blocks.24.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.24.encoder_attn.v_proj.weight -> decoder.blocks.24.cross_attn.value.weight decoder.blocks.24.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.24.encoder_attn.v_proj.bias -> decoder.blocks.24.cross_attn.value.bias decoder.blocks.24.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.24.encoder_attn.q_proj.weight -> decoder.blocks.24.cross_attn.query.weight decoder.blocks.24.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.24.encoder_attn.q_proj.bias -> decoder.blocks.24.cross_attn.query.bias decoder.blocks.24.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.24.encoder_attn.out_proj.weight -> decoder.blocks.24.cross_attn.out.weight decoder.blocks.24.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.24.encoder_attn.out_proj.bias -> decoder.blocks.24.cross_attn.out.bias decoder.blocks.24.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.24.encoder_attn_layer_norm.weight -> decoder.blocks.24.cross_attn_ln.weight decoder.blocks.24.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.24.encoder_attn_layer_norm.bias -> decoder.blocks.24.cross_attn_ln.bias decoder.blocks.24.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.24.fc1.weight -> decoder.blocks.24.mlp.0.weight decoder.blocks.24.mlp.0.weight 2 (5120, 1280) model.decoder.layers.24.fc1.bias -> decoder.blocks.24.mlp.0.bias decoder.blocks.24.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.24.fc2.weight -> decoder.blocks.24.mlp.2.weight decoder.blocks.24.mlp.2.weight 2 (1280, 5120) model.decoder.layers.24.fc2.bias -> decoder.blocks.24.mlp.2.bias decoder.blocks.24.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.24.final_layer_norm.weight -> decoder.blocks.24.mlp_ln.weight decoder.blocks.24.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.24.final_layer_norm.bias -> decoder.blocks.24.mlp_ln.bias decoder.blocks.24.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.25.self_attn.k_proj.weight -> decoder.blocks.25.attn.key.weight decoder.blocks.25.attn.key.weight 2 (1280, 1280) model.decoder.layers.25.self_attn.v_proj.weight -> decoder.blocks.25.attn.value.weight decoder.blocks.25.attn.value.weight 2 (1280, 1280) model.decoder.layers.25.self_attn.v_proj.bias -> decoder.blocks.25.attn.value.bias decoder.blocks.25.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.25.self_attn.q_proj.weight -> decoder.blocks.25.attn.query.weight decoder.blocks.25.attn.query.weight 2 (1280, 1280) model.decoder.layers.25.self_attn.q_proj.bias -> decoder.blocks.25.attn.query.bias decoder.blocks.25.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.25.self_attn.out_proj.weight -> decoder.blocks.25.attn.out.weight decoder.blocks.25.attn.out.weight 2 (1280, 1280) model.decoder.layers.25.self_attn.out_proj.bias -> decoder.blocks.25.attn.out.bias decoder.blocks.25.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.25.self_attn_layer_norm.weight -> decoder.blocks.25.attn_ln.weight decoder.blocks.25.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.25.self_attn_layer_norm.bias -> decoder.blocks.25.attn_ln.bias decoder.blocks.25.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.25.encoder_attn.k_proj.weight -> decoder.blocks.25.cross_attn.key.weight decoder.blocks.25.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.25.encoder_attn.v_proj.weight -> decoder.blocks.25.cross_attn.value.weight decoder.blocks.25.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.25.encoder_attn.v_proj.bias -> decoder.blocks.25.cross_attn.value.bias decoder.blocks.25.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.25.encoder_attn.q_proj.weight -> decoder.blocks.25.cross_attn.query.weight decoder.blocks.25.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.25.encoder_attn.q_proj.bias -> decoder.blocks.25.cross_attn.query.bias decoder.blocks.25.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.25.encoder_attn.out_proj.weight -> decoder.blocks.25.cross_attn.out.weight decoder.blocks.25.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.25.encoder_attn.out_proj.bias -> decoder.blocks.25.cross_attn.out.bias decoder.blocks.25.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.25.encoder_attn_layer_norm.weight -> decoder.blocks.25.cross_attn_ln.weight decoder.blocks.25.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.25.encoder_attn_layer_norm.bias -> decoder.blocks.25.cross_attn_ln.bias decoder.blocks.25.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.25.fc1.weight -> decoder.blocks.25.mlp.0.weight decoder.blocks.25.mlp.0.weight 2 (5120, 1280) model.decoder.layers.25.fc1.bias -> decoder.blocks.25.mlp.0.bias decoder.blocks.25.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.25.fc2.weight -> decoder.blocks.25.mlp.2.weight decoder.blocks.25.mlp.2.weight 2 (1280, 5120) model.decoder.layers.25.fc2.bias -> decoder.blocks.25.mlp.2.bias decoder.blocks.25.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.25.final_layer_norm.weight -> decoder.blocks.25.mlp_ln.weight decoder.blocks.25.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.25.final_layer_norm.bias -> decoder.blocks.25.mlp_ln.bias decoder.blocks.25.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.26.self_attn.k_proj.weight -> decoder.blocks.26.attn.key.weight decoder.blocks.26.attn.key.weight 2 (1280, 1280) model.decoder.layers.26.self_attn.v_proj.weight -> decoder.blocks.26.attn.value.weight decoder.blocks.26.attn.value.weight 2 (1280, 1280) model.decoder.layers.26.self_attn.v_proj.bias -> decoder.blocks.26.attn.value.bias decoder.blocks.26.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.26.self_attn.q_proj.weight -> decoder.blocks.26.attn.query.weight decoder.blocks.26.attn.query.weight 2 (1280, 1280) model.decoder.layers.26.self_attn.q_proj.bias -> decoder.blocks.26.attn.query.bias decoder.blocks.26.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.26.self_attn.out_proj.weight -> decoder.blocks.26.attn.out.weight decoder.blocks.26.attn.out.weight 2 (1280, 1280) model.decoder.layers.26.self_attn.out_proj.bias -> decoder.blocks.26.attn.out.bias decoder.blocks.26.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.26.self_attn_layer_norm.weight -> decoder.blocks.26.attn_ln.weight decoder.blocks.26.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.26.self_attn_layer_norm.bias -> decoder.blocks.26.attn_ln.bias decoder.blocks.26.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.26.encoder_attn.k_proj.weight -> decoder.blocks.26.cross_attn.key.weight decoder.blocks.26.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.26.encoder_attn.v_proj.weight -> decoder.blocks.26.cross_attn.value.weight decoder.blocks.26.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.26.encoder_attn.v_proj.bias -> decoder.blocks.26.cross_attn.value.bias decoder.blocks.26.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.26.encoder_attn.q_proj.weight -> decoder.blocks.26.cross_attn.query.weight decoder.blocks.26.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.26.encoder_attn.q_proj.bias -> decoder.blocks.26.cross_attn.query.bias decoder.blocks.26.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.26.encoder_attn.out_proj.weight -> decoder.blocks.26.cross_attn.out.weight decoder.blocks.26.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.26.encoder_attn.out_proj.bias -> decoder.blocks.26.cross_attn.out.bias decoder.blocks.26.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.26.encoder_attn_layer_norm.weight -> decoder.blocks.26.cross_attn_ln.weight decoder.blocks.26.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.26.encoder_attn_layer_norm.bias -> decoder.blocks.26.cross_attn_ln.bias decoder.blocks.26.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.26.fc1.weight -> decoder.blocks.26.mlp.0.weight decoder.blocks.26.mlp.0.weight 2 (5120, 1280) model.decoder.layers.26.fc1.bias -> decoder.blocks.26.mlp.0.bias decoder.blocks.26.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.26.fc2.weight -> decoder.blocks.26.mlp.2.weight decoder.blocks.26.mlp.2.weight 2 (1280, 5120) model.decoder.layers.26.fc2.bias -> decoder.blocks.26.mlp.2.bias decoder.blocks.26.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.26.final_layer_norm.weight -> decoder.blocks.26.mlp_ln.weight decoder.blocks.26.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.26.final_layer_norm.bias -> decoder.blocks.26.mlp_ln.bias decoder.blocks.26.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.27.self_attn.k_proj.weight -> decoder.blocks.27.attn.key.weight decoder.blocks.27.attn.key.weight 2 (1280, 1280) model.decoder.layers.27.self_attn.v_proj.weight -> decoder.blocks.27.attn.value.weight decoder.blocks.27.attn.value.weight 2 (1280, 1280) model.decoder.layers.27.self_attn.v_proj.bias -> decoder.blocks.27.attn.value.bias decoder.blocks.27.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.27.self_attn.q_proj.weight -> decoder.blocks.27.attn.query.weight decoder.blocks.27.attn.query.weight 2 (1280, 1280) model.decoder.layers.27.self_attn.q_proj.bias -> decoder.blocks.27.attn.query.bias decoder.blocks.27.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.27.self_attn.out_proj.weight -> decoder.blocks.27.attn.out.weight decoder.blocks.27.attn.out.weight 2 (1280, 1280) model.decoder.layers.27.self_attn.out_proj.bias -> decoder.blocks.27.attn.out.bias decoder.blocks.27.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.27.self_attn_layer_norm.weight -> decoder.blocks.27.attn_ln.weight decoder.blocks.27.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.27.self_attn_layer_norm.bias -> decoder.blocks.27.attn_ln.bias decoder.blocks.27.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.27.encoder_attn.k_proj.weight -> decoder.blocks.27.cross_attn.key.weight decoder.blocks.27.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.27.encoder_attn.v_proj.weight -> decoder.blocks.27.cross_attn.value.weight decoder.blocks.27.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.27.encoder_attn.v_proj.bias -> decoder.blocks.27.cross_attn.value.bias decoder.blocks.27.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.27.encoder_attn.q_proj.weight -> decoder.blocks.27.cross_attn.query.weight decoder.blocks.27.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.27.encoder_attn.q_proj.bias -> decoder.blocks.27.cross_attn.query.bias decoder.blocks.27.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.27.encoder_attn.out_proj.weight -> decoder.blocks.27.cross_attn.out.weight decoder.blocks.27.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.27.encoder_attn.out_proj.bias -> decoder.blocks.27.cross_attn.out.bias decoder.blocks.27.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.27.encoder_attn_layer_norm.weight -> decoder.blocks.27.cross_attn_ln.weight decoder.blocks.27.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.27.encoder_attn_layer_norm.bias -> decoder.blocks.27.cross_attn_ln.bias decoder.blocks.27.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.27.fc1.weight -> decoder.blocks.27.mlp.0.weight decoder.blocks.27.mlp.0.weight 2 (5120, 1280) model.decoder.layers.27.fc1.bias -> decoder.blocks.27.mlp.0.bias decoder.blocks.27.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.27.fc2.weight -> decoder.blocks.27.mlp.2.weight decoder.blocks.27.mlp.2.weight 2 (1280, 5120) model.decoder.layers.27.fc2.bias -> decoder.blocks.27.mlp.2.bias decoder.blocks.27.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.27.final_layer_norm.weight -> decoder.blocks.27.mlp_ln.weight decoder.blocks.27.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.27.final_layer_norm.bias -> decoder.blocks.27.mlp_ln.bias decoder.blocks.27.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.28.self_attn.k_proj.weight -> decoder.blocks.28.attn.key.weight decoder.blocks.28.attn.key.weight 2 (1280, 1280) model.decoder.layers.28.self_attn.v_proj.weight -> decoder.blocks.28.attn.value.weight decoder.blocks.28.attn.value.weight 2 (1280, 1280) model.decoder.layers.28.self_attn.v_proj.bias -> decoder.blocks.28.attn.value.bias decoder.blocks.28.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.28.self_attn.q_proj.weight -> decoder.blocks.28.attn.query.weight decoder.blocks.28.attn.query.weight 2 (1280, 1280) model.decoder.layers.28.self_attn.q_proj.bias -> decoder.blocks.28.attn.query.bias decoder.blocks.28.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.28.self_attn.out_proj.weight -> decoder.blocks.28.attn.out.weight decoder.blocks.28.attn.out.weight 2 (1280, 1280) model.decoder.layers.28.self_attn.out_proj.bias -> decoder.blocks.28.attn.out.bias decoder.blocks.28.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.28.self_attn_layer_norm.weight -> decoder.blocks.28.attn_ln.weight decoder.blocks.28.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.28.self_attn_layer_norm.bias -> decoder.blocks.28.attn_ln.bias decoder.blocks.28.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.28.encoder_attn.k_proj.weight -> decoder.blocks.28.cross_attn.key.weight decoder.blocks.28.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.28.encoder_attn.v_proj.weight -> decoder.blocks.28.cross_attn.value.weight decoder.blocks.28.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.28.encoder_attn.v_proj.bias -> decoder.blocks.28.cross_attn.value.bias decoder.blocks.28.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.28.encoder_attn.q_proj.weight -> decoder.blocks.28.cross_attn.query.weight decoder.blocks.28.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.28.encoder_attn.q_proj.bias -> decoder.blocks.28.cross_attn.query.bias decoder.blocks.28.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.28.encoder_attn.out_proj.weight -> decoder.blocks.28.cross_attn.out.weight decoder.blocks.28.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.28.encoder_attn.out_proj.bias -> decoder.blocks.28.cross_attn.out.bias decoder.blocks.28.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.28.encoder_attn_layer_norm.weight -> decoder.blocks.28.cross_attn_ln.weight decoder.blocks.28.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.28.encoder_attn_layer_norm.bias -> decoder.blocks.28.cross_attn_ln.bias decoder.blocks.28.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.28.fc1.weight -> decoder.blocks.28.mlp.0.weight decoder.blocks.28.mlp.0.weight 2 (5120, 1280) model.decoder.layers.28.fc1.bias -> decoder.blocks.28.mlp.0.bias decoder.blocks.28.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.28.fc2.weight -> decoder.blocks.28.mlp.2.weight decoder.blocks.28.mlp.2.weight 2 (1280, 5120) model.decoder.layers.28.fc2.bias -> decoder.blocks.28.mlp.2.bias decoder.blocks.28.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.28.final_layer_norm.weight -> decoder.blocks.28.mlp_ln.weight decoder.blocks.28.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.28.final_layer_norm.bias -> decoder.blocks.28.mlp_ln.bias decoder.blocks.28.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.29.self_attn.k_proj.weight -> decoder.blocks.29.attn.key.weight decoder.blocks.29.attn.key.weight 2 (1280, 1280) model.decoder.layers.29.self_attn.v_proj.weight -> decoder.blocks.29.attn.value.weight decoder.blocks.29.attn.value.weight 2 (1280, 1280) model.decoder.layers.29.self_attn.v_proj.bias -> decoder.blocks.29.attn.value.bias decoder.blocks.29.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.29.self_attn.q_proj.weight -> decoder.blocks.29.attn.query.weight decoder.blocks.29.attn.query.weight 2 (1280, 1280) model.decoder.layers.29.self_attn.q_proj.bias -> decoder.blocks.29.attn.query.bias decoder.blocks.29.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.29.self_attn.out_proj.weight -> decoder.blocks.29.attn.out.weight decoder.blocks.29.attn.out.weight 2 (1280, 1280) model.decoder.layers.29.self_attn.out_proj.bias -> decoder.blocks.29.attn.out.bias decoder.blocks.29.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.29.self_attn_layer_norm.weight -> decoder.blocks.29.attn_ln.weight decoder.blocks.29.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.29.self_attn_layer_norm.bias -> decoder.blocks.29.attn_ln.bias decoder.blocks.29.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.29.encoder_attn.k_proj.weight -> decoder.blocks.29.cross_attn.key.weight decoder.blocks.29.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.29.encoder_attn.v_proj.weight -> decoder.blocks.29.cross_attn.value.weight decoder.blocks.29.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.29.encoder_attn.v_proj.bias -> decoder.blocks.29.cross_attn.value.bias decoder.blocks.29.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.29.encoder_attn.q_proj.weight -> decoder.blocks.29.cross_attn.query.weight decoder.blocks.29.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.29.encoder_attn.q_proj.bias -> decoder.blocks.29.cross_attn.query.bias decoder.blocks.29.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.29.encoder_attn.out_proj.weight -> decoder.blocks.29.cross_attn.out.weight decoder.blocks.29.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.29.encoder_attn.out_proj.bias -> decoder.blocks.29.cross_attn.out.bias decoder.blocks.29.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.29.encoder_attn_layer_norm.weight -> decoder.blocks.29.cross_attn_ln.weight decoder.blocks.29.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.29.encoder_attn_layer_norm.bias -> decoder.blocks.29.cross_attn_ln.bias decoder.blocks.29.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.29.fc1.weight -> decoder.blocks.29.mlp.0.weight decoder.blocks.29.mlp.0.weight 2 (5120, 1280) model.decoder.layers.29.fc1.bias -> decoder.blocks.29.mlp.0.bias decoder.blocks.29.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.29.fc2.weight -> decoder.blocks.29.mlp.2.weight decoder.blocks.29.mlp.2.weight 2 (1280, 5120) model.decoder.layers.29.fc2.bias -> decoder.blocks.29.mlp.2.bias decoder.blocks.29.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.29.final_layer_norm.weight -> decoder.blocks.29.mlp_ln.weight decoder.blocks.29.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.29.final_layer_norm.bias -> decoder.blocks.29.mlp_ln.bias decoder.blocks.29.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.30.self_attn.k_proj.weight -> decoder.blocks.30.attn.key.weight decoder.blocks.30.attn.key.weight 2 (1280, 1280) model.decoder.layers.30.self_attn.v_proj.weight -> decoder.blocks.30.attn.value.weight decoder.blocks.30.attn.value.weight 2 (1280, 1280) model.decoder.layers.30.self_attn.v_proj.bias -> decoder.blocks.30.attn.value.bias decoder.blocks.30.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.30.self_attn.q_proj.weight -> decoder.blocks.30.attn.query.weight decoder.blocks.30.attn.query.weight 2 (1280, 1280) model.decoder.layers.30.self_attn.q_proj.bias -> decoder.blocks.30.attn.query.bias decoder.blocks.30.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.30.self_attn.out_proj.weight -> decoder.blocks.30.attn.out.weight decoder.blocks.30.attn.out.weight 2 (1280, 1280) model.decoder.layers.30.self_attn.out_proj.bias -> decoder.blocks.30.attn.out.bias decoder.blocks.30.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.30.self_attn_layer_norm.weight -> decoder.blocks.30.attn_ln.weight decoder.blocks.30.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.30.self_attn_layer_norm.bias -> decoder.blocks.30.attn_ln.bias decoder.blocks.30.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.30.encoder_attn.k_proj.weight -> decoder.blocks.30.cross_attn.key.weight decoder.blocks.30.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.30.encoder_attn.v_proj.weight -> decoder.blocks.30.cross_attn.value.weight decoder.blocks.30.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.30.encoder_attn.v_proj.bias -> decoder.blocks.30.cross_attn.value.bias decoder.blocks.30.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.30.encoder_attn.q_proj.weight -> decoder.blocks.30.cross_attn.query.weight decoder.blocks.30.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.30.encoder_attn.q_proj.bias -> decoder.blocks.30.cross_attn.query.bias decoder.blocks.30.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.30.encoder_attn.out_proj.weight -> decoder.blocks.30.cross_attn.out.weight decoder.blocks.30.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.30.encoder_attn.out_proj.bias -> decoder.blocks.30.cross_attn.out.bias decoder.blocks.30.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.30.encoder_attn_layer_norm.weight -> decoder.blocks.30.cross_attn_ln.weight decoder.blocks.30.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.30.encoder_attn_layer_norm.bias -> decoder.blocks.30.cross_attn_ln.bias decoder.blocks.30.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.30.fc1.weight -> decoder.blocks.30.mlp.0.weight decoder.blocks.30.mlp.0.weight 2 (5120, 1280) model.decoder.layers.30.fc1.bias -> decoder.blocks.30.mlp.0.bias decoder.blocks.30.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.30.fc2.weight -> decoder.blocks.30.mlp.2.weight decoder.blocks.30.mlp.2.weight 2 (1280, 5120) model.decoder.layers.30.fc2.bias -> decoder.blocks.30.mlp.2.bias decoder.blocks.30.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.30.final_layer_norm.weight -> decoder.blocks.30.mlp_ln.weight decoder.blocks.30.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.30.final_layer_norm.bias -> decoder.blocks.30.mlp_ln.bias decoder.blocks.30.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.31.self_attn.k_proj.weight -> decoder.blocks.31.attn.key.weight decoder.blocks.31.attn.key.weight 2 (1280, 1280) model.decoder.layers.31.self_attn.v_proj.weight -> decoder.blocks.31.attn.value.weight decoder.blocks.31.attn.value.weight 2 (1280, 1280) model.decoder.layers.31.self_attn.v_proj.bias -> decoder.blocks.31.attn.value.bias decoder.blocks.31.attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.31.self_attn.q_proj.weight -> decoder.blocks.31.attn.query.weight decoder.blocks.31.attn.query.weight 2 (1280, 1280) model.decoder.layers.31.self_attn.q_proj.bias -> decoder.blocks.31.attn.query.bias decoder.blocks.31.attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.31.self_attn.out_proj.weight -> decoder.blocks.31.attn.out.weight decoder.blocks.31.attn.out.weight 2 (1280, 1280) model.decoder.layers.31.self_attn.out_proj.bias -> decoder.blocks.31.attn.out.bias decoder.blocks.31.attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.31.self_attn_layer_norm.weight -> decoder.blocks.31.attn_ln.weight decoder.blocks.31.attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.31.self_attn_layer_norm.bias -> decoder.blocks.31.attn_ln.bias decoder.blocks.31.attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.31.encoder_attn.k_proj.weight -> decoder.blocks.31.cross_attn.key.weight decoder.blocks.31.cross_attn.key.weight 2 (1280, 1280) model.decoder.layers.31.encoder_attn.v_proj.weight -> decoder.blocks.31.cross_attn.value.weight decoder.blocks.31.cross_attn.value.weight 2 (1280, 1280) model.decoder.layers.31.encoder_attn.v_proj.bias -> decoder.blocks.31.cross_attn.value.bias decoder.blocks.31.cross_attn.value.bias 1 (1280,) Converting to float32 model.decoder.layers.31.encoder_attn.q_proj.weight -> decoder.blocks.31.cross_attn.query.weight decoder.blocks.31.cross_attn.query.weight 2 (1280, 1280) model.decoder.layers.31.encoder_attn.q_proj.bias -> decoder.blocks.31.cross_attn.query.bias decoder.blocks.31.cross_attn.query.bias 1 (1280,) Converting to float32 model.decoder.layers.31.encoder_attn.out_proj.weight -> decoder.blocks.31.cross_attn.out.weight decoder.blocks.31.cross_attn.out.weight 2 (1280, 1280) model.decoder.layers.31.encoder_attn.out_proj.bias -> decoder.blocks.31.cross_attn.out.bias decoder.blocks.31.cross_attn.out.bias 1 (1280,) Converting to float32 model.decoder.layers.31.encoder_attn_layer_norm.weight -> decoder.blocks.31.cross_attn_ln.weight decoder.blocks.31.cross_attn_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.31.encoder_attn_layer_norm.bias -> decoder.blocks.31.cross_attn_ln.bias decoder.blocks.31.cross_attn_ln.bias 1 (1280,) Converting to float32 model.decoder.layers.31.fc1.weight -> decoder.blocks.31.mlp.0.weight decoder.blocks.31.mlp.0.weight 2 (5120, 1280) model.decoder.layers.31.fc1.bias -> decoder.blocks.31.mlp.0.bias decoder.blocks.31.mlp.0.bias 1 (5120,) Converting to float32 model.decoder.layers.31.fc2.weight -> decoder.blocks.31.mlp.2.weight decoder.blocks.31.mlp.2.weight 2 (1280, 5120) model.decoder.layers.31.fc2.bias -> decoder.blocks.31.mlp.2.bias decoder.blocks.31.mlp.2.bias 1 (1280,) Converting to float32 model.decoder.layers.31.final_layer_norm.weight -> decoder.blocks.31.mlp_ln.weight decoder.blocks.31.mlp_ln.weight 1 (1280,) Converting to float32 model.decoder.layers.31.final_layer_norm.bias -> decoder.blocks.31.mlp_ln.bias decoder.blocks.31.mlp_ln.bias 1 (1280,) Converting to float32 model.decoder.layer_norm.weight -> decoder.ln.weight decoder.ln.weight 1 (1280,) Converting to float32 model.decoder.layer_norm.bias -> decoder.ln.bias decoder.ln.bias 1 (1280,) Converting to float32 Skipping proj_out.weight Done. Output file: whisper-ggml-sme/ggml-model.bin
!wget https://media.globalrecordings.net/GOKit_MP3s_named/Saami%20North%20-%20The%20Two%20Roads.mp3 -O sample.mp3
!ffmpeg -i sample.mp3 -acodec pcm_s16le -ac 1 -ar 16000 sample.wav
--2024-03-03 18:09:42-- https://media.globalrecordings.net/GOKit_MP3s_named/Saami%20North%20-%20The%20Two%20Roads.mp3 Resolving media.globalrecordings.net (media.globalrecordings.net)... 35.208.248.145 Connecting to media.globalrecordings.net (media.globalrecordings.net)|35.208.248.145|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 493511 (482K) [audio/mpeg] Saving to: ‘sample.mp3’ sample.mp3 100%[===================>] 481.94K 3.00MB/s in 0.2s 2024-03-03 18:09:42 (3.00 MB/s) - ‘sample.mp3’ saved [493511/493511] ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Input #0, mp3, from 'sample.mp3': Metadata: title : Saami, North Picture 22: The Two Roads The Two Roads artist : GRN Language Samples album : Saami, North SME composer : Saami, North genre : GRN Language Sample encoder : Lavf58.76.100 track : 1 copyright : 2010 GRN comment : https://globalrecordings.net/en/language/3475 grnprog : A63258 grnlang : 3475 grnprepared : 20240113 date : 2010 Duration: 00:01:44.67, start: 0.050111, bitrate: 37 kb/s Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 32 kb/s Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 300x300 [SAR 300:300 DAR 1:1], 90k tbr, 90k tbn, 90k tbc (attached pic) Metadata: title : gn-22 comment : Cover (front) Stream mapping: Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'sample.wav': Metadata: INAM : Saami, North Picture 22: The Two Roads The Two Roads IART : GRN Language Samples IPRD : Saami, North SME composer : Saami, North IGNR : GRN Language Sample ICRD : 2010 IPRT : 1 ICOP : 2010 GRN ICMT : https://globalrecordings.net/en/language/3475 grnprog : A63258 grnlang : 3475 grnprepared : 20240113 ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 3269kB time=00:01:44.61 bitrate= 256.0kbits/s speed= 722x video:0kB audio:3269kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.009559%
!./whisper.cpp/main -m whisper-ggml-sme/ggml-model.bin -f sample.wav
whisper_init_from_file_with_params_no_state: loading model from 'whisper-ggml-sme/ggml-model.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1280 whisper_model_load: n_audio_head = 20 whisper_model_load: n_audio_layer = 32 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1280 whisper_model_load: n_text_head = 20 whisper_model_load: n_text_layer = 32 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 5 (large) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 whisper_model_load: CPU total size = 3093.99 MB whisper_model_load: model size = 3093.99 MB whisper_init_state: kv self size = 220.20 MB whisper_init_state: kv cross size = 245.76 MB whisper_init_state: compute buffer (conv) = 34.82 MB whisper_init_state: compute buffer (encode) = 926.66 MB whisper_init_state: compute buffer (cross) = 9.38 MB whisper_init_state: compute buffer (decode) = 209.26 MB system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | main: processing 'sample.wav' (1673814 samples, 104.6 sec), 2 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...