Original here

%cd /tmp

/tmp

%%capture
!pip install git+https://github.com/pytorch/fairseq/

%%capture
!git clone https://github.com/pytorch/fairseq/

%cd fairseq/examples/wav2vec/unsupervised/scripts

/tmp/fairseq/examples/wav2vec/unsupervised/scripts

!mkdir tsv
!for i in train test valid; do echo /kaggle/input/wav2vec-u-cv-swedish-vads/wav/$i/common-voice-swedish-16bit-wav/ > tsv/$i.tsv; cat /kaggle/input/fork-of-wav2vec-u-cv-swedish-tsv/$i.tsv|sed '1d' >> tsv/$i.tsv;done
!cp /kaggle/input/wav2vec-u-cv-swedish-prep-ltr-phn-wrd/dic* tsv/
!cp /kaggle/input/wav2vec-u-cv-swedish-prep-ltr-phn-wrd/*.wrd tsv/
!cp /kaggle/input/wav2vec-u-cv-swedish-prep-ltr-phn-wrd/*.ltr tsv/
!cp /kaggle/input/wav2vec-u-cv-swedish-prep-ltr-phn-wrd/*.phn tsv/

%%capture
!pip install npy-append-array

!pip install faiss-gpu

%%capture
!apt-get -y install zsh

!zsh prepare_audio.sh tsv /kaggle/working /kaggle/input/download-xlsr-53-wav2vec2-model/xlsr_53_56k.pt

using 512 dim for PCA
100%|███████████████████████████████████████| 2331/2331 [01:21<00:00, 28.76it/s]
100%|███████████████████████████████████████| 2019/2019 [01:07<00:00, 29.98it/s]
100%|███████████████████████████████████████| 2027/2027 [01:08<00:00, 29.41it/s]
Faiss Specs: [faiss_spec(pca=0, norm=False, n_clus=128, sphere=False, spec_str='CLUS128')]
100%|███████████████████████████████████████| 2331/2331 [01:10<00:00, 33.09it/s]
(223140, 1024)
Processing spec faiss_spec(pca=0, norm=False, n_clus=128, sphere=False, spec_str='CLUS128')
Computing kmeans
Clustering 223140 points in 1024D to 128 clusters, redo 3 times, 50 iterations
  Preprocessing in 0.17 s
Outer iteration 0 / 3

Objective improved: keep new clusters
Outer iteration 1 / 3

Objective improved: keep new clusters
Outer iteration 2 / 3

Objective improved: keep new clusters
Faiss Spec: faiss_spec(pca=0, norm=False, n_clus=128, sphere=False, spec_str='CLUS128')
Loaded centroids (128, 1024)
100%|███████████████████████████████████████| 2331/2331 [00:58<00:00, 40.05it/s]
Faiss Spec: faiss_spec(pca=0, norm=False, n_clus=128, sphere=False, spec_str='CLUS128')
Loaded centroids (128, 1024)
100%|███████████████████████████████████████| 2019/2019 [00:57<00:00, 35.24it/s]
Faiss Spec: faiss_spec(pca=0, norm=False, n_clus=128, sphere=False, spec_str='CLUS128')
Loaded centroids (128, 1024)
100%|███████████████████████████████████████| 2027/2027 [00:51<00:00, 39.40it/s]
Reading features
Computing PCA
data path: /kaggle/working/train
  0%|                                                     | 0/1 [00:00<?, ?it/s]apply_pca.py:66: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729047590/work/torch/csrc/utils/tensor_numpy.cpp:141.)
  x = torch.from_numpy(features[start:end]).cuda()
100%|█████████████████████████████████████████████| 1/1 [00:01<00:00,  1.53s/it]
data path: /kaggle/working/precompute_pca512/train
100%|██████████████████████████████████████| 2331/2331 [00:05<00:00, 402.56it/s]
data path: /kaggle/working/precompute_pca512_cls128_mean/train
  0%|                                                  | 0/2331 [00:00<?, ?it/s]mean_pool.py:69: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729047590/work/torch/csrc/utils/tensor_numpy.cpp:141.)
  x = torch.from_numpy(feats).cuda()
100%|██████████████████████████████████████| 2331/2331 [00:03<00:00, 692.56it/s]
data path: /kaggle/working/valid
  0%|                                                     | 0/1 [00:00<?, ?it/s]apply_pca.py:66: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729047590/work/torch/csrc/utils/tensor_numpy.cpp:141.)
  x = torch.from_numpy(features[start:end]).cuda()
100%|█████████████████████████████████████████████| 1/1 [00:01<00:00,  1.60s/it]
data path: /kaggle/working/precompute_pca512/valid
100%|██████████████████████████████████████| 2019/2019 [00:04<00:00, 447.45it/s]
data path: /kaggle/working/precompute_pca512_cls128_mean/valid
  0%|                                                  | 0/2019 [00:00<?, ?it/s]mean_pool.py:69: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729047590/work/torch/csrc/utils/tensor_numpy.cpp:141.)
  x = torch.from_numpy(feats).cuda()
100%|██████████████████████████████████████| 2019/2019 [00:03<00:00, 592.35it/s]
data path: /kaggle/working/test
  0%|                                                     | 0/1 [00:00<?, ?it/s]apply_pca.py:66: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729047590/work/torch/csrc/utils/tensor_numpy.cpp:141.)
  x = torch.from_numpy(features[start:end]).cuda()
100%|█████████████████████████████████████████████| 1/1 [00:01<00:00,  1.22s/it]
data path: /kaggle/working/precompute_pca512/test
100%|██████████████████████████████████████| 2027/2027 [00:05<00:00, 379.47it/s]
data path: /kaggle/working/precompute_pca512_cls128_mean/test
  0%|                                                  | 0/2027 [00:00<?, ?it/s]mean_pool.py:69: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729047590/work/torch/csrc/utils/tensor_numpy.cpp:141.)
  x = torch.from_numpy(feats).cuda()
100%|██████████████████████████████████████| 2027/2027 [00:03<00:00, 569.65it/s]