add some code
This commit is contained in:
24
managed_components/78__esp-opus/dnn/LPCNet.yml
Normal file
24
managed_components/78__esp-opus/dnn/LPCNet.yml
Normal file
@@ -0,0 +1,24 @@
|
||||
#
|
||||
# install
|
||||
# conda env create -f=LPCNet.yml
|
||||
#
|
||||
# update
|
||||
# conda env update -f=LPCNet.yml
|
||||
#
|
||||
# activate
|
||||
# conda activate LPCNet
|
||||
#
|
||||
# remove
|
||||
# conda remove --name LPCNet --all
|
||||
#
|
||||
name: LPCNet
|
||||
channels:
|
||||
- anaconda
|
||||
- conda-forge
|
||||
dependencies:
|
||||
- keras==2.2.4
|
||||
- python>=3.6
|
||||
- tensorflow-gpu==1.12.0
|
||||
- cudatoolkit
|
||||
- h5py
|
||||
- numpy
|
||||
1
managed_components/78__esp-opus/dnn/README
Normal file
1
managed_components/78__esp-opus/dnn/README
Normal file
@@ -0,0 +1 @@
|
||||
See README.md
|
||||
126
managed_components/78__esp-opus/dnn/README.md
Normal file
126
managed_components/78__esp-opus/dnn/README.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# LPCNet
|
||||
|
||||
Low complexity implementation of the WaveRNN-based LPCNet algorithm, as described in:
|
||||
|
||||
- J.-M. Valin, J. Skoglund, [LPCNet: Improving Neural Speech Synthesis Through Linear Prediction](https://jmvalin.ca/papers/lpcnet_icassp2019.pdf), *Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, arXiv:1810.11846, 2019.
|
||||
- J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet](https://jmvalin.ca/papers/improved_lpcnet.pdf), *Proc. ICASSP*, arxiv:2106.04129, 2022.
|
||||
- K. Subramani, J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation](https://jmvalin.ca/papers/lpcnet_end2end.pdf), *Proc. INTERSPEECH*, arxiv:2106.04129, 2022.
|
||||
|
||||
For coding/PLC applications of LPCNet, see:
|
||||
|
||||
- J.-M. Valin, J. Skoglund, [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://jmvalin.ca/papers/lpcnet_codec.pdf), *Proc. INTERSPEECH*, arxiv:1903.12087, 2019.
|
||||
- J. Skoglund, J.-M. Valin, [Improving Opus Low Bit Rate Quality with Neural Speech Synthesis](https://jmvalin.ca/papers/opusnet.pdf), *Proc. INTERSPEECH*, arxiv:1905.04628, 2020.
|
||||
- J.-M. Valin, A. Mustafa, C. Montgomery, T.B. Terriberry, M. Klingbeil, P. Smaragdis, A. Krishnaswamy, [Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model](https://jmvalin.ca/papers/lpcnet_plc.pdf), *Proc. INTERSPEECH*, arxiv:2205.05785, 2022.
|
||||
- J.-M. Valin, J. Büthe, A. Mustafa, [Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder](https://jmvalin.ca/papers/valin_dred.pdf), *Proc. ICASSP*, arXiv:2212.04453, 2023. ([blog post](https://www.amazon.science/blog/neural-encoding-enables-more-efficient-recovery-of-lost-audio-packets))
|
||||
|
||||
# Introduction
|
||||
|
||||
Work in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 3 GFLOP) with SIMD support (SSE2, SSSE3, AVX, AVX2/FMA, NEON currently supported). The code also supports very low bitrate compression at 1.6 kb/s.
|
||||
|
||||
The BSD licensed software is written in C and Python/Keras. For training, a GTX 1080 Ti or better is recommended.
|
||||
|
||||
This software is an open source starting point for LPCNet/WaveRNN-based speech synthesis and coding.
|
||||
|
||||
# Using the existing software
|
||||
|
||||
You can build the code using:
|
||||
|
||||
```
|
||||
./autogen.sh
|
||||
./configure
|
||||
make
|
||||
```
|
||||
Note that the autogen.sh script is used when building from Git and will automatically download the latest model
|
||||
(models are too large to put in Git). By default, LPCNet will attempt to use 8-bit dot product instructions on AVX\*/Neon to
|
||||
speed up inference. To disable that (e.g. to avoid quantization effects when retraining), add --disable-dot-product to the
|
||||
configure script. LPCNet does not yet have a complete implementation for some of the integer operations on the ARMv7
|
||||
architecture so for now you will also need --disable-dot-product to successfully compile on 32-bit ARM.
|
||||
|
||||
It is highly recommended to set the CFLAGS environment variable to enable AVX or NEON *prior* to running configure, otherwise
|
||||
no vectorization will take place and the code will be very slow. On a recent x86 CPU, something like
|
||||
```
|
||||
export CFLAGS='-Ofast -g -march=native'
|
||||
```
|
||||
should work. On ARM, you can enable Neon with:
|
||||
```
|
||||
export CFLAGS='-Ofast -g -mfpu=neon'
|
||||
```
|
||||
While not strictly required, the -Ofast flag will help with auto-vectorization, especially for dot products that
|
||||
cannot be optimized without -ffast-math (which -Ofast enables). Additionally, -falign-loops=32 has been shown to
|
||||
help on x86.
|
||||
|
||||
You can test the capabilities of LPCNet using the lpcnet\_demo application. To encode a file:
|
||||
```
|
||||
./lpcnet_demo -encode input.pcm compressed.bin
|
||||
```
|
||||
where input.pcm is a 16-bit (machine endian) PCM file sampled at 16 kHz. The raw compressed data (no header)
|
||||
is written to compressed.bin and consists of 8 bytes per 40-ms packet.
|
||||
|
||||
To decode:
|
||||
```
|
||||
./lpcnet_demo -decode compressed.bin output.pcm
|
||||
```
|
||||
where output.pcm is also 16-bit, 16 kHz PCM.
|
||||
|
||||
Alternatively, you can run the uncompressed analysis/synthesis using -features
|
||||
instead of -encode and -synthesis instead of -decode.
|
||||
The same functionality is available in the form of a library. See include/lpcnet.h for the API.
|
||||
|
||||
To try packet loss concealment (PLC), you first need a PLC model, which you can get with:
|
||||
```
|
||||
./download_model.sh plc-3b1eab4
|
||||
```
|
||||
or (for the PLC challenge submission):
|
||||
```
|
||||
./download_model.sh plc_challenge
|
||||
```
|
||||
PLC can be tested with:
|
||||
```
|
||||
./lpcnet_demo -plc_file noncausal_dc error_pattern.txt input.pcm output.pcm
|
||||
```
|
||||
where error_pattern.txt is a text file with one entry per 20-ms packet, with 1 meaning "packet lost" and 0 meaning "packet not lost".
|
||||
noncausal_dc is the non-causal (5-ms look-ahead) with special handling for DC offsets. It's also possible to use "noncausal", "causal",
|
||||
or "causal_dc".
|
||||
|
||||
# Training a new model
|
||||
|
||||
This codebase is also meant for research and it is possible to train new models. These are the steps to do that:
|
||||
|
||||
1. Set up a Keras system with GPU.
|
||||
|
||||
1. Generate training data:
|
||||
```
|
||||
./dump_data -train input.s16 features.f32 data.s16
|
||||
```
|
||||
where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.
|
||||
|
||||
1. Now that you have your files, train with:
|
||||
```
|
||||
python3 training_tf2/train_lpcnet.py features.f32 data.s16 model_name
|
||||
```
|
||||
and it will generate an h5 file for each iteration, with model\_name as prefix. If it stops with a
|
||||
"Failed to allocate RNN reserve space" message try specifying a smaller --batch-size for train\_lpcnet.py.
|
||||
|
||||
1. You can synthesise speech with Python and your GPU card (very slow):
|
||||
```
|
||||
./dump_data -test test_input.s16 test_features.f32
|
||||
./training_tf2/test_lpcnet.py lpcnet_model_name.h5 test_features.f32 test.s16
|
||||
```
|
||||
|
||||
1. Or with C on a CPU (C inference is much faster):
|
||||
First extract the model files nnet\_data.h and nnet\_data.c
|
||||
```
|
||||
./training_tf2/dump_lpcnet.py lpcnet_model_name.h5
|
||||
```
|
||||
and move the generated nnet\_data.\* files to the src/ directory.
|
||||
Then you just need to rebuild the software and use lpcnet\_demo as explained above.
|
||||
|
||||
# Speech Material for Training
|
||||
|
||||
Suitable training material can be obtained from [Open Speech and Language Resources](https://www.openslr.org/). See the datasets.txt file for details on suitable training data.
|
||||
|
||||
# Reading Further
|
||||
|
||||
1. [LPCNet: DSP-Boosted Neural Speech Synthesis](https://people.xiph.org/~jm/demo/lpcnet/)
|
||||
1. [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://people.xiph.org/~jm/demo/lpcnet_codec/)
|
||||
1. Sample model files (check compatibility): https://media.xiph.org/lpcnet/data/
|
||||
449
managed_components/78__esp-opus/dnn/adaconvtest.c
Normal file
449
managed_components/78__esp-opus/dnn/adaconvtest.c
Normal file
@@ -0,0 +1,449 @@
|
||||
#include "lace_data.h"
|
||||
#include "nolace_data.h"
|
||||
#include "osce.h"
|
||||
#include "nndsp.h"
|
||||
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <math.h>
|
||||
|
||||
|
||||
extern const WeightArray lacelayers_arrays[];
|
||||
extern const WeightArray nolacelayers_arrays[];
|
||||
|
||||
void adaconv_compare(
|
||||
const char * prefix,
|
||||
int num_frames,
|
||||
AdaConvState* hAdaConv,
|
||||
LinearLayer *kernel_layer,
|
||||
LinearLayer *gain_layer,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int overlap_size,
|
||||
int in_channels,
|
||||
int out_channels,
|
||||
int kernel_size,
|
||||
int left_padding,
|
||||
float filter_gain_a,
|
||||
float filter_gain_b,
|
||||
float shape_gain
|
||||
)
|
||||
{
|
||||
char feature_file[256];
|
||||
char x_in_file[256];
|
||||
char x_out_file[256];
|
||||
char message[512];
|
||||
int i_frame, i_sample;
|
||||
float mse;
|
||||
float features[512];
|
||||
float x_in[512];
|
||||
float x_out_ref[512];
|
||||
float x_out[512];
|
||||
float window[40];
|
||||
|
||||
init_adaconv_state(hAdaConv);
|
||||
compute_overlap_window(window, 40);
|
||||
|
||||
FILE *f_features, *f_x_in, *f_x_out;
|
||||
|
||||
strcpy(feature_file, prefix);
|
||||
strcat(feature_file, "_features.f32");
|
||||
f_features = fopen(feature_file, "r");
|
||||
if (f_features == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", feature_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy(x_in_file, prefix);
|
||||
strcat(x_in_file, "_x_in.f32");
|
||||
f_x_in = fopen(x_in_file, "r");
|
||||
if (f_x_in == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", x_in_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy(x_out_file, prefix);
|
||||
strcat(x_out_file, "_x_out.f32");
|
||||
f_x_out = fopen(x_out_file, "r");
|
||||
if (f_x_out == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", x_out_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
for (i_frame = 0; i_frame < num_frames; i_frame ++)
|
||||
{
|
||||
if (fread(features, sizeof(float), feature_dim, f_features) != feature_dim)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, feature_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (fread(x_in, sizeof(float), frame_size * in_channels, f_x_in) != frame_size * in_channels)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_in_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (fread(x_out_ref, sizeof(float), frame_size * out_channels, f_x_out) != frame_size * out_channels)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_out_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
adaconv_process_frame(hAdaConv, x_out, x_in, features, kernel_layer, gain_layer, feature_dim,
|
||||
frame_size, overlap_size, in_channels, out_channels, kernel_size, left_padding,
|
||||
filter_gain_a, filter_gain_b, shape_gain, window, 0);
|
||||
|
||||
mse = 0;
|
||||
for (i_sample = 0; i_sample < frame_size * out_channels; i_sample ++)
|
||||
{
|
||||
mse += pow(x_out_ref[i_sample] - x_out[i_sample], 2);
|
||||
}
|
||||
mse = sqrt(mse / (frame_size * out_channels));
|
||||
printf("rmse[%d] %f\n", i_frame, mse);
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void adacomb_compare(
|
||||
const char * prefix,
|
||||
int num_frames,
|
||||
AdaCombState* hAdaComb,
|
||||
LinearLayer *kernel_layer,
|
||||
LinearLayer *gain_layer,
|
||||
LinearLayer *global_gain_layer,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int overlap_size,
|
||||
int kernel_size,
|
||||
int left_padding,
|
||||
float filter_gain_a,
|
||||
float filter_gain_b,
|
||||
float log_gain_limit
|
||||
)
|
||||
{
|
||||
char feature_file[256];
|
||||
char x_in_file[256];
|
||||
char p_in_file[256];
|
||||
char x_out_file[256];
|
||||
char message[512];
|
||||
int i_frame, i_sample;
|
||||
float mse;
|
||||
float features[512];
|
||||
float x_in[512];
|
||||
float x_out_ref[512];
|
||||
float x_out[512];
|
||||
int pitch_lag;
|
||||
float window[40];
|
||||
|
||||
init_adacomb_state(hAdaComb);
|
||||
compute_overlap_window(window, 40);
|
||||
|
||||
FILE *f_features, *f_x_in, *f_p_in, *f_x_out;
|
||||
|
||||
strcpy(feature_file, prefix);
|
||||
strcat(feature_file, "_features.f32");
|
||||
f_features = fopen(feature_file, "r");
|
||||
if (f_features == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", feature_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy(x_in_file, prefix);
|
||||
strcat(x_in_file, "_x_in.f32");
|
||||
f_x_in = fopen(x_in_file, "r");
|
||||
if (f_x_in == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", x_in_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy(p_in_file, prefix);
|
||||
strcat(p_in_file, "_p_in.s32");
|
||||
f_p_in = fopen(p_in_file, "r");
|
||||
if (f_p_in == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", p_in_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy(x_out_file, prefix);
|
||||
strcat(x_out_file, "_x_out.f32");
|
||||
f_x_out = fopen(x_out_file, "r");
|
||||
if (f_x_out == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", x_out_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
for (i_frame = 0; i_frame < num_frames; i_frame ++)
|
||||
{
|
||||
if (fread(features, sizeof(float), feature_dim, f_features) != feature_dim)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, feature_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (fread(x_in, sizeof(float), frame_size, f_x_in) != frame_size)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_in_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (fread(&pitch_lag, sizeof(int), 1, f_p_in) != 1)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, p_in_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (fread(x_out_ref, sizeof(float), frame_size, f_x_out) != frame_size)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_out_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
adacomb_process_frame(hAdaComb, x_out, x_in, features, kernel_layer, gain_layer, global_gain_layer,
|
||||
pitch_lag, feature_dim, frame_size, overlap_size, kernel_size, left_padding, filter_gain_a, filter_gain_b, log_gain_limit, window, 0);
|
||||
|
||||
|
||||
mse = 0;
|
||||
for (i_sample = 0; i_sample < frame_size; i_sample ++)
|
||||
{
|
||||
mse += pow(x_out_ref[i_sample] - x_out[i_sample], 2);
|
||||
}
|
||||
mse = sqrt(mse / (frame_size));
|
||||
printf("rmse[%d] %f\n", i_frame, mse);
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
void adashape_compare(
|
||||
const char * prefix,
|
||||
int num_frames,
|
||||
AdaShapeState* hAdaShape,
|
||||
LinearLayer *alpha1,
|
||||
LinearLayer *alpha2,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int avg_pool_k
|
||||
)
|
||||
{
|
||||
char feature_file[256];
|
||||
char x_in_file[256];
|
||||
char x_out_file[256];
|
||||
char message[512];
|
||||
int i_frame, i_sample;
|
||||
float mse;
|
||||
float features[512];
|
||||
float x_in[512];
|
||||
float x_out_ref[512];
|
||||
float x_out[512];
|
||||
|
||||
init_adashape_state(hAdaShape);
|
||||
|
||||
FILE *f_features, *f_x_in, *f_x_out;
|
||||
|
||||
strcpy(feature_file, prefix);
|
||||
strcat(feature_file, "_features.f32");
|
||||
f_features = fopen(feature_file, "r");
|
||||
if (f_features == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", feature_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy(x_in_file, prefix);
|
||||
strcat(x_in_file, "_x_in.f32");
|
||||
f_x_in = fopen(x_in_file, "r");
|
||||
if (f_x_in == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", x_in_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy(x_out_file, prefix);
|
||||
strcat(x_out_file, "_x_out.f32");
|
||||
f_x_out = fopen(x_out_file, "r");
|
||||
if (f_x_out == NULL)
|
||||
{
|
||||
sprintf(message, "could not open file %s", x_out_file);
|
||||
perror(message);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
for (i_frame = 0; i_frame < num_frames; i_frame ++)
|
||||
{
|
||||
if (fread(features, sizeof(float), feature_dim, f_features) != feature_dim)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, feature_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (fread(x_in, sizeof(float), frame_size, f_x_in) != frame_size)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_in_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (fread(x_out_ref, sizeof(float), frame_size, f_x_out) != frame_size)
|
||||
{
|
||||
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_out_file);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
adashape_process_frame(hAdaShape, x_out, x_in, features, alpha1, alpha2, feature_dim,
|
||||
frame_size, avg_pool_k, 0);
|
||||
|
||||
mse = 0;
|
||||
for (i_sample = 0; i_sample < frame_size; i_sample ++)
|
||||
{
|
||||
mse += pow(x_out_ref[i_sample] - x_out[i_sample], 2);
|
||||
}
|
||||
mse = sqrt(mse / (frame_size));
|
||||
printf("rmse[%d] %f\n", i_frame, mse);
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
int main()
|
||||
{
|
||||
LACELayers hLACE;
|
||||
NOLACELayers hNoLACE;
|
||||
|
||||
AdaConvState hAdaConv;
|
||||
AdaCombState hAdaComb;
|
||||
AdaShapeState hAdaShape;
|
||||
|
||||
init_adaconv_state(&hAdaConv);
|
||||
|
||||
init_lacelayers(&hLACE, lacelayers_arrays);
|
||||
init_nolacelayers(&hNoLACE, nolacelayers_arrays);
|
||||
|
||||
printf("\ntesting lace.af1 (1 in, 1 out)...\n");
|
||||
adaconv_compare(
|
||||
"testvectors/lace_af1",
|
||||
5,
|
||||
&hAdaConv,
|
||||
&hLACE.lace_af1_kernel,
|
||||
&hLACE.lace_af1_gain,
|
||||
LACE_AF1_FEATURE_DIM,
|
||||
LACE_AF1_FRAME_SIZE,
|
||||
LACE_AF1_OVERLAP_SIZE,
|
||||
LACE_AF1_IN_CHANNELS,
|
||||
LACE_AF1_OUT_CHANNELS,
|
||||
LACE_AF1_KERNEL_SIZE,
|
||||
LACE_AF1_LEFT_PADDING,
|
||||
LACE_AF1_FILTER_GAIN_A,
|
||||
LACE_AF1_FILTER_GAIN_B,
|
||||
LACE_AF1_SHAPE_GAIN
|
||||
);
|
||||
|
||||
|
||||
printf("\ntesting nolace.af1 (1 in, 2 out)...\n");
|
||||
adaconv_compare(
|
||||
"testvectors/nolace_af1",
|
||||
5,
|
||||
&hAdaConv,
|
||||
&hNoLACE.nolace_af1_kernel,
|
||||
&hNoLACE.nolace_af1_gain,
|
||||
NOLACE_AF1_FEATURE_DIM,
|
||||
NOLACE_AF1_FRAME_SIZE,
|
||||
NOLACE_AF1_OVERLAP_SIZE,
|
||||
NOLACE_AF1_IN_CHANNELS,
|
||||
NOLACE_AF1_OUT_CHANNELS,
|
||||
NOLACE_AF1_KERNEL_SIZE,
|
||||
NOLACE_AF1_LEFT_PADDING,
|
||||
NOLACE_AF1_FILTER_GAIN_A,
|
||||
NOLACE_AF1_FILTER_GAIN_B,
|
||||
NOLACE_AF1_SHAPE_GAIN
|
||||
);
|
||||
|
||||
|
||||
printf("testing nolace.af4 (2 in, 1 out)...\n");
|
||||
adaconv_compare(
|
||||
"testvectors/nolace_af4",
|
||||
5,
|
||||
&hAdaConv,
|
||||
&hNoLACE.nolace_af4_kernel,
|
||||
&hNoLACE.nolace_af4_gain,
|
||||
NOLACE_AF4_FEATURE_DIM,
|
||||
NOLACE_AF4_FRAME_SIZE,
|
||||
NOLACE_AF4_OVERLAP_SIZE,
|
||||
NOLACE_AF4_IN_CHANNELS,
|
||||
NOLACE_AF4_OUT_CHANNELS,
|
||||
NOLACE_AF4_KERNEL_SIZE,
|
||||
NOLACE_AF4_LEFT_PADDING,
|
||||
NOLACE_AF4_FILTER_GAIN_A,
|
||||
NOLACE_AF4_FILTER_GAIN_B,
|
||||
NOLACE_AF4_SHAPE_GAIN
|
||||
);
|
||||
|
||||
printf("\ntesting nolace.af2 (2 in, 2 out)...\n");
|
||||
adaconv_compare(
|
||||
"testvectors/nolace_af2",
|
||||
5,
|
||||
&hAdaConv,
|
||||
&hNoLACE.nolace_af2_kernel,
|
||||
&hNoLACE.nolace_af2_gain,
|
||||
NOLACE_AF2_FEATURE_DIM,
|
||||
NOLACE_AF2_FRAME_SIZE,
|
||||
NOLACE_AF2_OVERLAP_SIZE,
|
||||
NOLACE_AF2_IN_CHANNELS,
|
||||
NOLACE_AF2_OUT_CHANNELS,
|
||||
NOLACE_AF2_KERNEL_SIZE,
|
||||
NOLACE_AF2_LEFT_PADDING,
|
||||
NOLACE_AF2_FILTER_GAIN_A,
|
||||
NOLACE_AF2_FILTER_GAIN_B,
|
||||
NOLACE_AF2_SHAPE_GAIN
|
||||
);
|
||||
|
||||
printf("\ntesting lace.cf1...\n");
|
||||
adacomb_compare(
|
||||
"testvectors/lace_cf1",
|
||||
5,
|
||||
&hAdaComb,
|
||||
&hLACE.lace_cf1_kernel,
|
||||
&hLACE.lace_cf1_gain,
|
||||
&hLACE.lace_cf1_global_gain,
|
||||
LACE_CF1_FEATURE_DIM,
|
||||
LACE_CF1_FRAME_SIZE,
|
||||
LACE_CF1_OVERLAP_SIZE,
|
||||
LACE_CF1_KERNEL_SIZE,
|
||||
LACE_CF1_LEFT_PADDING,
|
||||
LACE_CF1_FILTER_GAIN_A,
|
||||
LACE_CF1_FILTER_GAIN_B,
|
||||
LACE_CF1_LOG_GAIN_LIMIT
|
||||
);
|
||||
|
||||
printf("\ntesting nolace.tdshape1...\n");
|
||||
adashape_compare(
|
||||
"testvectors/nolace_tdshape1",
|
||||
5,
|
||||
&hAdaShape,
|
||||
&hNoLACE.nolace_tdshape1_alpha1,
|
||||
&hNoLACE.nolace_tdshape1_alpha2,
|
||||
NOLACE_TDSHAPE1_FEATURE_DIM,
|
||||
NOLACE_TDSHAPE1_FRAME_SIZE,
|
||||
NOLACE_TDSHAPE1_AVG_POOL_K
|
||||
);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* gcc -DVAR_ARRAYS -DENABLE_OSCE -I ../include -I ../silk -I . -I ../celt adaconvtest.c nndsp.c lace_data.c nolace_data.c nnet.c nnet_default.c ../celt/pitch.c ../celt/celt_lpc.c parse_lpcnet_weights.c -lm -o adaconvtest */
|
||||
88
managed_components/78__esp-opus/dnn/arm/arm_dnn_map.c
Normal file
88
managed_components/78__esp-opus/dnn/arm/arm_dnn_map.c
Normal file
@@ -0,0 +1,88 @@
|
||||
/* Copyright (c) 2018-2019 Mozilla
|
||||
2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "arm/armcpu.h"
|
||||
#include "nnet.h"
|
||||
|
||||
#if defined(OPUS_HAVE_RTCD)
|
||||
|
||||
#if (defined(OPUS_ARM_MAY_HAVE_DOTPROD) && !defined(OPUS_ARM_PRESUME_DOTPROD))
|
||||
|
||||
void (*const DNN_COMPUTE_LINEAR_IMPL[OPUS_ARCHMASK + 1])(
|
||||
const LinearLayer *linear,
|
||||
float *out,
|
||||
const float *in
|
||||
) = {
|
||||
compute_linear_c, /* default */
|
||||
compute_linear_c,
|
||||
compute_linear_c,
|
||||
MAY_HAVE_NEON(compute_linear), /* neon */
|
||||
MAY_HAVE_DOTPROD(compute_linear) /* dotprod */
|
||||
};
|
||||
|
||||
#endif
|
||||
|
||||
#if (defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON)) && !defined(OPUS_ARM_PRESUME_NEON)
|
||||
|
||||
void (*const DNN_COMPUTE_ACTIVATION_IMPL[OPUS_ARCHMASK + 1])(
|
||||
float *output,
|
||||
const float *input,
|
||||
int N,
|
||||
int activation
|
||||
) = {
|
||||
compute_activation_c, /* default */
|
||||
compute_activation_c,
|
||||
compute_activation_c,
|
||||
MAY_HAVE_NEON(compute_activation), /* neon */
|
||||
MAY_HAVE_DOTPROD(compute_activation) /* dotprod */
|
||||
};
|
||||
|
||||
void (*const DNN_COMPUTE_CONV2D_IMPL[OPUS_ARCHMASK + 1])(
|
||||
const Conv2dLayer *conv,
|
||||
float *out,
|
||||
float *mem,
|
||||
const float *in,
|
||||
int height,
|
||||
int hstride,
|
||||
int activation
|
||||
) = {
|
||||
compute_conv2d_c, /* default */
|
||||
compute_conv2d_c,
|
||||
compute_conv2d_c,
|
||||
MAY_HAVE_NEON(compute_conv2d), /* neon */
|
||||
MAY_HAVE_DOTPROD(compute_conv2d) /* dotprod */
|
||||
};
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
#endif
|
||||
104
managed_components/78__esp-opus/dnn/arm/dnn_arm.h
Normal file
104
managed_components/78__esp-opus/dnn/arm/dnn_arm.h
Normal file
@@ -0,0 +1,104 @@
|
||||
/* Copyright (c) 2011-2019 Mozilla
|
||||
2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DNN_ARM_H
|
||||
#define DNN_ARM_H
|
||||
|
||||
#include "cpu_support.h"
|
||||
#include "opus_types.h"
|
||||
|
||||
void compute_linear_dotprod(const LinearLayer *linear, float *out, const float *in);
|
||||
void compute_linear_neon(const LinearLayer *linear, float *out, const float *in);
|
||||
|
||||
void compute_activation_neon(float *output, const float *input, int N, int activation);
|
||||
void compute_activation_dotprod(float *output, const float *input, int N, int activation);
|
||||
|
||||
void compute_conv2d_neon(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation);
|
||||
void compute_conv2d_dotprod(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation);
|
||||
|
||||
#if defined(OPUS_ARM_PRESUME_DOTPROD)
|
||||
|
||||
#define OVERRIDE_COMPUTE_LINEAR
|
||||
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_dotprod(linear, out, in))
|
||||
|
||||
#elif defined(OPUS_ARM_PRESUME_NEON_INTR) && !defined(OPUS_ARM_MAY_HAVE_DOTPROD)
|
||||
|
||||
#define OVERRIDE_COMPUTE_LINEAR
|
||||
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_neon(linear, out, in))
|
||||
|
||||
#elif defined(OPUS_HAVE_RTCD) && (defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON))
|
||||
|
||||
extern void (*const DNN_COMPUTE_LINEAR_IMPL[OPUS_ARCHMASK + 1])(
|
||||
const LinearLayer *linear,
|
||||
float *out,
|
||||
const float *in
|
||||
);
|
||||
#define OVERRIDE_COMPUTE_LINEAR
|
||||
#define compute_linear(linear, out, in, arch) \
|
||||
((*DNN_COMPUTE_LINEAR_IMPL[(arch) & OPUS_ARCHMASK])(linear, out, in))
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
#if defined(OPUS_ARM_PRESUME_NEON)
|
||||
|
||||
#define OVERRIDE_COMPUTE_ACTIVATION
|
||||
#define compute_activation(output, input, N, activation, arch) ((void)(arch),compute_activation_neon(output, input, N, activation))
|
||||
#define OVERRIDE_COMPUTE_CONV2D
|
||||
#define compute_conv2d(conv, out, mem, in, height, hstride, activation, arch) ((void)(arch),compute_conv2d_neon(conv, out, mem, in, height, hstride, activation))
|
||||
|
||||
#elif defined(OPUS_HAVE_RTCD) && (defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON))
|
||||
|
||||
extern void (*const DNN_COMPUTE_ACTIVATION_IMPL[OPUS_ARCHMASK + 1])(
|
||||
float *output,
|
||||
const float *input,
|
||||
int N,
|
||||
int activation
|
||||
);
|
||||
#define OVERRIDE_COMPUTE_ACTIVATION
|
||||
#define compute_activation(output, input, N, activation, arch) \
|
||||
((*DNN_COMPUTE_ACTIVATION_IMPL[(arch) & OPUS_ARCHMASK])(output, input, N, activation))
|
||||
|
||||
|
||||
extern void (*const DNN_COMPUTE_CONV2D_IMPL[OPUS_ARCHMASK + 1])(
|
||||
const Conv2dLayer *conv,
|
||||
float *out,
|
||||
float *mem,
|
||||
const float *in,
|
||||
int height,
|
||||
int hstride,
|
||||
int activation
|
||||
);
|
||||
#define OVERRIDE_COMPUTE_CONV2D
|
||||
#define compute_conv2d(conv, out, mem, in, height, hstride, activation, arch) \
|
||||
((*DNN_COMPUTE_CONV2D_IMPL[(arch) & OPUS_ARCHMASK])(conv, out, mem, in, height, hstride, activation))
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
#endif /* DNN_ARM_H */
|
||||
38
managed_components/78__esp-opus/dnn/arm/nnet_dotprod.c
Normal file
38
managed_components/78__esp-opus/dnn/arm/nnet_dotprod.c
Normal file
@@ -0,0 +1,38 @@
|
||||
/* Copyright (c) 2018-2019 Mozilla
|
||||
2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#ifndef __ARM_FEATURE_DOTPROD
|
||||
#error nnet_dotprod.c is being compiled without DOTPROD enabled
|
||||
#endif
|
||||
|
||||
#define RTCD_ARCH dotprod
|
||||
|
||||
#include "nnet_arch.h"
|
||||
38
managed_components/78__esp-opus/dnn/arm/nnet_neon.c
Normal file
38
managed_components/78__esp-opus/dnn/arm/nnet_neon.c
Normal file
@@ -0,0 +1,38 @@
|
||||
/* Copyright (c) 2018-2019 Mozilla
|
||||
2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#if !(defined(__ARM_NEON__) || defined(__ARM_NEON))
|
||||
#error nnet_neon.c is being compiled without Neon enabled
|
||||
#endif
|
||||
|
||||
#define RTCD_ARCH neon
|
||||
|
||||
#include "nnet_arch.h"
|
||||
246
managed_components/78__esp-opus/dnn/burg.c
Normal file
246
managed_components/78__esp-opus/dnn/burg.c
Normal file
@@ -0,0 +1,246 @@
|
||||
/***********************************************************************
|
||||
Copyright (c) 2006-2011, Skype Limited. All rights reserved.
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
- Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of Internet Society, IETF or IETF Trust, nor the
|
||||
names of specific contributors, may be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
***********************************************************************/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <math.h>
|
||||
#include <string.h>
|
||||
#include <assert.h>
|
||||
|
||||
#include "arch.h"
|
||||
#include "burg.h"
|
||||
|
||||
#define MAX_FRAME_SIZE 384 /* subfr_length * nb_subfr = ( 0.005 * 16000 + 16 ) * 4 = 384*/
|
||||
#define SILK_MAX_ORDER_LPC 16
|
||||
#define FIND_LPC_COND_FAC 1e-5f
|
||||
|
||||
/* sum of squares of a silk_float array, with result as double */
|
||||
static double silk_energy_FLP(
|
||||
const float *data,
|
||||
int dataSize
|
||||
)
|
||||
{
|
||||
int i;
|
||||
double result;
|
||||
|
||||
/* 4x unrolled loop */
|
||||
result = 0.0;
|
||||
for( i = 0; i < dataSize - 3; i += 4 ) {
|
||||
result += data[ i + 0 ] * (double)data[ i + 0 ] +
|
||||
data[ i + 1 ] * (double)data[ i + 1 ] +
|
||||
data[ i + 2 ] * (double)data[ i + 2 ] +
|
||||
data[ i + 3 ] * (double)data[ i + 3 ];
|
||||
}
|
||||
|
||||
/* add any remaining products */
|
||||
for( ; i < dataSize; i++ ) {
|
||||
result += data[ i ] * (double)data[ i ];
|
||||
}
|
||||
|
||||
assert( result >= 0.0 );
|
||||
return result;
|
||||
}
|
||||
|
||||
/* inner product of two silk_float arrays, with result as double */
|
||||
static double silk_inner_product_FLP(
|
||||
const float *data1,
|
||||
const float *data2,
|
||||
int dataSize
|
||||
)
|
||||
{
|
||||
int i;
|
||||
double result;
|
||||
|
||||
/* 4x unrolled loop */
|
||||
result = 0.0;
|
||||
for( i = 0; i < dataSize - 3; i += 4 ) {
|
||||
result += data1[ i + 0 ] * (double)data2[ i + 0 ] +
|
||||
data1[ i + 1 ] * (double)data2[ i + 1 ] +
|
||||
data1[ i + 2 ] * (double)data2[ i + 2 ] +
|
||||
data1[ i + 3 ] * (double)data2[ i + 3 ];
|
||||
}
|
||||
|
||||
/* add any remaining products */
|
||||
for( ; i < dataSize; i++ ) {
|
||||
result += data1[ i ] * (double)data2[ i ];
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
/* Compute reflection coefficients from input signal */
|
||||
float silk_burg_analysis( /* O returns residual energy */
|
||||
float A[], /* O prediction coefficients (length order) */
|
||||
const float x[], /* I input signal, length: nb_subfr*(D+L_sub) */
|
||||
const float minInvGain, /* I minimum inverse prediction gain */
|
||||
const int subfr_length, /* I input signal subframe length (incl. D preceding samples) */
|
||||
const int nb_subfr, /* I number of subframes stacked in x */
|
||||
const int D /* I order */
|
||||
)
|
||||
{
|
||||
int k, n, s, reached_max_gain;
|
||||
double C0, invGain, num, nrg_f, nrg_b, rc, Atmp, tmp1, tmp2;
|
||||
const float *x_ptr;
|
||||
double C_first_row[ SILK_MAX_ORDER_LPC ], C_last_row[ SILK_MAX_ORDER_LPC ];
|
||||
double CAf[ SILK_MAX_ORDER_LPC + 1 ], CAb[ SILK_MAX_ORDER_LPC + 1 ];
|
||||
double Af[ SILK_MAX_ORDER_LPC ];
|
||||
|
||||
assert( subfr_length * nb_subfr <= MAX_FRAME_SIZE );
|
||||
|
||||
/* Compute autocorrelations, added over subframes */
|
||||
C0 = silk_energy_FLP( x, nb_subfr * subfr_length );
|
||||
memset( C_first_row, 0, SILK_MAX_ORDER_LPC * sizeof( double ) );
|
||||
for( s = 0; s < nb_subfr; s++ ) {
|
||||
x_ptr = x + s * subfr_length;
|
||||
for( n = 1; n < D + 1; n++ ) {
|
||||
C_first_row[ n - 1 ] += silk_inner_product_FLP( x_ptr, x_ptr + n, subfr_length - n );
|
||||
}
|
||||
}
|
||||
memcpy( C_last_row, C_first_row, SILK_MAX_ORDER_LPC * sizeof( double ) );
|
||||
|
||||
/* Initialize */
|
||||
CAb[ 0 ] = CAf[ 0 ] = C0 + FIND_LPC_COND_FAC * C0 + 1e-9f;
|
||||
invGain = 1.0f;
|
||||
reached_max_gain = 0;
|
||||
for( n = 0; n < D; n++ ) {
|
||||
/* Update first row of correlation matrix (without first element) */
|
||||
/* Update last row of correlation matrix (without last element, stored in reversed order) */
|
||||
/* Update C * Af */
|
||||
/* Update C * flipud(Af) (stored in reversed order) */
|
||||
for( s = 0; s < nb_subfr; s++ ) {
|
||||
x_ptr = x + s * subfr_length;
|
||||
tmp1 = x_ptr[ n ];
|
||||
tmp2 = x_ptr[ subfr_length - n - 1 ];
|
||||
for( k = 0; k < n; k++ ) {
|
||||
C_first_row[ k ] -= x_ptr[ n ] * x_ptr[ n - k - 1 ];
|
||||
C_last_row[ k ] -= x_ptr[ subfr_length - n - 1 ] * x_ptr[ subfr_length - n + k ];
|
||||
Atmp = Af[ k ];
|
||||
tmp1 += x_ptr[ n - k - 1 ] * Atmp;
|
||||
tmp2 += x_ptr[ subfr_length - n + k ] * Atmp;
|
||||
}
|
||||
for( k = 0; k <= n; k++ ) {
|
||||
CAf[ k ] -= tmp1 * x_ptr[ n - k ];
|
||||
CAb[ k ] -= tmp2 * x_ptr[ subfr_length - n + k - 1 ];
|
||||
}
|
||||
}
|
||||
tmp1 = C_first_row[ n ];
|
||||
tmp2 = C_last_row[ n ];
|
||||
for( k = 0; k < n; k++ ) {
|
||||
Atmp = Af[ k ];
|
||||
tmp1 += C_last_row[ n - k - 1 ] * Atmp;
|
||||
tmp2 += C_first_row[ n - k - 1 ] * Atmp;
|
||||
}
|
||||
CAf[ n + 1 ] = tmp1;
|
||||
CAb[ n + 1 ] = tmp2;
|
||||
|
||||
/* Calculate nominator and denominator for the next order reflection (parcor) coefficient */
|
||||
num = CAb[ n + 1 ];
|
||||
nrg_b = CAb[ 0 ];
|
||||
nrg_f = CAf[ 0 ];
|
||||
for( k = 0; k < n; k++ ) {
|
||||
Atmp = Af[ k ];
|
||||
num += CAb[ n - k ] * Atmp;
|
||||
nrg_b += CAb[ k + 1 ] * Atmp;
|
||||
nrg_f += CAf[ k + 1 ] * Atmp;
|
||||
}
|
||||
assert( nrg_f > 0.0 );
|
||||
assert( nrg_b > 0.0 );
|
||||
|
||||
/* Calculate the next order reflection (parcor) coefficient */
|
||||
rc = -2.0 * num / ( nrg_f + nrg_b );
|
||||
assert( rc > -1.0 && rc < 1.0 );
|
||||
|
||||
/* Update inverse prediction gain */
|
||||
tmp1 = invGain * ( 1.0 - rc * rc );
|
||||
if( tmp1 <= minInvGain ) {
|
||||
/* Max prediction gain exceeded; set reflection coefficient such that max prediction gain is exactly hit */
|
||||
rc = sqrt( 1.0 - minInvGain / invGain );
|
||||
if( num > 0 ) {
|
||||
/* Ensure adjusted reflection coefficients has the original sign */
|
||||
rc = -rc;
|
||||
}
|
||||
invGain = minInvGain;
|
||||
reached_max_gain = 1;
|
||||
} else {
|
||||
invGain = tmp1;
|
||||
}
|
||||
|
||||
/* Update the AR coefficients */
|
||||
for( k = 0; k < (n + 1) >> 1; k++ ) {
|
||||
tmp1 = Af[ k ];
|
||||
tmp2 = Af[ n - k - 1 ];
|
||||
Af[ k ] = tmp1 + rc * tmp2;
|
||||
Af[ n - k - 1 ] = tmp2 + rc * tmp1;
|
||||
}
|
||||
Af[ n ] = rc;
|
||||
|
||||
if( reached_max_gain ) {
|
||||
/* Reached max prediction gain; set remaining coefficients to zero and exit loop */
|
||||
for( k = n + 1; k < D; k++ ) {
|
||||
Af[ k ] = 0.0;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
/* Update C * Af and C * Ab */
|
||||
for( k = 0; k <= n + 1; k++ ) {
|
||||
tmp1 = CAf[ k ];
|
||||
CAf[ k ] += rc * CAb[ n - k + 1 ];
|
||||
CAb[ n - k + 1 ] += rc * tmp1;
|
||||
}
|
||||
}
|
||||
|
||||
if( reached_max_gain ) {
|
||||
/* Convert to float */
|
||||
for( k = 0; k < D; k++ ) {
|
||||
A[ k ] = (float)( -Af[ k ] );
|
||||
}
|
||||
/* Subtract energy of preceding samples from C0 */
|
||||
for( s = 0; s < nb_subfr; s++ ) {
|
||||
C0 -= silk_energy_FLP( x + s * subfr_length, D );
|
||||
}
|
||||
/* Approximate residual energy */
|
||||
nrg_f = C0 * invGain;
|
||||
} else {
|
||||
/* Compute residual energy and store coefficients as float */
|
||||
nrg_f = CAf[ 0 ];
|
||||
tmp1 = 1.0;
|
||||
for( k = 0; k < D; k++ ) {
|
||||
Atmp = Af[ k ];
|
||||
nrg_f += CAf[ k + 1 ] * Atmp;
|
||||
tmp1 += Atmp * Atmp;
|
||||
A[ k ] = (float)(-Atmp);
|
||||
}
|
||||
nrg_f -= FIND_LPC_COND_FAC * C0 * tmp1;
|
||||
}
|
||||
|
||||
/* Return residual energy */
|
||||
return MAX32(0, (float)nrg_f);
|
||||
}
|
||||
41
managed_components/78__esp-opus/dnn/burg.h
Normal file
41
managed_components/78__esp-opus/dnn/burg.h
Normal file
@@ -0,0 +1,41 @@
|
||||
/***********************************************************************
|
||||
Copyright (c) 2006-2011, Skype Limited. All rights reserved.
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
- Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of Internet Society, IETF or IETF Trust, nor the
|
||||
names of specific contributors, may be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
***********************************************************************/
|
||||
|
||||
#ifndef BURG_H
|
||||
#define BURG_H
|
||||
|
||||
|
||||
float silk_burg_analysis( /* O returns residual energy */
|
||||
float A[], /* O prediction coefficients (length order) */
|
||||
const float x[], /* I input signal, length: nb_subfr*(D+L_sub) */
|
||||
const float minInvGain, /* I minimum inverse prediction gain */
|
||||
const int subfr_length, /* I input signal subframe length (incl. D preceding samples) */
|
||||
const int nb_subfr, /* I number of subframes stacked in x */
|
||||
const int D /* I order */
|
||||
);
|
||||
|
||||
#endif
|
||||
56
managed_components/78__esp-opus/dnn/common.h
Normal file
56
managed_components/78__esp-opus/dnn/common.h
Normal file
@@ -0,0 +1,56 @@
|
||||
|
||||
|
||||
#ifndef COMMON_H
|
||||
#define COMMON_H
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <math.h>
|
||||
#include "opus_defines.h"
|
||||
|
||||
#define LOG256 5.5451774445f
|
||||
static OPUS_INLINE float log2_approx(float x)
|
||||
{
|
||||
int integer;
|
||||
float frac;
|
||||
union {
|
||||
float f;
|
||||
int i;
|
||||
} in;
|
||||
in.f = x;
|
||||
integer = (in.i>>23)-127;
|
||||
in.i -= integer<<23;
|
||||
frac = in.f - 1.5f;
|
||||
frac = -0.41445418f + frac*(0.95909232f
|
||||
+ frac*(-0.33951290f + frac*0.16541097f));
|
||||
return 1+integer+frac;
|
||||
}
|
||||
|
||||
#define log_approx(x) (0.69315f*log2_approx(x))
|
||||
|
||||
static OPUS_INLINE float ulaw2lin(float u)
|
||||
{
|
||||
float s;
|
||||
float scale_1 = 32768.f/255.f;
|
||||
u = u - 128.f;
|
||||
s = u >= 0.f ? 1.f : -1.f;
|
||||
u = fabs(u);
|
||||
return s*scale_1*(exp(u/128.*LOG256)-1);
|
||||
}
|
||||
|
||||
static OPUS_INLINE int lin2ulaw(float x)
|
||||
{
|
||||
float u;
|
||||
float scale = 255.f/32768.f;
|
||||
int s = x >= 0 ? 1 : -1;
|
||||
x = fabs(x);
|
||||
u = (s*(128*log_approx(1+scale*x)/LOG256));
|
||||
u = 128 + u;
|
||||
if (u < 0) u = 0;
|
||||
if (u > 255) u = 255;
|
||||
return (int)floor(.5 + u);
|
||||
}
|
||||
|
||||
|
||||
|
||||
#endif
|
||||
163
managed_components/78__esp-opus/dnn/datasets.txt
Normal file
163
managed_components/78__esp-opus/dnn/datasets.txt
Normal file
@@ -0,0 +1,163 @@
|
||||
The following datasets can be used to train a language-independent FARGAN model
|
||||
and a Deep REDundancy (DRED) model. Note that this data typically needs to be
|
||||
resampled before it can be used.
|
||||
|
||||
https://www.openslr.org/resources/30/si_lk.tar.gz
|
||||
https://www.openslr.org/resources/32/af_za.tar.gz
|
||||
https://www.openslr.org/resources/32/st_za.tar.gz
|
||||
https://www.openslr.org/resources/32/tn_za.tar.gz
|
||||
https://www.openslr.org/resources/32/xh_za.tar.gz
|
||||
https://www.openslr.org/resources/37/bn_bd.zip
|
||||
https://www.openslr.org/resources/37/bn_in.zip
|
||||
https://www.openslr.org/resources/41/jv_id_female.zip
|
||||
https://www.openslr.org/resources/41/jv_id_male.zip
|
||||
https://www.openslr.org/resources/42/km_kh_male.zip
|
||||
https://www.openslr.org/resources/43/ne_np_female.zip
|
||||
https://www.openslr.org/resources/44/su_id_female.zip
|
||||
https://www.openslr.org/resources/44/su_id_male.zip
|
||||
https://www.openslr.org/resources/61/es_ar_female.zip
|
||||
https://www.openslr.org/resources/61/es_ar_male.zip
|
||||
https://www.openslr.org/resources/63/ml_in_female.zip
|
||||
https://www.openslr.org/resources/63/ml_in_male.zip
|
||||
https://www.openslr.org/resources/64/mr_in_female.zip
|
||||
https://www.openslr.org/resources/65/ta_in_female.zip
|
||||
https://www.openslr.org/resources/65/ta_in_male.zip
|
||||
https://www.openslr.org/resources/66/te_in_female.zip
|
||||
https://www.openslr.org/resources/66/te_in_male.zip
|
||||
https://www.openslr.org/resources/69/ca_es_female.zip
|
||||
https://www.openslr.org/resources/69/ca_es_male.zip
|
||||
https://www.openslr.org/resources/70/en_ng_female.zip
|
||||
https://www.openslr.org/resources/70/en_ng_male.zip
|
||||
https://www.openslr.org/resources/71/es_cl_female.zip
|
||||
https://www.openslr.org/resources/71/es_cl_male.zip
|
||||
https://www.openslr.org/resources/72/es_co_female.zip
|
||||
https://www.openslr.org/resources/72/es_co_male.zip
|
||||
https://www.openslr.org/resources/73/es_pe_female.zip
|
||||
https://www.openslr.org/resources/73/es_pe_male.zip
|
||||
https://www.openslr.org/resources/74/es_pr_female.zip
|
||||
https://www.openslr.org/resources/75/es_ve_female.zip
|
||||
https://www.openslr.org/resources/75/es_ve_male.zip
|
||||
https://www.openslr.org/resources/76/eu_es_female.zip
|
||||
https://www.openslr.org/resources/76/eu_es_male.zip
|
||||
https://www.openslr.org/resources/77/gl_es_female.zip
|
||||
https://www.openslr.org/resources/77/gl_es_male.zip
|
||||
https://www.openslr.org/resources/78/gu_in_female.zip
|
||||
https://www.openslr.org/resources/78/gu_in_male.zip
|
||||
https://www.openslr.org/resources/79/kn_in_female.zip
|
||||
https://www.openslr.org/resources/79/kn_in_male.zip
|
||||
https://www.openslr.org/resources/80/my_mm_female.zip
|
||||
https://www.openslr.org/resources/83/irish_english_male.zip
|
||||
https://www.openslr.org/resources/83/midlands_english_female.zip
|
||||
https://www.openslr.org/resources/83/midlands_english_male.zip
|
||||
https://www.openslr.org/resources/83/northern_english_female.zip
|
||||
https://www.openslr.org/resources/83/northern_english_male.zip
|
||||
https://www.openslr.org/resources/83/scottish_english_female.zip
|
||||
https://www.openslr.org/resources/83/scottish_english_male.zip
|
||||
https://www.openslr.org/resources/83/southern_english_female.zip
|
||||
https://www.openslr.org/resources/83/southern_english_male.zip
|
||||
https://www.openslr.org/resources/83/welsh_english_female.zip
|
||||
https://www.openslr.org/resources/83/welsh_english_male.zip
|
||||
https://www.openslr.org/resources/86/yo_ng_female.zip
|
||||
https://www.openslr.org/resources/86/yo_ng_male.zip
|
||||
|
||||
The corresponding citations for all these datasets are:
|
||||
|
||||
@inproceedings{demirsahin-etal-2020-open,
|
||||
title = {{Open-source Multi-speaker Corpora of the English Accents in the British Isles}},
|
||||
author = {Demirsahin, Isin and Kjartansson, Oddur and Gutkin, Alexander and Rivera, Clara},
|
||||
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
|
||||
month = may,
|
||||
year = {2020},
|
||||
pages = {6532--6541},
|
||||
address = {Marseille, France},
|
||||
publisher = {European Language Resources Association (ELRA)},
|
||||
url = {https://www.aclweb.org/anthology/2020.lrec-1.804},
|
||||
ISBN = {979-10-95546-34-4},
|
||||
}
|
||||
@inproceedings{kjartansson-etal-2020-open,
|
||||
title = {{Open-Source High Quality Speech Datasets for Basque, Catalan and Galician}},
|
||||
author = {Kjartansson, Oddur and Gutkin, Alexander and Butryna, Alena and Demirsahin, Isin and Rivera, Clara},
|
||||
booktitle = {Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)},
|
||||
year = {2020},
|
||||
pages = {21--27},
|
||||
month = may,
|
||||
address = {Marseille, France},
|
||||
publisher = {European Language Resources association (ELRA)},
|
||||
url = {https://www.aclweb.org/anthology/2020.sltu-1.3},
|
||||
ISBN = {979-10-95546-35-1},
|
||||
}
|
||||
|
||||
|
||||
@inproceedings{guevara-rukoz-etal-2020-crowdsourcing,
|
||||
title = {{Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech}},
|
||||
author = {Guevara-Rukoz, Adriana and Demirsahin, Isin and He, Fei and Chu, Shan-Hui Cathy and Sarin, Supheakmungkol and Pipatsrisawat, Knot and Gutkin, Alexander and Butryna, Alena and Kjartansson, Oddur},
|
||||
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
|
||||
year = {2020},
|
||||
month = may,
|
||||
address = {Marseille, France},
|
||||
publisher = {European Language Resources Association (ELRA)},
|
||||
url = {https://www.aclweb.org/anthology/2020.lrec-1.801},
|
||||
pages = {6504--6513},
|
||||
ISBN = {979-10-95546-34-4},
|
||||
}
|
||||
@inproceedings{he-etal-2020-open,
|
||||
title = {{Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems}},
|
||||
author = {He, Fei and Chu, Shan-Hui Cathy and Kjartansson, Oddur and Rivera, Clara and Katanova, Anna and Gutkin, Alexander and Demirsahin, Isin and Johny, Cibu and Jansche, Martin and Sarin, Supheakmungkol and Pipatsrisawat, Knot},
|
||||
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
|
||||
month = may,
|
||||
year = {2020},
|
||||
address = {Marseille, France},
|
||||
publisher = {European Language Resources Association (ELRA)},
|
||||
pages = {6494--6503},
|
||||
url = {https://www.aclweb.org/anthology/2020.lrec-1.800},
|
||||
ISBN = "{979-10-95546-34-4}",
|
||||
}
|
||||
|
||||
|
||||
@inproceedings{kjartansson-etal-tts-sltu2018,
|
||||
title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}},
|
||||
author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin},
|
||||
booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)},
|
||||
year = {2018},
|
||||
address = {Gurugram, India},
|
||||
month = aug,
|
||||
pages = {66--70},
|
||||
URL = {http://dx.doi.org/10.21437/SLTU.2018-14}
|
||||
}
|
||||
|
||||
|
||||
@inproceedings{oo-etal-2020-burmese,
|
||||
title = {{Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech}},
|
||||
author = {Oo, Yin May and Wattanavekin, Theeraphol and Li, Chenfang and De Silva, Pasindu and Sarin, Supheakmungkol and Pipatsrisawat, Knot and Jansche, Martin and Kjartansson, Oddur and Gutkin, Alexander},
|
||||
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
|
||||
month = may,
|
||||
year = {2020},
|
||||
pages = "6328--6339",
|
||||
address = {Marseille, France},
|
||||
publisher = {European Language Resources Association (ELRA)},
|
||||
url = {https://www.aclweb.org/anthology/2020.lrec-1.777},
|
||||
ISBN = {979-10-95546-34-4},
|
||||
}
|
||||
@inproceedings{van-niekerk-etal-2017,
|
||||
title = {{Rapid development of TTS corpora for four South African languages}},
|
||||
author = {Daniel van Niekerk and Charl van Heerden and Marelie Davel and Neil Kleynhans and Oddur Kjartansson and Martin Jansche and Linne Ha},
|
||||
booktitle = {Proc. Interspeech 2017},
|
||||
pages = {2178--2182},
|
||||
address = {Stockholm, Sweden},
|
||||
month = aug,
|
||||
year = {2017},
|
||||
URL = {http://dx.doi.org/10.21437/Interspeech.2017-1139}
|
||||
}
|
||||
|
||||
@inproceedings{gutkin-et-al-yoruba2020,
|
||||
title = {{Developing an Open-Source Corpus of Yoruba Speech}},
|
||||
author = {Alexander Gutkin and I{\c{s}}{\i}n Demir{\c{s}}ahin and Oddur Kjartansson and Clara Rivera and K\d{\'o}lá Túb\d{\`o}sún},
|
||||
booktitle = {Proceedings of Interspeech 2020},
|
||||
pages = {404--408},
|
||||
month = {October},
|
||||
year = {2020},
|
||||
address = {Shanghai, China},
|
||||
publisher = {International Speech and Communication Association (ISCA)},
|
||||
doi = {10.21437/Interspeech.2020-1096},
|
||||
url = {http://dx.doi.org/10.21437/Interspeech.2020-1096},
|
||||
}
|
||||
9
managed_components/78__esp-opus/dnn/download_model.bat
Normal file
9
managed_components/78__esp-opus/dnn/download_model.bat
Normal file
@@ -0,0 +1,9 @@
|
||||
@echo off
|
||||
set model=opus_data-%1.tar.gz
|
||||
|
||||
if not exist %model% (
|
||||
echo Downloading latest model
|
||||
powershell -Command "(New-Object System.Net.WebClient).DownloadFile('https://media.xiph.org/opus/models/%model%', '%model%')"
|
||||
)
|
||||
|
||||
tar -xvzf %model%
|
||||
30
managed_components/78__esp-opus/dnn/download_model.sh
Normal file
30
managed_components/78__esp-opus/dnn/download_model.sh
Normal file
@@ -0,0 +1,30 @@
|
||||
#!/bin/sh
|
||||
set -e
|
||||
|
||||
model=opus_data-$1.tar.gz
|
||||
|
||||
if [ ! -f $model ]; then
|
||||
echo "Downloading latest model"
|
||||
wget https://media.xiph.org/opus/models/$model
|
||||
fi
|
||||
|
||||
if command -v sha256sum
|
||||
then
|
||||
echo "Validating checksum"
|
||||
checksum="$1"
|
||||
checksum2=$(sha256sum $model | awk '{print $1}')
|
||||
if [ "$checksum" != "$checksum2" ]
|
||||
then
|
||||
echo "Aborting due to mismatching checksums. This could be caused by a corrupted download of $model."
|
||||
echo "Consider deleting local copy of $model and running this script again."
|
||||
exit 1
|
||||
else
|
||||
echo "checksums match"
|
||||
fi
|
||||
else
|
||||
echo "Could not find sha256 sum; skipping verification. Please verify manually that sha256 hash of ${model} matches ${1}."
|
||||
fi
|
||||
|
||||
|
||||
|
||||
tar xvomf $model
|
||||
44
managed_components/78__esp-opus/dnn/dred_coding.c
Normal file
44
managed_components/78__esp-opus/dnn/dred_coding.c
Normal file
@@ -0,0 +1,44 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jean-Marc Valin */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <math.h>
|
||||
|
||||
#include "celt/entenc.h"
|
||||
#include "os_support.h"
|
||||
#include "dred_config.h"
|
||||
#include "dred_coding.h"
|
||||
|
||||
int compute_quantizer(int q0, int dQ, int qmax, int i) {
|
||||
int quant;
|
||||
static const int dQ_table[8] = {0, 2, 3, 4, 6, 8, 12, 16};
|
||||
quant = q0 + (dQ_table[dQ]*i + 8)/16;
|
||||
return quant > qmax ? qmax : quant;
|
||||
}
|
||||
36
managed_components/78__esp-opus/dnn/dred_coding.h
Normal file
36
managed_components/78__esp-opus/dnn/dred_coding.h
Normal file
@@ -0,0 +1,36 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DRED_CODING_H
|
||||
#define DRED_CODING_H
|
||||
|
||||
#include "opus_types.h"
|
||||
#include "entcode.h"
|
||||
|
||||
int compute_quantizer(int q0, int dQ, int qmax, int i);
|
||||
|
||||
#endif
|
||||
54
managed_components/78__esp-opus/dnn/dred_config.h
Normal file
54
managed_components/78__esp-opus/dnn/dred_config.h
Normal file
@@ -0,0 +1,54 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DRED_CONFIG_H
|
||||
#define DRED_CONFIG_H
|
||||
|
||||
/* Change this once DRED gets an extension number assigned. */
|
||||
#define DRED_EXTENSION_ID 126
|
||||
|
||||
/* Remove these two completely once DRED gets an extension number assigned. */
|
||||
#define DRED_EXPERIMENTAL_VERSION 10
|
||||
#define DRED_EXPERIMENTAL_BYTES 2
|
||||
|
||||
|
||||
#define DRED_MIN_BYTES 8
|
||||
|
||||
/* these are inpart duplicates to the values defined in dred_rdovae_constants.h */
|
||||
#define DRED_SILK_ENCODER_DELAY (79+12-80)
|
||||
#define DRED_FRAME_SIZE 160
|
||||
#define DRED_DFRAME_SIZE (2 * (DRED_FRAME_SIZE))
|
||||
#define DRED_MAX_DATA_SIZE 1000
|
||||
#define DRED_ENC_Q0 6
|
||||
#define DRED_ENC_Q1 15
|
||||
|
||||
/* Covers 1.04 second so we can cover one second, after the lookahead. */
|
||||
#define DRED_MAX_LATENTS 26
|
||||
#define DRED_NUM_REDUNDANCY_FRAMES (2*DRED_MAX_LATENTS)
|
||||
#define DRED_MAX_FRAMES (4*DRED_MAX_LATENTS)
|
||||
|
||||
#endif
|
||||
129
managed_components/78__esp-opus/dnn/dred_decoder.c
Normal file
129
managed_components/78__esp-opus/dnn/dred_decoder.c
Normal file
@@ -0,0 +1,129 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#include <string.h>
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "os_support.h"
|
||||
#include "dred_decoder.h"
|
||||
#include "dred_coding.h"
|
||||
#include "celt/entdec.h"
|
||||
#include "celt/laplace.h"
|
||||
#include "dred_rdovae_stats_data.h"
|
||||
#include "dred_rdovae_constants.h"
|
||||
|
||||
static void dred_decode_latents(ec_dec *dec, float *x, const opus_uint8 *scale, const opus_uint8 *r, const opus_uint8 *p0, int dim) {
|
||||
int i;
|
||||
for (i=0;i<dim;i++) {
|
||||
int q;
|
||||
if (r[i] == 0 || p0[i] == 255) q = 0;
|
||||
else q = ec_laplace_decode_p0(dec, p0[i]<<7, r[i]<<7);
|
||||
x[i] = q*256.f/(scale[i] == 0 ? 1 : scale[i]);
|
||||
}
|
||||
}
|
||||
|
||||
int dred_ec_decode(OpusDRED *dec, const opus_uint8 *bytes, int num_bytes, int min_feature_frames, int dred_frame_offset)
|
||||
{
|
||||
ec_dec ec;
|
||||
int q_level;
|
||||
int i;
|
||||
int offset;
|
||||
int q0;
|
||||
int dQ;
|
||||
int qmax;
|
||||
int state_qoffset;
|
||||
int extra_offset;
|
||||
|
||||
/* since features are decoded in quadruples, it makes no sense to go with an uneven number of redundancy frames */
|
||||
celt_assert(DRED_NUM_REDUNDANCY_FRAMES % 2 == 0);
|
||||
|
||||
/* decode initial state and initialize RDOVAE decoder */
|
||||
ec_dec_init(&ec, (unsigned char*)bytes, num_bytes);
|
||||
q0 = ec_dec_uint(&ec, 16);
|
||||
dQ = ec_dec_uint(&ec, 8);
|
||||
if (ec_dec_uint(&ec, 2)) extra_offset = 32*ec_dec_uint(&ec, 256);
|
||||
else extra_offset = 0;
|
||||
/* Compute total offset, including DRED position in a multiframe packet. */
|
||||
dec->dred_offset = 16 - ec_dec_uint(&ec, 32) - extra_offset + dred_frame_offset;
|
||||
/*printf("%d %d %d\n", dred_offset, q0, dQ);*/
|
||||
qmax = 15;
|
||||
if (q0 < 14 && dQ > 0) {
|
||||
int nvals;
|
||||
int ft;
|
||||
int s;
|
||||
/* The distribution for the dQmax symbol is split evenly between zero
|
||||
(which implies qmax == 15) and larger values, with the probability of
|
||||
all larger values being uniform.
|
||||
This is equivalent to coding 1 bit to decide if the maximum is less than
|
||||
15 followed by a uint to decide the actual value if it is less than
|
||||
15, but combined into a single symbol. */
|
||||
nvals = 15 - (q0 + 1);
|
||||
ft = 2*nvals;
|
||||
s = ec_decode(&ec, ft);
|
||||
if (s >= nvals) {
|
||||
qmax = q0 + (s - nvals) + 1;
|
||||
ec_dec_update(&ec, s, s + 1, ft);
|
||||
}
|
||||
else {
|
||||
ec_dec_update(&ec, 0, nvals, ft);
|
||||
}
|
||||
}
|
||||
state_qoffset = q0*DRED_STATE_DIM;
|
||||
dred_decode_latents(
|
||||
&ec,
|
||||
dec->state,
|
||||
dred_state_quant_scales_q8 + state_qoffset,
|
||||
dred_state_r_q8 + state_qoffset,
|
||||
dred_state_p0_q8 + state_qoffset,
|
||||
DRED_STATE_DIM);
|
||||
|
||||
/* decode newest to oldest and store oldest to newest */
|
||||
for (i = 0; i < IMIN(DRED_NUM_REDUNDANCY_FRAMES, (min_feature_frames+1)/2); i += 2)
|
||||
{
|
||||
/* FIXME: Figure out how to avoid missing a last frame that would take up < 8 bits. */
|
||||
if (8*num_bytes - ec_tell(&ec) <= 7)
|
||||
break;
|
||||
q_level = compute_quantizer(q0, dQ, qmax, i/2);
|
||||
offset = q_level*DRED_LATENT_DIM;
|
||||
dred_decode_latents(
|
||||
&ec,
|
||||
&dec->latents[(i/2)*DRED_LATENT_DIM],
|
||||
dred_latent_quant_scales_q8 + offset,
|
||||
dred_latent_r_q8 + offset,
|
||||
dred_latent_p0_q8 + offset,
|
||||
DRED_LATENT_DIM
|
||||
);
|
||||
|
||||
offset = 2 * i * DRED_NUM_FEATURES;
|
||||
}
|
||||
dec->process_stage = 1;
|
||||
dec->nb_latents = i/2;
|
||||
return i/2;
|
||||
}
|
||||
49
managed_components/78__esp-opus/dnn/dred_decoder.h
Normal file
49
managed_components/78__esp-opus/dnn/dred_decoder.h
Normal file
@@ -0,0 +1,49 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DRED_DECODER_H
|
||||
#define DRED_DECODER_H
|
||||
|
||||
#include "opus.h"
|
||||
#include "dred_config.h"
|
||||
#include "dred_rdovae.h"
|
||||
#include "entcode.h"
|
||||
#include "dred_rdovae_constants.h"
|
||||
|
||||
struct OpusDRED {
|
||||
float fec_features[2*DRED_NUM_REDUNDANCY_FRAMES*DRED_NUM_FEATURES];
|
||||
float state[DRED_STATE_DIM];
|
||||
float latents[(DRED_NUM_REDUNDANCY_FRAMES/2)*DRED_LATENT_DIM];
|
||||
int nb_latents;
|
||||
int process_stage;
|
||||
int dred_offset;
|
||||
};
|
||||
|
||||
|
||||
int dred_ec_decode(OpusDRED *dec, const opus_uint8 *bytes, int num_bytes, int min_feature_frames, int dred_frame_offset);
|
||||
|
||||
#endif
|
||||
363
managed_components/78__esp-opus/dnn/dred_encoder.c
Normal file
363
managed_components/78__esp-opus/dnn/dred_encoder.c
Normal file
@@ -0,0 +1,363 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <string.h>
|
||||
|
||||
#if 0
|
||||
#include <stdio.h>
|
||||
#include <math.h>
|
||||
#endif
|
||||
|
||||
#include "dred_encoder.h"
|
||||
#include "dred_coding.h"
|
||||
#include "celt/entenc.h"
|
||||
|
||||
#include "dred_decoder.h"
|
||||
#include "float_cast.h"
|
||||
#include "os_support.h"
|
||||
#include "celt/laplace.h"
|
||||
#include "dred_rdovae_stats_data.h"
|
||||
|
||||
|
||||
static void DRED_rdovae_init_encoder(RDOVAEEncState *enc_state)
|
||||
{
|
||||
memset(enc_state, 0, sizeof(*enc_state));
|
||||
}
|
||||
|
||||
int dred_encoder_load_model(DREDEnc* enc, const void *data, int len)
|
||||
{
|
||||
WeightArray *list;
|
||||
int ret;
|
||||
parse_weights(&list, data, len);
|
||||
ret = init_rdovaeenc(&enc->model, list);
|
||||
opus_free(list);
|
||||
if (ret == 0) {
|
||||
ret = lpcnet_encoder_load_model(&enc->lpcnet_enc_state, data, len);
|
||||
}
|
||||
if (ret == 0) enc->loaded = 1;
|
||||
return (ret == 0) ? OPUS_OK : OPUS_BAD_ARG;
|
||||
}
|
||||
|
||||
void dred_encoder_reset(DREDEnc* enc)
|
||||
{
|
||||
OPUS_CLEAR((char*)&enc->DREDENC_RESET_START,
|
||||
sizeof(DREDEnc)-
|
||||
((char*)&enc->DREDENC_RESET_START - (char*)enc));
|
||||
enc->input_buffer_fill = DRED_SILK_ENCODER_DELAY;
|
||||
lpcnet_encoder_init(&enc->lpcnet_enc_state);
|
||||
DRED_rdovae_init_encoder(&enc->rdovae_enc);
|
||||
}
|
||||
|
||||
void dred_encoder_init(DREDEnc* enc, opus_int32 Fs, int channels)
|
||||
{
|
||||
enc->Fs = Fs;
|
||||
enc->channels = channels;
|
||||
enc->loaded = 0;
|
||||
#ifndef USE_WEIGHTS_FILE
|
||||
if (init_rdovaeenc(&enc->model, rdovaeenc_arrays) == 0) enc->loaded = 1;
|
||||
#endif
|
||||
dred_encoder_reset(enc);
|
||||
}
|
||||
|
||||
static void dred_process_frame(DREDEnc *enc, int arch)
|
||||
{
|
||||
float feature_buffer[2 * 36];
|
||||
float input_buffer[2*DRED_NUM_FEATURES] = {0};
|
||||
|
||||
celt_assert(enc->loaded);
|
||||
/* shift latents buffer */
|
||||
OPUS_MOVE(enc->latents_buffer + DRED_LATENT_DIM, enc->latents_buffer, (DRED_MAX_FRAMES - 1) * DRED_LATENT_DIM);
|
||||
OPUS_MOVE(enc->state_buffer + DRED_STATE_DIM, enc->state_buffer, (DRED_MAX_FRAMES - 1) * DRED_STATE_DIM);
|
||||
|
||||
/* calculate LPCNet features */
|
||||
lpcnet_compute_single_frame_features_float(&enc->lpcnet_enc_state, enc->input_buffer, feature_buffer, arch);
|
||||
lpcnet_compute_single_frame_features_float(&enc->lpcnet_enc_state, enc->input_buffer + DRED_FRAME_SIZE, feature_buffer + 36, arch);
|
||||
|
||||
/* prepare input buffer (discard LPC coefficients) */
|
||||
OPUS_COPY(input_buffer, feature_buffer, DRED_NUM_FEATURES);
|
||||
OPUS_COPY(input_buffer + DRED_NUM_FEATURES, feature_buffer + 36, DRED_NUM_FEATURES);
|
||||
|
||||
/* run RDOVAE encoder */
|
||||
dred_rdovae_encode_dframe(&enc->rdovae_enc, &enc->model, enc->latents_buffer, enc->state_buffer, input_buffer, arch);
|
||||
enc->latents_buffer_fill = IMIN(enc->latents_buffer_fill+1, DRED_NUM_REDUNDANCY_FRAMES);
|
||||
}
|
||||
|
||||
void filter_df2t(const float *in, float *out, int len, float b0, const float *b, const float *a, int order, float *mem)
|
||||
{
|
||||
int i;
|
||||
for (i=0;i<len;i++) {
|
||||
int j;
|
||||
float xi, yi, nyi;
|
||||
xi = in[i];
|
||||
yi = xi*b0 + mem[0];
|
||||
nyi = -yi;
|
||||
for (j=0;j<order;j++)
|
||||
{
|
||||
mem[j] = mem[j+1] + b[j]*xi + a[j]*nyi;
|
||||
}
|
||||
out[i] = yi;
|
||||
/*fprintf(stdout, "%f\n", out[i]);*/
|
||||
}
|
||||
}
|
||||
|
||||
#define MAX_DOWNMIX_BUFFER (960*2)
|
||||
static void dred_convert_to_16k(DREDEnc *enc, const float *in, int in_len, float *out, int out_len)
|
||||
{
|
||||
float downmix[MAX_DOWNMIX_BUFFER];
|
||||
int i;
|
||||
int up;
|
||||
celt_assert(enc->channels*in_len <= MAX_DOWNMIX_BUFFER);
|
||||
celt_assert(in_len * (opus_int32)16000 == out_len * enc->Fs);
|
||||
switch(enc->Fs) {
|
||||
case 8000:
|
||||
up = 2;
|
||||
break;
|
||||
case 12000:
|
||||
up = 4;
|
||||
break;
|
||||
case 16000:
|
||||
up = 1;
|
||||
break;
|
||||
case 24000:
|
||||
up = 2;
|
||||
break;
|
||||
case 48000:
|
||||
up = 1;
|
||||
break;
|
||||
default:
|
||||
celt_assert(0);
|
||||
}
|
||||
OPUS_CLEAR(downmix, up*in_len);
|
||||
if (enc->channels == 1) {
|
||||
for (i=0;i<in_len;i++) downmix[up*i] = FLOAT2INT16(up*in[i]);
|
||||
} else {
|
||||
for (i=0;i<in_len;i++) downmix[up*i] = FLOAT2INT16(.5*up*(in[2*i]+in[2*i+1]));
|
||||
}
|
||||
if (enc->Fs == 16000) {
|
||||
OPUS_COPY(out, downmix, out_len);
|
||||
} else if (enc->Fs == 48000 || enc->Fs == 24000) {
|
||||
/* ellip(7, .2, 70, 7750/24000) */
|
||||
|
||||
static const float filter_b[8] = { 0.005873358047f, 0.012980854831f, 0.014531340042f, 0.014531340042f, 0.012980854831f, 0.005873358047f, 0.004523418224f, 0.f};
|
||||
static const float filter_a[8] = {-3.878718597768f, 7.748834257468f, -9.653651699533f, 8.007342726666f, -4.379450178552f, 1.463182111810f, -0.231720677804f, 0.f};
|
||||
float b0 = 0.004523418224f;
|
||||
filter_df2t(downmix, downmix, up*in_len, b0, filter_b, filter_a, RESAMPLING_ORDER, enc->resample_mem);
|
||||
for (i=0;i<out_len;i++) out[i] = downmix[3*i];
|
||||
} else if (enc->Fs == 12000) {
|
||||
/* ellip(7, .2, 70, 7750/24000) */
|
||||
static const float filter_b[8] = {-0.001017101081f, 0.003673127243f, 0.001009165267f, 0.001009165267f, 0.003673127243f, -0.001017101081f, 0.002033596776f, 0.f};
|
||||
static const float filter_a[8] = {-4.930414411612f, 11.291643096504f, -15.322037343815f, 13.216403930898f, -7.220409219553f, 2.310550142771f, -0.334338618782f, 0.f};
|
||||
float b0 = 0.002033596776f;
|
||||
filter_df2t(downmix, downmix, up*in_len, b0, filter_b, filter_a, RESAMPLING_ORDER, enc->resample_mem);
|
||||
for (i=0;i<out_len;i++) out[i] = downmix[3*i];
|
||||
} else if (enc->Fs == 8000) {
|
||||
/* ellip(7, .2, 70, 3900/8000) */
|
||||
static const float filter_b[8] = { 0.081670120929f, 0.180401598565f, 0.259391051971f, 0.259391051971f, 0.180401598565f, 0.081670120929f, 0.020109185709f, 0.f};
|
||||
static const float filter_a[8] = {-1.393651933659f, 2.609789872676f, -2.403541968806f, 2.056814957331f, -1.148908574570f, 0.473001413788f, -0.110359852412f, 0.f};
|
||||
float b0 = 0.020109185709f;
|
||||
filter_df2t(downmix, out, out_len, b0, filter_b, filter_a, RESAMPLING_ORDER, enc->resample_mem);
|
||||
} else {
|
||||
celt_assert(0);
|
||||
}
|
||||
}
|
||||
|
||||
void dred_compute_latents(DREDEnc *enc, const float *pcm, int frame_size, int extra_delay, int arch)
|
||||
{
|
||||
int curr_offset16k;
|
||||
int frame_size16k = frame_size * 16000 / enc->Fs;
|
||||
celt_assert(enc->loaded);
|
||||
curr_offset16k = 40 + extra_delay*16000/enc->Fs - enc->input_buffer_fill;
|
||||
enc->dred_offset = (int)floor((curr_offset16k+20.f)/40.f);
|
||||
enc->latent_offset = 0;
|
||||
while (frame_size16k > 0) {
|
||||
int process_size16k;
|
||||
int process_size;
|
||||
process_size16k = IMIN(2*DRED_FRAME_SIZE, frame_size16k);
|
||||
process_size = process_size16k * enc->Fs / 16000;
|
||||
dred_convert_to_16k(enc, pcm, process_size, &enc->input_buffer[enc->input_buffer_fill], process_size16k);
|
||||
enc->input_buffer_fill += process_size16k;
|
||||
if (enc->input_buffer_fill >= 2*DRED_FRAME_SIZE)
|
||||
{
|
||||
curr_offset16k += 320;
|
||||
dred_process_frame(enc, arch);
|
||||
enc->input_buffer_fill -= 2*DRED_FRAME_SIZE;
|
||||
OPUS_MOVE(&enc->input_buffer[0], &enc->input_buffer[2*DRED_FRAME_SIZE], enc->input_buffer_fill);
|
||||
/* 15 ms (6*2.5 ms) is the ideal offset for DRED because it corresponds to our vocoder look-ahead. */
|
||||
if (enc->dred_offset < 6) {
|
||||
enc->dred_offset += 8;
|
||||
} else {
|
||||
enc->latent_offset++;
|
||||
}
|
||||
}
|
||||
|
||||
pcm += process_size;
|
||||
frame_size16k -= process_size16k;
|
||||
}
|
||||
}
|
||||
|
||||
static void dred_encode_latents(ec_enc *enc, const float *x, const opus_uint8 *scale, const opus_uint8 *dzone, const opus_uint8 *r, const opus_uint8 *p0, int dim, int arch) {
|
||||
int i;
|
||||
int q[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
|
||||
float xq[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
|
||||
float delta[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
|
||||
float deadzone[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
|
||||
float eps = .1f;
|
||||
/* This is split into multiple loops (with temporary arrays) so that the compiler
|
||||
can vectorize all of it, and so we can call the vector tanh(). */
|
||||
for (i=0;i<dim;i++) {
|
||||
delta[i] = dzone[i]*(1.f/256.f);
|
||||
xq[i] = x[i]*scale[i]*(1.f/256.f);
|
||||
deadzone[i] = xq[i]/(delta[i]+eps);
|
||||
}
|
||||
compute_activation(deadzone, deadzone, dim, ACTIVATION_TANH, arch);
|
||||
for (i=0;i<dim;i++) {
|
||||
xq[i] = xq[i] - delta[i]*deadzone[i];
|
||||
q[i] = (int)floor(.5f+xq[i]);
|
||||
}
|
||||
for (i=0;i<dim;i++) {
|
||||
/* Make the impossible actually impossible. */
|
||||
if (r[i] == 0 || p0[i] == 255) q[i] = 0;
|
||||
else ec_laplace_encode_p0(enc, q[i], p0[i]<<7, r[i]<<7);
|
||||
}
|
||||
}
|
||||
|
||||
static int dred_voice_active(const unsigned char *activity_mem, int offset) {
|
||||
int i;
|
||||
for (i=0;i<16;i++) {
|
||||
if (activity_mem[8*offset + i] == 1) return 1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int dred_encode_silk_frame(DREDEnc *enc, unsigned char *buf, int max_chunks, int max_bytes, int q0, int dQ, int qmax, unsigned char *activity_mem, int arch) {
|
||||
ec_enc ec_encoder;
|
||||
|
||||
int q_level;
|
||||
int i;
|
||||
int offset;
|
||||
int ec_buffer_fill;
|
||||
int state_qoffset;
|
||||
ec_enc ec_bak;
|
||||
int prev_active=0;
|
||||
int latent_offset;
|
||||
int extra_dred_offset=0;
|
||||
int dred_encoded=0;
|
||||
int delayed_dred=0;
|
||||
int total_offset;
|
||||
|
||||
latent_offset = enc->latent_offset;
|
||||
/* Delaying new DRED data when just out of silence because we already have the
|
||||
main Opus payload for that frame. */
|
||||
if (activity_mem[0] && enc->last_extra_dred_offset>0) {
|
||||
latent_offset = enc->last_extra_dred_offset;
|
||||
delayed_dred = 1;
|
||||
enc->last_extra_dred_offset = 0;
|
||||
}
|
||||
while (latent_offset < enc->latents_buffer_fill && !dred_voice_active(activity_mem, latent_offset)) {
|
||||
latent_offset++;
|
||||
extra_dred_offset++;
|
||||
}
|
||||
if (!delayed_dred) enc->last_extra_dred_offset = extra_dred_offset;
|
||||
|
||||
/* entropy coding of state and latents */
|
||||
ec_enc_init(&ec_encoder, buf, max_bytes);
|
||||
ec_enc_uint(&ec_encoder, q0, 16);
|
||||
ec_enc_uint(&ec_encoder, dQ, 8);
|
||||
total_offset = 16 - (enc->dred_offset - extra_dred_offset*8);
|
||||
celt_assert(total_offset>=0);
|
||||
if (total_offset > 31) {
|
||||
ec_enc_uint(&ec_encoder, 1, 2);
|
||||
ec_enc_uint(&ec_encoder, total_offset>>5, 256);
|
||||
ec_enc_uint(&ec_encoder, total_offset&31, 32);
|
||||
} else {
|
||||
ec_enc_uint(&ec_encoder, 0, 2);
|
||||
ec_enc_uint(&ec_encoder, total_offset, 32);
|
||||
}
|
||||
celt_assert(qmax >= q0);
|
||||
if (q0 < 14 && dQ > 0) {
|
||||
int nvals;
|
||||
/* If you want to use qmax == q0, you should have set dQ = 0. */
|
||||
celt_assert(qmax > q0);
|
||||
nvals = 15 - (q0 + 1);
|
||||
ec_encode(&ec_encoder, qmax >= 15 ? 0 : nvals + qmax - (q0 + 1),
|
||||
qmax >= 15 ? nvals : nvals + qmax - q0, 2*nvals);
|
||||
}
|
||||
state_qoffset = q0*DRED_STATE_DIM;
|
||||
dred_encode_latents(
|
||||
&ec_encoder,
|
||||
&enc->state_buffer[latent_offset*DRED_STATE_DIM],
|
||||
dred_state_quant_scales_q8 + state_qoffset,
|
||||
dred_state_dead_zone_q8 + state_qoffset,
|
||||
dred_state_r_q8 + state_qoffset,
|
||||
dred_state_p0_q8 + state_qoffset,
|
||||
DRED_STATE_DIM,
|
||||
arch);
|
||||
if (ec_tell(&ec_encoder) > 8*max_bytes) {
|
||||
return 0;
|
||||
}
|
||||
ec_bak = ec_encoder;
|
||||
for (i = 0; i < IMIN(2*max_chunks, enc->latents_buffer_fill-latent_offset-1); i += 2)
|
||||
{
|
||||
int active;
|
||||
q_level = compute_quantizer(q0, dQ, qmax, i/2);
|
||||
offset = q_level * DRED_LATENT_DIM;
|
||||
|
||||
dred_encode_latents(
|
||||
&ec_encoder,
|
||||
enc->latents_buffer + (i+latent_offset) * DRED_LATENT_DIM,
|
||||
dred_latent_quant_scales_q8 + offset,
|
||||
dred_latent_dead_zone_q8 + offset,
|
||||
dred_latent_r_q8 + offset,
|
||||
dred_latent_p0_q8 + offset,
|
||||
DRED_LATENT_DIM,
|
||||
arch
|
||||
);
|
||||
if (ec_tell(&ec_encoder) > 8*max_bytes) {
|
||||
/* If we haven't been able to code one chunk, give up on DRED completely. */
|
||||
if (i==0) return 0;
|
||||
break;
|
||||
}
|
||||
active = dred_voice_active(activity_mem, i+latent_offset);
|
||||
if (active || prev_active) {
|
||||
ec_bak = ec_encoder;
|
||||
dred_encoded = i+2;
|
||||
}
|
||||
prev_active = active;
|
||||
}
|
||||
/* Avoid sending empty DRED packets. */
|
||||
if (dred_encoded==0 || (dred_encoded<=2 && extra_dred_offset)) return 0;
|
||||
ec_encoder = ec_bak;
|
||||
|
||||
ec_buffer_fill = (ec_tell(&ec_encoder)+7)/8;
|
||||
ec_enc_shrink(&ec_encoder, ec_buffer_fill);
|
||||
ec_enc_done(&ec_encoder);
|
||||
return ec_buffer_fill;
|
||||
}
|
||||
71
managed_components/78__esp-opus/dnn/dred_encoder.h
Normal file
71
managed_components/78__esp-opus/dnn/dred_encoder.h
Normal file
@@ -0,0 +1,71 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DRED_ENCODER_H
|
||||
#define DRED_ENCODER_H
|
||||
|
||||
#include "lpcnet.h"
|
||||
#include "dred_config.h"
|
||||
#include "dred_rdovae.h"
|
||||
#include "entcode.h"
|
||||
#include "lpcnet_private.h"
|
||||
#include "dred_rdovae_enc.h"
|
||||
#include "dred_rdovae_enc_data.h"
|
||||
|
||||
#define RESAMPLING_ORDER 8
|
||||
|
||||
typedef struct {
|
||||
RDOVAEEnc model;
|
||||
LPCNetEncState lpcnet_enc_state;
|
||||
RDOVAEEncState rdovae_enc;
|
||||
int loaded;
|
||||
opus_int32 Fs;
|
||||
int channels;
|
||||
|
||||
#define DREDENC_RESET_START input_buffer
|
||||
float input_buffer[2*DRED_DFRAME_SIZE];
|
||||
int input_buffer_fill;
|
||||
int dred_offset;
|
||||
int latent_offset;
|
||||
int last_extra_dred_offset;
|
||||
float latents_buffer[DRED_MAX_FRAMES * DRED_LATENT_DIM];
|
||||
int latents_buffer_fill;
|
||||
float state_buffer[DRED_MAX_FRAMES * DRED_STATE_DIM];
|
||||
float resample_mem[RESAMPLING_ORDER + 1];
|
||||
} DREDEnc;
|
||||
|
||||
int dred_encoder_load_model(DREDEnc* enc, const void *data, int len);
|
||||
void dred_encoder_init(DREDEnc* enc, opus_int32 Fs, int channels);
|
||||
void dred_encoder_reset(DREDEnc* enc);
|
||||
|
||||
void dred_deinit_encoder(DREDEnc *enc);
|
||||
|
||||
void dred_compute_latents(DREDEnc *enc, const float *pcm, int frame_size, int extra_delay, int arch);
|
||||
|
||||
int dred_encode_silk_frame(DREDEnc *enc, unsigned char *buf, int max_chunks, int max_bytes, int q0, int dQ, int qmax, unsigned char *activity_mem, int arch);
|
||||
|
||||
#endif
|
||||
42
managed_components/78__esp-opus/dnn/dred_rdovae.h
Normal file
42
managed_components/78__esp-opus/dnn/dred_rdovae.h
Normal file
@@ -0,0 +1,42 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DRED_RDOVAE_H
|
||||
#define DRED_RDOVAE_H
|
||||
|
||||
#include <stdlib.h>
|
||||
|
||||
#include "opus_types.h"
|
||||
|
||||
typedef struct RDOVAEDec RDOVAEDec;
|
||||
typedef struct RDOVAEEnc RDOVAEEnc;
|
||||
typedef struct RDOVAEDecStruct RDOVAEDecState;
|
||||
typedef struct RDOVAEEncStruct RDOVAEEncState;
|
||||
|
||||
|
||||
|
||||
#endif
|
||||
139
managed_components/78__esp-opus/dnn/dred_rdovae_dec.c
Normal file
139
managed_components/78__esp-opus/dnn/dred_rdovae_dec.c
Normal file
@@ -0,0 +1,139 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "dred_rdovae_dec.h"
|
||||
#include "dred_rdovae_constants.h"
|
||||
#include "os_support.h"
|
||||
|
||||
static void conv1_cond_init(float *mem, int len, int dilation, int *init)
|
||||
{
|
||||
if (!*init) {
|
||||
int i;
|
||||
for (i=0;i<dilation;i++) OPUS_CLEAR(&mem[i*len], len);
|
||||
}
|
||||
*init = 1;
|
||||
}
|
||||
|
||||
void DRED_rdovae_decode_all(const RDOVAEDec *model, float *features, const float *state, const float *latents, int nb_latents, int arch)
|
||||
{
|
||||
int i;
|
||||
RDOVAEDecState dec;
|
||||
memset(&dec, 0, sizeof(dec));
|
||||
dred_rdovae_dec_init_states(&dec, model, state, arch);
|
||||
for (i = 0; i < 2*nb_latents; i += 2)
|
||||
{
|
||||
dred_rdovae_decode_qframe(
|
||||
&dec,
|
||||
model,
|
||||
&features[2*i*DRED_NUM_FEATURES],
|
||||
&latents[(i/2)*DRED_LATENT_DIM],
|
||||
arch);
|
||||
}
|
||||
}
|
||||
|
||||
void dred_rdovae_dec_init_states(
|
||||
RDOVAEDecState *h, /* io: state buffer handle */
|
||||
const RDOVAEDec *model,
|
||||
const float *initial_state, /* i: initial state */
|
||||
int arch
|
||||
)
|
||||
{
|
||||
float hidden[DEC_HIDDEN_INIT_OUT_SIZE];
|
||||
float state_init[DEC_GRU1_STATE_SIZE+DEC_GRU2_STATE_SIZE+DEC_GRU3_STATE_SIZE+DEC_GRU4_STATE_SIZE+DEC_GRU5_STATE_SIZE];
|
||||
int counter=0;
|
||||
compute_generic_dense(&model->dec_hidden_init, hidden, initial_state, ACTIVATION_TANH, arch);
|
||||
compute_generic_dense(&model->dec_gru_init, state_init, hidden, ACTIVATION_TANH, arch);
|
||||
OPUS_COPY(h->gru1_state, state_init, DEC_GRU1_STATE_SIZE);
|
||||
counter += DEC_GRU1_STATE_SIZE;
|
||||
OPUS_COPY(h->gru2_state, &state_init[counter], DEC_GRU2_STATE_SIZE);
|
||||
counter += DEC_GRU2_STATE_SIZE;
|
||||
OPUS_COPY(h->gru3_state, &state_init[counter], DEC_GRU3_STATE_SIZE);
|
||||
counter += DEC_GRU3_STATE_SIZE;
|
||||
OPUS_COPY(h->gru4_state, &state_init[counter], DEC_GRU4_STATE_SIZE);
|
||||
counter += DEC_GRU4_STATE_SIZE;
|
||||
OPUS_COPY(h->gru5_state, &state_init[counter], DEC_GRU5_STATE_SIZE);
|
||||
h->initialized = 0;
|
||||
}
|
||||
|
||||
|
||||
void dred_rdovae_decode_qframe(
|
||||
RDOVAEDecState *dec_state, /* io: state buffer handle */
|
||||
const RDOVAEDec *model,
|
||||
float *qframe, /* o: quadruple feature frame (four concatenated frames in reverse order) */
|
||||
const float *input, /* i: latent vector */
|
||||
int arch
|
||||
)
|
||||
{
|
||||
float buffer[DEC_DENSE1_OUT_SIZE + DEC_GRU1_OUT_SIZE + DEC_GRU2_OUT_SIZE + DEC_GRU3_OUT_SIZE + DEC_GRU4_OUT_SIZE + DEC_GRU5_OUT_SIZE
|
||||
+ DEC_CONV1_OUT_SIZE + DEC_CONV2_OUT_SIZE + DEC_CONV3_OUT_SIZE + DEC_CONV4_OUT_SIZE + DEC_CONV5_OUT_SIZE];
|
||||
int output_index = 0;
|
||||
|
||||
/* run encoder stack and concatenate output in buffer*/
|
||||
compute_generic_dense(&model->dec_dense1, &buffer[output_index], input, ACTIVATION_TANH, arch);
|
||||
output_index += DEC_DENSE1_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->dec_gru1_input, &model->dec_gru1_recurrent, dec_state->gru1_state, buffer, arch);
|
||||
compute_glu(&model->dec_glu1, &buffer[output_index], dec_state->gru1_state, arch);
|
||||
output_index += DEC_GRU1_OUT_SIZE;
|
||||
conv1_cond_init(dec_state->conv1_state, output_index, 1, &dec_state->initialized);
|
||||
compute_generic_conv1d(&model->dec_conv1, &buffer[output_index], dec_state->conv1_state, buffer, output_index, ACTIVATION_TANH, arch);
|
||||
output_index += DEC_CONV1_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->dec_gru2_input, &model->dec_gru2_recurrent, dec_state->gru2_state, buffer, arch);
|
||||
compute_glu(&model->dec_glu2, &buffer[output_index], dec_state->gru2_state, arch);
|
||||
output_index += DEC_GRU2_OUT_SIZE;
|
||||
conv1_cond_init(dec_state->conv2_state, output_index, 1, &dec_state->initialized);
|
||||
compute_generic_conv1d(&model->dec_conv2, &buffer[output_index], dec_state->conv2_state, buffer, output_index, ACTIVATION_TANH, arch);
|
||||
output_index += DEC_CONV2_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->dec_gru3_input, &model->dec_gru3_recurrent, dec_state->gru3_state, buffer, arch);
|
||||
compute_glu(&model->dec_glu3, &buffer[output_index], dec_state->gru3_state, arch);
|
||||
output_index += DEC_GRU3_OUT_SIZE;
|
||||
conv1_cond_init(dec_state->conv3_state, output_index, 1, &dec_state->initialized);
|
||||
compute_generic_conv1d(&model->dec_conv3, &buffer[output_index], dec_state->conv3_state, buffer, output_index, ACTIVATION_TANH, arch);
|
||||
output_index += DEC_CONV3_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->dec_gru4_input, &model->dec_gru4_recurrent, dec_state->gru4_state, buffer, arch);
|
||||
compute_glu(&model->dec_glu4, &buffer[output_index], dec_state->gru4_state, arch);
|
||||
output_index += DEC_GRU4_OUT_SIZE;
|
||||
conv1_cond_init(dec_state->conv4_state, output_index, 1, &dec_state->initialized);
|
||||
compute_generic_conv1d(&model->dec_conv4, &buffer[output_index], dec_state->conv4_state, buffer, output_index, ACTIVATION_TANH, arch);
|
||||
output_index += DEC_CONV4_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->dec_gru5_input, &model->dec_gru5_recurrent, dec_state->gru5_state, buffer, arch);
|
||||
compute_glu(&model->dec_glu5, &buffer[output_index], dec_state->gru5_state, arch);
|
||||
output_index += DEC_GRU5_OUT_SIZE;
|
||||
conv1_cond_init(dec_state->conv5_state, output_index, 1, &dec_state->initialized);
|
||||
compute_generic_conv1d(&model->dec_conv5, &buffer[output_index], dec_state->conv5_state, buffer, output_index, ACTIVATION_TANH, arch);
|
||||
output_index += DEC_CONV5_OUT_SIZE;
|
||||
|
||||
compute_generic_dense(&model->dec_output, qframe, buffer, ACTIVATION_LINEAR, arch);
|
||||
}
|
||||
53
managed_components/78__esp-opus/dnn/dred_rdovae_dec.h
Normal file
53
managed_components/78__esp-opus/dnn/dred_rdovae_dec.h
Normal file
@@ -0,0 +1,53 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DRED_RDOVAE_DEC_H
|
||||
#define DRED_RDOVAE_DEC_H
|
||||
|
||||
#include "dred_rdovae.h"
|
||||
#include "dred_rdovae_dec_data.h"
|
||||
#include "dred_rdovae_stats_data.h"
|
||||
|
||||
struct RDOVAEDecStruct {
|
||||
int initialized;
|
||||
float gru1_state[DEC_GRU1_STATE_SIZE];
|
||||
float gru2_state[DEC_GRU2_STATE_SIZE];
|
||||
float gru3_state[DEC_GRU3_STATE_SIZE];
|
||||
float gru4_state[DEC_GRU4_STATE_SIZE];
|
||||
float gru5_state[DEC_GRU5_STATE_SIZE];
|
||||
float conv1_state[DEC_CONV1_STATE_SIZE];
|
||||
float conv2_state[DEC_CONV2_STATE_SIZE];
|
||||
float conv3_state[DEC_CONV3_STATE_SIZE];
|
||||
float conv4_state[DEC_CONV4_STATE_SIZE];
|
||||
float conv5_state[DEC_CONV5_STATE_SIZE];
|
||||
};
|
||||
|
||||
void dred_rdovae_dec_init_states(RDOVAEDecState *h, const RDOVAEDec *model, const float * initial_state, int arch);
|
||||
void dred_rdovae_decode_qframe(RDOVAEDecState *h, const RDOVAEDec *model, float *qframe, const float * z, int arch);
|
||||
void DRED_rdovae_decode_all(const RDOVAEDec *model, float *features, const float *state, const float *latents, int nb_latents, int arch);
|
||||
|
||||
#endif
|
||||
110
managed_components/78__esp-opus/dnn/dred_rdovae_enc.c
Normal file
110
managed_components/78__esp-opus/dnn/dred_rdovae_enc.c
Normal file
@@ -0,0 +1,110 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#include <math.h>
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
|
||||
#include "dred_rdovae_enc.h"
|
||||
#include "os_support.h"
|
||||
#include "dred_rdovae_constants.h"
|
||||
|
||||
static void conv1_cond_init(float *mem, int len, int dilation, int *init)
|
||||
{
|
||||
if (!*init) {
|
||||
int i;
|
||||
for (i=0;i<dilation;i++) OPUS_CLEAR(&mem[i*len], len);
|
||||
}
|
||||
*init = 1;
|
||||
}
|
||||
|
||||
void dred_rdovae_encode_dframe(
|
||||
RDOVAEEncState *enc_state, /* io: encoder state */
|
||||
const RDOVAEEnc *model,
|
||||
float *latents, /* o: latent vector */
|
||||
float *initial_state, /* o: initial state */
|
||||
const float *input, /* i: double feature frame (concatenated) */
|
||||
int arch
|
||||
)
|
||||
{
|
||||
float padded_latents[DRED_PADDED_LATENT_DIM];
|
||||
float padded_state[DRED_PADDED_STATE_DIM];
|
||||
float buffer[ENC_DENSE1_OUT_SIZE + ENC_GRU1_OUT_SIZE + ENC_GRU2_OUT_SIZE + ENC_GRU3_OUT_SIZE + ENC_GRU4_OUT_SIZE + ENC_GRU5_OUT_SIZE
|
||||
+ ENC_CONV1_OUT_SIZE + ENC_CONV2_OUT_SIZE + ENC_CONV3_OUT_SIZE + ENC_CONV4_OUT_SIZE + ENC_CONV5_OUT_SIZE];
|
||||
float state_hidden[GDENSE1_OUT_SIZE];
|
||||
int output_index = 0;
|
||||
|
||||
/* run encoder stack and concatenate output in buffer*/
|
||||
compute_generic_dense(&model->enc_dense1, &buffer[output_index], input, ACTIVATION_TANH, arch);
|
||||
output_index += ENC_DENSE1_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->enc_gru1_input, &model->enc_gru1_recurrent, enc_state->gru1_state, buffer, arch);
|
||||
OPUS_COPY(&buffer[output_index], enc_state->gru1_state, ENC_GRU1_OUT_SIZE);
|
||||
output_index += ENC_GRU1_OUT_SIZE;
|
||||
conv1_cond_init(enc_state->conv1_state, output_index, 1, &enc_state->initialized);
|
||||
compute_generic_conv1d(&model->enc_conv1, &buffer[output_index], enc_state->conv1_state, buffer, output_index, ACTIVATION_TANH, arch);
|
||||
output_index += ENC_CONV1_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->enc_gru2_input, &model->enc_gru2_recurrent, enc_state->gru2_state, buffer, arch);
|
||||
OPUS_COPY(&buffer[output_index], enc_state->gru2_state, ENC_GRU2_OUT_SIZE);
|
||||
output_index += ENC_GRU2_OUT_SIZE;
|
||||
conv1_cond_init(enc_state->conv2_state, output_index, 2, &enc_state->initialized);
|
||||
compute_generic_conv1d_dilation(&model->enc_conv2, &buffer[output_index], enc_state->conv2_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
|
||||
output_index += ENC_CONV2_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->enc_gru3_input, &model->enc_gru3_recurrent, enc_state->gru3_state, buffer, arch);
|
||||
OPUS_COPY(&buffer[output_index], enc_state->gru3_state, ENC_GRU3_OUT_SIZE);
|
||||
output_index += ENC_GRU3_OUT_SIZE;
|
||||
conv1_cond_init(enc_state->conv3_state, output_index, 2, &enc_state->initialized);
|
||||
compute_generic_conv1d_dilation(&model->enc_conv3, &buffer[output_index], enc_state->conv3_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
|
||||
output_index += ENC_CONV3_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->enc_gru4_input, &model->enc_gru4_recurrent, enc_state->gru4_state, buffer, arch);
|
||||
OPUS_COPY(&buffer[output_index], enc_state->gru4_state, ENC_GRU4_OUT_SIZE);
|
||||
output_index += ENC_GRU4_OUT_SIZE;
|
||||
conv1_cond_init(enc_state->conv4_state, output_index, 2, &enc_state->initialized);
|
||||
compute_generic_conv1d_dilation(&model->enc_conv4, &buffer[output_index], enc_state->conv4_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
|
||||
output_index += ENC_CONV4_OUT_SIZE;
|
||||
|
||||
compute_generic_gru(&model->enc_gru5_input, &model->enc_gru5_recurrent, enc_state->gru5_state, buffer, arch);
|
||||
OPUS_COPY(&buffer[output_index], enc_state->gru5_state, ENC_GRU5_OUT_SIZE);
|
||||
output_index += ENC_GRU5_OUT_SIZE;
|
||||
conv1_cond_init(enc_state->conv5_state, output_index, 2, &enc_state->initialized);
|
||||
compute_generic_conv1d_dilation(&model->enc_conv5, &buffer[output_index], enc_state->conv5_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
|
||||
output_index += ENC_CONV5_OUT_SIZE;
|
||||
|
||||
compute_generic_dense(&model->enc_zdense, padded_latents, buffer, ACTIVATION_LINEAR, arch);
|
||||
OPUS_COPY(latents, padded_latents, DRED_LATENT_DIM);
|
||||
|
||||
/* next, calculate initial state */
|
||||
compute_generic_dense(&model->gdense1, state_hidden, buffer, ACTIVATION_TANH, arch);
|
||||
compute_generic_dense(&model->gdense2, padded_state, state_hidden, ACTIVATION_LINEAR, arch);
|
||||
OPUS_COPY(initial_state, padded_state, DRED_STATE_DIM);
|
||||
}
|
||||
52
managed_components/78__esp-opus/dnn/dred_rdovae_enc.h
Normal file
52
managed_components/78__esp-opus/dnn/dred_rdovae_enc.h
Normal file
@@ -0,0 +1,52 @@
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef DRED_RDOVAE_ENC_H
|
||||
#define DRED_RDOVAE_ENC_H
|
||||
|
||||
#include "dred_rdovae.h"
|
||||
|
||||
#include "dred_rdovae_enc_data.h"
|
||||
|
||||
struct RDOVAEEncStruct {
|
||||
int initialized;
|
||||
float gru1_state[ENC_GRU1_STATE_SIZE];
|
||||
float gru2_state[ENC_GRU2_STATE_SIZE];
|
||||
float gru3_state[ENC_GRU3_STATE_SIZE];
|
||||
float gru4_state[ENC_GRU4_STATE_SIZE];
|
||||
float gru5_state[ENC_GRU5_STATE_SIZE];
|
||||
float conv1_state[ENC_CONV1_STATE_SIZE];
|
||||
float conv2_state[2*ENC_CONV2_STATE_SIZE];
|
||||
float conv3_state[2*ENC_CONV3_STATE_SIZE];
|
||||
float conv4_state[2*ENC_CONV4_STATE_SIZE];
|
||||
float conv5_state[2*ENC_CONV5_STATE_SIZE];
|
||||
};
|
||||
|
||||
void dred_rdovae_encode_dframe(RDOVAEEncState *enc_state, const RDOVAEEnc *model, float *latents, float *initial_state, const float *input, int arch);
|
||||
|
||||
|
||||
#endif
|
||||
238
managed_components/78__esp-opus/dnn/dump_data.c
Normal file
238
managed_components/78__esp-opus/dnn/dump_data.c
Normal file
@@ -0,0 +1,238 @@
|
||||
/* Copyright (c) 2017-2018 Mozilla */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
#include "kiss_fft.h"
|
||||
#include "common.h"
|
||||
#include <math.h>
|
||||
#include "freq.h"
|
||||
#include "pitch.h"
|
||||
#include "arch.h"
|
||||
#include <assert.h>
|
||||
#include "lpcnet.h"
|
||||
#include "lpcnet_private.h"
|
||||
#include "os_support.h"
|
||||
#include "cpu_support.h"
|
||||
|
||||
|
||||
static void biquad(float *y, float mem[2], const float *x, const float *b, const float *a, int N) {
|
||||
int i;
|
||||
for (i=0;i<N;i++) {
|
||||
float xi, yi;
|
||||
xi = x[i];
|
||||
yi = x[i] + mem[0];
|
||||
mem[0] = mem[1] + (b[0]*(double)xi - a[0]*(double)yi);
|
||||
mem[1] = (b[1]*(double)xi - a[1]*(double)yi);
|
||||
y[i] = yi;
|
||||
}
|
||||
}
|
||||
|
||||
static float uni_rand(void) {
|
||||
return rand()/(double)RAND_MAX-.5;
|
||||
}
|
||||
|
||||
static void rand_resp(float *a, float *b) {
|
||||
a[0] = .75*uni_rand();
|
||||
a[1] = .75*uni_rand();
|
||||
b[0] = .75*uni_rand();
|
||||
b[1] = .75*uni_rand();
|
||||
}
|
||||
|
||||
static opus_int16 float2short(float x)
|
||||
{
|
||||
int i;
|
||||
i = (int)floor(.5+x);
|
||||
return IMAX(-32767, IMIN(32767, i));
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
int i;
|
||||
char *argv0;
|
||||
int count=0;
|
||||
static const float a_hp[2] = {-1.99599, 0.99600};
|
||||
static const float b_hp[2] = {-2, 1};
|
||||
float a_sig[2] = {0};
|
||||
float b_sig[2] = {0};
|
||||
float mem_hp_x[2]={0};
|
||||
float mem_resp_x[2]={0};
|
||||
float mem_preemph=0;
|
||||
float x[FRAME_SIZE];
|
||||
int gain_change_count=0;
|
||||
FILE *f1;
|
||||
FILE *ffeat;
|
||||
FILE *fpcm=NULL;
|
||||
opus_int16 pcm[FRAME_SIZE]={0};
|
||||
opus_int16 tmp[FRAME_SIZE] = {0};
|
||||
float speech_gain=1;
|
||||
float old_speech_gain = 1;
|
||||
int one_pass_completed = 0;
|
||||
LPCNetEncState *st;
|
||||
int training = -1;
|
||||
int burg = 0;
|
||||
int pitch = 0;
|
||||
FILE *fnoise = NULL;
|
||||
float noise_gain = 0;
|
||||
long noise_size=0;
|
||||
int arch;
|
||||
srand(getpid());
|
||||
arch = opus_select_arch();
|
||||
st = lpcnet_encoder_create();
|
||||
argv0=argv[0];
|
||||
if (argc == 5 && strcmp(argv[1], "-btrain")==0) {
|
||||
burg = 1;
|
||||
training = 1;
|
||||
}
|
||||
else if (argc == 4 && strcmp(argv[1], "-btest")==0) {
|
||||
burg = 1;
|
||||
training = 0;
|
||||
}
|
||||
else if (argc == 5 && strcmp(argv[1], "-ptrain")==0) {
|
||||
pitch = 1;
|
||||
training = 1;
|
||||
fnoise = fopen(argv[2], "rb");
|
||||
fseek(fnoise, 0, SEEK_END);
|
||||
noise_size = ftell(fnoise);
|
||||
fseek(fnoise, 0, SEEK_SET);
|
||||
argv++;
|
||||
}
|
||||
else if (argc == 4 && strcmp(argv[1], "-ptest")==0) {
|
||||
pitch = 1;
|
||||
training = 0;
|
||||
}
|
||||
else if (argc == 5 && strcmp(argv[1], "-train")==0) training = 1;
|
||||
else if (argc == 4 && strcmp(argv[1], "-test")==0) training = 0;
|
||||
if (training == -1) {
|
||||
fprintf(stderr, "usage: %s -train <speech> <features out> <pcm out>\n", argv0);
|
||||
fprintf(stderr, " or %s -test <speech> <features out>\n", argv0);
|
||||
return 1;
|
||||
}
|
||||
f1 = fopen(argv[2], "r");
|
||||
if (f1 == NULL) {
|
||||
fprintf(stderr,"Error opening input .s16 16kHz speech input file: %s\n", argv[2]);
|
||||
exit(1);
|
||||
}
|
||||
ffeat = fopen(argv[3], "wb");
|
||||
if (ffeat == NULL) {
|
||||
fprintf(stderr,"Error opening output feature file: %s\n", argv[3]);
|
||||
exit(1);
|
||||
}
|
||||
if (training && !pitch) {
|
||||
fpcm = fopen(argv[4], "wb");
|
||||
if (fpcm == NULL) {
|
||||
fprintf(stderr,"Error opening output PCM file: %s\n", argv[4]);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
while (1) {
|
||||
size_t ret;
|
||||
ret = fread(tmp, sizeof(opus_int16), FRAME_SIZE, f1);
|
||||
if (feof(f1) || ret != FRAME_SIZE) {
|
||||
if (!training) break;
|
||||
rewind(f1);
|
||||
ret = fread(tmp, sizeof(opus_int16), FRAME_SIZE, f1);
|
||||
if (ret != FRAME_SIZE) {
|
||||
fprintf(stderr, "error reading\n");
|
||||
exit(1);
|
||||
}
|
||||
one_pass_completed = 1;
|
||||
}
|
||||
for (i=0;i<FRAME_SIZE;i++) x[i] = tmp[i];
|
||||
if (count*FRAME_SIZE_5MS>=10000000 && one_pass_completed) break;
|
||||
if (training && ++gain_change_count > 2821) {
|
||||
speech_gain = pow(10., (-30+(rand()%40))/20.);
|
||||
if (rand()&1) speech_gain = -speech_gain;
|
||||
if (rand()%20==0) speech_gain *= .01;
|
||||
if (!pitch && rand()%100==0) speech_gain = 0;
|
||||
gain_change_count = 0;
|
||||
rand_resp(a_sig, b_sig);
|
||||
if (fnoise != NULL) {
|
||||
long pos;
|
||||
/* Randomize the fraction because rand() only gives us 31 bits. */
|
||||
float frac_pos = rand()/(float)RAND_MAX;
|
||||
pos = (long)(frac_pos*noise_size);
|
||||
/* 32-bit alignment. */
|
||||
pos = pos/4 * 4;
|
||||
if (pos > noise_size-500000) pos = noise_size-500000;
|
||||
noise_gain = pow(10., (-15+(rand()%40))/20.);
|
||||
if (rand()%10==0) noise_gain = 0;
|
||||
fseek(fnoise, pos, SEEK_SET);
|
||||
}
|
||||
}
|
||||
if (fnoise != NULL) {
|
||||
opus_int16 noise[FRAME_SIZE];
|
||||
ret = fread(noise, sizeof(opus_int16), FRAME_SIZE, fnoise);
|
||||
for (i=0;i<FRAME_SIZE;i++) x[i] += noise[i]*noise_gain;
|
||||
}
|
||||
biquad(x, mem_hp_x, x, b_hp, a_hp, FRAME_SIZE);
|
||||
biquad(x, mem_resp_x, x, b_sig, a_sig, FRAME_SIZE);
|
||||
for (i=0;i<FRAME_SIZE;i++) {
|
||||
float g;
|
||||
float f = (float)i/FRAME_SIZE;
|
||||
g = f*speech_gain + (1-f)*old_speech_gain;
|
||||
x[i] *= g;
|
||||
}
|
||||
if (burg) {
|
||||
float ceps[2*NB_BANDS];
|
||||
burg_cepstral_analysis(ceps, x);
|
||||
fwrite(ceps, sizeof(float), 2*NB_BANDS, ffeat);
|
||||
}
|
||||
preemphasis(x, &mem_preemph, x, PREEMPHASIS, FRAME_SIZE);
|
||||
/* PCM is delayed by 1/2 frame to make the features centered on the frames. */
|
||||
for (i=0;i<FRAME_SIZE-TRAINING_OFFSET;i++) pcm[i+TRAINING_OFFSET] = float2short(x[i]);
|
||||
compute_frame_features(st, x, arch);
|
||||
|
||||
if (pitch) {
|
||||
signed char pitch_features[PITCH_MAX_PERIOD-PITCH_MIN_PERIOD+PITCH_IF_FEATURES];
|
||||
for (i=0;i<PITCH_MAX_PERIOD-PITCH_MIN_PERIOD;i++) {
|
||||
pitch_features[i] = (int)floor(.5f + 127.f*st->xcorr_features[i]);
|
||||
}
|
||||
for (i=0;i<PITCH_IF_FEATURES;i++) {
|
||||
pitch_features[i+PITCH_MAX_PERIOD-PITCH_MIN_PERIOD] = (int)floor(.5f + 127.f*st->if_features[i]);
|
||||
}
|
||||
fwrite(pitch_features, PITCH_MAX_PERIOD-PITCH_MIN_PERIOD+PITCH_IF_FEATURES, 1, ffeat);
|
||||
} else {
|
||||
fwrite(st->features, sizeof(float), NB_TOTAL_FEATURES, ffeat);
|
||||
}
|
||||
/*if(pitch) fwrite(pcm, FRAME_SIZE, 2, stdout);*/
|
||||
if (fpcm) fwrite(pcm, FRAME_SIZE, 2, fpcm);
|
||||
/*if (fpcm) fwrite(pcm, sizeof(opus_int16), FRAME_SIZE, fpcm);*/
|
||||
for (i=0;i<TRAINING_OFFSET;i++) pcm[i] = float2short(x[i+FRAME_SIZE-TRAINING_OFFSET]);
|
||||
old_speech_gain = speech_gain;
|
||||
count++;
|
||||
}
|
||||
fclose(f1);
|
||||
fclose(ffeat);
|
||||
if (fpcm) fclose(fpcm);
|
||||
lpcnet_encoder_destroy(st);
|
||||
return 0;
|
||||
}
|
||||
104
managed_components/78__esp-opus/dnn/dump_lpcnet_tables.c
Normal file
104
managed_components/78__esp-opus/dnn/dump_lpcnet_tables.c
Normal file
@@ -0,0 +1,104 @@
|
||||
/* Copyright (c) 2017-2018 Mozilla
|
||||
Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <math.h>
|
||||
#include <stdio.h>
|
||||
#include "freq.h"
|
||||
#include "kiss_fft.h"
|
||||
|
||||
|
||||
int main(void) {
|
||||
int i;
|
||||
FILE *file;
|
||||
kiss_fft_state *kfft;
|
||||
float half_window[OVERLAP_SIZE];
|
||||
float dct_table[NB_BANDS*NB_BANDS];
|
||||
|
||||
file=fopen("lpcnet_tables.c", "wb");
|
||||
fprintf(file, "/* The contents of this file was automatically generated by dump_lpcnet_tables.c*/\n\n");
|
||||
fprintf(file, "#ifdef HAVE_CONFIG_H\n");
|
||||
fprintf(file, "#include \"config.h\"\n");
|
||||
fprintf(file, "#endif\n");
|
||||
|
||||
fprintf(file, "#include \"kiss_fft.h\"\n\n");
|
||||
|
||||
kfft = opus_fft_alloc_twiddles(WINDOW_SIZE, NULL, NULL, NULL, 0);
|
||||
|
||||
fprintf(file, "static const arch_fft_state arch_fft = {0, NULL};\n\n");
|
||||
|
||||
fprintf (file, "static const opus_int16 fft_bitrev[%d] = {\n", kfft->nfft);
|
||||
for (i=0;i<kfft->nfft;i++)
|
||||
fprintf (file, "%d,%c", kfft->bitrev[i],(i+16)%15==0?'\n':' ');
|
||||
fprintf (file, "};\n\n");
|
||||
|
||||
fprintf (file, "static const kiss_twiddle_cpx fft_twiddles[%d] = {\n", kfft->nfft);
|
||||
for (i=0;i<kfft->nfft;i++)
|
||||
fprintf (file, "{%#0.9gf, %#0.9gf},%c", kfft->twiddles[i].r, kfft->twiddles[i].i,(i+3)%2==0?'\n':' ');
|
||||
fprintf (file, "};\n\n");
|
||||
|
||||
|
||||
fprintf(file, "const kiss_fft_state kfft = {\n");
|
||||
fprintf(file, "%d, /* nfft */\n", kfft->nfft);
|
||||
fprintf(file, "%#0.8gf, /* scale */\n", kfft->scale);
|
||||
fprintf(file, "%d, /* shift */\n", kfft->shift);
|
||||
fprintf(file, "{");
|
||||
for (i=0;i<2*MAXFACTORS;i++) {
|
||||
fprintf(file, "%d, ", kfft->factors[i]);
|
||||
}
|
||||
fprintf(file, "}, /* factors */\n");
|
||||
fprintf(file, "fft_bitrev, /* bitrev*/\n");
|
||||
fprintf(file, "fft_twiddles, /* twiddles*/\n");
|
||||
fprintf(file, "(arch_fft_state *)&arch_fft, /* arch_fft*/\n");
|
||||
|
||||
fprintf(file, "};\n\n");
|
||||
|
||||
for (i=0;i<OVERLAP_SIZE;i++)
|
||||
half_window[i] = sin(.5*M_PI*sin(.5*M_PI*(i+.5)/OVERLAP_SIZE) * sin(.5*M_PI*(i+.5)/OVERLAP_SIZE));
|
||||
fprintf(file, "const float half_window[] = {\n");
|
||||
for (i=0;i<OVERLAP_SIZE;i++)
|
||||
fprintf (file, "%#0.9gf,%c", half_window[i],(i+6)%5==0?'\n':' ');
|
||||
fprintf(file, "};\n\n");
|
||||
|
||||
for (i=0;i<NB_BANDS;i++) {
|
||||
int j;
|
||||
for (j=0;j<NB_BANDS;j++) {
|
||||
dct_table[i*NB_BANDS + j] = cos((i+.5)*j*M_PI/NB_BANDS);
|
||||
if (j==0) dct_table[i*NB_BANDS + j] *= sqrt(.5);
|
||||
}
|
||||
}
|
||||
fprintf(file, "const float dct_table[] = {\n");
|
||||
for (i=0;i<NB_BANDS*NB_BANDS;i++)
|
||||
fprintf (file, "%#0.9gf,%c", dct_table[i],(i+6)%5==0?'\n':' ');
|
||||
fprintf(file, "};\n");
|
||||
|
||||
fclose(file);
|
||||
return 0;
|
||||
}
|
||||
225
managed_components/78__esp-opus/dnn/fargan.c
Normal file
225
managed_components/78__esp-opus/dnn/fargan.c
Normal file
@@ -0,0 +1,225 @@
|
||||
/* Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "fargan.h"
|
||||
#include "os_support.h"
|
||||
#include "freq.h"
|
||||
#include "fargan_data.h"
|
||||
#include "lpcnet.h"
|
||||
#include "pitch.h"
|
||||
#include "nnet.h"
|
||||
#include "lpcnet_private.h"
|
||||
#include "cpu_support.h"
|
||||
|
||||
#define FARGAN_FEATURES (NB_FEATURES)
|
||||
|
||||
static void compute_fargan_cond(FARGANState *st, float *cond, const float *features, int period)
|
||||
{
|
||||
FARGAN *model;
|
||||
float dense_in[NB_FEATURES+COND_NET_PEMBED_OUT_SIZE];
|
||||
float conv1_in[COND_NET_FCONV1_IN_SIZE];
|
||||
float fdense2_in[COND_NET_FCONV1_OUT_SIZE];
|
||||
model = &st->model;
|
||||
celt_assert(FARGAN_FEATURES+COND_NET_PEMBED_OUT_SIZE == model->cond_net_fdense1.nb_inputs);
|
||||
celt_assert(COND_NET_FCONV1_IN_SIZE == model->cond_net_fdense1.nb_outputs);
|
||||
celt_assert(COND_NET_FCONV1_OUT_SIZE == model->cond_net_fconv1.nb_outputs);
|
||||
OPUS_COPY(&dense_in[NB_FEATURES], &model->cond_net_pembed.float_weights[IMAX(0,IMIN(period-32, 223))*COND_NET_PEMBED_OUT_SIZE], COND_NET_PEMBED_OUT_SIZE);
|
||||
OPUS_COPY(dense_in, features, NB_FEATURES);
|
||||
|
||||
compute_generic_dense(&model->cond_net_fdense1, conv1_in, dense_in, ACTIVATION_TANH, st->arch);
|
||||
compute_generic_conv1d(&model->cond_net_fconv1, fdense2_in, st->cond_conv1_state, conv1_in, COND_NET_FCONV1_IN_SIZE, ACTIVATION_TANH, st->arch);
|
||||
compute_generic_dense(&model->cond_net_fdense2, cond, fdense2_in, ACTIVATION_TANH, st->arch);
|
||||
}
|
||||
|
||||
static void fargan_deemphasis(float *pcm, float *deemph_mem) {
|
||||
int i;
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) {
|
||||
pcm[i] += FARGAN_DEEMPHASIS * *deemph_mem;
|
||||
*deemph_mem = pcm[i];
|
||||
}
|
||||
}
|
||||
|
||||
static void run_fargan_subframe(FARGANState *st, float *pcm, const float *cond, int period)
|
||||
{
|
||||
int i, pos;
|
||||
float fwc0_in[SIG_NET_INPUT_SIZE];
|
||||
float gru1_in[SIG_NET_FWC0_CONV_OUT_SIZE+2*FARGAN_SUBFRAME_SIZE];
|
||||
float gru2_in[SIG_NET_GRU1_OUT_SIZE+2*FARGAN_SUBFRAME_SIZE];
|
||||
float gru3_in[SIG_NET_GRU2_OUT_SIZE+2*FARGAN_SUBFRAME_SIZE];
|
||||
float pred[FARGAN_SUBFRAME_SIZE+4];
|
||||
float prev[FARGAN_SUBFRAME_SIZE];
|
||||
float pitch_gate[4];
|
||||
float gain;
|
||||
float gain_1;
|
||||
float skip_cat[10000];
|
||||
float skip_out[SIG_NET_SKIP_DENSE_OUT_SIZE];
|
||||
FARGAN *model;
|
||||
|
||||
celt_assert(st->cont_initialized);
|
||||
model = &st->model;
|
||||
|
||||
compute_generic_dense(&model->sig_net_cond_gain_dense, &gain, cond, ACTIVATION_LINEAR, st->arch);
|
||||
gain = exp(gain);
|
||||
gain_1 = 1.f/(1e-5f + gain);
|
||||
|
||||
pos = PITCH_MAX_PERIOD-period-2;
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE+4;i++) {
|
||||
pred[i] = MIN32(1.f, MAX32(-1.f, gain_1*st->pitch_buf[IMAX(0, pos)]));
|
||||
pos++;
|
||||
if (pos == PITCH_MAX_PERIOD) pos -= period;
|
||||
}
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) prev[i] = MAX32(-1.f, MIN16(1.f, gain_1*st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE+i]));
|
||||
|
||||
OPUS_COPY(&fwc0_in[0], &cond[0], FARGAN_COND_SIZE);
|
||||
OPUS_COPY(&fwc0_in[FARGAN_COND_SIZE], pred, FARGAN_SUBFRAME_SIZE+4);
|
||||
OPUS_COPY(&fwc0_in[FARGAN_COND_SIZE+FARGAN_SUBFRAME_SIZE+4], prev, FARGAN_SUBFRAME_SIZE);
|
||||
|
||||
compute_generic_conv1d(&model->sig_net_fwc0_conv, gru1_in, st->fwc0_mem, fwc0_in, SIG_NET_INPUT_SIZE, ACTIVATION_TANH, st->arch);
|
||||
celt_assert(SIG_NET_FWC0_GLU_GATE_OUT_SIZE == model->sig_net_fwc0_glu_gate.nb_outputs);
|
||||
compute_glu(&model->sig_net_fwc0_glu_gate, gru1_in, gru1_in, st->arch);
|
||||
|
||||
compute_generic_dense(&model->sig_net_gain_dense_out, pitch_gate, gru1_in, ACTIVATION_SIGMOID, st->arch);
|
||||
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) gru1_in[SIG_NET_FWC0_GLU_GATE_OUT_SIZE+i] = pitch_gate[0]*pred[i+2];
|
||||
OPUS_COPY(&gru1_in[SIG_NET_FWC0_GLU_GATE_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
|
||||
compute_generic_gru(&model->sig_net_gru1_input, &model->sig_net_gru1_recurrent, st->gru1_state, gru1_in, st->arch);
|
||||
compute_glu(&model->sig_net_gru1_glu_gate, gru2_in, st->gru1_state, st->arch);
|
||||
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) gru2_in[SIG_NET_GRU1_OUT_SIZE+i] = pitch_gate[1]*pred[i+2];
|
||||
OPUS_COPY(&gru2_in[SIG_NET_GRU1_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
|
||||
compute_generic_gru(&model->sig_net_gru2_input, &model->sig_net_gru2_recurrent, st->gru2_state, gru2_in, st->arch);
|
||||
compute_glu(&model->sig_net_gru2_glu_gate, gru3_in, st->gru2_state, st->arch);
|
||||
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) gru3_in[SIG_NET_GRU2_OUT_SIZE+i] = pitch_gate[2]*pred[i+2];
|
||||
OPUS_COPY(&gru3_in[SIG_NET_GRU2_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
|
||||
compute_generic_gru(&model->sig_net_gru3_input, &model->sig_net_gru3_recurrent, st->gru3_state, gru3_in, st->arch);
|
||||
compute_glu(&model->sig_net_gru3_glu_gate, &skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE], st->gru3_state, st->arch);
|
||||
|
||||
OPUS_COPY(skip_cat, gru2_in, SIG_NET_GRU1_OUT_SIZE);
|
||||
OPUS_COPY(&skip_cat[SIG_NET_GRU1_OUT_SIZE], gru3_in, SIG_NET_GRU2_OUT_SIZE);
|
||||
OPUS_COPY(&skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE+SIG_NET_GRU3_OUT_SIZE], gru1_in, SIG_NET_FWC0_CONV_OUT_SIZE);
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE+SIG_NET_GRU3_OUT_SIZE+SIG_NET_FWC0_CONV_OUT_SIZE+i] = pitch_gate[3]*pred[i+2];
|
||||
OPUS_COPY(&skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE+SIG_NET_GRU3_OUT_SIZE+SIG_NET_FWC0_CONV_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
|
||||
|
||||
compute_generic_dense(&model->sig_net_skip_dense, skip_out, skip_cat, ACTIVATION_TANH, st->arch);
|
||||
compute_glu(&model->sig_net_skip_glu_gate, skip_out, skip_out, st->arch);
|
||||
|
||||
compute_generic_dense(&model->sig_net_sig_dense_out, pcm, skip_out, ACTIVATION_TANH, st->arch);
|
||||
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) pcm[i] *= gain;
|
||||
|
||||
OPUS_MOVE(st->pitch_buf, &st->pitch_buf[FARGAN_SUBFRAME_SIZE], PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE);
|
||||
OPUS_COPY(&st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE], pcm, FARGAN_SUBFRAME_SIZE);
|
||||
fargan_deemphasis(pcm, &st->deemph_mem);
|
||||
}
|
||||
|
||||
void fargan_cont(FARGANState *st, const float *pcm0, const float *features0)
|
||||
{
|
||||
int i;
|
||||
float cond[COND_NET_FDENSE2_OUT_SIZE];
|
||||
float x0[FARGAN_CONT_SAMPLES];
|
||||
float dummy[FARGAN_SUBFRAME_SIZE];
|
||||
int period=0;
|
||||
|
||||
/* Pre-load features. */
|
||||
for (i=0;i<5;i++) {
|
||||
const float *features = &features0[i*NB_FEATURES];
|
||||
st->last_period = period;
|
||||
period = (int)floor(.5+256./pow(2.f,((1./60.)*((features[NB_BANDS]+1.5)*60))));
|
||||
compute_fargan_cond(st, cond, features, period);
|
||||
}
|
||||
|
||||
x0[0] = 0;
|
||||
for (i=1;i<FARGAN_CONT_SAMPLES;i++) {
|
||||
x0[i] = pcm0[i] - FARGAN_DEEMPHASIS*pcm0[i-1];
|
||||
}
|
||||
|
||||
OPUS_COPY(&st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_FRAME_SIZE], x0, FARGAN_FRAME_SIZE);
|
||||
st->cont_initialized = 1;
|
||||
|
||||
for (i=0;i<FARGAN_NB_SUBFRAMES;i++) {
|
||||
run_fargan_subframe(st, dummy, &cond[i*FARGAN_COND_SIZE], st->last_period);
|
||||
OPUS_COPY(&st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE], &x0[FARGAN_FRAME_SIZE+i*FARGAN_SUBFRAME_SIZE], FARGAN_SUBFRAME_SIZE);
|
||||
}
|
||||
st->deemph_mem = pcm0[FARGAN_CONT_SAMPLES-1];
|
||||
}
|
||||
|
||||
|
||||
void fargan_init(FARGANState *st)
|
||||
{
|
||||
int ret;
|
||||
OPUS_CLEAR(st, 1);
|
||||
st->arch = opus_select_arch();
|
||||
#ifndef USE_WEIGHTS_FILE
|
||||
ret = init_fargan(&st->model, fargan_arrays);
|
||||
#else
|
||||
ret = 0;
|
||||
#endif
|
||||
celt_assert(ret == 0);
|
||||
}
|
||||
|
||||
int fargan_load_model(FARGANState *st, const void *data, int len) {
|
||||
WeightArray *list;
|
||||
int ret;
|
||||
parse_weights(&list, data, len);
|
||||
ret = init_fargan(&st->model, list);
|
||||
opus_free(list);
|
||||
if (ret == 0) return 0;
|
||||
else return -1;
|
||||
}
|
||||
|
||||
static void fargan_synthesize_impl(FARGANState *st, float *pcm, const float *features)
|
||||
{
|
||||
int subframe;
|
||||
float cond[COND_NET_FDENSE2_OUT_SIZE];
|
||||
int period;
|
||||
celt_assert(st->cont_initialized);
|
||||
|
||||
period = (int)floor(.5+256./pow(2.f,((1./60.)*((features[NB_BANDS]+1.5)*60))));
|
||||
compute_fargan_cond(st, cond, features, period);
|
||||
for (subframe=0;subframe<FARGAN_NB_SUBFRAMES;subframe++) {
|
||||
float *sub_cond;
|
||||
sub_cond = &cond[subframe*FARGAN_COND_SIZE];
|
||||
run_fargan_subframe(st, &pcm[subframe*FARGAN_SUBFRAME_SIZE], sub_cond, st->last_period);
|
||||
}
|
||||
st->last_period = period;
|
||||
}
|
||||
|
||||
void fargan_synthesize(FARGANState *st, float *pcm, const float *features)
|
||||
{
|
||||
fargan_synthesize_impl(st, pcm, features);
|
||||
}
|
||||
|
||||
void fargan_synthesize_int(FARGANState *st, opus_int16 *pcm, const float *features)
|
||||
{
|
||||
int i;
|
||||
float fpcm[FARGAN_FRAME_SIZE];
|
||||
fargan_synthesize(st, fpcm, features);
|
||||
for (i=0;i<LPCNET_FRAME_SIZE;i++) pcm[i] = (int)floor(.5 + MIN32(32767, MAX32(-32767, 32768.f*fpcm[i])));
|
||||
}
|
||||
68
managed_components/78__esp-opus/dnn/fargan.h
Normal file
68
managed_components/78__esp-opus/dnn/fargan.h
Normal file
@@ -0,0 +1,68 @@
|
||||
/* Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef FARGAN_H
|
||||
#define FARGAN_H
|
||||
|
||||
#include "freq.h"
|
||||
#include "fargan_data.h"
|
||||
#include "pitchdnn.h"
|
||||
|
||||
#define FARGAN_CONT_SAMPLES 320
|
||||
#define FARGAN_NB_SUBFRAMES 4
|
||||
#define FARGAN_SUBFRAME_SIZE 40
|
||||
#define FARGAN_FRAME_SIZE (FARGAN_NB_SUBFRAMES*FARGAN_SUBFRAME_SIZE)
|
||||
#define FARGAN_COND_SIZE (COND_NET_FDENSE2_OUT_SIZE/FARGAN_NB_SUBFRAMES)
|
||||
#define FARGAN_DEEMPHASIS 0.85f
|
||||
|
||||
#define SIG_NET_INPUT_SIZE (FARGAN_COND_SIZE+2*FARGAN_SUBFRAME_SIZE+4)
|
||||
#define SIG_NET_FWC0_STATE_SIZE (2*SIG_NET_INPUT_SIZE)
|
||||
|
||||
#define FARGAN_MAX_RNN_NEURONS SIG_NET_GRU1_OUT_SIZE
|
||||
typedef struct {
|
||||
FARGAN model;
|
||||
int arch;
|
||||
int cont_initialized;
|
||||
float deemph_mem;
|
||||
float pitch_buf[PITCH_MAX_PERIOD];
|
||||
float cond_conv1_state[COND_NET_FCONV1_STATE_SIZE];
|
||||
float fwc0_mem[SIG_NET_FWC0_STATE_SIZE];
|
||||
float gru1_state[SIG_NET_GRU1_STATE_SIZE];
|
||||
float gru2_state[SIG_NET_GRU2_STATE_SIZE];
|
||||
float gru3_state[SIG_NET_GRU3_STATE_SIZE];
|
||||
int last_period;
|
||||
} FARGANState;
|
||||
|
||||
void fargan_init(FARGANState *st);
|
||||
int fargan_load_model(FARGANState *st, const void *data, int len);
|
||||
|
||||
void fargan_cont(FARGANState *st, const float *pcm0, const float *features0);
|
||||
|
||||
void fargan_synthesize(FARGANState *st, float *pcm, const float *features);
|
||||
void fargan_synthesize_int(FARGANState *st, opus_int16 *pcm, const float *features);
|
||||
|
||||
|
||||
#endif /* FARGAN_H */
|
||||
217
managed_components/78__esp-opus/dnn/fargan_demo.c
Normal file
217
managed_components/78__esp-opus/dnn/fargan_demo.c
Normal file
@@ -0,0 +1,217 @@
|
||||
/* Copyright (c) 2018 Mozilla */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <math.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include <stdlib.h>
|
||||
#include "arch.h"
|
||||
#include "lpcnet.h"
|
||||
#include "freq.h"
|
||||
#include "os_support.h"
|
||||
#include "fargan.h"
|
||||
#include "cpu_support.h"
|
||||
|
||||
#ifdef USE_WEIGHTS_FILE
|
||||
# if __unix__
|
||||
# include <fcntl.h>
|
||||
# include <sys/mman.h>
|
||||
# include <unistd.h>
|
||||
# include <sys/stat.h>
|
||||
/* When available, mmap() is preferable to reading the file, as it leads to
|
||||
better resource utilization, especially if multiple processes are using the same
|
||||
file (mapping will be shared in cache). */
|
||||
void *load_blob(const char *filename, int *len) {
|
||||
int fd;
|
||||
void *data;
|
||||
struct stat st;
|
||||
if (stat(filename, &st)) {
|
||||
*len = 0;
|
||||
return NULL;
|
||||
}
|
||||
*len = st.st_size;
|
||||
fd = open(filename, O_RDONLY);
|
||||
if (fd<0) {
|
||||
*len = 0;
|
||||
return NULL;
|
||||
}
|
||||
data = mmap(NULL, *len, PROT_READ, MAP_SHARED, fd, 0);
|
||||
if (data == MAP_FAILED) {
|
||||
*len = 0;
|
||||
data = NULL;
|
||||
}
|
||||
close(fd);
|
||||
return data;
|
||||
}
|
||||
void free_blob(void *blob, int len) {
|
||||
if (blob) munmap(blob, len);
|
||||
}
|
||||
# else
|
||||
void *load_blob(const char *filename, int *len) {
|
||||
FILE *file;
|
||||
void *data;
|
||||
file = fopen(filename, "r");
|
||||
if (file == NULL)
|
||||
{
|
||||
perror("could not open blob file");
|
||||
*len = 0;
|
||||
return NULL;
|
||||
}
|
||||
fseek(file, 0L, SEEK_END);
|
||||
*len = ftell(file);
|
||||
fseek(file, 0L, SEEK_SET);
|
||||
if (*len <= 0) {
|
||||
*len = 0;
|
||||
return NULL;
|
||||
}
|
||||
data = malloc(*len);
|
||||
if (!data) {
|
||||
*len = 0;
|
||||
return NULL;
|
||||
}
|
||||
*len = fread(data, 1, *len, file);
|
||||
return data;
|
||||
}
|
||||
void free_blob(void *blob, int len) {
|
||||
free(blob);
|
||||
(void)len;
|
||||
}
|
||||
# endif
|
||||
#endif
|
||||
|
||||
#define MODE_FEATURES 2
|
||||
/*#define MODE_SYNTHESIS 3*/
|
||||
#define MODE_ADDLPC 5
|
||||
#define MODE_FWGAN_SYNTHESIS 6
|
||||
#define MODE_FARGAN_SYNTHESIS 7
|
||||
|
||||
void usage(void) {
|
||||
fprintf(stderr, "usage: lpcnet_demo -features <input.pcm> <features.f32>\n");
|
||||
fprintf(stderr, " lpcnet_demo -fargan-synthesis <features.f32> <output.pcm>\n");
|
||||
fprintf(stderr, " lpcnet_demo -addlpc <features_without_lpc.f32> <features_with_lpc.lpc>\n\n");
|
||||
fprintf(stderr, " plc_options:\n");
|
||||
fprintf(stderr, " causal: normal (causal) PLC\n");
|
||||
fprintf(stderr, " codec: normal (causal) PLC without cross-fade (will glitch)\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
int mode=0;
|
||||
int arch;
|
||||
FILE *fin, *fout;
|
||||
#ifdef USE_WEIGHTS_FILE
|
||||
int len;
|
||||
void *data;
|
||||
const char *filename = "weights_blob.bin";
|
||||
#endif
|
||||
arch = opus_select_arch();
|
||||
if (argc < 4) usage();
|
||||
if (strcmp(argv[1], "-features") == 0) mode=MODE_FEATURES;
|
||||
else if (strcmp(argv[1], "-fargan-synthesis") == 0) mode=MODE_FARGAN_SYNTHESIS;
|
||||
else if (strcmp(argv[1], "-addlpc") == 0){
|
||||
mode=MODE_ADDLPC;
|
||||
} else {
|
||||
usage();
|
||||
}
|
||||
if (argc != 4) usage();
|
||||
fin = fopen(argv[2], "rb");
|
||||
if (fin == NULL) {
|
||||
fprintf(stderr, "Can't open %s\n", argv[2]);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
fout = fopen(argv[3], "wb");
|
||||
if (fout == NULL) {
|
||||
fprintf(stderr, "Can't open %s\n", argv[3]);
|
||||
exit(1);
|
||||
}
|
||||
#ifdef USE_WEIGHTS_FILE
|
||||
data = load_blob(filename, &len);
|
||||
#endif
|
||||
if (mode == MODE_FEATURES) {
|
||||
LPCNetEncState *net;
|
||||
net = lpcnet_encoder_create();
|
||||
while (1) {
|
||||
float features[NB_TOTAL_FEATURES];
|
||||
opus_int16 pcm[LPCNET_FRAME_SIZE];
|
||||
size_t ret;
|
||||
ret = fread(pcm, sizeof(pcm[0]), LPCNET_FRAME_SIZE, fin);
|
||||
if (feof(fin) || ret != LPCNET_FRAME_SIZE) break;
|
||||
lpcnet_compute_single_frame_features(net, pcm, features, arch);
|
||||
fwrite(features, sizeof(float), NB_TOTAL_FEATURES, fout);
|
||||
}
|
||||
lpcnet_encoder_destroy(net);
|
||||
} else if (mode == MODE_FARGAN_SYNTHESIS) {
|
||||
FARGANState fargan;
|
||||
size_t ret, i;
|
||||
float in_features[5*NB_TOTAL_FEATURES];
|
||||
float zeros[320] = {0};
|
||||
fargan_init(&fargan);
|
||||
#ifdef USE_WEIGHTS_FILE
|
||||
fargan_load_model(&fargan, data, len);
|
||||
#endif
|
||||
/* uncomment the following to align with Python code */
|
||||
/*ret = fread(&in_features[0], sizeof(in_features[0]), NB_TOTAL_FEATURES, fin);*/
|
||||
for (i=0;i<5;i++) {
|
||||
ret = fread(&in_features[i*NB_FEATURES], sizeof(in_features[0]), NB_TOTAL_FEATURES, fin);
|
||||
}
|
||||
fargan_cont(&fargan, zeros, in_features);
|
||||
while (1) {
|
||||
float features[NB_FEATURES];
|
||||
float fpcm[LPCNET_FRAME_SIZE];
|
||||
opus_int16 pcm[LPCNET_FRAME_SIZE];
|
||||
ret = fread(in_features, sizeof(features[0]), NB_TOTAL_FEATURES, fin);
|
||||
if (feof(fin) || ret != NB_TOTAL_FEATURES) break;
|
||||
OPUS_COPY(features, in_features, NB_FEATURES);
|
||||
fargan_synthesize(&fargan, fpcm, features);
|
||||
for (i=0;i<LPCNET_FRAME_SIZE;i++) pcm[i] = (int)floor(.5 + MIN32(32767, MAX32(-32767, 32768.f*fpcm[i])));
|
||||
fwrite(pcm, sizeof(pcm[0]), LPCNET_FRAME_SIZE, fout);
|
||||
}
|
||||
} else if (mode == MODE_ADDLPC) {
|
||||
float features[36];
|
||||
size_t ret;
|
||||
|
||||
while (1) {
|
||||
ret = fread(features, sizeof(features[0]), 36, fin);
|
||||
if (ret != 36 || feof(fin)) break;
|
||||
lpc_from_cepstrum(&features[20], &features[0]);
|
||||
fwrite(features, sizeof(features[0]), 36, fout);
|
||||
}
|
||||
|
||||
} else {
|
||||
fprintf(stderr, "unknown action\n");
|
||||
}
|
||||
fclose(fin);
|
||||
fclose(fout);
|
||||
#ifdef USE_WEIGHTS_FILE
|
||||
free_blob(data, len);
|
||||
#endif
|
||||
return 0;
|
||||
}
|
||||
328
managed_components/78__esp-opus/dnn/freq.c
Normal file
328
managed_components/78__esp-opus/dnn/freq.c
Normal file
@@ -0,0 +1,328 @@
|
||||
/* Copyright (c) 2017-2018 Mozilla */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include "kiss_fft.h"
|
||||
#include <math.h>
|
||||
#include "freq.h"
|
||||
#include "pitch.h"
|
||||
#include "arch.h"
|
||||
#include "burg.h"
|
||||
#include <assert.h>
|
||||
#include "os_support.h"
|
||||
|
||||
#define SQUARE(x) ((x)*(x))
|
||||
|
||||
static const opus_int16 eband5ms[] = {
|
||||
/*0 200 400 600 800 1k 1.2 1.4 1.6 2k 2.4 2.8 3.2 4k 4.8 5.6 6.8 8k*/
|
||||
0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40
|
||||
};
|
||||
|
||||
static const float compensation[] = {
|
||||
0.8f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 0.666667f, 0.5f, 0.5f, 0.5f, 0.333333f, 0.25f, 0.25f, 0.2f, 0.166667f, 0.173913f
|
||||
};
|
||||
|
||||
|
||||
extern const kiss_fft_state kfft;
|
||||
extern const float half_window[OVERLAP_SIZE];
|
||||
extern const float dct_table[NB_BANDS*NB_BANDS];
|
||||
|
||||
|
||||
static void compute_band_energy_inverse(float *bandE, const kiss_fft_cpx *X) {
|
||||
int i;
|
||||
float sum[NB_BANDS] = {0};
|
||||
for (i=0;i<NB_BANDS-1;i++)
|
||||
{
|
||||
int j;
|
||||
int band_size;
|
||||
band_size = (eband5ms[i+1]-eband5ms[i])*WINDOW_SIZE_5MS;
|
||||
for (j=0;j<band_size;j++) {
|
||||
float tmp;
|
||||
float frac = (float)j/band_size;
|
||||
tmp = SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].r);
|
||||
tmp += SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].i);
|
||||
tmp = 1.f/(tmp + 1e-9);
|
||||
sum[i] += (1-frac)*tmp;
|
||||
sum[i+1] += frac*tmp;
|
||||
}
|
||||
}
|
||||
sum[0] *= 2;
|
||||
sum[NB_BANDS-1] *= 2;
|
||||
for (i=0;i<NB_BANDS;i++)
|
||||
{
|
||||
bandE[i] = sum[i];
|
||||
}
|
||||
}
|
||||
|
||||
static float lpcn_lpc(
|
||||
opus_val16 *lpc, /* out: [0...p-1] LPC coefficients */
|
||||
opus_val16 *rc,
|
||||
const opus_val32 *ac, /* in: [0...p] autocorrelation values */
|
||||
int p
|
||||
)
|
||||
{
|
||||
int i, j;
|
||||
opus_val32 r;
|
||||
opus_val32 error = ac[0];
|
||||
|
||||
OPUS_CLEAR(lpc, p);
|
||||
OPUS_CLEAR(rc, p);
|
||||
if (ac[0] != 0)
|
||||
{
|
||||
for (i = 0; i < p; i++) {
|
||||
/* Sum up this iteration's reflection coefficient */
|
||||
opus_val32 rr = 0;
|
||||
for (j = 0; j < i; j++)
|
||||
rr += MULT32_32_Q31(lpc[j],ac[i - j]);
|
||||
rr += SHR32(ac[i + 1],3);
|
||||
r = -SHL32(rr,3)/error;
|
||||
rc[i] = r;
|
||||
/* Update LPC coefficients and total error */
|
||||
lpc[i] = SHR32(r,3);
|
||||
for (j = 0; j < (i+1)>>1; j++)
|
||||
{
|
||||
opus_val32 tmp1, tmp2;
|
||||
tmp1 = lpc[j];
|
||||
tmp2 = lpc[i-1-j];
|
||||
lpc[j] = tmp1 + MULT32_32_Q31(r,tmp2);
|
||||
lpc[i-1-j] = tmp2 + MULT32_32_Q31(r,tmp1);
|
||||
}
|
||||
|
||||
error = error - MULT32_32_Q31(MULT32_32_Q31(r,r),error);
|
||||
/* Bail out once we get 30 dB gain */
|
||||
if (error<.001f*ac[0])
|
||||
break;
|
||||
}
|
||||
}
|
||||
return error;
|
||||
}
|
||||
|
||||
|
||||
|
||||
void lpcn_compute_band_energy(float *bandE, const kiss_fft_cpx *X) {
|
||||
int i;
|
||||
float sum[NB_BANDS] = {0};
|
||||
for (i=0;i<NB_BANDS-1;i++)
|
||||
{
|
||||
int j;
|
||||
int band_size;
|
||||
band_size = (eband5ms[i+1]-eband5ms[i])*WINDOW_SIZE_5MS;
|
||||
for (j=0;j<band_size;j++) {
|
||||
float tmp;
|
||||
float frac = (float)j/band_size;
|
||||
tmp = SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].r);
|
||||
tmp += SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].i);
|
||||
sum[i] += (1-frac)*tmp;
|
||||
sum[i+1] += frac*tmp;
|
||||
}
|
||||
}
|
||||
sum[0] *= 2;
|
||||
sum[NB_BANDS-1] *= 2;
|
||||
for (i=0;i<NB_BANDS;i++)
|
||||
{
|
||||
bandE[i] = sum[i];
|
||||
}
|
||||
}
|
||||
|
||||
static void compute_burg_cepstrum(const float *pcm, float *burg_cepstrum, int len, int order) {
|
||||
int i;
|
||||
float burg_in[FRAME_SIZE];
|
||||
float burg_lpc[LPC_ORDER];
|
||||
float x[WINDOW_SIZE];
|
||||
float Eburg[NB_BANDS];
|
||||
float g;
|
||||
kiss_fft_cpx LPC[FREQ_SIZE];
|
||||
float Ly[NB_BANDS];
|
||||
float logMax = -2;
|
||||
float follow = -2;
|
||||
assert(order <= LPC_ORDER);
|
||||
assert(len <= FRAME_SIZE);
|
||||
for (i=0;i<len-1;i++) burg_in[i] = pcm[i+1] - PREEMPHASIS*pcm[i];
|
||||
g = silk_burg_analysis(burg_lpc, burg_in, 1e-3, len-1, 1, order);
|
||||
g /= len - 2*(order-1);
|
||||
OPUS_CLEAR(x, WINDOW_SIZE);
|
||||
x[0] = 1;
|
||||
for (i=0;i<order;i++) x[i+1] = -burg_lpc[i]*pow(.995, i+1);
|
||||
forward_transform(LPC, x);
|
||||
compute_band_energy_inverse(Eburg, LPC);
|
||||
for (i=0;i<NB_BANDS;i++) Eburg[i] *= .45*g*(1.f/((float)WINDOW_SIZE*WINDOW_SIZE*WINDOW_SIZE));
|
||||
for (i=0;i<NB_BANDS;i++) {
|
||||
Ly[i] = log10(1e-2+Eburg[i]);
|
||||
Ly[i] = MAX16(logMax-8, MAX16(follow-2.5, Ly[i]));
|
||||
logMax = MAX16(logMax, Ly[i]);
|
||||
follow = MAX16(follow-2.5, Ly[i]);
|
||||
}
|
||||
dct(burg_cepstrum, Ly);
|
||||
burg_cepstrum[0] += - 4;
|
||||
}
|
||||
|
||||
void burg_cepstral_analysis(float *ceps, const float *x) {
|
||||
int i;
|
||||
compute_burg_cepstrum(x, &ceps[0 ], FRAME_SIZE/2, LPC_ORDER);
|
||||
compute_burg_cepstrum(&x[FRAME_SIZE/2], &ceps[NB_BANDS], FRAME_SIZE/2, LPC_ORDER);
|
||||
for (i=0;i<NB_BANDS;i++) {
|
||||
float c0, c1;
|
||||
c0 = ceps[i];
|
||||
c1 = ceps[NB_BANDS+i];
|
||||
ceps[i ] = .5*(c0+c1);
|
||||
ceps[NB_BANDS+i] = (c0-c1);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
static void interp_band_gain(float *g, const float *bandE) {
|
||||
int i;
|
||||
memset(g, 0, FREQ_SIZE);
|
||||
for (i=0;i<NB_BANDS-1;i++)
|
||||
{
|
||||
int j;
|
||||
int band_size;
|
||||
band_size = (eband5ms[i+1]-eband5ms[i])*WINDOW_SIZE_5MS;
|
||||
for (j=0;j<band_size;j++) {
|
||||
float frac = (float)j/band_size;
|
||||
g[(eband5ms[i]*WINDOW_SIZE_5MS) + j] = (1-frac)*bandE[i] + frac*bandE[i+1];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void dct(float *out, const float *in) {
|
||||
int i;
|
||||
for (i=0;i<NB_BANDS;i++) {
|
||||
int j;
|
||||
float sum = 0;
|
||||
for (j=0;j<NB_BANDS;j++) {
|
||||
sum += in[j] * dct_table[j*NB_BANDS + i];
|
||||
}
|
||||
out[i] = sum*sqrt(2./NB_BANDS);
|
||||
}
|
||||
}
|
||||
|
||||
static void idct(float *out, const float *in) {
|
||||
int i;
|
||||
for (i=0;i<NB_BANDS;i++) {
|
||||
int j;
|
||||
float sum = 0;
|
||||
for (j=0;j<NB_BANDS;j++) {
|
||||
sum += in[j] * dct_table[i*NB_BANDS + j];
|
||||
}
|
||||
out[i] = sum*sqrt(2./NB_BANDS);
|
||||
}
|
||||
}
|
||||
|
||||
void forward_transform(kiss_fft_cpx *out, const float *in) {
|
||||
int i;
|
||||
kiss_fft_cpx x[WINDOW_SIZE];
|
||||
kiss_fft_cpx y[WINDOW_SIZE];
|
||||
for (i=0;i<WINDOW_SIZE;i++) {
|
||||
x[i].r = in[i];
|
||||
x[i].i = 0;
|
||||
}
|
||||
opus_fft(&kfft, x, y, 0);
|
||||
for (i=0;i<FREQ_SIZE;i++) {
|
||||
out[i] = y[i];
|
||||
}
|
||||
}
|
||||
|
||||
static void inverse_transform(float *out, const kiss_fft_cpx *in) {
|
||||
int i;
|
||||
kiss_fft_cpx x[WINDOW_SIZE];
|
||||
kiss_fft_cpx y[WINDOW_SIZE];
|
||||
for (i=0;i<FREQ_SIZE;i++) {
|
||||
x[i] = in[i];
|
||||
}
|
||||
for (;i<WINDOW_SIZE;i++) {
|
||||
x[i].r = x[WINDOW_SIZE - i].r;
|
||||
x[i].i = -x[WINDOW_SIZE - i].i;
|
||||
}
|
||||
opus_fft(&kfft, x, y, 0);
|
||||
/* output in reverse order for IFFT. */
|
||||
out[0] = WINDOW_SIZE*y[0].r;
|
||||
for (i=1;i<WINDOW_SIZE;i++) {
|
||||
out[i] = WINDOW_SIZE*y[WINDOW_SIZE - i].r;
|
||||
}
|
||||
}
|
||||
|
||||
static float lpc_from_bands(float *lpc, const float *Ex)
|
||||
{
|
||||
int i;
|
||||
float e;
|
||||
float ac[LPC_ORDER+1];
|
||||
float rc[LPC_ORDER];
|
||||
float Xr[FREQ_SIZE];
|
||||
kiss_fft_cpx X_auto[FREQ_SIZE];
|
||||
float x_auto[WINDOW_SIZE];
|
||||
interp_band_gain(Xr, Ex);
|
||||
Xr[FREQ_SIZE-1] = 0;
|
||||
OPUS_CLEAR(X_auto, FREQ_SIZE);
|
||||
for (i=0;i<FREQ_SIZE;i++) X_auto[i].r = Xr[i];
|
||||
inverse_transform(x_auto, X_auto);
|
||||
for (i=0;i<LPC_ORDER+1;i++) ac[i] = x_auto[i];
|
||||
|
||||
/* -40 dB noise floor. */
|
||||
ac[0] += ac[0]*1e-4 + 320/12/38.;
|
||||
/* Lag windowing. */
|
||||
for (i=1;i<LPC_ORDER+1;i++) ac[i] *= (1 - 6e-5*i*i);
|
||||
e = lpcn_lpc(lpc, rc, ac, LPC_ORDER);
|
||||
return e;
|
||||
}
|
||||
|
||||
void lpc_weighting(float *lpc, float gamma)
|
||||
{
|
||||
int i;
|
||||
float gamma_i = gamma;
|
||||
for (i = 0; i < LPC_ORDER; i++)
|
||||
{
|
||||
lpc[i] *= gamma_i;
|
||||
gamma_i *= gamma;
|
||||
}
|
||||
}
|
||||
|
||||
float lpc_from_cepstrum(float *lpc, const float *cepstrum)
|
||||
{
|
||||
int i;
|
||||
float Ex[NB_BANDS];
|
||||
float tmp[NB_BANDS];
|
||||
OPUS_COPY(tmp, cepstrum, NB_BANDS);
|
||||
tmp[0] += 4;
|
||||
idct(Ex, tmp);
|
||||
for (i=0;i<NB_BANDS;i++) Ex[i] = pow(10.f, Ex[i])*compensation[i];
|
||||
return lpc_from_bands(lpc, Ex);
|
||||
}
|
||||
|
||||
void apply_window(float *x) {
|
||||
int i;
|
||||
for (i=0;i<OVERLAP_SIZE;i++) {
|
||||
x[i] *= half_window[i];
|
||||
x[WINDOW_SIZE - 1 - i] *= half_window[i];
|
||||
}
|
||||
}
|
||||
61
managed_components/78__esp-opus/dnn/freq.h
Normal file
61
managed_components/78__esp-opus/dnn/freq.h
Normal file
@@ -0,0 +1,61 @@
|
||||
/* Copyright (c) 2017-2018 Mozilla */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef FREQ_H
|
||||
#define FREQ_H
|
||||
|
||||
#include "kiss_fft.h"
|
||||
|
||||
#define LPC_ORDER 16
|
||||
|
||||
#define PREEMPHASIS (0.85f)
|
||||
|
||||
#define FRAME_SIZE_5MS (2)
|
||||
#define OVERLAP_SIZE_5MS (2)
|
||||
#define TRAINING_OFFSET_5MS (1)
|
||||
|
||||
#define WINDOW_SIZE_5MS (FRAME_SIZE_5MS + OVERLAP_SIZE_5MS)
|
||||
|
||||
#define FRAME_SIZE (80*FRAME_SIZE_5MS)
|
||||
#define OVERLAP_SIZE (80*OVERLAP_SIZE_5MS)
|
||||
#define TRAINING_OFFSET (80*TRAINING_OFFSET_5MS)
|
||||
#define WINDOW_SIZE (FRAME_SIZE + OVERLAP_SIZE)
|
||||
#define FREQ_SIZE (WINDOW_SIZE/2 + 1)
|
||||
|
||||
#define NB_BANDS 18
|
||||
#define NB_BANDS_1 (NB_BANDS - 1)
|
||||
|
||||
void lpcn_compute_band_energy(float *bandE, const kiss_fft_cpx *X);
|
||||
void burg_cepstral_analysis(float *ceps, const float *x);
|
||||
|
||||
void apply_window(float *x);
|
||||
void dct(float *out, const float *in);
|
||||
void forward_transform(kiss_fft_cpx *out, const float *in);
|
||||
float lpc_from_cepstrum(float *lpc, const float *cepstrum);
|
||||
void apply_window(float *x);
|
||||
void lpc_weighting(float *lpc, float gamma);
|
||||
|
||||
#endif
|
||||
322
managed_components/78__esp-opus/dnn/fwgan.c
Normal file
322
managed_components/78__esp-opus/dnn/fwgan.c
Normal file
@@ -0,0 +1,322 @@
|
||||
/* Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "fwgan.h"
|
||||
#include "os_support.h"
|
||||
#include "freq.h"
|
||||
#include "fwgan_data.h"
|
||||
#include "lpcnet.h"
|
||||
#include "pitch.h"
|
||||
#include "nnet.h"
|
||||
#include "lpcnet_private.h"
|
||||
|
||||
#define FEAT_IN_SIZE (BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4 + FWGAN_FRAME_SIZE/2)
|
||||
|
||||
#define FWGAN_FEATURES (NB_FEATURES-1)
|
||||
|
||||
static void pitch_embeddings(float *pembed, float *phase, double w0) {
|
||||
int i;
|
||||
float wreal, wimag;
|
||||
#if 1
|
||||
/* This Taylor expansion should be good enough since w0 is always small. */
|
||||
float w2 = w0*w0;
|
||||
wreal = 1 - .5*w2*(1.f - 0.083333333f*w2);
|
||||
wimag = w0*(1 - 0.166666667f*w2*(1.f - 0.05f*w2));
|
||||
#else
|
||||
wreal = cos(w0);
|
||||
wimag = sin(w0);
|
||||
#endif
|
||||
/* Speed-up phase reference by making phase a unit-norm complex value and rotating it
|
||||
by exp(-i*w0) each sample. */
|
||||
for (i=0;i<SUBFRAME_SIZE;i++) {
|
||||
float tmp;
|
||||
tmp = phase[0]*wreal - phase[1]*wimag;
|
||||
phase[1] = phase[0]*wimag + phase[1]*wreal;
|
||||
phase[0] = tmp;
|
||||
pembed[i] = phase[1];
|
||||
pembed[SUBFRAME_SIZE+i] = phase[0];
|
||||
}
|
||||
/* Renormalize once per sub-frame, though we could probably do it even less frequently. */
|
||||
{
|
||||
float r = 1.f/sqrt(phase[0]*phase[0] + phase[1]*phase[1]);
|
||||
phase[0] *= r;
|
||||
phase[1] *= r;
|
||||
}
|
||||
}
|
||||
|
||||
static void compute_wlpc(float lpc[LPC_ORDER], const float *features) {
|
||||
float lpc_weight;
|
||||
int i;
|
||||
lpc_from_cepstrum(lpc, features);
|
||||
lpc_weight = 1.f;
|
||||
for (i=0;i<LPC_ORDER;i++) {
|
||||
lpc_weight *= FWGAN_GAMMA;
|
||||
lpc[i] *= lpc_weight;
|
||||
}
|
||||
}
|
||||
|
||||
static void run_fwgan_upsampler(FWGANState *st, float *cond, const float *features)
|
||||
{
|
||||
FWGAN *model;
|
||||
model = &st->model;
|
||||
celt_assert(FWGAN_FEATURES == model->bfcc_with_corr_upsampler_fc.nb_inputs);
|
||||
celt_assert(BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE == model->bfcc_with_corr_upsampler_fc.nb_outputs);
|
||||
compute_generic_dense(&model->bfcc_with_corr_upsampler_fc, cond, features, ACTIVATION_TANH);
|
||||
}
|
||||
|
||||
static void fwgan_synthesize_impl(FWGANState *st, float *pcm, const float *lpc, const float *features);
|
||||
void fwgan_cont(FWGANState *st, const float *pcm0, const float *features0)
|
||||
{
|
||||
int i;
|
||||
float norm2, norm_1;
|
||||
float wpcm0[CONT_PCM_INPUTS];
|
||||
float cont_inputs[CONT_PCM_INPUTS+1];
|
||||
float tmp1[MAX_CONT_SIZE];
|
||||
float tmp2[MAX_CONT_SIZE];
|
||||
float lpc[LPC_ORDER];
|
||||
float new_pcm[FWGAN_FRAME_SIZE];
|
||||
FWGAN *model;
|
||||
st->embed_phase[0] = 1;
|
||||
model = &st->model;
|
||||
compute_wlpc(lpc, features0);
|
||||
/* Deemphasis memory is just the last continuation sample. */
|
||||
st->deemph_mem = pcm0[CONT_PCM_INPUTS-1];
|
||||
|
||||
/* Apply analysis filter, considering that the preemphasis and deemphasis filter
|
||||
cancel each other in this case since the LPC filter is constant across that boundary.
|
||||
*/
|
||||
for (i=LPC_ORDER;i<CONT_PCM_INPUTS;i++) {
|
||||
int j;
|
||||
wpcm0[i] = pcm0[i];
|
||||
for (j=0;j<LPC_ORDER;j++) wpcm0[i] += lpc[j]*pcm0[i-j-1];
|
||||
}
|
||||
/* FIXME: Make this less stupid. */
|
||||
for (i=0;i<LPC_ORDER;i++) wpcm0[i] = wpcm0[LPC_ORDER];
|
||||
|
||||
/* The memory of the pre-empahsis is the last sample of the weighted signal
|
||||
(ignoring preemphasis+deemphasis combination). */
|
||||
st->preemph_mem = wpcm0[CONT_PCM_INPUTS-1];
|
||||
/* The memory of the synthesis filter is the pre-emphasized continuation. */
|
||||
for (i=0;i<LPC_ORDER;i++) st->syn_mem[i] = pcm0[CONT_PCM_INPUTS-1-i] - FWGAN_DEEMPHASIS*pcm0[CONT_PCM_INPUTS-2-i];
|
||||
|
||||
norm2 = celt_inner_prod(wpcm0, wpcm0, CONT_PCM_INPUTS, st->arch);
|
||||
norm_1 = 1.f/sqrt(1e-8f + norm2);
|
||||
for (i=0;i<CONT_PCM_INPUTS;i++) cont_inputs[i+1] = norm_1*wpcm0[i];
|
||||
cont_inputs[0] = log(sqrt(norm2) + 1e-7f);
|
||||
|
||||
/* Continuation network */
|
||||
compute_generic_dense(&model->cont_net_0, tmp1, cont_inputs, ACTIVATION_TANH);
|
||||
compute_generic_dense(&model->cont_net_2, tmp2, tmp1, ACTIVATION_TANH);
|
||||
compute_generic_dense(&model->cont_net_4, tmp1, tmp2, ACTIVATION_TANH);
|
||||
compute_generic_dense(&model->cont_net_6, tmp2, tmp1, ACTIVATION_TANH);
|
||||
compute_generic_dense(&model->cont_net_8, tmp1, tmp2, ACTIVATION_TANH);
|
||||
celt_assert(CONT_NET_10_OUT_SIZE == model->cont_net_10.nb_outputs);
|
||||
compute_generic_dense(&model->cont_net_10, st->cont, tmp1, ACTIVATION_TANH);
|
||||
|
||||
/* Computing continuation for each layer. */
|
||||
celt_assert(RNN_GRU_STATE_SIZE == model->rnn_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->rnn_cont_fc_0, st->rnn_state, st->cont, ACTIVATION_TANH);
|
||||
|
||||
celt_assert(FWC1_STATE_SIZE == model->fwc1_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->fwc1_cont_fc_0, st->fwc1_state, st->cont, ACTIVATION_TANH);
|
||||
celt_assert(FWC2_STATE_SIZE == model->fwc2_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->fwc2_cont_fc_0, st->fwc2_state, st->cont, ACTIVATION_TANH);
|
||||
celt_assert(FWC3_STATE_SIZE == model->fwc3_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->fwc3_cont_fc_0, st->fwc3_state, st->cont, ACTIVATION_TANH);
|
||||
celt_assert(FWC4_STATE_SIZE == model->fwc4_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->fwc4_cont_fc_0, st->fwc4_state, st->cont, ACTIVATION_TANH);
|
||||
celt_assert(FWC5_STATE_SIZE == model->fwc5_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->fwc5_cont_fc_0, st->fwc5_state, st->cont, ACTIVATION_TANH);
|
||||
celt_assert(FWC6_STATE_SIZE == model->fwc6_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->fwc6_cont_fc_0, st->fwc6_state, st->cont, ACTIVATION_TANH);
|
||||
celt_assert(FWC7_STATE_SIZE == model->fwc7_cont_fc_0.nb_outputs);
|
||||
compute_generic_dense(&model->fwc7_cont_fc_0, st->fwc7_state, st->cont, ACTIVATION_TANH);
|
||||
|
||||
st->cont_initialized = 1;
|
||||
/* Process the first frame, discard the first subframe, and keep the rest for the first
|
||||
synthesis call. */
|
||||
fwgan_synthesize_impl(st, new_pcm, lpc, features0);
|
||||
OPUS_COPY(st->pcm_buf, &new_pcm[SUBFRAME_SIZE], FWGAN_FRAME_SIZE-SUBFRAME_SIZE);
|
||||
}
|
||||
|
||||
static void apply_gain(float *pcm, float c0, float *last_gain) {
|
||||
int i;
|
||||
float gain = pow(10.f, (0.5f*c0/sqrt(18.f)));
|
||||
for (i=0;i<SUBFRAME_SIZE;i++) pcm[i] *= *last_gain;
|
||||
*last_gain = gain;
|
||||
}
|
||||
|
||||
static void fwgan_lpc_syn(float *pcm, float *mem, const float *lpc, float last_lpc[LPC_ORDER]) {
|
||||
int i;
|
||||
for (i=0;i<SUBFRAME_SIZE;i++) {
|
||||
int j;
|
||||
for (j=0;j<LPC_ORDER;j++) pcm[i] -= mem[j]*last_lpc[j];
|
||||
OPUS_MOVE(&mem[1], &mem[0], LPC_ORDER-1);
|
||||
mem[0] = pcm[i];
|
||||
}
|
||||
OPUS_COPY(last_lpc, lpc, LPC_ORDER);
|
||||
}
|
||||
|
||||
static void fwgan_preemphasis(float *pcm, float *preemph_mem) {
|
||||
int i;
|
||||
for (i=0;i<SUBFRAME_SIZE;i++) {
|
||||
float tmp = pcm[i];
|
||||
pcm[i] -= FWGAN_DEEMPHASIS * *preemph_mem;
|
||||
*preemph_mem = tmp;
|
||||
}
|
||||
}
|
||||
|
||||
static void fwgan_deemphasis(float *pcm, float *deemph_mem) {
|
||||
int i;
|
||||
for (i=0;i<SUBFRAME_SIZE;i++) {
|
||||
pcm[i] += FWGAN_DEEMPHASIS * *deemph_mem;
|
||||
*deemph_mem = pcm[i];
|
||||
}
|
||||
}
|
||||
|
||||
static void run_fwgan_subframe(FWGANState *st, float *pcm, const float *cond, double w0, const float *lpc, float c0)
|
||||
{
|
||||
float tmp1[FWC1_FC_0_OUT_SIZE];
|
||||
float tmp2[IMAX(RNN_GRU_STATE_SIZE, FWC2_FC_0_OUT_SIZE)];
|
||||
float feat_in[FEAT_IN_SIZE];
|
||||
float rnn_in[FEAT_IN_CONV1_CONV_OUT_SIZE];
|
||||
float pembed[FWGAN_FRAME_SIZE/2];
|
||||
FWGAN *model;
|
||||
model = &st->model;
|
||||
|
||||
pitch_embeddings(pembed, st->embed_phase, w0);
|
||||
/* Interleave bfcc_cond and pembed for each subframe in feat_in. */
|
||||
OPUS_COPY(&feat_in[BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4], &cond[0], BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4);
|
||||
OPUS_COPY(&feat_in[0], &pembed[0], FWGAN_FRAME_SIZE/2);
|
||||
|
||||
compute_generic_conv1d(&model->feat_in_conv1_conv, rnn_in, st->cont_conv1_mem, feat_in, FEAT_IN_CONV1_CONV_IN_SIZE, ACTIVATION_LINEAR);
|
||||
celt_assert(FEAT_IN_NL1_GATE_OUT_SIZE == model->feat_in_nl1_gate.nb_outputs);
|
||||
compute_gated_activation(&model->feat_in_nl1_gate, rnn_in, rnn_in, ACTIVATION_TANH);
|
||||
|
||||
if (st->cont_initialized == 1) {
|
||||
/* On the very first subframe we stop here. We only want to run the feat_in layer since the
|
||||
others are initialized via the continuation network. */
|
||||
OPUS_CLEAR(pcm, SUBFRAME_SIZE);
|
||||
st->cont_initialized = 2;
|
||||
apply_gain(pcm, c0, &st->last_gain);
|
||||
OPUS_COPY(st->last_lpc, lpc, LPC_ORDER);
|
||||
return;
|
||||
}
|
||||
|
||||
compute_generic_gru(&model->rnn_gru_input, &model->rnn_gru_recurrent, st->rnn_state, rnn_in);
|
||||
celt_assert(IMAX(RNN_GRU_STATE_SIZE, FWC2_FC_0_OUT_SIZE) >= model->rnn_nl_gate.nb_outputs);
|
||||
compute_gated_activation(&model->rnn_nl_gate, tmp2, st->rnn_state, ACTIVATION_TANH);
|
||||
|
||||
compute_generic_conv1d(&model->fwc1_fc_0, tmp1, st->fwc1_state, tmp2, RNN_GRU_STATE_SIZE, ACTIVATION_LINEAR);
|
||||
compute_gated_activation(&model->fwc1_fc_1_gate, tmp1, tmp1, ACTIVATION_TANH);
|
||||
|
||||
compute_generic_conv1d(&model->fwc2_fc_0, tmp2, st->fwc2_state, tmp1, FWC1_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
|
||||
compute_gated_activation(&model->fwc2_fc_1_gate, tmp2, tmp2, ACTIVATION_TANH);
|
||||
|
||||
compute_generic_conv1d(&model->fwc3_fc_0, tmp1, st->fwc3_state, tmp2, FWC2_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
|
||||
compute_gated_activation(&model->fwc3_fc_1_gate, tmp1, tmp1, ACTIVATION_TANH);
|
||||
|
||||
compute_generic_conv1d(&model->fwc4_fc_0, tmp2, st->fwc4_state, tmp1, FWC3_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
|
||||
compute_gated_activation(&model->fwc4_fc_1_gate, tmp2, tmp2, ACTIVATION_TANH);
|
||||
|
||||
compute_generic_conv1d(&model->fwc5_fc_0, tmp1, st->fwc5_state, tmp2, FWC4_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
|
||||
compute_gated_activation(&model->fwc5_fc_1_gate, tmp1, tmp1, ACTIVATION_TANH);
|
||||
|
||||
compute_generic_conv1d(&model->fwc6_fc_0, tmp2, st->fwc6_state, tmp1, FWC5_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
|
||||
compute_gated_activation(&model->fwc6_fc_1_gate, tmp2, tmp2, ACTIVATION_TANH);
|
||||
|
||||
compute_generic_conv1d(&model->fwc7_fc_0, tmp1, st->fwc7_state, tmp2, FWC6_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
|
||||
compute_gated_activation(&model->fwc7_fc_1_gate, pcm, tmp1, ACTIVATION_TANH);
|
||||
|
||||
apply_gain(pcm, c0, &st->last_gain);
|
||||
fwgan_preemphasis(pcm, &st->preemph_mem);
|
||||
fwgan_lpc_syn(pcm, st->syn_mem, lpc, st->last_lpc);
|
||||
fwgan_deemphasis(pcm, &st->deemph_mem);
|
||||
}
|
||||
|
||||
void fwgan_init(FWGANState *st)
|
||||
{
|
||||
int ret;
|
||||
OPUS_CLEAR(st, 1);
|
||||
ret = init_fwgan(&st->model, fwgan_arrays);
|
||||
celt_assert(ret == 0);
|
||||
/* FIXME: perform arch detection. */
|
||||
}
|
||||
|
||||
int fwgan_load_model(FWGANState *st, const unsigned char *data, int len) {
|
||||
WeightArray *list;
|
||||
int ret;
|
||||
parse_weights(&list, data, len);
|
||||
ret = init_fwgan(&st->model, list);
|
||||
opus_free(list);
|
||||
if (ret == 0) return 0;
|
||||
else return -1;
|
||||
}
|
||||
|
||||
static void fwgan_synthesize_impl(FWGANState *st, float *pcm, const float *lpc, const float *features)
|
||||
{
|
||||
int subframe;
|
||||
float cond[BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE];
|
||||
double w0;
|
||||
int period;
|
||||
float fwgan_features[NB_FEATURES-1];
|
||||
celt_assert(st->cont_initialized);
|
||||
OPUS_COPY(fwgan_features, features, NB_FEATURES-2);
|
||||
fwgan_features[NB_FEATURES-2] = features[NB_FEATURES-1]+.5;
|
||||
|
||||
period = (int)floor(.1 + 50*features[NB_BANDS]+100);
|
||||
w0 = 2*M_PI/period;
|
||||
run_fwgan_upsampler(st, cond, fwgan_features);
|
||||
for (subframe=0;subframe<NB_SUBFRAMES;subframe++) {
|
||||
float *sub_cond;
|
||||
sub_cond = &cond[subframe*BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4];
|
||||
run_fwgan_subframe(st, &pcm[subframe*SUBFRAME_SIZE], sub_cond, w0, lpc, features[0]);
|
||||
}
|
||||
}
|
||||
|
||||
void fwgan_synthesize(FWGANState *st, float *pcm, const float *features)
|
||||
{
|
||||
float lpc[LPC_ORDER];
|
||||
float new_pcm[FWGAN_FRAME_SIZE];
|
||||
compute_wlpc(lpc, features);
|
||||
fwgan_synthesize_impl(st, new_pcm, lpc, features);
|
||||
/* Handle buffering. */
|
||||
OPUS_COPY(pcm, st->pcm_buf, FWGAN_FRAME_SIZE-SUBFRAME_SIZE);
|
||||
OPUS_COPY(&pcm[FWGAN_FRAME_SIZE-SUBFRAME_SIZE], new_pcm, SUBFRAME_SIZE);
|
||||
OPUS_COPY(st->pcm_buf, &new_pcm[SUBFRAME_SIZE], FWGAN_FRAME_SIZE-SUBFRAME_SIZE);
|
||||
}
|
||||
|
||||
void fwgan_synthesize_int(FWGANState *st, opus_int16 *pcm, const float *features)
|
||||
{
|
||||
int i;
|
||||
float fpcm[FWGAN_FRAME_SIZE];
|
||||
fwgan_synthesize(st, fpcm, features);
|
||||
for (i=0;i<LPCNET_FRAME_SIZE;i++) pcm[i] = (int)floor(.5 + MIN32(32767, MAX32(-32767, 32768.f*fpcm[i])));
|
||||
}
|
||||
83
managed_components/78__esp-opus/dnn/fwgan.h
Normal file
83
managed_components/78__esp-opus/dnn/fwgan.h
Normal file
@@ -0,0 +1,83 @@
|
||||
/* Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef FWGAN_H
|
||||
#define FWGAN_H
|
||||
|
||||
#include "freq.h"
|
||||
#include "fwgan_data.h"
|
||||
|
||||
#define FWGAN_CONT_SAMPLES 320
|
||||
#define NB_SUBFRAMES 4
|
||||
#define SUBFRAME_SIZE 40
|
||||
#define FWGAN_FRAME_SIZE (NB_SUBFRAMES*SUBFRAME_SIZE)
|
||||
#define CONT_PCM_INPUTS 320
|
||||
#define MAX_CONT_SIZE CONT_NET_0_OUT_SIZE
|
||||
#define FWGAN_GAMMA 0.92f
|
||||
#define FWGAN_DEEMPHASIS 0.85f
|
||||
|
||||
/* FIXME: Derive those from the model rather than hardcoding. */
|
||||
#define FWC1_STATE_SIZE 512
|
||||
#define FWC2_STATE_SIZE 512
|
||||
#define FWC3_STATE_SIZE 256
|
||||
#define FWC4_STATE_SIZE 256
|
||||
#define FWC5_STATE_SIZE 128
|
||||
#define FWC6_STATE_SIZE 128
|
||||
#define FWC7_STATE_SIZE 80
|
||||
|
||||
typedef struct {
|
||||
FWGAN model;
|
||||
int arch;
|
||||
int cont_initialized;
|
||||
float embed_phase[2];
|
||||
float last_gain;
|
||||
float last_lpc[LPC_ORDER];
|
||||
float syn_mem[LPC_ORDER];
|
||||
float preemph_mem;
|
||||
float deemph_mem;
|
||||
float pcm_buf[FWGAN_FRAME_SIZE];
|
||||
float cont[CONT_NET_10_OUT_SIZE];
|
||||
float cont_conv1_mem[FEAT_IN_CONV1_CONV_STATE_SIZE];
|
||||
float rnn_state[RNN_GRU_STATE_SIZE];
|
||||
float fwc1_state[FWC1_STATE_SIZE];
|
||||
float fwc2_state[FWC2_STATE_SIZE];
|
||||
float fwc3_state[FWC3_STATE_SIZE];
|
||||
float fwc4_state[FWC4_STATE_SIZE];
|
||||
float fwc5_state[FWC5_STATE_SIZE];
|
||||
float fwc6_state[FWC6_STATE_SIZE];
|
||||
float fwc7_state[FWC7_STATE_SIZE];
|
||||
} FWGANState;
|
||||
|
||||
void fwgan_init(FWGANState *st);
|
||||
int fwgan_load_model(FWGANState *st, const unsigned char *data, int len);
|
||||
|
||||
void fwgan_cont(FWGANState *st, const float *pcm0, const float *features0);
|
||||
|
||||
void fwgan_synthesize(FWGANState *st, float *pcm, const float *features);
|
||||
void fwgan_synthesize_int(FWGANState *st, opus_int16 *pcm, const float *features);
|
||||
|
||||
|
||||
#endif /* FWGAN_H */
|
||||
81
managed_components/78__esp-opus/dnn/kiss99.c
Normal file
81
managed_components/78__esp-opus/dnn/kiss99.c
Normal file
@@ -0,0 +1,81 @@
|
||||
/*Daala video codec
|
||||
Copyright (c) 2012 Daala project contributors. All rights reserved.
|
||||
Author: Timothy B. Terriberry
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "kiss99.h"
|
||||
|
||||
void kiss99_srand(kiss99_ctx *_this,const unsigned char *_data,int _ndata){
|
||||
int i;
|
||||
_this->z=362436069;
|
||||
_this->w=521288629;
|
||||
_this->jsr=123456789;
|
||||
_this->jcong=380116160;
|
||||
for(i=3;i<_ndata;i+=4){
|
||||
_this->z^=_data[i-3];
|
||||
_this->w^=_data[i-2];
|
||||
_this->jsr^=_data[i-1];
|
||||
_this->jcong^=_data[i];
|
||||
kiss99_rand(_this);
|
||||
}
|
||||
if(i-3<_ndata)_this->z^=_data[i-3];
|
||||
if(i-2<_ndata)_this->w^=_data[i-2];
|
||||
if(i-1<_ndata)_this->jsr^=_data[i-1];
|
||||
/*Fix any potential short cycles that show up.
|
||||
These are not too likely, given the way we initialize the state, but they
|
||||
are technically possible, so let us go ahead and eliminate that
|
||||
possibility.
|
||||
See Gregory G. Rose: "KISS: A Bit Too Simple", Cryptographic Communications
|
||||
No. 10, pp. 123---137, 2018.*/
|
||||
if(_this->z==0||_this->z==0x9068FFFF)_this->z++;
|
||||
if(_this->w==0||_this->w==0x464FFFFF)_this->w++;
|
||||
if(_this->jsr==0)_this->jsr++;
|
||||
}
|
||||
|
||||
uint32_t kiss99_rand(kiss99_ctx *_this){
|
||||
uint32_t znew;
|
||||
uint32_t wnew;
|
||||
uint32_t mwc;
|
||||
uint32_t shr3;
|
||||
uint32_t cong;
|
||||
znew=36969*(_this->z&0xFFFF)+(_this->z>>16);
|
||||
wnew=18000*(_this->w&0xFFFF)+(_this->w>>16);
|
||||
mwc=(znew<<16)+wnew;
|
||||
/*We swap the 13 and 17 from the original 1999 algorithm to produce a single
|
||||
cycle of maximal length, matching KISS11.
|
||||
We are not actually using KISS11 because of the impractically large (16 MB)
|
||||
internal state of the full algorithm.*/
|
||||
shr3=_this->jsr^(_this->jsr<<13);
|
||||
shr3^=shr3>>17;
|
||||
shr3^=shr3<<5;
|
||||
cong=69069*_this->jcong+1234567;
|
||||
_this->z=znew;
|
||||
_this->w=wnew;
|
||||
_this->jsr=shr3;
|
||||
_this->jcong=cong;
|
||||
return (mwc^cong)+shr3;
|
||||
}
|
||||
46
managed_components/78__esp-opus/dnn/kiss99.h
Normal file
46
managed_components/78__esp-opus/dnn/kiss99.h
Normal file
@@ -0,0 +1,46 @@
|
||||
/*Daala video codec
|
||||
Copyright (c) 2012 Daala project contributors. All rights reserved.
|
||||
Author: Timothy B. Terriberry
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.*/
|
||||
|
||||
#if !defined(_kiss99_H)
|
||||
# define _kiss99_H (1)
|
||||
# include <stdint.h>
|
||||
|
||||
/*KISS PRNG from George Marsaglia (1999 version).
|
||||
See https://en.wikipedia.org/wiki/KISS_(algorithm) for details.
|
||||
This is suitable for simulations, but not for use in crytographic contexts.*/
|
||||
|
||||
typedef struct kiss99_ctx kiss99_ctx;
|
||||
|
||||
struct kiss99_ctx{
|
||||
uint32_t z;
|
||||
uint32_t w;
|
||||
uint32_t jsr;
|
||||
uint32_t jcong;
|
||||
};
|
||||
|
||||
void kiss99_srand(kiss99_ctx *_this,const unsigned char *_data,int _ndata);
|
||||
uint32_t kiss99_rand(kiss99_ctx *_this);
|
||||
|
||||
#endif
|
||||
192
managed_components/78__esp-opus/dnn/lossgen.c
Normal file
192
managed_components/78__esp-opus/dnn/lossgen.c
Normal file
@@ -0,0 +1,192 @@
|
||||
/* Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
/* This packet loss simulator can be used independently of the Opus codebase.
|
||||
To do that, you need to compile the following files:
|
||||
dnn/lossgen.c
|
||||
dnn/lossgen_data.c
|
||||
|
||||
with the following files needed as #include
|
||||
dnn/lossgen_data.h
|
||||
dnn/lossgen.h
|
||||
dnn/nnet_arch.h
|
||||
dnn/nnet.h
|
||||
dnn/parse_lpcnet_weights.c (included despite being a C file)
|
||||
dnn/vec_avx.h
|
||||
dnn/vec.h
|
||||
celt/os_support.h
|
||||
celt/arch.h
|
||||
celt/x86/x86_arch_macros.h
|
||||
include/opus_defines.h
|
||||
include/opus_types.h
|
||||
|
||||
Additionally, the code in dnn/lossgen_demo.c can be used to generate losses from
|
||||
the command line.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "arch.h"
|
||||
|
||||
#include <math.h>
|
||||
#include "lossgen.h"
|
||||
#include "os_support.h"
|
||||
#include "nnet.h"
|
||||
#include "assert.h"
|
||||
|
||||
/* Disable RTCD for this. */
|
||||
#define RTCD_ARCH c
|
||||
|
||||
/* Override assert to avoid undefined/redefined symbols. */
|
||||
#undef celt_assert
|
||||
#define celt_assert assert
|
||||
|
||||
/* Directly include the C files we need since the symbols won't be exposed if we link in a shared object. */
|
||||
#include "parse_lpcnet_weights.c"
|
||||
#include "nnet_arch.h"
|
||||
|
||||
#undef compute_linear
|
||||
#undef compute_activation
|
||||
|
||||
/* Force the C version since the SIMD versions may be hidden. */
|
||||
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_c(linear, out, in))
|
||||
#define compute_activation(output, input, N, activation, arch) ((void)(arch),compute_activation_c(output, input, N, activation))
|
||||
|
||||
#define MAX_RNN_NEURONS_ALL IMAX(LOSSGEN_GRU1_STATE_SIZE, LOSSGEN_GRU2_STATE_SIZE)
|
||||
|
||||
/* These two functions are copied from nnet.c to make sure we don't have linking issues. */
|
||||
void compute_generic_gru_lossgen(const LinearLayer *input_weights, const LinearLayer *recurrent_weights, float *state, const float *in, int arch)
|
||||
{
|
||||
int i;
|
||||
int N;
|
||||
float zrh[3*MAX_RNN_NEURONS_ALL];
|
||||
float recur[3*MAX_RNN_NEURONS_ALL];
|
||||
float *z;
|
||||
float *r;
|
||||
float *h;
|
||||
celt_assert(3*recurrent_weights->nb_inputs == recurrent_weights->nb_outputs);
|
||||
celt_assert(input_weights->nb_outputs == recurrent_weights->nb_outputs);
|
||||
N = recurrent_weights->nb_inputs;
|
||||
z = zrh;
|
||||
r = &zrh[N];
|
||||
h = &zrh[2*N];
|
||||
celt_assert(recurrent_weights->nb_outputs <= 3*MAX_RNN_NEURONS_ALL);
|
||||
celt_assert(in != state);
|
||||
compute_linear(input_weights, zrh, in, arch);
|
||||
compute_linear(recurrent_weights, recur, state, arch);
|
||||
for (i=0;i<2*N;i++)
|
||||
zrh[i] += recur[i];
|
||||
compute_activation(zrh, zrh, 2*N, ACTIVATION_SIGMOID, arch);
|
||||
for (i=0;i<N;i++)
|
||||
h[i] += recur[2*N+i]*r[i];
|
||||
compute_activation(h, h, N, ACTIVATION_TANH, arch);
|
||||
for (i=0;i<N;i++)
|
||||
h[i] = z[i]*state[i] + (1-z[i])*h[i];
|
||||
for (i=0;i<N;i++)
|
||||
state[i] = h[i];
|
||||
}
|
||||
|
||||
|
||||
void compute_generic_dense_lossgen(const LinearLayer *layer, float *output, const float *input, int activation, int arch)
|
||||
{
|
||||
compute_linear(layer, output, input, arch);
|
||||
compute_activation(output, output, layer->nb_outputs, activation, arch);
|
||||
}
|
||||
|
||||
|
||||
static int sample_loss_impl(
|
||||
LossGenState *st,
|
||||
float percent_loss)
|
||||
{
|
||||
float input[2];
|
||||
float tmp[LOSSGEN_DENSE_IN_OUT_SIZE];
|
||||
float out;
|
||||
int loss;
|
||||
LossGen *model = &st->model;
|
||||
input[0] = st->last_loss;
|
||||
input[1] = percent_loss;
|
||||
compute_generic_dense_lossgen(&model->lossgen_dense_in, tmp, input, ACTIVATION_TANH, 0);
|
||||
compute_generic_gru_lossgen(&model->lossgen_gru1_input, &model->lossgen_gru1_recurrent, st->gru1_state, tmp, 0);
|
||||
compute_generic_gru_lossgen(&model->lossgen_gru2_input, &model->lossgen_gru2_recurrent, st->gru2_state, st->gru1_state, 0);
|
||||
compute_generic_dense_lossgen(&model->lossgen_dense_out, &out, st->gru2_state, ACTIVATION_SIGMOID, 0);
|
||||
loss = (float)rand()/RAND_MAX < out;
|
||||
st->last_loss = loss;
|
||||
return loss;
|
||||
}
|
||||
|
||||
int sample_loss(
|
||||
LossGenState *st,
|
||||
float percent_loss)
|
||||
{
|
||||
/* Due to GRU being initialized with zeros, the first packets aren't quite random,
|
||||
so we skip them. */
|
||||
if (!st->used) {
|
||||
int i;
|
||||
for (i=0;i<1000;i++) sample_loss_impl(st, percent_loss);
|
||||
st->used = 1;
|
||||
}
|
||||
return sample_loss_impl(st, percent_loss);
|
||||
}
|
||||
|
||||
void lossgen_init(LossGenState *st)
|
||||
{
|
||||
int ret;
|
||||
OPUS_CLEAR(st, 1);
|
||||
ret = init_lossgen(&st->model, lossgen_arrays);
|
||||
celt_assert(ret == 0);
|
||||
(void)ret;
|
||||
}
|
||||
|
||||
int lossgen_load_model(LossGenState *st, const void *data, int len) {
|
||||
WeightArray *list;
|
||||
int ret;
|
||||
parse_weights(&list, data, len);
|
||||
ret = init_lossgen(&st->model, list);
|
||||
opus_free(list);
|
||||
if (ret == 0) return 0;
|
||||
else return -1;
|
||||
}
|
||||
|
||||
#if 0
|
||||
#include <stdio.h>
|
||||
int main(int argc, char **argv) {
|
||||
int i, N;
|
||||
float p;
|
||||
LossGenState st;
|
||||
if (argc!=3) {
|
||||
fprintf(stderr, "usage: lossgen <percentage> <length>\n");
|
||||
return 1;
|
||||
}
|
||||
lossgen_init(&st);
|
||||
p = atof(argv[1]);
|
||||
N = atoi(argv[2]);
|
||||
for (i=0;i<N;i++) {
|
||||
printf("%d\n", sample_loss(&st, p));
|
||||
}
|
||||
}
|
||||
#endif
|
||||
55
managed_components/78__esp-opus/dnn/lossgen.h
Normal file
55
managed_components/78__esp-opus/dnn/lossgen.h
Normal file
@@ -0,0 +1,55 @@
|
||||
/* Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef LOSSGEN_H
|
||||
#define LOSSGEN_H
|
||||
|
||||
|
||||
#include "lossgen_data.h"
|
||||
|
||||
#define PITCH_MIN_PERIOD 32
|
||||
#define PITCH_MAX_PERIOD 256
|
||||
|
||||
#define NB_XCORR_FEATURES (PITCH_MAX_PERIOD-PITCH_MIN_PERIOD)
|
||||
|
||||
|
||||
typedef struct {
|
||||
LossGen model;
|
||||
float gru1_state[LOSSGEN_GRU1_STATE_SIZE];
|
||||
float gru2_state[LOSSGEN_GRU2_STATE_SIZE];
|
||||
int last_loss;
|
||||
int used;
|
||||
} LossGenState;
|
||||
|
||||
|
||||
void lossgen_init(LossGenState *st);
|
||||
int lossgen_load_model(LossGenState *st, const void *data, int len);
|
||||
|
||||
int sample_loss(
|
||||
LossGenState *st,
|
||||
float percent_loss);
|
||||
|
||||
#endif
|
||||
22
managed_components/78__esp-opus/dnn/lossgen_demo.c
Normal file
22
managed_components/78__esp-opus/dnn/lossgen_demo.c
Normal file
@@ -0,0 +1,22 @@
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include "lossgen.h"
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
LossGenState st;
|
||||
long num_packets;
|
||||
long i;
|
||||
float percent;
|
||||
if (argc != 3) {
|
||||
fprintf(stderr, "usage: %s <percent_loss> <nb packets>\n", argv[0]);
|
||||
return 1;
|
||||
}
|
||||
lossgen_init(&st);
|
||||
percent = atof(argv[1]);
|
||||
num_packets = atol(argv[2]);
|
||||
/*printf("loss: %f %d\n", percent, num_packets);*/
|
||||
for (i=0;i<num_packets;i++) {
|
||||
printf("%d\n", sample_loss(&st, percent*0.01f));
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
283
managed_components/78__esp-opus/dnn/lpcnet.c
Normal file
283
managed_components/78__esp-opus/dnn/lpcnet.c
Normal file
@@ -0,0 +1,283 @@
|
||||
/* Copyright (c) 2018 Mozilla */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <math.h>
|
||||
#include <stdio.h>
|
||||
#include "nnet_data.h"
|
||||
#include "nnet.h"
|
||||
#include "common.h"
|
||||
#include "arch.h"
|
||||
#include "lpcnet.h"
|
||||
#include "lpcnet_private.h"
|
||||
#include "os_support.h"
|
||||
|
||||
#define PREEMPH 0.85f
|
||||
|
||||
#define PDF_FLOOR 0.002
|
||||
|
||||
#define FRAME_INPUT_SIZE (NB_FEATURES + EMBED_PITCH_OUT_SIZE)
|
||||
|
||||
|
||||
#if 0
|
||||
static void print_vector(float *x, int N)
|
||||
{
|
||||
int i;
|
||||
for (i=0;i<N;i++) printf("%f ", x[i]);
|
||||
printf("\n");
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef END2END
|
||||
void rc2lpc(float *lpc, const float *rc)
|
||||
{
|
||||
int i, j, k;
|
||||
float tmp[LPC_ORDER];
|
||||
float ntmp[LPC_ORDER] = {0.0};
|
||||
OPUS_COPY(tmp, rc, LPC_ORDER);
|
||||
for(i = 0; i < LPC_ORDER ; i++)
|
||||
{
|
||||
for(j = 0; j <= i-1; j++)
|
||||
{
|
||||
ntmp[j] = tmp[j] + tmp[i]*tmp[i - j - 1];
|
||||
}
|
||||
for(k = 0; k <= i-1; k++)
|
||||
{
|
||||
tmp[k] = ntmp[k];
|
||||
}
|
||||
}
|
||||
for(i = 0; i < LPC_ORDER ; i++)
|
||||
{
|
||||
lpc[i] = tmp[i];
|
||||
}
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
void run_frame_network(LPCNetState *lpcnet, float *gru_a_condition, float *gru_b_condition, float *lpc, const float *features)
|
||||
{
|
||||
NNetState *net;
|
||||
float condition[FEATURE_DENSE2_OUT_SIZE];
|
||||
float in[FRAME_INPUT_SIZE];
|
||||
float conv1_out[FEATURE_CONV1_OUT_SIZE];
|
||||
float conv2_out[FEATURE_CONV2_OUT_SIZE];
|
||||
float dense1_out[FEATURE_DENSE1_OUT_SIZE];
|
||||
int pitch;
|
||||
float rc[LPC_ORDER];
|
||||
/* Matches the Python code -- the 0.1 avoids rounding issues. */
|
||||
pitch = (int)floor(.1 + 50*features[NB_BANDS]+100);
|
||||
pitch = IMIN(255, IMAX(33, pitch));
|
||||
net = &lpcnet->nnet;
|
||||
OPUS_COPY(in, features, NB_FEATURES);
|
||||
compute_embedding(&lpcnet->model.embed_pitch, &in[NB_FEATURES], pitch);
|
||||
compute_conv1d(&lpcnet->model.feature_conv1, conv1_out, net->feature_conv1_state, in);
|
||||
if (lpcnet->frame_count < FEATURE_CONV1_DELAY) OPUS_CLEAR(conv1_out, FEATURE_CONV1_OUT_SIZE);
|
||||
compute_conv1d(&lpcnet->model.feature_conv2, conv2_out, net->feature_conv2_state, conv1_out);
|
||||
if (lpcnet->frame_count < FEATURES_DELAY) OPUS_CLEAR(conv2_out, FEATURE_CONV2_OUT_SIZE);
|
||||
_lpcnet_compute_dense(&lpcnet->model.feature_dense1, dense1_out, conv2_out);
|
||||
_lpcnet_compute_dense(&lpcnet->model.feature_dense2, condition, dense1_out);
|
||||
OPUS_COPY(rc, condition, LPC_ORDER);
|
||||
_lpcnet_compute_dense(&lpcnet->model.gru_a_dense_feature, gru_a_condition, condition);
|
||||
_lpcnet_compute_dense(&lpcnet->model.gru_b_dense_feature, gru_b_condition, condition);
|
||||
#ifdef END2END
|
||||
rc2lpc(lpc, rc);
|
||||
#elif FEATURES_DELAY>0
|
||||
memcpy(lpc, lpcnet->old_lpc[FEATURES_DELAY-1], LPC_ORDER*sizeof(lpc[0]));
|
||||
memmove(lpcnet->old_lpc[1], lpcnet->old_lpc[0], (FEATURES_DELAY-1)*LPC_ORDER*sizeof(lpc[0]));
|
||||
lpc_from_cepstrum(lpcnet->old_lpc[0], features);
|
||||
#else
|
||||
lpc_from_cepstrum(lpc, features);
|
||||
#endif
|
||||
#ifdef LPC_GAMMA
|
||||
lpc_weighting(lpc, LPC_GAMMA);
|
||||
#endif
|
||||
if (lpcnet->frame_count < 1000) lpcnet->frame_count++;
|
||||
}
|
||||
|
||||
void run_frame_network_deferred(LPCNetState *lpcnet, const float *features)
|
||||
{
|
||||
int max_buffer_size = lpcnet->model.feature_conv1.kernel_size + lpcnet->model.feature_conv2.kernel_size - 2;
|
||||
celt_assert(max_buffer_size <= MAX_FEATURE_BUFFER_SIZE);
|
||||
if (lpcnet->feature_buffer_fill == max_buffer_size) {
|
||||
OPUS_MOVE(lpcnet->feature_buffer, &lpcnet->feature_buffer[NB_FEATURES], (max_buffer_size-1)*NB_FEATURES);
|
||||
} else {
|
||||
lpcnet->feature_buffer_fill++;
|
||||
}
|
||||
OPUS_COPY(&lpcnet->feature_buffer[(lpcnet->feature_buffer_fill-1)*NB_FEATURES], features, NB_FEATURES);
|
||||
}
|
||||
|
||||
void run_frame_network_flush(LPCNetState *lpcnet)
|
||||
{
|
||||
int i;
|
||||
for (i=0;i<lpcnet->feature_buffer_fill;i++) {
|
||||
float lpc[LPC_ORDER];
|
||||
float gru_a_condition[3*GRU_A_STATE_SIZE];
|
||||
float gru_b_condition[3*GRU_B_STATE_SIZE];
|
||||
run_frame_network(lpcnet, gru_a_condition, gru_b_condition, lpc, &lpcnet->feature_buffer[i*NB_FEATURES]);
|
||||
}
|
||||
lpcnet->feature_buffer_fill = 0;
|
||||
}
|
||||
|
||||
int run_sample_network(LPCNetState *lpcnet, const float *gru_a_condition, const float *gru_b_condition, int last_exc, int last_sig, int pred, const float *sampling_logit_table, kiss99_ctx *rng)
|
||||
{
|
||||
NNetState *net;
|
||||
float gru_a_input[3*GRU_A_STATE_SIZE];
|
||||
float in_b[GRU_A_STATE_SIZE+FEATURE_DENSE2_OUT_SIZE];
|
||||
float gru_b_input[3*GRU_B_STATE_SIZE];
|
||||
net = &lpcnet->nnet;
|
||||
#if 1
|
||||
compute_gru_a_input(gru_a_input, gru_a_condition, GRU_A_STATE_SIZE, &lpcnet->model.gru_a_embed_sig, last_sig, &lpcnet->model.gru_a_embed_pred, pred, &lpcnet->model.gru_a_embed_exc, last_exc);
|
||||
#else
|
||||
OPUS_COPY(gru_a_input, gru_a_condition, 3*GRU_A_STATE_SIZE);
|
||||
accum_embedding(&lpcnet->model.gru_a_embed_sig, gru_a_input, last_sig);
|
||||
accum_embedding(&lpcnet->model.gru_a_embed_pred, gru_a_input, pred);
|
||||
accum_embedding(&lpcnet->model.gru_a_embed_exc, gru_a_input, last_exc);
|
||||
#endif
|
||||
/*compute_gru3(&gru_a, net->gru_a_state, gru_a_input);*/
|
||||
compute_sparse_gru(&lpcnet->model.sparse_gru_a, net->gru_a_state, gru_a_input);
|
||||
OPUS_COPY(in_b, net->gru_a_state, GRU_A_STATE_SIZE);
|
||||
OPUS_COPY(gru_b_input, gru_b_condition, 3*GRU_B_STATE_SIZE);
|
||||
compute_gruB(&lpcnet->model.gru_b, gru_b_input, net->gru_b_state, in_b);
|
||||
return sample_mdense(&lpcnet->model.dual_fc, net->gru_b_state, sampling_logit_table, rng);
|
||||
}
|
||||
|
||||
int lpcnet_get_size()
|
||||
{
|
||||
return sizeof(LPCNetState);
|
||||
}
|
||||
|
||||
void lpcnet_reset(LPCNetState *lpcnet)
|
||||
{
|
||||
const char* rng_string="LPCNet";
|
||||
OPUS_CLEAR((char*)&lpcnet->LPCNET_RESET_START,
|
||||
sizeof(LPCNetState)-
|
||||
((char*)&lpcnet->LPCNET_RESET_START - (char*)lpcnet));
|
||||
lpcnet->last_exc = lin2ulaw(0.f);
|
||||
kiss99_srand(&lpcnet->rng, (const unsigned char *)rng_string, strlen(rng_string));
|
||||
}
|
||||
|
||||
int lpcnet_init(LPCNetState *lpcnet)
|
||||
{
|
||||
int i;
|
||||
int ret;
|
||||
for (i=0;i<256;i++) {
|
||||
float prob = .025f+.95f*i/255.f;
|
||||
lpcnet->sampling_logit_table[i] = -log((1-prob)/prob);
|
||||
}
|
||||
#ifndef USE_WEIGHTS_FILE
|
||||
ret = init_lpcnet_model(&lpcnet->model, lpcnet_arrays);
|
||||
#else
|
||||
ret = 0;
|
||||
#endif
|
||||
lpcnet_reset(lpcnet);
|
||||
celt_assert(ret == 0);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int lpcnet_load_model(LPCNetState *st, const unsigned char *data, int len) {
|
||||
WeightArray *list;
|
||||
int ret;
|
||||
parse_weights(&list, data, len);
|
||||
ret = init_lpcnet_model(&st->model, list);
|
||||
opus_free(list);
|
||||
if (ret == 0) return 0;
|
||||
else return -1;
|
||||
}
|
||||
|
||||
|
||||
LPCNetState *lpcnet_create()
|
||||
{
|
||||
LPCNetState *lpcnet;
|
||||
lpcnet = (LPCNetState *)opus_alloc(lpcnet_get_size(), 1);
|
||||
OPUS_CLEAR(lpcnet, 1);
|
||||
lpcnet_init(lpcnet);
|
||||
return lpcnet;
|
||||
}
|
||||
|
||||
void lpcnet_destroy(LPCNetState *lpcnet)
|
||||
{
|
||||
opus_free(lpcnet);
|
||||
}
|
||||
|
||||
void lpcnet_reset_signal(LPCNetState *lpcnet)
|
||||
{
|
||||
lpcnet->deemph_mem = 0;
|
||||
lpcnet->last_exc = lin2ulaw(0.f);
|
||||
OPUS_CLEAR(lpcnet->last_sig, LPC_ORDER);
|
||||
OPUS_CLEAR(lpcnet->nnet.gru_a_state, GRU_A_STATE_SIZE);
|
||||
OPUS_CLEAR(lpcnet->nnet.gru_b_state, GRU_B_STATE_SIZE);
|
||||
}
|
||||
|
||||
void lpcnet_synthesize_tail_impl(LPCNetState *lpcnet, opus_int16 *output, int N, int preload)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (lpcnet->frame_count <= FEATURES_DELAY)
|
||||
{
|
||||
OPUS_CLEAR(output, N);
|
||||
return;
|
||||
}
|
||||
for (i=0;i<N;i++)
|
||||
{
|
||||
int j;
|
||||
float pcm;
|
||||
int exc;
|
||||
int last_sig_ulaw;
|
||||
int pred_ulaw;
|
||||
float pred = 0;
|
||||
for (j=0;j<LPC_ORDER;j++) pred -= lpcnet->last_sig[j]*lpcnet->lpc[j];
|
||||
last_sig_ulaw = lin2ulaw(lpcnet->last_sig[0]);
|
||||
pred_ulaw = lin2ulaw(pred);
|
||||
exc = run_sample_network(lpcnet, lpcnet->gru_a_condition, lpcnet->gru_b_condition, lpcnet->last_exc, last_sig_ulaw, pred_ulaw, lpcnet->sampling_logit_table, &lpcnet->rng);
|
||||
if (i < preload) {
|
||||
exc = lin2ulaw(output[i]-PREEMPH*lpcnet->deemph_mem - pred);
|
||||
pcm = output[i]-PREEMPH*lpcnet->deemph_mem;
|
||||
} else {
|
||||
pcm = pred + ulaw2lin(exc);
|
||||
}
|
||||
OPUS_MOVE(&lpcnet->last_sig[1], &lpcnet->last_sig[0], LPC_ORDER-1);
|
||||
lpcnet->last_sig[0] = pcm;
|
||||
lpcnet->last_exc = exc;
|
||||
pcm += PREEMPH*lpcnet->deemph_mem;
|
||||
lpcnet->deemph_mem = pcm;
|
||||
if (pcm<-32767) pcm = -32767;
|
||||
if (pcm>32767) pcm = 32767;
|
||||
if (i >= preload) output[i] = (int)floor(.5 + pcm);
|
||||
}
|
||||
}
|
||||
|
||||
void lpcnet_synthesize_impl(LPCNetState *lpcnet, const float *features, opus_int16 *output, int N, int preload)
|
||||
{
|
||||
run_frame_network(lpcnet, lpcnet->gru_a_condition, lpcnet->gru_b_condition, lpcnet->lpc, features);
|
||||
lpcnet_synthesize_tail_impl(lpcnet, output, N, preload);
|
||||
}
|
||||
|
||||
void lpcnet_synthesize(LPCNetState *lpcnet, const float *features, opus_int16 *output, int N) {
|
||||
lpcnet_synthesize_impl(lpcnet, features, output, N, 0);
|
||||
}
|
||||
183
managed_components/78__esp-opus/dnn/lpcnet.h
Normal file
183
managed_components/78__esp-opus/dnn/lpcnet.h
Normal file
@@ -0,0 +1,183 @@
|
||||
/* Copyright (c) 2018 Mozilla */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef LPCNET_H_
|
||||
#define LPCNET_H_
|
||||
|
||||
#include "opus_types.h"
|
||||
|
||||
#define NB_FEATURES 20
|
||||
#define NB_TOTAL_FEATURES 36
|
||||
|
||||
/** Number of audio samples in a feature frame (not for encoding/decoding). */
|
||||
#define LPCNET_FRAME_SIZE (160)
|
||||
|
||||
typedef struct LPCNetState LPCNetState;
|
||||
|
||||
typedef struct LPCNetDecState LPCNetDecState;
|
||||
|
||||
typedef struct LPCNetEncState LPCNetEncState;
|
||||
|
||||
typedef struct LPCNetPLCState LPCNetPLCState;
|
||||
|
||||
|
||||
/** Gets the size of an <code>LPCNetDecState</code> structure.
|
||||
* @returns The size in bytes.
|
||||
*/
|
||||
int lpcnet_decoder_get_size(void);
|
||||
|
||||
/** Initializes a previously allocated decoder state
|
||||
* The memory pointed to by st must be at least the size returned by lpcnet_decoder_get_size().
|
||||
* This is intended for applications which use their own allocator instead of malloc.
|
||||
* @see lpcnet_decoder_create(),lpcnet_decoder_get_size()
|
||||
* @param [in] st <tt>LPCNetDecState*</tt>: Decoder state
|
||||
* @retval 0 Success
|
||||
*/
|
||||
int lpcnet_decoder_init(LPCNetDecState *st);
|
||||
|
||||
void lpcnet_reset(LPCNetState *lpcnet);
|
||||
|
||||
/** Allocates and initializes a decoder state.
|
||||
* @returns The newly created state
|
||||
*/
|
||||
LPCNetDecState *lpcnet_decoder_create(void);
|
||||
|
||||
/** Frees an <code>LPCNetDecState</code> allocated by lpcnet_decoder_create().
|
||||
* @param[in] st <tt>LPCNetDecState*</tt>: State to be freed.
|
||||
*/
|
||||
void lpcnet_decoder_destroy(LPCNetDecState *st);
|
||||
|
||||
/** Decodes a packet of LPCNET_COMPRESSED_SIZE bytes (currently 8) into LPCNET_PACKET_SAMPLES samples (currently 640).
|
||||
* @param [in] st <tt>LPCNetDecState*</tt>: Decoder state
|
||||
* @param [in] buf <tt>const unsigned char *</tt>: Compressed packet
|
||||
* @param [out] pcm <tt>opus_int16 *</tt>: Decoded audio
|
||||
* @retval 0 Success
|
||||
*/
|
||||
int lpcnet_decode(LPCNetDecState *st, const unsigned char *buf, opus_int16 *pcm);
|
||||
|
||||
|
||||
|
||||
/** Gets the size of an <code>LPCNetEncState</code> structure.
|
||||
* @returns The size in bytes.
|
||||
*/
|
||||
int lpcnet_encoder_get_size(void);
|
||||
|
||||
/** Initializes a previously allocated encoder state
|
||||
* The memory pointed to by st must be at least the size returned by lpcnet_encoder_get_size().
|
||||
* This is intended for applications which use their own allocator instead of malloc.
|
||||
* @see lpcnet_encoder_create(),lpcnet_encoder_get_size()
|
||||
* @param [in] st <tt>LPCNetEncState*</tt>: Encoder state
|
||||
* @retval 0 Success
|
||||
*/
|
||||
int lpcnet_encoder_init(LPCNetEncState *st);
|
||||
|
||||
int lpcnet_encoder_load_model(LPCNetEncState *st, const void *data, int len);
|
||||
|
||||
/** Allocates and initializes an encoder state.
|
||||
* @returns The newly created state
|
||||
*/
|
||||
LPCNetEncState *lpcnet_encoder_create(void);
|
||||
|
||||
/** Frees an <code>LPCNetEncState</code> allocated by lpcnet_encoder_create().
|
||||
* @param[in] st <tt>LPCNetEncState*</tt>: State to be freed.
|
||||
*/
|
||||
void lpcnet_encoder_destroy(LPCNetEncState *st);
|
||||
|
||||
/** Encodes LPCNET_PACKET_SAMPLES speech samples (currently 640) into a packet of LPCNET_COMPRESSED_SIZE bytes (currently 8).
|
||||
* @param [in] st <tt>LPCNetDecState*</tt>: Encoder state
|
||||
* @param [in] pcm <tt>opus_int16 *</tt>: Input speech to be encoded
|
||||
* @param [out] buf <tt>const unsigned char *</tt>: Compressed packet
|
||||
* @retval 0 Success
|
||||
*/
|
||||
int lpcnet_encode(LPCNetEncState *st, const opus_int16 *pcm, unsigned char *buf);
|
||||
|
||||
/** Compute features on LPCNET_FRAME_SIZE speech samples (currently 160) and output features for one 10-ms frame.
|
||||
* @param [in] st <tt>LPCNetDecState*</tt>: Encoder state
|
||||
* @param [in] pcm <tt>opus_int16 *</tt>: Input speech to be analyzed
|
||||
* @param [out] features <tt>float[NB_TOTAL_FEATURES]</tt>: Four feature vectors
|
||||
* @retval 0 Success
|
||||
*/
|
||||
int lpcnet_compute_single_frame_features(LPCNetEncState *st, const opus_int16 *pcm, float features[NB_TOTAL_FEATURES], int arch);
|
||||
|
||||
|
||||
/** Compute features on LPCNET_FRAME_SIZE speech samples (currently 160) and output features for one 10-ms frame.
|
||||
* @param [in] st <tt>LPCNetDecState*</tt>: Encoder state
|
||||
* @param [in] pcm <tt>float *</tt>: Input speech to be analyzed
|
||||
* @param [out] features <tt>float[NB_TOTAL_FEATURES]</tt>: Four feature vectors
|
||||
* @retval 0 Success
|
||||
*/
|
||||
int lpcnet_compute_single_frame_features_float(LPCNetEncState *st, const float *pcm, float features[NB_TOTAL_FEATURES], int arch);
|
||||
|
||||
/** Gets the size of an <code>LPCNetState</code> structure.
|
||||
* @returns The size in bytes.
|
||||
*/
|
||||
int lpcnet_get_size(void);
|
||||
|
||||
/** Initializes a previously allocated synthesis state
|
||||
* The memory pointed to by st must be at least the size returned by lpcnet_get_size().
|
||||
* This is intended for applications which use their own allocator instead of malloc.
|
||||
* @see lpcnet_create(),lpcnet_get_size()
|
||||
* @param [in] st <tt>LPCNetState*</tt>: Synthesis state
|
||||
* @retval 0 Success
|
||||
*/
|
||||
int lpcnet_init(LPCNetState *st);
|
||||
|
||||
/** Allocates and initializes a synthesis state.
|
||||
* @returns The newly created state
|
||||
*/
|
||||
LPCNetState *lpcnet_create(void);
|
||||
|
||||
/** Frees an <code>LPCNetState</code> allocated by lpcnet_create().
|
||||
* @param[in] st <tt>LPCNetState*</tt>: State to be freed.
|
||||
*/
|
||||
void lpcnet_destroy(LPCNetState *st);
|
||||
|
||||
/** Synthesizes speech from an LPCNet feature vector.
|
||||
* @param [in] st <tt>LPCNetState*</tt>: Synthesis state
|
||||
* @param [in] features <tt>const float *</tt>: Compressed packet
|
||||
* @param [out] output <tt>opus_int16 **</tt>: Synthesized speech
|
||||
* @param [in] N <tt>int</tt>: Number of samples to generate
|
||||
* @retval 0 Success
|
||||
*/
|
||||
void lpcnet_synthesize(LPCNetState *st, const float *features, opus_int16 *output, int N);
|
||||
|
||||
|
||||
|
||||
int lpcnet_plc_init(LPCNetPLCState *st);
|
||||
void lpcnet_plc_reset(LPCNetPLCState *st);
|
||||
|
||||
int lpcnet_plc_update(LPCNetPLCState *st, opus_int16 *pcm);
|
||||
|
||||
int lpcnet_plc_conceal(LPCNetPLCState *st, opus_int16 *pcm);
|
||||
|
||||
void lpcnet_plc_fec_add(LPCNetPLCState *st, const float *features);
|
||||
|
||||
void lpcnet_plc_fec_clear(LPCNetPLCState *st);
|
||||
|
||||
int lpcnet_load_model(LPCNetState *st, const void *data, int len);
|
||||
int lpcnet_plc_load_model(LPCNetPLCState *st, const void *data, int len);
|
||||
|
||||
#endif
|
||||
230
managed_components/78__esp-opus/dnn/lpcnet_enc.c
Normal file
230
managed_components/78__esp-opus/dnn/lpcnet_enc.c
Normal file
@@ -0,0 +1,230 @@
|
||||
/* Copyright (c) 2017-2019 Mozilla */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include "kiss_fft.h"
|
||||
#include "common.h"
|
||||
#include <math.h>
|
||||
#include "freq.h"
|
||||
#include "pitch.h"
|
||||
#include "arch.h"
|
||||
#include <assert.h>
|
||||
#include "lpcnet_private.h"
|
||||
#include "lpcnet.h"
|
||||
#include "os_support.h"
|
||||
#include "_kiss_fft_guts.h"
|
||||
#include "celt_lpc.h"
|
||||
#include "mathops.h"
|
||||
|
||||
|
||||
int lpcnet_encoder_get_size(void) {
|
||||
return sizeof(LPCNetEncState);
|
||||
}
|
||||
|
||||
int lpcnet_encoder_init(LPCNetEncState *st) {
|
||||
memset(st, 0, sizeof(*st));
|
||||
pitchdnn_init(&st->pitchdnn);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int lpcnet_encoder_load_model(LPCNetEncState *st, const void *data, int len) {
|
||||
return pitchdnn_load_model(&st->pitchdnn, data, len);
|
||||
}
|
||||
|
||||
LPCNetEncState *lpcnet_encoder_create(void) {
|
||||
LPCNetEncState *st;
|
||||
st = opus_alloc(lpcnet_encoder_get_size());
|
||||
lpcnet_encoder_init(st);
|
||||
return st;
|
||||
}
|
||||
|
||||
void lpcnet_encoder_destroy(LPCNetEncState *st) {
|
||||
opus_free(st);
|
||||
}
|
||||
|
||||
static void frame_analysis(LPCNetEncState *st, kiss_fft_cpx *X, float *Ex, const float *in) {
|
||||
float x[WINDOW_SIZE];
|
||||
OPUS_COPY(x, st->analysis_mem, OVERLAP_SIZE);
|
||||
OPUS_COPY(&x[OVERLAP_SIZE], in, FRAME_SIZE);
|
||||
OPUS_COPY(st->analysis_mem, &in[FRAME_SIZE-OVERLAP_SIZE], OVERLAP_SIZE);
|
||||
apply_window(x);
|
||||
forward_transform(X, x);
|
||||
lpcn_compute_band_energy(Ex, X);
|
||||
}
|
||||
|
||||
static void biquad(float *y, float mem[2], const float *x, const float *b, const float *a, int N) {
|
||||
int i;
|
||||
float mem0, mem1;
|
||||
mem0 = mem[0];
|
||||
mem1 = mem[1];
|
||||
for (i=0;i<N;i++) {
|
||||
float xi, yi, mem00;
|
||||
xi = x[i];
|
||||
yi = x[i] + mem0;
|
||||
mem00 = mem0;
|
||||
/* Original code:
|
||||
mem0 = mem1 + (b[0]*xi - a[0]*yi);
|
||||
mem1 = (b[1]*xi - a[1]*yi);
|
||||
Modified to reduce dependency chains: (the +1e-30f forces the ordering and has no effect on the output)
|
||||
*/
|
||||
mem0 = (b[0]-a[0])*xi + mem1 - a[0]*mem0;
|
||||
mem1 = (b[1]-a[1])*xi + 1e-30f - a[1]*mem00;
|
||||
y[i] = yi;
|
||||
}
|
||||
mem[0] = mem0;
|
||||
mem[1] = mem1;
|
||||
}
|
||||
|
||||
#define celt_log10(x) (0.3010299957f*celt_log2(x))
|
||||
|
||||
void compute_frame_features(LPCNetEncState *st, const float *in, int arch) {
|
||||
float aligned_in[FRAME_SIZE];
|
||||
int i;
|
||||
float Ly[NB_BANDS];
|
||||
float follow, logMax;
|
||||
kiss_fft_cpx X[FREQ_SIZE];
|
||||
float Ex[NB_BANDS];
|
||||
float xcorr[PITCH_MAX_PERIOD];
|
||||
float ener0;
|
||||
float ener;
|
||||
float x[FRAME_SIZE+LPC_ORDER];
|
||||
float frame_corr;
|
||||
float xy, xx, yy;
|
||||
int pitch;
|
||||
float ener_norm[PITCH_MAX_PERIOD - PITCH_MIN_PERIOD];
|
||||
/* [b,a]=ellip(2, 2, 20, 1200/8000); */
|
||||
static const float lp_b[2] = {-0.84946f, 1.f};
|
||||
static const float lp_a[2] = {-1.54220f, 0.70781f};
|
||||
OPUS_COPY(aligned_in, &st->analysis_mem[OVERLAP_SIZE-TRAINING_OFFSET], TRAINING_OFFSET);
|
||||
frame_analysis(st, X, Ex, in);
|
||||
st->if_features[0] = MAX16(-1.f, MIN16(1.f, (1.f/64)*(10.f*celt_log10(1e-15f + X[0].r*X[0].r)-6.f)));
|
||||
for (i=1;i<PITCH_IF_MAX_FREQ;i++) {
|
||||
kiss_fft_cpx prod;
|
||||
float norm_1;
|
||||
C_MULC(prod, X[i], st->prev_if[i]);
|
||||
norm_1 = 1.f/sqrt(1e-15f + prod.r*prod.r + prod.i*prod.i);
|
||||
C_MULBYSCALAR(prod, norm_1);
|
||||
st->if_features[3*i-2] = prod.r;
|
||||
st->if_features[3*i-1] = prod.i;
|
||||
st->if_features[3*i] = MAX16(-1.f, MIN16(1.f, (1.f/64)*(10.f*celt_log10(1e-15f + X[i].r*X[i].r + X[i].i*X[i].i)-6.f)));
|
||||
}
|
||||
OPUS_COPY(st->prev_if, X, PITCH_IF_MAX_FREQ);
|
||||
/*for (i=0;i<88;i++) printf("%f ", st->if_features[i]);printf("\n");*/
|
||||
logMax = -2;
|
||||
follow = -2;
|
||||
for (i=0;i<NB_BANDS;i++) {
|
||||
Ly[i] = celt_log10(1e-2f+Ex[i]);
|
||||
Ly[i] = MAX16(logMax-8, MAX16(follow-2.5f, Ly[i]));
|
||||
logMax = MAX16(logMax, Ly[i]);
|
||||
follow = MAX16(follow-2.5f, Ly[i]);
|
||||
}
|
||||
dct(st->features, Ly);
|
||||
st->features[0] -= 4;
|
||||
lpc_from_cepstrum(st->lpc, st->features);
|
||||
for (i=0;i<LPC_ORDER;i++) st->features[NB_BANDS+2+i] = st->lpc[i];
|
||||
OPUS_MOVE(st->exc_buf, &st->exc_buf[FRAME_SIZE], PITCH_MAX_PERIOD);
|
||||
OPUS_MOVE(st->lp_buf, &st->lp_buf[FRAME_SIZE], PITCH_MAX_PERIOD);
|
||||
OPUS_COPY(&aligned_in[TRAINING_OFFSET], in, FRAME_SIZE-TRAINING_OFFSET);
|
||||
OPUS_COPY(&x[0], st->pitch_mem, LPC_ORDER);
|
||||
OPUS_COPY(&x[LPC_ORDER], aligned_in, FRAME_SIZE);
|
||||
OPUS_COPY(st->pitch_mem, &aligned_in[FRAME_SIZE-LPC_ORDER], LPC_ORDER);
|
||||
celt_fir(&x[LPC_ORDER], st->lpc, &st->lp_buf[PITCH_MAX_PERIOD], FRAME_SIZE, LPC_ORDER, arch);
|
||||
for (i=0;i<FRAME_SIZE;i++) {
|
||||
st->exc_buf[PITCH_MAX_PERIOD+i] = st->lp_buf[PITCH_MAX_PERIOD+i] + .7f*st->pitch_filt;
|
||||
st->pitch_filt = st->lp_buf[PITCH_MAX_PERIOD+i];
|
||||
/*printf("%f\n", st->exc_buf[PITCH_MAX_PERIOD+i]);*/
|
||||
}
|
||||
biquad(&st->lp_buf[PITCH_MAX_PERIOD], st->lp_mem, &st->lp_buf[PITCH_MAX_PERIOD], lp_b, lp_a, FRAME_SIZE);
|
||||
{
|
||||
double ener1;
|
||||
float *buf = st->exc_buf;
|
||||
celt_pitch_xcorr(&buf[PITCH_MAX_PERIOD], buf, xcorr, FRAME_SIZE, PITCH_MAX_PERIOD-PITCH_MIN_PERIOD, arch);
|
||||
ener0 = celt_inner_prod(&buf[PITCH_MAX_PERIOD], &buf[PITCH_MAX_PERIOD], FRAME_SIZE, arch);
|
||||
ener1 = celt_inner_prod(&buf[0], &buf[0], FRAME_SIZE, arch);
|
||||
/*printf("%f\n", st->frame_weight[sub]);*/
|
||||
for (i=0;i<PITCH_MAX_PERIOD-PITCH_MIN_PERIOD;i++) {
|
||||
ener = 1 + ener0 + ener1;
|
||||
st->xcorr_features[i] = 2*xcorr[i];
|
||||
ener_norm[i] = ener;
|
||||
ener1 += buf[i+FRAME_SIZE]*(double)buf[i+FRAME_SIZE] - buf[i]*(double)buf[i];
|
||||
/*printf("%f ", st->xcorr_features[i]);*/
|
||||
}
|
||||
/* Split in a separate loop so the compiler can vectorize it */
|
||||
for (i=0;i<PITCH_MAX_PERIOD-PITCH_MIN_PERIOD;i++) {
|
||||
st->xcorr_features[i] /= ener_norm[i];
|
||||
}
|
||||
/*printf("\n");*/
|
||||
}
|
||||
st->dnn_pitch = compute_pitchdnn(&st->pitchdnn, st->if_features, st->xcorr_features, arch);
|
||||
pitch = (int)floor(.5+256./pow(2.f,((1./60.)*((st->dnn_pitch+1.5)*60))));
|
||||
xx = celt_inner_prod(&st->lp_buf[PITCH_MAX_PERIOD], &st->lp_buf[PITCH_MAX_PERIOD], FRAME_SIZE, arch);
|
||||
yy = celt_inner_prod(&st->lp_buf[PITCH_MAX_PERIOD-pitch], &st->lp_buf[PITCH_MAX_PERIOD-pitch], FRAME_SIZE, arch);
|
||||
xy = celt_inner_prod(&st->lp_buf[PITCH_MAX_PERIOD], &st->lp_buf[PITCH_MAX_PERIOD-pitch], FRAME_SIZE, arch);
|
||||
/*printf("%f %f\n", frame_corr, xy/sqrt(1e-15+xx*yy));*/
|
||||
frame_corr = xy/sqrt(1+xx*yy);
|
||||
frame_corr = log(1.f+exp(5.f*frame_corr))/log(1+exp(5.f));
|
||||
st->features[NB_BANDS] = st->dnn_pitch;
|
||||
st->features[NB_BANDS + 1] = frame_corr-.5f;
|
||||
}
|
||||
|
||||
void preemphasis(float *y, float *mem, const float *x, float coef, int N) {
|
||||
int i;
|
||||
for (i=0;i<N;i++) {
|
||||
float yi;
|
||||
yi = x[i] + *mem;
|
||||
*mem = -coef*x[i];
|
||||
y[i] = yi;
|
||||
}
|
||||
}
|
||||
|
||||
static int lpcnet_compute_single_frame_features_impl(LPCNetEncState *st, float *x, float features[NB_TOTAL_FEATURES], int arch) {
|
||||
preemphasis(x, &st->mem_preemph, x, PREEMPHASIS, FRAME_SIZE);
|
||||
compute_frame_features(st, x, arch);
|
||||
OPUS_COPY(features, &st->features[0], NB_TOTAL_FEATURES);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int lpcnet_compute_single_frame_features(LPCNetEncState *st, const opus_int16 *pcm, float features[NB_TOTAL_FEATURES], int arch) {
|
||||
int i;
|
||||
float x[FRAME_SIZE];
|
||||
for (i=0;i<FRAME_SIZE;i++) x[i] = pcm[i];
|
||||
lpcnet_compute_single_frame_features_impl(st, x, features, arch);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int lpcnet_compute_single_frame_features_float(LPCNetEncState *st, const float *pcm, float features[NB_TOTAL_FEATURES], int arch) {
|
||||
int i;
|
||||
float x[FRAME_SIZE];
|
||||
for (i=0;i<FRAME_SIZE;i++) x[i] = pcm[i];
|
||||
lpcnet_compute_single_frame_features_impl(st, x, features, arch);
|
||||
return 0;
|
||||
}
|
||||
211
managed_components/78__esp-opus/dnn/lpcnet_plc.c
Normal file
211
managed_components/78__esp-opus/dnn/lpcnet_plc.c
Normal file
@@ -0,0 +1,211 @@
|
||||
/* Copyright (c) 2021 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include "lpcnet_private.h"
|
||||
#include "lpcnet.h"
|
||||
#include "plc_data.h"
|
||||
#include "os_support.h"
|
||||
#include "common.h"
|
||||
#include "cpu_support.h"
|
||||
|
||||
#ifndef M_PI
|
||||
#define M_PI 3.141592653
|
||||
#endif
|
||||
|
||||
/* Comment this out to have LPCNet update its state on every good packet (slow). */
|
||||
#define PLC_SKIP_UPDATES
|
||||
|
||||
void lpcnet_plc_reset(LPCNetPLCState *st) {
|
||||
OPUS_CLEAR((char*)&st->LPCNET_PLC_RESET_START,
|
||||
sizeof(LPCNetPLCState)-
|
||||
((char*)&st->LPCNET_PLC_RESET_START - (char*)st));
|
||||
lpcnet_encoder_init(&st->enc);
|
||||
OPUS_CLEAR(st->pcm, PLC_BUF_SIZE);
|
||||
st->blend = 0;
|
||||
st->loss_count = 0;
|
||||
st->analysis_gap = 1;
|
||||
st->analysis_pos = PLC_BUF_SIZE;
|
||||
st->predict_pos = PLC_BUF_SIZE;
|
||||
}
|
||||
|
||||
int lpcnet_plc_init(LPCNetPLCState *st) {
|
||||
int ret;
|
||||
st->arch = opus_select_arch();
|
||||
fargan_init(&st->fargan);
|
||||
lpcnet_encoder_init(&st->enc);
|
||||
st->loaded = 0;
|
||||
#ifndef USE_WEIGHTS_FILE
|
||||
ret = init_plcmodel(&st->model, plcmodel_arrays);
|
||||
if (ret == 0) st->loaded = 1;
|
||||
#else
|
||||
ret = 0;
|
||||
#endif
|
||||
celt_assert(ret == 0);
|
||||
lpcnet_plc_reset(st);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int lpcnet_plc_load_model(LPCNetPLCState *st, const void *data, int len) {
|
||||
WeightArray *list;
|
||||
int ret;
|
||||
parse_weights(&list, data, len);
|
||||
ret = init_plcmodel(&st->model, list);
|
||||
opus_free(list);
|
||||
if (ret == 0) {
|
||||
ret = lpcnet_encoder_load_model(&st->enc, data, len);
|
||||
}
|
||||
if (ret == 0) {
|
||||
ret = fargan_load_model(&st->fargan, data, len);
|
||||
}
|
||||
if (ret == 0) st->loaded = 1;
|
||||
return ret;
|
||||
}
|
||||
|
||||
void lpcnet_plc_fec_add(LPCNetPLCState *st, const float *features) {
|
||||
if (features == NULL) {
|
||||
st->fec_skip++;
|
||||
return;
|
||||
}
|
||||
if (st->fec_fill_pos == PLC_MAX_FEC) {
|
||||
OPUS_MOVE(&st->fec[0][0], &st->fec[st->fec_read_pos][0], (st->fec_fill_pos-st->fec_read_pos)*NB_FEATURES);
|
||||
st->fec_fill_pos = st->fec_fill_pos-st->fec_read_pos;
|
||||
st->fec_read_pos -= st->fec_read_pos;
|
||||
}
|
||||
OPUS_COPY(&st->fec[st->fec_fill_pos][0], features, NB_FEATURES);
|
||||
st->fec_fill_pos++;
|
||||
}
|
||||
|
||||
void lpcnet_plc_fec_clear(LPCNetPLCState *st) {
|
||||
st->fec_read_pos = st->fec_fill_pos = st->fec_skip = 0;
|
||||
}
|
||||
|
||||
|
||||
static void compute_plc_pred(LPCNetPLCState *st, float *out, const float *in) {
|
||||
float tmp[PLC_DENSE_IN_OUT_SIZE];
|
||||
PLCModel *model = &st->model;
|
||||
PLCNetState *net = &st->plc_net;
|
||||
celt_assert(st->loaded);
|
||||
compute_generic_dense(&model->plc_dense_in, tmp, in, ACTIVATION_TANH, st->arch);
|
||||
compute_generic_gru(&model->plc_gru1_input, &model->plc_gru1_recurrent, net->gru1_state, tmp, st->arch);
|
||||
compute_generic_gru(&model->plc_gru2_input, &model->plc_gru2_recurrent, net->gru2_state, net->gru1_state, st->arch);
|
||||
compute_generic_dense(&model->plc_dense_out, out, net->gru2_state, ACTIVATION_LINEAR, st->arch);
|
||||
}
|
||||
|
||||
static int get_fec_or_pred(LPCNetPLCState *st, float *out) {
|
||||
if (st->fec_read_pos != st->fec_fill_pos && st->fec_skip==0) {
|
||||
float plc_features[2*NB_BANDS+NB_FEATURES+1] = {0};
|
||||
float discard[NB_FEATURES];
|
||||
OPUS_COPY(out, &st->fec[st->fec_read_pos][0], NB_FEATURES);
|
||||
st->fec_read_pos++;
|
||||
/* Update PLC state using FEC, so without Burg features. */
|
||||
OPUS_COPY(&plc_features[2*NB_BANDS], out, NB_FEATURES);
|
||||
plc_features[2*NB_BANDS+NB_FEATURES] = -1;
|
||||
compute_plc_pred(st, discard, plc_features);
|
||||
return 1;
|
||||
} else {
|
||||
float zeros[2*NB_BANDS+NB_FEATURES+1] = {0};
|
||||
compute_plc_pred(st, out, zeros);
|
||||
if (st->fec_skip > 0) st->fec_skip--;
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
static void queue_features(LPCNetPLCState *st, const float *features) {
|
||||
OPUS_MOVE(&st->cont_features[0], &st->cont_features[NB_FEATURES], (CONT_VECTORS-1)*NB_FEATURES);
|
||||
OPUS_COPY(&st->cont_features[(CONT_VECTORS-1)*NB_FEATURES], features, NB_FEATURES);
|
||||
}
|
||||
|
||||
/* In this causal version of the code, the DNN model implemented by compute_plc_pred()
|
||||
needs to generate two feature vectors to conceal the first lost packet.*/
|
||||
|
||||
int lpcnet_plc_update(LPCNetPLCState *st, opus_int16 *pcm) {
|
||||
int i;
|
||||
if (st->analysis_pos - FRAME_SIZE >= 0) st->analysis_pos -= FRAME_SIZE;
|
||||
else st->analysis_gap = 1;
|
||||
if (st->predict_pos - FRAME_SIZE >= 0) st->predict_pos -= FRAME_SIZE;
|
||||
OPUS_MOVE(st->pcm, &st->pcm[FRAME_SIZE], PLC_BUF_SIZE-FRAME_SIZE);
|
||||
for (i=0;i<FRAME_SIZE;i++) st->pcm[PLC_BUF_SIZE-FRAME_SIZE+i] = (1.f/32768.f)*pcm[i];
|
||||
st->loss_count = 0;
|
||||
st->blend = 0;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static const float att_table[10] = {0, 0, -.2, -.2, -.4, -.4, -.8, -.8, -1.6, -1.6};
|
||||
int lpcnet_plc_conceal(LPCNetPLCState *st, opus_int16 *pcm) {
|
||||
int i;
|
||||
celt_assert(st->loaded);
|
||||
if (st->blend == 0) {
|
||||
int count = 0;
|
||||
st->plc_net = st->plc_bak[0];
|
||||
while (st->analysis_pos + FRAME_SIZE <= PLC_BUF_SIZE) {
|
||||
float x[FRAME_SIZE];
|
||||
float plc_features[2*NB_BANDS+NB_FEATURES+1];
|
||||
celt_assert(st->analysis_pos >= 0);
|
||||
for (i=0;i<FRAME_SIZE;i++) x[i] = 32768.f*st->pcm[st->analysis_pos+i];
|
||||
burg_cepstral_analysis(plc_features, x);
|
||||
lpcnet_compute_single_frame_features_float(&st->enc, x, st->features, st->arch);
|
||||
if ((!st->analysis_gap || count>0) && st->analysis_pos >= st->predict_pos) {
|
||||
queue_features(st, st->features);
|
||||
OPUS_COPY(&plc_features[2*NB_BANDS], st->features, NB_FEATURES);
|
||||
plc_features[2*NB_BANDS+NB_FEATURES] = 1;
|
||||
st->plc_bak[0] = st->plc_bak[1];
|
||||
st->plc_bak[1] = st->plc_net;
|
||||
compute_plc_pred(st, st->features, plc_features);
|
||||
}
|
||||
st->analysis_pos += FRAME_SIZE;
|
||||
count++;
|
||||
}
|
||||
st->plc_bak[0] = st->plc_bak[1];
|
||||
st->plc_bak[1] = st->plc_net;
|
||||
get_fec_or_pred(st, st->features);
|
||||
queue_features(st, st->features);
|
||||
st->plc_bak[0] = st->plc_bak[1];
|
||||
st->plc_bak[1] = st->plc_net;
|
||||
get_fec_or_pred(st, st->features);
|
||||
queue_features(st, st->features);
|
||||
fargan_cont(&st->fargan, &st->pcm[PLC_BUF_SIZE-FARGAN_CONT_SAMPLES], st->cont_features);
|
||||
st->analysis_gap = 0;
|
||||
}
|
||||
st->plc_bak[0] = st->plc_bak[1];
|
||||
st->plc_bak[1] = st->plc_net;
|
||||
if (get_fec_or_pred(st, st->features)) st->loss_count = 0;
|
||||
else st->loss_count++;
|
||||
if (st->loss_count >= 10) st->features[0] = MAX16(-10, st->features[0]+att_table[9] - 2*(st->loss_count-9));
|
||||
else st->features[0] = MAX16(-10, st->features[0]+att_table[st->loss_count]);
|
||||
fargan_synthesize_int(&st->fargan, pcm, &st->features[0]);
|
||||
queue_features(st, st->features);
|
||||
if (st->analysis_pos - FRAME_SIZE >= 0) st->analysis_pos -= FRAME_SIZE;
|
||||
else st->analysis_gap = 1;
|
||||
st->predict_pos = PLC_BUF_SIZE;
|
||||
OPUS_MOVE(st->pcm, &st->pcm[FRAME_SIZE], PLC_BUF_SIZE-FRAME_SIZE);
|
||||
for (i=0;i<FRAME_SIZE;i++) st->pcm[PLC_BUF_SIZE-FRAME_SIZE+i] = (1.f/32768.f)*pcm[i];
|
||||
st->blend = 1;
|
||||
return 0;
|
||||
}
|
||||
90
managed_components/78__esp-opus/dnn/lpcnet_private.h
Normal file
90
managed_components/78__esp-opus/dnn/lpcnet_private.h
Normal file
@@ -0,0 +1,90 @@
|
||||
#ifndef LPCNET_PRIVATE_H
|
||||
#define LPCNET_PRIVATE_H
|
||||
|
||||
#include <stdio.h>
|
||||
#include "freq.h"
|
||||
#include "lpcnet.h"
|
||||
#include "plc_data.h"
|
||||
#include "pitchdnn.h"
|
||||
#include "fargan.h"
|
||||
|
||||
|
||||
#define PITCH_FRAME_SIZE 320
|
||||
#define PITCH_BUF_SIZE (PITCH_MAX_PERIOD+PITCH_FRAME_SIZE)
|
||||
|
||||
#define PLC_MAX_FEC 100
|
||||
#define MAX_FEATURE_BUFFER_SIZE 4
|
||||
|
||||
#define PITCH_IF_MAX_FREQ 30
|
||||
#define PITCH_IF_FEATURES (3*PITCH_IF_MAX_FREQ - 2)
|
||||
|
||||
#define CONT_VECTORS 5
|
||||
|
||||
#define FEATURES_DELAY 1
|
||||
|
||||
struct LPCNetEncState{
|
||||
PitchDNNState pitchdnn;
|
||||
float analysis_mem[OVERLAP_SIZE];
|
||||
float mem_preemph;
|
||||
kiss_fft_cpx prev_if[PITCH_IF_MAX_FREQ];
|
||||
float if_features[PITCH_IF_FEATURES];
|
||||
float xcorr_features[PITCH_MAX_PERIOD - PITCH_MIN_PERIOD];
|
||||
float dnn_pitch;
|
||||
float pitch_mem[LPC_ORDER];
|
||||
float pitch_filt;
|
||||
float exc_buf[PITCH_BUF_SIZE];
|
||||
float lp_buf[PITCH_BUF_SIZE];
|
||||
float lp_mem[4];
|
||||
float lpc[LPC_ORDER];
|
||||
float features[NB_TOTAL_FEATURES];
|
||||
float sig_mem[LPC_ORDER];
|
||||
float burg_cepstrum[2*NB_BANDS];
|
||||
};
|
||||
|
||||
typedef struct {
|
||||
float gru1_state[PLC_GRU1_STATE_SIZE];
|
||||
float gru2_state[PLC_GRU2_STATE_SIZE];
|
||||
} PLCNetState;
|
||||
|
||||
#define PLC_BUF_SIZE ((CONT_VECTORS+10)*FRAME_SIZE)
|
||||
struct LPCNetPLCState {
|
||||
PLCModel model;
|
||||
FARGANState fargan;
|
||||
LPCNetEncState enc;
|
||||
int loaded;
|
||||
int arch;
|
||||
|
||||
#define LPCNET_PLC_RESET_START fec
|
||||
float fec[PLC_MAX_FEC][NB_FEATURES];
|
||||
int analysis_gap;
|
||||
int fec_read_pos;
|
||||
int fec_fill_pos;
|
||||
int fec_skip;
|
||||
int analysis_pos;
|
||||
int predict_pos;
|
||||
float pcm[PLC_BUF_SIZE];
|
||||
int blend;
|
||||
float features[NB_TOTAL_FEATURES];
|
||||
float cont_features[CONT_VECTORS*NB_FEATURES];
|
||||
int loss_count;
|
||||
PLCNetState plc_net;
|
||||
PLCNetState plc_bak[2];
|
||||
};
|
||||
|
||||
void preemphasis(float *y, float *mem, const float *x, float coef, int N);
|
||||
|
||||
void compute_frame_features(LPCNetEncState *st, const float *in, int arch);
|
||||
|
||||
void lpcnet_reset_signal(LPCNetState *lpcnet);
|
||||
void run_frame_network(LPCNetState *lpcnet, float *gru_a_condition, float *gru_b_condition, float *lpc, const float *features);
|
||||
void run_frame_network_deferred(LPCNetState *lpcnet, const float *features);
|
||||
void run_frame_network_flush(LPCNetState *lpcnet);
|
||||
|
||||
|
||||
void lpcnet_synthesize_tail_impl(LPCNetState *lpcnet, opus_int16 *output, int N, int preload);
|
||||
void lpcnet_synthesize_impl(LPCNetState *lpcnet, const float *features, opus_int16 *output, int N, int preload);
|
||||
void lpcnet_synthesize_blend_impl(LPCNetState *lpcnet, const opus_int16 *pcm_in, opus_int16 *output, int N);
|
||||
|
||||
void run_frame_network(LPCNetState *lpcnet, float *gru_a_condition, float *gru_b_condition, float *lpc, const float *features);
|
||||
|
||||
#endif
|
||||
307
managed_components/78__esp-opus/dnn/lpcnet_tables.c
Normal file
307
managed_components/78__esp-opus/dnn/lpcnet_tables.c
Normal file
@@ -0,0 +1,307 @@
|
||||
/* The contents of this file was automatically generated by dump_lpcnet_tables.c*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
#include "kiss_fft.h"
|
||||
|
||||
static const arch_fft_state arch_fft = {0, NULL};
|
||||
|
||||
static const opus_int16 fft_bitrev[320] = {
|
||||
0, 64, 128, 192, 256, 16, 80, 144, 208, 272, 32, 96, 160, 224, 288,
|
||||
48, 112, 176, 240, 304, 4, 68, 132, 196, 260, 20, 84, 148, 212, 276,
|
||||
36, 100, 164, 228, 292, 52, 116, 180, 244, 308, 8, 72, 136, 200, 264,
|
||||
24, 88, 152, 216, 280, 40, 104, 168, 232, 296, 56, 120, 184, 248, 312,
|
||||
12, 76, 140, 204, 268, 28, 92, 156, 220, 284, 44, 108, 172, 236, 300,
|
||||
60, 124, 188, 252, 316, 1, 65, 129, 193, 257, 17, 81, 145, 209, 273,
|
||||
33, 97, 161, 225, 289, 49, 113, 177, 241, 305, 5, 69, 133, 197, 261,
|
||||
21, 85, 149, 213, 277, 37, 101, 165, 229, 293, 53, 117, 181, 245, 309,
|
||||
9, 73, 137, 201, 265, 25, 89, 153, 217, 281, 41, 105, 169, 233, 297,
|
||||
57, 121, 185, 249, 313, 13, 77, 141, 205, 269, 29, 93, 157, 221, 285,
|
||||
45, 109, 173, 237, 301, 61, 125, 189, 253, 317, 2, 66, 130, 194, 258,
|
||||
18, 82, 146, 210, 274, 34, 98, 162, 226, 290, 50, 114, 178, 242, 306,
|
||||
6, 70, 134, 198, 262, 22, 86, 150, 214, 278, 38, 102, 166, 230, 294,
|
||||
54, 118, 182, 246, 310, 10, 74, 138, 202, 266, 26, 90, 154, 218, 282,
|
||||
42, 106, 170, 234, 298, 58, 122, 186, 250, 314, 14, 78, 142, 206, 270,
|
||||
30, 94, 158, 222, 286, 46, 110, 174, 238, 302, 62, 126, 190, 254, 318,
|
||||
3, 67, 131, 195, 259, 19, 83, 147, 211, 275, 35, 99, 163, 227, 291,
|
||||
51, 115, 179, 243, 307, 7, 71, 135, 199, 263, 23, 87, 151, 215, 279,
|
||||
39, 103, 167, 231, 295, 55, 119, 183, 247, 311, 11, 75, 139, 203, 267,
|
||||
27, 91, 155, 219, 283, 43, 107, 171, 235, 299, 59, 123, 187, 251, 315,
|
||||
15, 79, 143, 207, 271, 31, 95, 159, 223, 287, 47, 111, 175, 239, 303,
|
||||
63, 127, 191, 255, 319, };
|
||||
|
||||
static const kiss_twiddle_cpx fft_twiddles[320] = {
|
||||
{1.00000000f, -0.00000000f}, {0.999807239f, -0.0196336918f},
|
||||
{0.999229014f, -0.0392598175f}, {0.998265624f, -0.0588708036f},
|
||||
{0.996917307f, -0.0784590989f}, {0.995184720f, -0.0980171412f},
|
||||
{0.993068457f, -0.117537394f}, {0.990569353f, -0.137012348f},
|
||||
{0.987688363f, -0.156434461f}, {0.984426558f, -0.175796285f},
|
||||
{0.980785251f, -0.195090324f}, {0.976765871f, -0.214309156f},
|
||||
{0.972369909f, -0.233445361f}, {0.967599094f, -0.252491564f},
|
||||
{0.962455213f, -0.271440446f}, {0.956940353f, -0.290284663f},
|
||||
{0.951056540f, -0.309017003f}, {0.944806039f, -0.327630192f},
|
||||
{0.938191354f, -0.346117049f}, {0.931214929f, -0.364470512f},
|
||||
{0.923879504f, -0.382683426f}, {0.916187942f, -0.400748819f},
|
||||
{0.908143163f, -0.418659747f}, {0.899748266f, -0.436409235f},
|
||||
{0.891006529f, -0.453990489f}, {0.881921291f, -0.471396744f},
|
||||
{0.872496009f, -0.488621235f}, {0.862734377f, -0.505657375f},
|
||||
{0.852640152f, -0.522498548f}, {0.842217207f, -0.539138317f},
|
||||
{0.831469595f, -0.555570245f}, {0.820401430f, -0.571787953f},
|
||||
{0.809017003f, -0.587785244f}, {0.797320664f, -0.603555918f},
|
||||
{0.785316944f, -0.619093955f}, {0.773010433f, -0.634393275f},
|
||||
{0.760405958f, -0.649448037f}, {0.747508347f, -0.664252460f},
|
||||
{0.734322488f, -0.678800762f}, {0.720853567f, -0.693087339f},
|
||||
{0.707106769f, -0.707106769f}, {0.693087339f, -0.720853567f},
|
||||
{0.678800762f, -0.734322488f}, {0.664252460f, -0.747508347f},
|
||||
{0.649448037f, -0.760405958f}, {0.634393275f, -0.773010433f},
|
||||
{0.619093955f, -0.785316944f}, {0.603555918f, -0.797320664f},
|
||||
{0.587785244f, -0.809017003f}, {0.571787953f, -0.820401430f},
|
||||
{0.555570245f, -0.831469595f}, {0.539138317f, -0.842217207f},
|
||||
{0.522498548f, -0.852640152f}, {0.505657375f, -0.862734377f},
|
||||
{0.488621235f, -0.872496009f}, {0.471396744f, -0.881921291f},
|
||||
{0.453990489f, -0.891006529f}, {0.436409235f, -0.899748266f},
|
||||
{0.418659747f, -0.908143163f}, {0.400748819f, -0.916187942f},
|
||||
{0.382683426f, -0.923879504f}, {0.364470512f, -0.931214929f},
|
||||
{0.346117049f, -0.938191354f}, {0.327630192f, -0.944806039f},
|
||||
{0.309017003f, -0.951056540f}, {0.290284663f, -0.956940353f},
|
||||
{0.271440446f, -0.962455213f}, {0.252491564f, -0.967599094f},
|
||||
{0.233445361f, -0.972369909f}, {0.214309156f, -0.976765871f},
|
||||
{0.195090324f, -0.980785251f}, {0.175796285f, -0.984426558f},
|
||||
{0.156434461f, -0.987688363f}, {0.137012348f, -0.990569353f},
|
||||
{0.117537394f, -0.993068457f}, {0.0980171412f, -0.995184720f},
|
||||
{0.0784590989f, -0.996917307f}, {0.0588708036f, -0.998265624f},
|
||||
{0.0392598175f, -0.999229014f}, {0.0196336918f, -0.999807239f},
|
||||
{6.12323426e-17f, -1.00000000f}, {-0.0196336918f, -0.999807239f},
|
||||
{-0.0392598175f, -0.999229014f}, {-0.0588708036f, -0.998265624f},
|
||||
{-0.0784590989f, -0.996917307f}, {-0.0980171412f, -0.995184720f},
|
||||
{-0.117537394f, -0.993068457f}, {-0.137012348f, -0.990569353f},
|
||||
{-0.156434461f, -0.987688363f}, {-0.175796285f, -0.984426558f},
|
||||
{-0.195090324f, -0.980785251f}, {-0.214309156f, -0.976765871f},
|
||||
{-0.233445361f, -0.972369909f}, {-0.252491564f, -0.967599094f},
|
||||
{-0.271440446f, -0.962455213f}, {-0.290284663f, -0.956940353f},
|
||||
{-0.309017003f, -0.951056540f}, {-0.327630192f, -0.944806039f},
|
||||
{-0.346117049f, -0.938191354f}, {-0.364470512f, -0.931214929f},
|
||||
{-0.382683426f, -0.923879504f}, {-0.400748819f, -0.916187942f},
|
||||
{-0.418659747f, -0.908143163f}, {-0.436409235f, -0.899748266f},
|
||||
{-0.453990489f, -0.891006529f}, {-0.471396744f, -0.881921291f},
|
||||
{-0.488621235f, -0.872496009f}, {-0.505657375f, -0.862734377f},
|
||||
{-0.522498548f, -0.852640152f}, {-0.539138317f, -0.842217207f},
|
||||
{-0.555570245f, -0.831469595f}, {-0.571787953f, -0.820401430f},
|
||||
{-0.587785244f, -0.809017003f}, {-0.603555918f, -0.797320664f},
|
||||
{-0.619093955f, -0.785316944f}, {-0.634393275f, -0.773010433f},
|
||||
{-0.649448037f, -0.760405958f}, {-0.664252460f, -0.747508347f},
|
||||
{-0.678800762f, -0.734322488f}, {-0.693087339f, -0.720853567f},
|
||||
{-0.707106769f, -0.707106769f}, {-0.720853567f, -0.693087339f},
|
||||
{-0.734322488f, -0.678800762f}, {-0.747508347f, -0.664252460f},
|
||||
{-0.760405958f, -0.649448037f}, {-0.773010433f, -0.634393275f},
|
||||
{-0.785316944f, -0.619093955f}, {-0.797320664f, -0.603555918f},
|
||||
{-0.809017003f, -0.587785244f}, {-0.820401430f, -0.571787953f},
|
||||
{-0.831469595f, -0.555570245f}, {-0.842217207f, -0.539138317f},
|
||||
{-0.852640152f, -0.522498548f}, {-0.862734377f, -0.505657375f},
|
||||
{-0.872496009f, -0.488621235f}, {-0.881921291f, -0.471396744f},
|
||||
{-0.891006529f, -0.453990489f}, {-0.899748266f, -0.436409235f},
|
||||
{-0.908143163f, -0.418659747f}, {-0.916187942f, -0.400748819f},
|
||||
{-0.923879504f, -0.382683426f}, {-0.931214929f, -0.364470512f},
|
||||
{-0.938191354f, -0.346117049f}, {-0.944806039f, -0.327630192f},
|
||||
{-0.951056540f, -0.309017003f}, {-0.956940353f, -0.290284663f},
|
||||
{-0.962455213f, -0.271440446f}, {-0.967599094f, -0.252491564f},
|
||||
{-0.972369909f, -0.233445361f}, {-0.976765871f, -0.214309156f},
|
||||
{-0.980785251f, -0.195090324f}, {-0.984426558f, -0.175796285f},
|
||||
{-0.987688363f, -0.156434461f}, {-0.990569353f, -0.137012348f},
|
||||
{-0.993068457f, -0.117537394f}, {-0.995184720f, -0.0980171412f},
|
||||
{-0.996917307f, -0.0784590989f}, {-0.998265624f, -0.0588708036f},
|
||||
{-0.999229014f, -0.0392598175f}, {-0.999807239f, -0.0196336918f},
|
||||
{-1.00000000f, -1.22464685e-16f}, {-0.999807239f, 0.0196336918f},
|
||||
{-0.999229014f, 0.0392598175f}, {-0.998265624f, 0.0588708036f},
|
||||
{-0.996917307f, 0.0784590989f}, {-0.995184720f, 0.0980171412f},
|
||||
{-0.993068457f, 0.117537394f}, {-0.990569353f, 0.137012348f},
|
||||
{-0.987688363f, 0.156434461f}, {-0.984426558f, 0.175796285f},
|
||||
{-0.980785251f, 0.195090324f}, {-0.976765871f, 0.214309156f},
|
||||
{-0.972369909f, 0.233445361f}, {-0.967599094f, 0.252491564f},
|
||||
{-0.962455213f, 0.271440446f}, {-0.956940353f, 0.290284663f},
|
||||
{-0.951056540f, 0.309017003f}, {-0.944806039f, 0.327630192f},
|
||||
{-0.938191354f, 0.346117049f}, {-0.931214929f, 0.364470512f},
|
||||
{-0.923879504f, 0.382683426f}, {-0.916187942f, 0.400748819f},
|
||||
{-0.908143163f, 0.418659747f}, {-0.899748266f, 0.436409235f},
|
||||
{-0.891006529f, 0.453990489f}, {-0.881921291f, 0.471396744f},
|
||||
{-0.872496009f, 0.488621235f}, {-0.862734377f, 0.505657375f},
|
||||
{-0.852640152f, 0.522498548f}, {-0.842217207f, 0.539138317f},
|
||||
{-0.831469595f, 0.555570245f}, {-0.820401430f, 0.571787953f},
|
||||
{-0.809017003f, 0.587785244f}, {-0.797320664f, 0.603555918f},
|
||||
{-0.785316944f, 0.619093955f}, {-0.773010433f, 0.634393275f},
|
||||
{-0.760405958f, 0.649448037f}, {-0.747508347f, 0.664252460f},
|
||||
{-0.734322488f, 0.678800762f}, {-0.720853567f, 0.693087339f},
|
||||
{-0.707106769f, 0.707106769f}, {-0.693087339f, 0.720853567f},
|
||||
{-0.678800762f, 0.734322488f}, {-0.664252460f, 0.747508347f},
|
||||
{-0.649448037f, 0.760405958f}, {-0.634393275f, 0.773010433f},
|
||||
{-0.619093955f, 0.785316944f}, {-0.603555918f, 0.797320664f},
|
||||
{-0.587785244f, 0.809017003f}, {-0.571787953f, 0.820401430f},
|
||||
{-0.555570245f, 0.831469595f}, {-0.539138317f, 0.842217207f},
|
||||
{-0.522498548f, 0.852640152f}, {-0.505657375f, 0.862734377f},
|
||||
{-0.488621235f, 0.872496009f}, {-0.471396744f, 0.881921291f},
|
||||
{-0.453990489f, 0.891006529f}, {-0.436409235f, 0.899748266f},
|
||||
{-0.418659747f, 0.908143163f}, {-0.400748819f, 0.916187942f},
|
||||
{-0.382683426f, 0.923879504f}, {-0.364470512f, 0.931214929f},
|
||||
{-0.346117049f, 0.938191354f}, {-0.327630192f, 0.944806039f},
|
||||
{-0.309017003f, 0.951056540f}, {-0.290284663f, 0.956940353f},
|
||||
{-0.271440446f, 0.962455213f}, {-0.252491564f, 0.967599094f},
|
||||
{-0.233445361f, 0.972369909f}, {-0.214309156f, 0.976765871f},
|
||||
{-0.195090324f, 0.980785251f}, {-0.175796285f, 0.984426558f},
|
||||
{-0.156434461f, 0.987688363f}, {-0.137012348f, 0.990569353f},
|
||||
{-0.117537394f, 0.993068457f}, {-0.0980171412f, 0.995184720f},
|
||||
{-0.0784590989f, 0.996917307f}, {-0.0588708036f, 0.998265624f},
|
||||
{-0.0392598175f, 0.999229014f}, {-0.0196336918f, 0.999807239f},
|
||||
{-1.83697015e-16f, 1.00000000f}, {0.0196336918f, 0.999807239f},
|
||||
{0.0392598175f, 0.999229014f}, {0.0588708036f, 0.998265624f},
|
||||
{0.0784590989f, 0.996917307f}, {0.0980171412f, 0.995184720f},
|
||||
{0.117537394f, 0.993068457f}, {0.137012348f, 0.990569353f},
|
||||
{0.156434461f, 0.987688363f}, {0.175796285f, 0.984426558f},
|
||||
{0.195090324f, 0.980785251f}, {0.214309156f, 0.976765871f},
|
||||
{0.233445361f, 0.972369909f}, {0.252491564f, 0.967599094f},
|
||||
{0.271440446f, 0.962455213f}, {0.290284663f, 0.956940353f},
|
||||
{0.309017003f, 0.951056540f}, {0.327630192f, 0.944806039f},
|
||||
{0.346117049f, 0.938191354f}, {0.364470512f, 0.931214929f},
|
||||
{0.382683426f, 0.923879504f}, {0.400748819f, 0.916187942f},
|
||||
{0.418659747f, 0.908143163f}, {0.436409235f, 0.899748266f},
|
||||
{0.453990489f, 0.891006529f}, {0.471396744f, 0.881921291f},
|
||||
{0.488621235f, 0.872496009f}, {0.505657375f, 0.862734377f},
|
||||
{0.522498548f, 0.852640152f}, {0.539138317f, 0.842217207f},
|
||||
{0.555570245f, 0.831469595f}, {0.571787953f, 0.820401430f},
|
||||
{0.587785244f, 0.809017003f}, {0.603555918f, 0.797320664f},
|
||||
{0.619093955f, 0.785316944f}, {0.634393275f, 0.773010433f},
|
||||
{0.649448037f, 0.760405958f}, {0.664252460f, 0.747508347f},
|
||||
{0.678800762f, 0.734322488f}, {0.693087339f, 0.720853567f},
|
||||
{0.707106769f, 0.707106769f}, {0.720853567f, 0.693087339f},
|
||||
{0.734322488f, 0.678800762f}, {0.747508347f, 0.664252460f},
|
||||
{0.760405958f, 0.649448037f}, {0.773010433f, 0.634393275f},
|
||||
{0.785316944f, 0.619093955f}, {0.797320664f, 0.603555918f},
|
||||
{0.809017003f, 0.587785244f}, {0.820401430f, 0.571787953f},
|
||||
{0.831469595f, 0.555570245f}, {0.842217207f, 0.539138317f},
|
||||
{0.852640152f, 0.522498548f}, {0.862734377f, 0.505657375f},
|
||||
{0.872496009f, 0.488621235f}, {0.881921291f, 0.471396744f},
|
||||
{0.891006529f, 0.453990489f}, {0.899748266f, 0.436409235f},
|
||||
{0.908143163f, 0.418659747f}, {0.916187942f, 0.400748819f},
|
||||
{0.923879504f, 0.382683426f}, {0.931214929f, 0.364470512f},
|
||||
{0.938191354f, 0.346117049f}, {0.944806039f, 0.327630192f},
|
||||
{0.951056540f, 0.309017003f}, {0.956940353f, 0.290284663f},
|
||||
{0.962455213f, 0.271440446f}, {0.967599094f, 0.252491564f},
|
||||
{0.972369909f, 0.233445361f}, {0.976765871f, 0.214309156f},
|
||||
{0.980785251f, 0.195090324f}, {0.984426558f, 0.175796285f},
|
||||
{0.987688363f, 0.156434461f}, {0.990569353f, 0.137012348f},
|
||||
{0.993068457f, 0.117537394f}, {0.995184720f, 0.0980171412f},
|
||||
{0.996917307f, 0.0784590989f}, {0.998265624f, 0.0588708036f},
|
||||
{0.999229014f, 0.0392598175f}, {0.999807239f, 0.0196336918f},
|
||||
};
|
||||
|
||||
const kiss_fft_state kfft = {
|
||||
320, /* nfft */
|
||||
0.0031250000f, /* scale */
|
||||
-1, /* shift */
|
||||
{5, 64, 4, 16, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, }, /* factors */
|
||||
fft_bitrev, /* bitrev*/
|
||||
fft_twiddles, /* twiddles*/
|
||||
(arch_fft_state *)&arch_fft, /* arch_fft*/
|
||||
};
|
||||
|
||||
const float half_window[] = {
|
||||
3.78491532e-05f, 0.000340620492f, 0.000946046319f, 0.00185389258f, 0.00306380726f,
|
||||
0.00457531959f, 0.00638783723f, 0.00850064680f, 0.0109129101f, 0.0136236614f,
|
||||
0.0166318044f, 0.0199361145f, 0.0235352255f, 0.0274276342f, 0.0316116922f,
|
||||
0.0360856056f, 0.0408474281f, 0.0458950549f, 0.0512262285f, 0.0568385124f,
|
||||
0.0627293140f, 0.0688958541f, 0.0753351897f, 0.0820441842f, 0.0890194997f,
|
||||
0.0962576419f, 0.103754878f, 0.111507311f, 0.119510807f, 0.127761051f,
|
||||
0.136253506f, 0.144983411f, 0.153945804f, 0.163135484f, 0.172547072f,
|
||||
0.182174906f, 0.192013159f, 0.202055752f, 0.212296382f, 0.222728521f,
|
||||
0.233345464f, 0.244140238f, 0.255105674f, 0.266234398f, 0.277518868f,
|
||||
0.288951218f, 0.300523549f, 0.312227666f, 0.324055225f, 0.335997701f,
|
||||
0.348046392f, 0.360192508f, 0.372427016f, 0.384740859f, 0.397124738f,
|
||||
0.409569323f, 0.422065198f, 0.434602767f, 0.447172493f, 0.459764689f,
|
||||
0.472369671f, 0.484977663f, 0.497579008f, 0.510163903f, 0.522722721f,
|
||||
0.535245717f, 0.547723293f, 0.560145974f, 0.572504222f, 0.584788740f,
|
||||
0.596990347f, 0.609099925f, 0.621108532f, 0.633007407f, 0.644788086f,
|
||||
0.656442165f, 0.667961538f, 0.679338276f, 0.690564752f, 0.701633692f,
|
||||
0.712537885f, 0.723270535f, 0.733825266f, 0.744195819f, 0.754376352f,
|
||||
0.764361382f, 0.774145722f, 0.783724606f, 0.793093503f, 0.802248418f,
|
||||
0.811185598f, 0.819901764f, 0.828393936f, 0.836659551f, 0.844696403f,
|
||||
0.852502763f, 0.860077202f, 0.867418647f, 0.874526560f, 0.881400526f,
|
||||
0.888040781f, 0.894447744f, 0.900622249f, 0.906565487f, 0.912279010f,
|
||||
0.917764664f, 0.923024654f, 0.928061485f, 0.932878017f, 0.937477291f,
|
||||
0.941862822f, 0.946038187f, 0.950007319f, 0.953774393f, 0.957343817f,
|
||||
0.960720181f, 0.963908315f, 0.966913164f, 0.969739914f, 0.972393870f,
|
||||
0.974880517f, 0.977205336f, 0.979374051f, 0.981392324f, 0.983266115f,
|
||||
0.985001266f, 0.986603677f, 0.988079309f, 0.989434063f, 0.990674019f,
|
||||
0.991804957f, 0.992832899f, 0.993763626f, 0.994602919f, 0.995356441f,
|
||||
0.996029854f, 0.996628702f, 0.997158289f, 0.997623861f, 0.998030603f,
|
||||
0.998383403f, 0.998687088f, 0.998946249f, 0.999165416f, 0.999348700f,
|
||||
0.999500215f, 0.999623775f, 0.999723017f, 0.999801278f, 0.999861658f,
|
||||
0.999907196f, 0.999940455f, 0.999963880f, 0.999979615f, 0.999989510f,
|
||||
0.999995291f, 0.999998271f, 0.999999523f, 0.999999940f, 1.00000000f,
|
||||
};
|
||||
|
||||
const float dct_table[] = {
|
||||
0.707106769f, 0.996194720f, 0.984807730f, 0.965925813f, 0.939692616f,
|
||||
0.906307817f, 0.866025388f, 0.819152057f, 0.766044438f, 0.707106769f,
|
||||
0.642787635f, 0.573576450f, 0.500000000f, 0.422618270f, 0.342020154f,
|
||||
0.258819044f, 0.173648179f, 0.0871557444f, 0.707106769f, 0.965925813f,
|
||||
0.866025388f, 0.707106769f, 0.500000000f, 0.258819044f, 6.12323426e-17f,
|
||||
-0.258819044f, -0.500000000f, -0.707106769f, -0.866025388f, -0.965925813f,
|
||||
-1.00000000f, -0.965925813f, -0.866025388f, -0.707106769f, -0.500000000f,
|
||||
-0.258819044f, 0.707106769f, 0.906307817f, 0.642787635f, 0.258819044f,
|
||||
-0.173648179f, -0.573576450f, -0.866025388f, -0.996194720f, -0.939692616f,
|
||||
-0.707106769f, -0.342020154f, 0.0871557444f, 0.500000000f, 0.819152057f,
|
||||
0.984807730f, 0.965925813f, 0.766044438f, 0.422618270f, 0.707106769f,
|
||||
0.819152057f, 0.342020154f, -0.258819044f, -0.766044438f, -0.996194720f,
|
||||
-0.866025388f, -0.422618270f, 0.173648179f, 0.707106769f, 0.984807730f,
|
||||
0.906307817f, 0.500000000f, -0.0871557444f, -0.642787635f, -0.965925813f,
|
||||
-0.939692616f, -0.573576450f, 0.707106769f, 0.707106769f, 6.12323426e-17f,
|
||||
-0.707106769f, -1.00000000f, -0.707106769f, -1.83697015e-16f, 0.707106769f,
|
||||
1.00000000f, 0.707106769f, 3.06161700e-16f, -0.707106769f, -1.00000000f,
|
||||
-0.707106769f, -4.28626385e-16f, 0.707106769f, 1.00000000f, 0.707106769f,
|
||||
0.707106769f, 0.573576450f, -0.342020154f, -0.965925813f, -0.766044438f,
|
||||
0.0871557444f, 0.866025388f, 0.906307817f, 0.173648179f, -0.707106769f,
|
||||
-0.984807730f, -0.422618270f, 0.500000000f, 0.996194720f, 0.642787635f,
|
||||
-0.258819044f, -0.939692616f, -0.819152057f, 0.707106769f, 0.422618270f,
|
||||
-0.642787635f, -0.965925813f, -0.173648179f, 0.819152057f, 0.866025388f,
|
||||
-0.0871557444f, -0.939692616f, -0.707106769f, 0.342020154f, 0.996194720f,
|
||||
0.500000000f, -0.573576450f, -0.984807730f, -0.258819044f, 0.766044438f,
|
||||
0.906307817f, 0.707106769f, 0.258819044f, -0.866025388f, -0.707106769f,
|
||||
0.500000000f, 0.965925813f, 3.06161700e-16f, -0.965925813f, -0.500000000f,
|
||||
0.707106769f, 0.866025388f, -0.258819044f, -1.00000000f, -0.258819044f,
|
||||
0.866025388f, 0.707106769f, -0.500000000f, -0.965925813f, 0.707106769f,
|
||||
0.0871557444f, -0.984807730f, -0.258819044f, 0.939692616f, 0.422618270f,
|
||||
-0.866025388f, -0.573576450f, 0.766044438f, 0.707106769f, -0.642787635f,
|
||||
-0.819152057f, 0.500000000f, 0.906307817f, -0.342020154f, -0.965925813f,
|
||||
0.173648179f, 0.996194720f, 0.707106769f, -0.0871557444f, -0.984807730f,
|
||||
0.258819044f, 0.939692616f, -0.422618270f, -0.866025388f, 0.573576450f,
|
||||
0.766044438f, -0.707106769f, -0.642787635f, 0.819152057f, 0.500000000f,
|
||||
-0.906307817f, -0.342020154f, 0.965925813f, 0.173648179f, -0.996194720f,
|
||||
0.707106769f, -0.258819044f, -0.866025388f, 0.707106769f, 0.500000000f,
|
||||
-0.965925813f, -4.28626385e-16f, 0.965925813f, -0.500000000f, -0.707106769f,
|
||||
0.866025388f, 0.258819044f, -1.00000000f, 0.258819044f, 0.866025388f,
|
||||
-0.707106769f, -0.500000000f, 0.965925813f, 0.707106769f, -0.422618270f,
|
||||
-0.642787635f, 0.965925813f, -0.173648179f, -0.819152057f, 0.866025388f,
|
||||
0.0871557444f, -0.939692616f, 0.707106769f, 0.342020154f, -0.996194720f,
|
||||
0.500000000f, 0.573576450f, -0.984807730f, 0.258819044f, 0.766044438f,
|
||||
-0.906307817f, 0.707106769f, -0.573576450f, -0.342020154f, 0.965925813f,
|
||||
-0.766044438f, -0.0871557444f, 0.866025388f, -0.906307817f, 0.173648179f,
|
||||
0.707106769f, -0.984807730f, 0.422618270f, 0.500000000f, -0.996194720f,
|
||||
0.642787635f, 0.258819044f, -0.939692616f, 0.819152057f, 0.707106769f,
|
||||
-0.707106769f, -1.83697015e-16f, 0.707106769f, -1.00000000f, 0.707106769f,
|
||||
5.51091070e-16f, -0.707106769f, 1.00000000f, -0.707106769f, -2.69484189e-15f,
|
||||
0.707106769f, -1.00000000f, 0.707106769f, -4.90477710e-16f, -0.707106769f,
|
||||
1.00000000f, -0.707106769f, 0.707106769f, -0.819152057f, 0.342020154f,
|
||||
0.258819044f, -0.766044438f, 0.996194720f, -0.866025388f, 0.422618270f,
|
||||
0.173648179f, -0.707106769f, 0.984807730f, -0.906307817f, 0.500000000f,
|
||||
0.0871557444f, -0.642787635f, 0.965925813f, -0.939692616f, 0.573576450f,
|
||||
0.707106769f, -0.906307817f, 0.642787635f, -0.258819044f, -0.173648179f,
|
||||
0.573576450f, -0.866025388f, 0.996194720f, -0.939692616f, 0.707106769f,
|
||||
-0.342020154f, -0.0871557444f, 0.500000000f, -0.819152057f, 0.984807730f,
|
||||
-0.965925813f, 0.766044438f, -0.422618270f, 0.707106769f, -0.965925813f,
|
||||
0.866025388f, -0.707106769f, 0.500000000f, -0.258819044f, 1.10280111e-15f,
|
||||
0.258819044f, -0.500000000f, 0.707106769f, -0.866025388f, 0.965925813f,
|
||||
-1.00000000f, 0.965925813f, -0.866025388f, 0.707106769f, -0.500000000f,
|
||||
0.258819044f, 0.707106769f, -0.996194720f, 0.984807730f, -0.965925813f,
|
||||
0.939692616f, -0.906307817f, 0.866025388f, -0.819152057f, 0.766044438f,
|
||||
-0.707106769f, 0.642787635f, -0.573576450f, 0.500000000f, -0.422618270f,
|
||||
0.342020154f, -0.258819044f, 0.173648179f, -0.0871557444f, };
|
||||
64
managed_components/78__esp-opus/dnn/meson.build
Normal file
64
managed_components/78__esp-opus/dnn/meson.build
Normal file
@@ -0,0 +1,64 @@
|
||||
dnn_sources = sources['DEEP_PLC_SOURCES']
|
||||
|
||||
dred_sources = sources['DRED_SOURCES']
|
||||
if opt_dred.enabled()
|
||||
dnn_sources += dred_sources
|
||||
endif
|
||||
|
||||
osce_sources = sources['OSCE_SOURCES']
|
||||
if opt_osce.enabled()
|
||||
dnn_sources += osce_sources
|
||||
endif
|
||||
|
||||
dnn_sources_sse2 = sources['DNN_SOURCES_SSE2']
|
||||
dnn_sources_sse4_1 = sources['DNN_SOURCES_SSE4_1']
|
||||
dnn_sources_avx2 = sources['DNN_SOURCES_AVX2']
|
||||
|
||||
dnn_sources_neon_intr = sources['DNN_SOURCES_NEON']
|
||||
dnn_sources_dotprod_intr = sources['DNN_SOURCES_DOTPROD']
|
||||
|
||||
dnn_includes = [opus_includes]
|
||||
dnn_static_libs = []
|
||||
|
||||
if host_cpu_family in ['x86', 'x86_64'] and opus_conf.has('OPUS_HAVE_RTCD')
|
||||
dnn_sources += sources['DNN_SOURCES_X86_RTCD']
|
||||
endif
|
||||
|
||||
if host_cpu_family in ['arm', 'aarch64'] and have_arm_intrinsics_or_asm
|
||||
if opus_conf.has('OPUS_HAVE_RTCD')
|
||||
dnn_sources += sources['DNN_SOURCES_ARM_RTCD']
|
||||
endif
|
||||
endif
|
||||
|
||||
foreach intr_name : ['sse2', 'sse4_1', 'avx2', 'neon_intr', 'dotprod_intr']
|
||||
have_intr = get_variable('have_' + intr_name)
|
||||
if not have_intr
|
||||
continue
|
||||
endif
|
||||
|
||||
intr_sources = get_variable('dnn_sources_' + intr_name)
|
||||
|
||||
intr_args = get_variable('opus_@0@_args'.format(intr_name), [])
|
||||
dnn_static_libs += static_library('dnn_' + intr_name, intr_sources,
|
||||
c_args: intr_args,
|
||||
include_directories: dnn_includes,
|
||||
install: false)
|
||||
endforeach
|
||||
|
||||
dnn_c_args = []
|
||||
if host_machine.system() == 'windows'
|
||||
dnn_c_args += ['-DDLL_EXPORT']
|
||||
endif
|
||||
|
||||
|
||||
if opt_deep_plc.enabled()
|
||||
dnn_lib = static_library('opus-dnn',
|
||||
dnn_sources,
|
||||
c_args: dnn_c_args,
|
||||
include_directories: dnn_includes,
|
||||
link_whole: [dnn_static_libs],
|
||||
dependencies: libm,
|
||||
install: false)
|
||||
else
|
||||
dnn_lib = []
|
||||
endif
|
||||
416
managed_components/78__esp-opus/dnn/nndsp.c
Normal file
416
managed_components/78__esp-opus/dnn/nndsp.c
Normal file
@@ -0,0 +1,416 @@
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
|
||||
#include "nndsp.h"
|
||||
#include "arch.h"
|
||||
#include "nnet.h"
|
||||
#include "os_support.h"
|
||||
#include "pitch.h"
|
||||
|
||||
#include <math.h>
|
||||
|
||||
#ifndef M_PI
|
||||
#define M_PI 3.141592653589793f
|
||||
#endif
|
||||
|
||||
#define KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel) ((((i_out_channels) * in_channels) + (i_in_channels)) * kernel_size + (i_kernel))
|
||||
|
||||
void init_adaconv_state(AdaConvState *hAdaConv)
|
||||
{
|
||||
OPUS_CLEAR(hAdaConv, 1);
|
||||
}
|
||||
|
||||
void init_adacomb_state(AdaCombState *hAdaComb)
|
||||
{
|
||||
OPUS_CLEAR(hAdaComb, 1);
|
||||
}
|
||||
|
||||
void init_adashape_state(AdaShapeState *hAdaShape)
|
||||
{
|
||||
OPUS_CLEAR(hAdaShape, 1);
|
||||
}
|
||||
|
||||
void compute_overlap_window(float *window, int overlap_size)
|
||||
{
|
||||
int i_sample;
|
||||
for (i_sample=0; i_sample < overlap_size; i_sample++)
|
||||
{
|
||||
window[i_sample] = 0.5f + 0.5f * cos(M_PI * (i_sample + 0.5f) / overlap_size);
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef DEBUG_NNDSP
|
||||
void print_float_vector(const char* name, const float *vec, int length)
|
||||
{
|
||||
for (int i = 0; i < length; i ++)
|
||||
{
|
||||
printf("%s[%d]: %f\n", name, i, vec[i]);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
static void scale_kernel(
|
||||
float *kernel,
|
||||
int in_channels,
|
||||
int out_channels,
|
||||
int kernel_size,
|
||||
float *gain
|
||||
)
|
||||
/* normalizes (p-norm) kernel over input channel and kernel dimension */
|
||||
{
|
||||
float norm;
|
||||
int i_in_channels, i_out_channels, i_kernel;
|
||||
|
||||
for (i_out_channels = 0; i_out_channels < out_channels; i_out_channels++)
|
||||
{
|
||||
norm = 0;
|
||||
for (i_in_channels = 0; i_in_channels < in_channels; i_in_channels ++)
|
||||
{
|
||||
for (i_kernel = 0; i_kernel < kernel_size; i_kernel++)
|
||||
{
|
||||
norm += kernel[KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel)] * kernel[KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel)];
|
||||
}
|
||||
}
|
||||
#ifdef DEBUG_NNDSP
|
||||
printf("kernel norm: %f, %f\n", norm, sqrt(norm));
|
||||
#endif
|
||||
norm = 1.f / (1e-6f + sqrt(norm));
|
||||
for (i_in_channels = 0; i_in_channels < in_channels; i_in_channels++)
|
||||
{
|
||||
for (i_kernel = 0; i_kernel < kernel_size; i_kernel++)
|
||||
{
|
||||
|
||||
kernel[KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel)] *= norm * gain[i_out_channels];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void transform_gains(
|
||||
float *gains,
|
||||
int num_gains,
|
||||
float filter_gain_a,
|
||||
float filter_gain_b
|
||||
)
|
||||
{
|
||||
int i;
|
||||
for (i = 0; i < num_gains; i++)
|
||||
{
|
||||
gains[i] = exp(filter_gain_a * gains[i] + filter_gain_b);
|
||||
}
|
||||
}
|
||||
|
||||
void adaconv_process_frame(
|
||||
AdaConvState* hAdaConv,
|
||||
float *x_out,
|
||||
const float *x_in,
|
||||
const float *features,
|
||||
const LinearLayer *kernel_layer,
|
||||
const LinearLayer *gain_layer,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int overlap_size,
|
||||
int in_channels,
|
||||
int out_channels,
|
||||
int kernel_size,
|
||||
int left_padding,
|
||||
float filter_gain_a,
|
||||
float filter_gain_b,
|
||||
float shape_gain,
|
||||
float *window,
|
||||
int arch
|
||||
)
|
||||
{
|
||||
float output_buffer[ADACONV_MAX_FRAME_SIZE * ADACONV_MAX_OUTPUT_CHANNELS];
|
||||
float kernel_buffer[ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS * ADACONV_MAX_OUTPUT_CHANNELS];
|
||||
float input_buffer[ADACONV_MAX_INPUT_CHANNELS * (ADACONV_MAX_FRAME_SIZE + ADACONV_MAX_KERNEL_SIZE)];
|
||||
float kernel0[ADACONV_MAX_KERNEL_SIZE];
|
||||
float kernel1[ADACONV_MAX_KERNEL_SIZE];
|
||||
float channel_buffer0[ADACONV_MAX_OVERLAP_SIZE];
|
||||
float channel_buffer1[ADACONV_MAX_FRAME_SIZE];
|
||||
float gain_buffer[ADACONV_MAX_OUTPUT_CHANNELS];
|
||||
float *p_input;
|
||||
int i_in_channels, i_out_channels, i_sample;
|
||||
|
||||
(void) feature_dim; /* ToDo: figure out whether we might need this information */
|
||||
|
||||
celt_assert(shape_gain == 1);
|
||||
celt_assert(left_padding == kernel_size - 1); /* currently only supports causal version. Non-causal version not difficult to implement but will require third loop */
|
||||
celt_assert(kernel_size < frame_size);
|
||||
|
||||
OPUS_CLEAR(output_buffer, ADACONV_MAX_FRAME_SIZE * ADACONV_MAX_OUTPUT_CHANNELS);
|
||||
OPUS_CLEAR(kernel_buffer, ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS * ADACONV_MAX_OUTPUT_CHANNELS);
|
||||
OPUS_CLEAR(input_buffer, ADACONV_MAX_INPUT_CHANNELS * (ADACONV_MAX_FRAME_SIZE + ADACONV_MAX_KERNEL_SIZE));
|
||||
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("x_in", x_in, in_channels * frame_size);
|
||||
#endif
|
||||
|
||||
/* prepare input */
|
||||
for (i_in_channels=0; i_in_channels < in_channels; i_in_channels ++)
|
||||
{
|
||||
OPUS_COPY(input_buffer + i_in_channels * (kernel_size + frame_size), hAdaConv->history + i_in_channels * kernel_size, kernel_size);
|
||||
OPUS_COPY(input_buffer + kernel_size + i_in_channels * (kernel_size + frame_size), x_in + frame_size * i_in_channels, frame_size);
|
||||
}
|
||||
p_input = input_buffer + kernel_size;
|
||||
|
||||
|
||||
/* calculate new kernel and new gain */
|
||||
compute_generic_dense(kernel_layer, kernel_buffer, features, ACTIVATION_LINEAR, arch);
|
||||
compute_generic_dense(gain_layer, gain_buffer, features, ACTIVATION_TANH, arch);
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("features", features, feature_dim);
|
||||
print_float_vector("adaconv_kernel_raw", kernel_buffer, in_channels * out_channels * kernel_size);
|
||||
print_float_vector("adaconv_gain_raw", gain_buffer, out_channels);
|
||||
#endif
|
||||
transform_gains(gain_buffer, out_channels, filter_gain_a, filter_gain_b);
|
||||
scale_kernel(kernel_buffer, in_channels, out_channels, kernel_size, gain_buffer);
|
||||
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("adaconv_kernel", kernel_buffer, in_channels * out_channels * kernel_size);
|
||||
print_float_vector("adaconv_gain", gain_buffer, out_channels);
|
||||
#endif
|
||||
|
||||
/* calculate overlapping part using kernel from last frame */
|
||||
|
||||
for (i_out_channels = 0; i_out_channels < out_channels; i_out_channels++)
|
||||
{
|
||||
for (i_in_channels = 0; i_in_channels < in_channels; i_in_channels++)
|
||||
{
|
||||
OPUS_CLEAR(kernel0, ADACONV_MAX_KERNEL_SIZE);
|
||||
OPUS_CLEAR(kernel1, ADACONV_MAX_KERNEL_SIZE);
|
||||
|
||||
OPUS_COPY(kernel0, hAdaConv->last_kernel + KERNEL_INDEX(i_out_channels, i_in_channels, 0), kernel_size);
|
||||
OPUS_COPY(kernel1, kernel_buffer + KERNEL_INDEX(i_out_channels, i_in_channels, 0), kernel_size);
|
||||
celt_pitch_xcorr(kernel0, p_input + i_in_channels * (frame_size + kernel_size) - left_padding, channel_buffer0, ADACONV_MAX_KERNEL_SIZE, overlap_size, arch);
|
||||
celt_pitch_xcorr(kernel1, p_input + i_in_channels * (frame_size + kernel_size) - left_padding, channel_buffer1, ADACONV_MAX_KERNEL_SIZE, frame_size, arch);
|
||||
for (i_sample = 0; i_sample < overlap_size; i_sample++)
|
||||
{
|
||||
output_buffer[i_sample + i_out_channels * frame_size] += window[i_sample] * channel_buffer0[i_sample];
|
||||
output_buffer[i_sample + i_out_channels * frame_size] += (1.f - window[i_sample]) * channel_buffer1[i_sample];
|
||||
}
|
||||
for (i_sample = overlap_size; i_sample < frame_size; i_sample++)
|
||||
{
|
||||
output_buffer[i_sample + i_out_channels * frame_size] += channel_buffer1[i_sample];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
OPUS_COPY(x_out, output_buffer, out_channels * frame_size);
|
||||
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("x_out", x_out, out_channels * frame_size);
|
||||
#endif
|
||||
|
||||
/* buffer update */
|
||||
for (i_in_channels=0; i_in_channels < in_channels; i_in_channels ++)
|
||||
{
|
||||
OPUS_COPY(hAdaConv->history + i_in_channels * kernel_size, p_input + i_in_channels * (frame_size + kernel_size) + frame_size - kernel_size, kernel_size);
|
||||
}
|
||||
OPUS_COPY(hAdaConv->last_kernel, kernel_buffer, kernel_size * in_channels * out_channels);
|
||||
}
|
||||
|
||||
void adacomb_process_frame(
|
||||
AdaCombState* hAdaComb,
|
||||
float *x_out,
|
||||
const float *x_in,
|
||||
const float *features,
|
||||
const LinearLayer *kernel_layer,
|
||||
const LinearLayer *gain_layer,
|
||||
const LinearLayer *global_gain_layer,
|
||||
int pitch_lag,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int overlap_size,
|
||||
int kernel_size,
|
||||
int left_padding,
|
||||
float filter_gain_a,
|
||||
float filter_gain_b,
|
||||
float log_gain_limit,
|
||||
float *window,
|
||||
int arch
|
||||
)
|
||||
{
|
||||
float output_buffer[ADACOMB_MAX_FRAME_SIZE];
|
||||
float output_buffer_last[ADACOMB_MAX_FRAME_SIZE];
|
||||
float kernel_buffer[ADACOMB_MAX_KERNEL_SIZE];
|
||||
float input_buffer[ADACOMB_MAX_FRAME_SIZE + ADACOMB_MAX_LAG + ADACOMB_MAX_KERNEL_SIZE];
|
||||
float gain, global_gain;
|
||||
float *p_input;
|
||||
int i_sample;
|
||||
float kernel[16];
|
||||
float last_kernel[16];
|
||||
|
||||
(void) feature_dim; /* ToDo: figure out whether we might need this information */
|
||||
|
||||
OPUS_CLEAR(output_buffer, ADACOMB_MAX_FRAME_SIZE);
|
||||
OPUS_CLEAR(kernel_buffer, ADACOMB_MAX_KERNEL_SIZE);
|
||||
OPUS_CLEAR(input_buffer, ADACOMB_MAX_FRAME_SIZE + ADACOMB_MAX_LAG + ADACOMB_MAX_KERNEL_SIZE);
|
||||
|
||||
OPUS_COPY(input_buffer, hAdaComb->history, kernel_size + ADACOMB_MAX_LAG);
|
||||
OPUS_COPY(input_buffer + kernel_size + ADACOMB_MAX_LAG, x_in, frame_size);
|
||||
p_input = input_buffer + kernel_size + ADACOMB_MAX_LAG;
|
||||
|
||||
/* calculate new kernel and new gain */
|
||||
compute_generic_dense(kernel_layer, kernel_buffer, features, ACTIVATION_LINEAR, arch);
|
||||
compute_generic_dense(gain_layer, &gain, features, ACTIVATION_RELU, arch);
|
||||
compute_generic_dense(global_gain_layer, &global_gain, features, ACTIVATION_TANH, arch);
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("features", features, feature_dim);
|
||||
print_float_vector("adacomb_kernel_raw", kernel_buffer, kernel_size);
|
||||
print_float_vector("adacomb_gain_raw", &gain, 1);
|
||||
print_float_vector("adacomb_global_gain_raw", &global_gain, 1);
|
||||
#endif
|
||||
gain = exp(log_gain_limit - gain);
|
||||
global_gain = exp(filter_gain_a * global_gain + filter_gain_b);
|
||||
scale_kernel(kernel_buffer, 1, 1, kernel_size, &gain);
|
||||
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("adacomb_kernel", kernel_buffer, kernel_size);
|
||||
print_float_vector("adacomb_gain", &gain, 1);
|
||||
#endif
|
||||
|
||||
OPUS_CLEAR(kernel, ADACOMB_MAX_KERNEL_SIZE);
|
||||
OPUS_CLEAR(last_kernel, ADACOMB_MAX_KERNEL_SIZE);
|
||||
OPUS_COPY(kernel, kernel_buffer, kernel_size);
|
||||
OPUS_COPY(last_kernel, hAdaComb->last_kernel, kernel_size);
|
||||
|
||||
celt_pitch_xcorr(last_kernel, &p_input[- left_padding - hAdaComb->last_pitch_lag], output_buffer_last, ADACOMB_MAX_KERNEL_SIZE, overlap_size, arch);
|
||||
|
||||
celt_pitch_xcorr(kernel, &p_input[- left_padding - pitch_lag], output_buffer, ADACOMB_MAX_KERNEL_SIZE, frame_size, arch);
|
||||
for (i_sample = 0; i_sample < overlap_size; i_sample++)
|
||||
{
|
||||
output_buffer[i_sample] = hAdaComb->last_global_gain * window[i_sample] * output_buffer_last[i_sample] + global_gain * (1.f - window[i_sample]) * output_buffer[i_sample];
|
||||
}
|
||||
|
||||
for (i_sample = 0; i_sample < overlap_size; i_sample++)
|
||||
{
|
||||
output_buffer[i_sample] += (window[i_sample] * hAdaComb->last_global_gain + (1.f - window[i_sample]) * global_gain) * p_input[i_sample];
|
||||
}
|
||||
|
||||
for (i_sample = overlap_size; i_sample < frame_size; i_sample++)
|
||||
{
|
||||
output_buffer[i_sample] = global_gain * (output_buffer[i_sample] + p_input[i_sample]);
|
||||
}
|
||||
OPUS_COPY(x_out, output_buffer, frame_size);
|
||||
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("x_out", x_out, frame_size);
|
||||
#endif
|
||||
|
||||
/* buffer update */
|
||||
OPUS_COPY(hAdaComb->last_kernel, kernel_buffer, kernel_size);
|
||||
OPUS_COPY(hAdaComb->history, p_input + frame_size - kernel_size - ADACOMB_MAX_LAG, kernel_size + ADACOMB_MAX_LAG);
|
||||
hAdaComb->last_pitch_lag = pitch_lag;
|
||||
hAdaComb->last_global_gain = global_gain;
|
||||
}
|
||||
|
||||
|
||||
void adashape_process_frame(
|
||||
AdaShapeState *hAdaShape,
|
||||
float *x_out,
|
||||
const float *x_in,
|
||||
const float *features,
|
||||
const LinearLayer *alpha1f,
|
||||
const LinearLayer *alpha1t,
|
||||
const LinearLayer *alpha2,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int avg_pool_k,
|
||||
int arch
|
||||
)
|
||||
{
|
||||
float in_buffer[ADASHAPE_MAX_INPUT_DIM + ADASHAPE_MAX_FRAME_SIZE];
|
||||
float out_buffer[ADASHAPE_MAX_FRAME_SIZE];
|
||||
float tmp_buffer[ADASHAPE_MAX_FRAME_SIZE];
|
||||
int i, k;
|
||||
int tenv_size;
|
||||
float mean;
|
||||
float *tenv;
|
||||
|
||||
celt_assert(frame_size % avg_pool_k == 0);
|
||||
celt_assert(feature_dim + frame_size / avg_pool_k + 1 < ADASHAPE_MAX_INPUT_DIM);
|
||||
|
||||
tenv_size = frame_size / avg_pool_k;
|
||||
tenv = in_buffer + feature_dim;
|
||||
OPUS_CLEAR(tenv, tenv_size + 1);
|
||||
|
||||
OPUS_COPY(in_buffer, features, feature_dim);
|
||||
|
||||
/* calculate temporal envelope */
|
||||
mean = 0;
|
||||
for (i = 0; i < tenv_size; i++)
|
||||
{
|
||||
for (k = 0; k < avg_pool_k; k++)
|
||||
{
|
||||
tenv[i] += fabs(x_in[i * avg_pool_k + k]);
|
||||
}
|
||||
tenv[i] = log(tenv[i] / avg_pool_k + 1.52587890625e-05f);
|
||||
mean += tenv[i];
|
||||
}
|
||||
mean /= tenv_size;
|
||||
for (i = 0; i < tenv_size; i++)
|
||||
{
|
||||
tenv[i] -= mean;
|
||||
}
|
||||
tenv[tenv_size] = mean;
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("tenv", tenv, tenv_size + 1);
|
||||
#endif
|
||||
|
||||
/* calculate temporal weights */
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("alpha1_in", in_buffer, feature_dim + tenv_size + 1);
|
||||
#endif
|
||||
compute_generic_conv1d(alpha1f, out_buffer, hAdaShape->conv_alpha1f_state, in_buffer, feature_dim, ACTIVATION_LINEAR, arch);
|
||||
compute_generic_conv1d(alpha1t, tmp_buffer, hAdaShape->conv_alpha1t_state, tenv, tenv_size + 1, ACTIVATION_LINEAR, arch);
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("alpha1_out", out_buffer, frame_size);
|
||||
#endif
|
||||
/* compute leaky ReLU by hand. ToDo: try tanh activation */
|
||||
for (i = 0; i < frame_size; i ++)
|
||||
{
|
||||
float tmp = out_buffer[i] + tmp_buffer[i];
|
||||
in_buffer[i] = tmp >= 0 ? tmp : 0.2 * tmp;
|
||||
}
|
||||
#ifdef DEBUG_NNDSP
|
||||
print_float_vector("post_alpha1", in_buffer, frame_size);
|
||||
#endif
|
||||
compute_generic_conv1d(alpha2, out_buffer, hAdaShape->conv_alpha2_state, in_buffer, frame_size, ACTIVATION_LINEAR, arch);
|
||||
|
||||
/* shape signal */
|
||||
for (i = 0; i < frame_size; i ++)
|
||||
{
|
||||
x_out[i] = exp(out_buffer[i]) * x_in[i];
|
||||
}
|
||||
|
||||
}
|
||||
143
managed_components/78__esp-opus/dnn/nndsp.h
Normal file
143
managed_components/78__esp-opus/dnn/nndsp.h
Normal file
@@ -0,0 +1,143 @@
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef NNDSP_H
|
||||
#define NNDSP_H
|
||||
|
||||
#include "opus_types.h"
|
||||
#include "nnet.h"
|
||||
#include <string.h>
|
||||
|
||||
|
||||
#define ADACONV_MAX_KERNEL_SIZE 16
|
||||
#define ADACONV_MAX_INPUT_CHANNELS 2
|
||||
#define ADACONV_MAX_OUTPUT_CHANNELS 2
|
||||
#define ADACONV_MAX_FRAME_SIZE 80
|
||||
#define ADACONV_MAX_OVERLAP_SIZE 40
|
||||
|
||||
#define ADACOMB_MAX_LAG 300
|
||||
#define ADACOMB_MAX_KERNEL_SIZE 16
|
||||
#define ADACOMB_MAX_FRAME_SIZE 80
|
||||
#define ADACOMB_MAX_OVERLAP_SIZE 40
|
||||
|
||||
#define ADASHAPE_MAX_INPUT_DIM 512
|
||||
#define ADASHAPE_MAX_FRAME_SIZE 160
|
||||
|
||||
/*#define DEBUG_NNDSP*/
|
||||
#ifdef DEBUG_NNDSP
|
||||
#include <stdio.h>
|
||||
#endif
|
||||
|
||||
|
||||
void print_float_vector(const char* name, const float *vec, int length);
|
||||
|
||||
typedef struct {
|
||||
float history[ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS];
|
||||
float last_kernel[ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS * ADACONV_MAX_OUTPUT_CHANNELS];
|
||||
float last_gain;
|
||||
} AdaConvState;
|
||||
|
||||
|
||||
typedef struct {
|
||||
float history[ADACOMB_MAX_KERNEL_SIZE + ADACOMB_MAX_LAG];
|
||||
float last_kernel[ADACOMB_MAX_KERNEL_SIZE];
|
||||
float last_global_gain;
|
||||
int last_pitch_lag;
|
||||
} AdaCombState;
|
||||
|
||||
|
||||
typedef struct {
|
||||
float conv_alpha1f_state[ADASHAPE_MAX_INPUT_DIM];
|
||||
float conv_alpha1t_state[ADASHAPE_MAX_INPUT_DIM];
|
||||
float conv_alpha2_state[ADASHAPE_MAX_FRAME_SIZE];
|
||||
} AdaShapeState;
|
||||
|
||||
void init_adaconv_state(AdaConvState *hAdaConv);
|
||||
|
||||
void init_adacomb_state(AdaCombState *hAdaComb);
|
||||
|
||||
void init_adashape_state(AdaShapeState *hAdaShape);
|
||||
|
||||
void compute_overlap_window(float *window, int overlap_size);
|
||||
|
||||
void adaconv_process_frame(
|
||||
AdaConvState* hAdaConv,
|
||||
float *x_out,
|
||||
const float *x_in,
|
||||
const float *features,
|
||||
const LinearLayer *kernel_layer,
|
||||
const LinearLayer *gain_layer,
|
||||
int feature_dim, /* not strictly necessary */
|
||||
int frame_size,
|
||||
int overlap_size,
|
||||
int in_channels,
|
||||
int out_channels,
|
||||
int kernel_size,
|
||||
int left_padding,
|
||||
float filter_gain_a,
|
||||
float filter_gain_b,
|
||||
float shape_gain,
|
||||
float *window,
|
||||
int arch
|
||||
);
|
||||
|
||||
void adacomb_process_frame(
|
||||
AdaCombState* hAdaComb,
|
||||
float *x_out,
|
||||
const float *x_in,
|
||||
const float *features,
|
||||
const LinearLayer *kernel_layer,
|
||||
const LinearLayer *gain_layer,
|
||||
const LinearLayer *global_gain_layer,
|
||||
int pitch_lag,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int overlap_size,
|
||||
int kernel_size,
|
||||
int left_padding,
|
||||
float filter_gain_a,
|
||||
float filter_gain_b,
|
||||
float log_gain_limit,
|
||||
float *window,
|
||||
int arch
|
||||
);
|
||||
|
||||
void adashape_process_frame(
|
||||
AdaShapeState *hAdaShape,
|
||||
float *x_out,
|
||||
const float *x_in,
|
||||
const float *features,
|
||||
const LinearLayer *alpha1f,
|
||||
const LinearLayer *alpha1t,
|
||||
const LinearLayer *alpha2,
|
||||
int feature_dim,
|
||||
int frame_size,
|
||||
int avg_pool_k,
|
||||
int arch
|
||||
);
|
||||
|
||||
#endif
|
||||
149
managed_components/78__esp-opus/dnn/nnet.c
Normal file
149
managed_components/78__esp-opus/dnn/nnet.c
Normal file
@@ -0,0 +1,149 @@
|
||||
/* Copyright (c) 2018 Mozilla
|
||||
2008-2011 Octasic Inc.
|
||||
2012-2017 Jean-Marc Valin */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <math.h>
|
||||
#include "opus_types.h"
|
||||
#include "arch.h"
|
||||
#include "nnet.h"
|
||||
#include "dred_rdovae_constants.h"
|
||||
#include "plc_data.h"
|
||||
#include "fargan.h"
|
||||
#include "os_support.h"
|
||||
#include "vec.h"
|
||||
|
||||
#ifdef ENABLE_OSCE
|
||||
#include "osce.h"
|
||||
#endif
|
||||
|
||||
#ifdef NO_OPTIMIZATIONS
|
||||
#if defined(_MSC_VER)
|
||||
#pragma message ("Compiling without any vectorization. This code will be very slow")
|
||||
#else
|
||||
#warning Compiling without any vectorization. This code will be very slow
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
#define SOFTMAX_HACK
|
||||
|
||||
|
||||
void compute_generic_dense(const LinearLayer *layer, float *output, const float *input, int activation, int arch)
|
||||
{
|
||||
compute_linear(layer, output, input, arch);
|
||||
compute_activation(output, output, layer->nb_outputs, activation, arch);
|
||||
}
|
||||
|
||||
#ifdef ENABLE_OSCE
|
||||
#define MAX_RNN_NEURONS_ALL IMAX(IMAX(IMAX(FARGAN_MAX_RNN_NEURONS, PLC_MAX_RNN_UNITS), DRED_MAX_RNN_NEURONS), OSCE_MAX_RNN_NEURONS)
|
||||
#else
|
||||
#define MAX_RNN_NEURONS_ALL IMAX(IMAX(FARGAN_MAX_RNN_NEURONS, PLC_MAX_RNN_UNITS), DRED_MAX_RNN_NEURONS)
|
||||
#endif
|
||||
|
||||
void compute_generic_gru(const LinearLayer *input_weights, const LinearLayer *recurrent_weights, float *state, const float *in, int arch)
|
||||
{
|
||||
int i;
|
||||
int N;
|
||||
float zrh[3*MAX_RNN_NEURONS_ALL];
|
||||
float recur[3*MAX_RNN_NEURONS_ALL];
|
||||
float *z;
|
||||
float *r;
|
||||
float *h;
|
||||
celt_assert(3*recurrent_weights->nb_inputs == recurrent_weights->nb_outputs);
|
||||
celt_assert(input_weights->nb_outputs == recurrent_weights->nb_outputs);
|
||||
N = recurrent_weights->nb_inputs;
|
||||
z = zrh;
|
||||
r = &zrh[N];
|
||||
h = &zrh[2*N];
|
||||
celt_assert(recurrent_weights->nb_outputs <= 3*MAX_RNN_NEURONS_ALL);
|
||||
celt_assert(in != state);
|
||||
compute_linear(input_weights, zrh, in, arch);
|
||||
compute_linear(recurrent_weights, recur, state, arch);
|
||||
for (i=0;i<2*N;i++)
|
||||
zrh[i] += recur[i];
|
||||
compute_activation(zrh, zrh, 2*N, ACTIVATION_SIGMOID, arch);
|
||||
for (i=0;i<N;i++)
|
||||
h[i] += recur[2*N+i]*r[i];
|
||||
compute_activation(h, h, N, ACTIVATION_TANH, arch);
|
||||
for (i=0;i<N;i++)
|
||||
h[i] = z[i]*state[i] + (1-z[i])*h[i];
|
||||
for (i=0;i<N;i++)
|
||||
state[i] = h[i];
|
||||
}
|
||||
|
||||
void compute_glu(const LinearLayer *layer, float *output, const float *input, int arch)
|
||||
{
|
||||
int i;
|
||||
float act2[MAX_INPUTS];
|
||||
celt_assert(layer->nb_inputs == layer->nb_outputs);
|
||||
compute_linear(layer, act2, input, arch);
|
||||
compute_activation(act2, act2, layer->nb_outputs, ACTIVATION_SIGMOID, arch);
|
||||
if (input == output) {
|
||||
/* Give a vectorization hint to the compiler for the in-place case. */
|
||||
for (i=0;i<layer->nb_outputs;i++) output[i] = output[i]*act2[i];
|
||||
} else {
|
||||
for (i=0;i<layer->nb_outputs;i++) output[i] = input[i]*act2[i];
|
||||
}
|
||||
}
|
||||
|
||||
#define MAX_CONV_INPUTS_ALL DRED_MAX_CONV_INPUTS
|
||||
|
||||
void compute_generic_conv1d(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int activation, int arch)
|
||||
{
|
||||
float tmp[MAX_CONV_INPUTS_ALL];
|
||||
celt_assert(input != output);
|
||||
celt_assert(layer->nb_inputs <= MAX_CONV_INPUTS_ALL);
|
||||
if (layer->nb_inputs!=input_size) OPUS_COPY(tmp, mem, layer->nb_inputs-input_size);
|
||||
OPUS_COPY(&tmp[layer->nb_inputs-input_size], input, input_size);
|
||||
compute_linear(layer, output, tmp, arch);
|
||||
compute_activation(output, output, layer->nb_outputs, activation, arch);
|
||||
if (layer->nb_inputs!=input_size) OPUS_COPY(mem, &tmp[input_size], layer->nb_inputs-input_size);
|
||||
}
|
||||
|
||||
void compute_generic_conv1d_dilation(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int dilation, int activation, int arch)
|
||||
{
|
||||
float tmp[MAX_CONV_INPUTS_ALL];
|
||||
int ksize = layer->nb_inputs/input_size;
|
||||
int i;
|
||||
celt_assert(input != output);
|
||||
celt_assert(layer->nb_inputs <= MAX_CONV_INPUTS_ALL);
|
||||
if (dilation==1) OPUS_COPY(tmp, mem, layer->nb_inputs-input_size);
|
||||
else for (i=0;i<ksize-1;i++) OPUS_COPY(&tmp[i*input_size], &mem[i*input_size*dilation], input_size);
|
||||
OPUS_COPY(&tmp[layer->nb_inputs-input_size], input, input_size);
|
||||
compute_linear(layer, output, tmp, arch);
|
||||
compute_activation(output, output, layer->nb_outputs, activation, arch);
|
||||
if (dilation==1) OPUS_COPY(mem, &tmp[input_size], layer->nb_inputs-input_size);
|
||||
else {
|
||||
OPUS_COPY(mem, &mem[input_size], input_size*dilation*(ksize-1)-input_size);
|
||||
OPUS_COPY(&mem[input_size*dilation*(ksize-1)-input_size], input, input_size);
|
||||
}
|
||||
}
|
||||
163
managed_components/78__esp-opus/dnn/nnet.h
Normal file
163
managed_components/78__esp-opus/dnn/nnet.h
Normal file
@@ -0,0 +1,163 @@
|
||||
/* Copyright (c) 2018 Mozilla
|
||||
Copyright (c) 2017 Jean-Marc Valin */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef NNET_H_
|
||||
#define NNET_H_
|
||||
|
||||
#include <stddef.h>
|
||||
#include "opus_types.h"
|
||||
|
||||
#define ACTIVATION_LINEAR 0
|
||||
#define ACTIVATION_SIGMOID 1
|
||||
#define ACTIVATION_TANH 2
|
||||
#define ACTIVATION_RELU 3
|
||||
#define ACTIVATION_SOFTMAX 4
|
||||
#define ACTIVATION_SWISH 5
|
||||
|
||||
#define WEIGHT_BLOB_VERSION 0
|
||||
#define WEIGHT_BLOCK_SIZE 64
|
||||
typedef struct {
|
||||
const char *name;
|
||||
int type;
|
||||
int size;
|
||||
const void *data;
|
||||
} WeightArray;
|
||||
|
||||
#define WEIGHT_TYPE_float 0
|
||||
#define WEIGHT_TYPE_int 1
|
||||
#define WEIGHT_TYPE_qweight 2
|
||||
#define WEIGHT_TYPE_int8 3
|
||||
|
||||
typedef struct {
|
||||
char head[4];
|
||||
int version;
|
||||
int type;
|
||||
int size;
|
||||
int block_size;
|
||||
char name[44];
|
||||
} WeightHead;
|
||||
|
||||
/* Generic sparse affine transformation. */
|
||||
typedef struct {
|
||||
const float *bias;
|
||||
const float *subias;
|
||||
const opus_int8 *weights;
|
||||
const float *float_weights;
|
||||
const int *weights_idx;
|
||||
const float *diag;
|
||||
const float *scale;
|
||||
int nb_inputs;
|
||||
int nb_outputs;
|
||||
} LinearLayer;
|
||||
|
||||
/* Generic sparse affine transformation. */
|
||||
typedef struct {
|
||||
const float *bias;
|
||||
const float *float_weights;
|
||||
int in_channels;
|
||||
int out_channels;
|
||||
int ktime;
|
||||
int kheight;
|
||||
} Conv2dLayer;
|
||||
|
||||
|
||||
void compute_generic_dense(const LinearLayer *layer, float *output, const float *input, int activation, int arch);
|
||||
void compute_generic_gru(const LinearLayer *input_weights, const LinearLayer *recurrent_weights, float *state, const float *in, int arch);
|
||||
void compute_generic_conv1d(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int activation, int arch);
|
||||
void compute_generic_conv1d_dilation(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int dilation, int activation, int arch);
|
||||
void compute_glu(const LinearLayer *layer, float *output, const float *input, int arch);
|
||||
void compute_gated_activation(const LinearLayer *layer, float *output, const float *input, int activation, int arch);
|
||||
|
||||
|
||||
int parse_weights(WeightArray **list, const void *data, int len);
|
||||
|
||||
|
||||
extern const WeightArray lpcnet_arrays[];
|
||||
extern const WeightArray plcmodel_arrays[];
|
||||
extern const WeightArray rdovaeenc_arrays[];
|
||||
extern const WeightArray rdovaedec_arrays[];
|
||||
extern const WeightArray fwgan_arrays[];
|
||||
extern const WeightArray fargan_arrays[];
|
||||
extern const WeightArray pitchdnn_arrays[];
|
||||
extern const WeightArray lossgen_arrays[];
|
||||
|
||||
int linear_init(LinearLayer *layer, const WeightArray *arrays,
|
||||
const char *bias,
|
||||
const char *subias,
|
||||
const char *weights,
|
||||
const char *float_weights,
|
||||
const char *weights_idx,
|
||||
const char *diag,
|
||||
const char *scale,
|
||||
int nb_inputs,
|
||||
int nb_outputs);
|
||||
|
||||
int conv2d_init(Conv2dLayer *layer, const WeightArray *arrays,
|
||||
const char *bias,
|
||||
const char *float_weights,
|
||||
int in_channels,
|
||||
int out_channels,
|
||||
int ktime,
|
||||
int kheight);
|
||||
|
||||
|
||||
void compute_linear_c(const LinearLayer *linear, float *out, const float *in);
|
||||
void compute_activation_c(float *output, const float *input, int N, int activation);
|
||||
void compute_conv2d_c(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation);
|
||||
|
||||
|
||||
#if defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON_INTR)
|
||||
#include "arm/dnn_arm.h"
|
||||
#endif
|
||||
|
||||
#if defined(OPUS_X86_MAY_HAVE_SSE2)
|
||||
#include "x86/dnn_x86.h"
|
||||
#endif
|
||||
|
||||
#ifndef OVERRIDE_COMPUTE_LINEAR
|
||||
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_c(linear, out, in))
|
||||
#endif
|
||||
|
||||
#ifndef OVERRIDE_COMPUTE_ACTIVATION
|
||||
#define compute_activation(output, input, N, activation, arch) ((void)(arch),compute_activation_c(output, input, N, activation))
|
||||
#endif
|
||||
|
||||
#ifndef OVERRIDE_COMPUTE_CONV2D
|
||||
#define compute_conv2d(conv, out, mem, in, height, hstride, activation, arch) ((void)(arch),compute_conv2d_c(conv, out, mem, in, height, hstride, activation))
|
||||
#endif
|
||||
|
||||
#if defined(__x86_64__) && !defined(OPUS_X86_MAY_HAVE_SSE4_1) && !defined(OPUS_X86_MAY_HAVE_AVX2)
|
||||
#if defined(_MSC_VER)
|
||||
#pragma message ("Only SSE and SSE2 are available. On newer machines, enable SSSE3/AVX/AVX2 to get better performance")
|
||||
#else
|
||||
#warning "Only SSE and SSE2 are available. On newer machines, enable SSSE3/AVX/AVX2 using -march= to get better performance"
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
#endif /* NNET_H_ */
|
||||
247
managed_components/78__esp-opus/dnn/nnet_arch.h
Normal file
247
managed_components/78__esp-opus/dnn/nnet_arch.h
Normal file
@@ -0,0 +1,247 @@
|
||||
/* Copyright (c) 2018-2019 Mozilla
|
||||
2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef NNET_ARCH_H
|
||||
#define NNET_ARCH_H
|
||||
|
||||
#include "nnet.h"
|
||||
#include "arch.h"
|
||||
#include "os_support.h"
|
||||
#include "vec.h"
|
||||
|
||||
#define CAT_SUFFIX2(a,b) a ## b
|
||||
#define CAT_SUFFIX(a,b) CAT_SUFFIX2(a, b)
|
||||
|
||||
#define RTCD_SUF(name) CAT_SUFFIX(name, RTCD_ARCH)
|
||||
|
||||
/* Force vectorization on for DNN code because some of the loops rely on
|
||||
compiler vectorization rather than explicitly using intrinsics. */
|
||||
#if OPUS_GNUC_PREREQ(5,1)
|
||||
#define GCC_POP_OPTIONS
|
||||
#pragma GCC push_options
|
||||
#pragma GCC optimize("tree-vectorize")
|
||||
#endif
|
||||
|
||||
|
||||
#define MAX_ACTIVATIONS (4096)
|
||||
|
||||
static OPUS_INLINE void vec_swish(float *y, const float *x, int N)
|
||||
{
|
||||
int i;
|
||||
float tmp[MAX_ACTIVATIONS];
|
||||
celt_assert(N <= MAX_ACTIVATIONS);
|
||||
vec_sigmoid(tmp, x, N);
|
||||
for (i=0;i<N;i++)
|
||||
y[i] = x[i]*tmp[i];
|
||||
}
|
||||
|
||||
static OPUS_INLINE float relu(float x)
|
||||
{
|
||||
return x < 0 ? 0 : x;
|
||||
}
|
||||
|
||||
/*#define HIGH_ACCURACY */
|
||||
|
||||
void RTCD_SUF(compute_activation_)(float *output, const float *input, int N, int activation)
|
||||
{
|
||||
int i;
|
||||
if (activation == ACTIVATION_SIGMOID) {
|
||||
#ifdef HIGH_ACCURACY
|
||||
for (int n=0; n<N; n++)
|
||||
{
|
||||
output[n] = 1.f / (1 + exp(-input[n]));
|
||||
}
|
||||
#else
|
||||
vec_sigmoid(output, input, N);
|
||||
#endif
|
||||
} else if (activation == ACTIVATION_TANH) {
|
||||
#ifdef HIGH_ACCURACY
|
||||
for (int n=0; n<N; n++)
|
||||
{
|
||||
output[n] = tanh(input[n]);
|
||||
}
|
||||
#else
|
||||
vec_tanh(output, input, N);
|
||||
#endif
|
||||
} else if (activation == ACTIVATION_SWISH) {
|
||||
vec_swish(output, input, N);
|
||||
} else if (activation == ACTIVATION_RELU) {
|
||||
for (i=0;i<N;i++)
|
||||
output[i] = relu(input[i]);
|
||||
} else if (activation == ACTIVATION_SOFTMAX) {
|
||||
#ifdef SOFTMAX_HACK
|
||||
OPUS_COPY(output, input, N);
|
||||
/*for (i=0;i<N;i++)
|
||||
output[i] = input[i];*/
|
||||
#else
|
||||
float sum = 0;
|
||||
softmax(output, input, N);
|
||||
for (i=0;i<N;i++) {
|
||||
sum += output[i];
|
||||
}
|
||||
sum = 1.f/(sum+1e-30);
|
||||
for (i=0;i<N;i++)
|
||||
output[i] = sum*output[i];
|
||||
#endif
|
||||
} else {
|
||||
celt_assert(activation == ACTIVATION_LINEAR);
|
||||
if (input != output) {
|
||||
for (i=0;i<N;i++)
|
||||
output[i] = input[i];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void RTCD_SUF(compute_linear_) (const LinearLayer *linear, float *out, const float *in)
|
||||
{
|
||||
int i, M, N;
|
||||
const float *bias;
|
||||
celt_assert(in != out);
|
||||
bias = linear->bias;
|
||||
M = linear->nb_inputs;
|
||||
N = linear->nb_outputs;
|
||||
if (linear->float_weights != NULL) {
|
||||
if (linear->weights_idx != NULL) sparse_sgemv8x4(out, linear->float_weights, linear->weights_idx, N, in);
|
||||
else sgemv(out, linear->float_weights, N, M, N, in);
|
||||
} else if (linear->weights != NULL) {
|
||||
if (linear->weights_idx != NULL) sparse_cgemv8x4(out, linear->weights, linear->weights_idx, linear->scale, N, M, in);
|
||||
else cgemv8x4(out, linear->weights, linear->scale, N, M, in);
|
||||
/* Only use SU biases on for integer matrices on SU archs. */
|
||||
#ifdef USE_SU_BIAS
|
||||
bias = linear->subias;
|
||||
#endif
|
||||
}
|
||||
else OPUS_CLEAR(out, N);
|
||||
if (bias != NULL) {
|
||||
for (i=0;i<N;i++) out[i] += bias[i];
|
||||
}
|
||||
if (linear->diag) {
|
||||
/* Diag is only used for GRU recurrent weights. */
|
||||
celt_assert(3*M == N);
|
||||
for (i=0;i<M;i++) {
|
||||
out[i] += linear->diag[i]*in[i];
|
||||
out[i+M] += linear->diag[i+M]*in[i];
|
||||
out[i+2*M] += linear->diag[i+2*M]*in[i];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Computes non-padded convolution for input [ ksize1 x in_channels x (len2+ksize2) ],
|
||||
kernel [ out_channels x in_channels x ksize1 x ksize2 ],
|
||||
storing the output as [ out_channels x len2 ].
|
||||
We assume that the output dimension along the ksize1 axis is 1,
|
||||
i.e. processing one frame at a time. */
|
||||
static void conv2d_float(float *out, const float *weights, int in_channels, int out_channels, int ktime, int kheight, const float *in, int height, int hstride)
|
||||
{
|
||||
int i;
|
||||
int in_stride;
|
||||
in_stride = height+kheight-1;
|
||||
for (i=0;i<out_channels;i++) {
|
||||
int m;
|
||||
OPUS_CLEAR(&out[i*hstride], height);
|
||||
for (m=0;m<in_channels;m++) {
|
||||
int t;
|
||||
for (t=0;t<ktime;t++) {
|
||||
int h;
|
||||
for (h=0;h<kheight;h++) {
|
||||
int j;
|
||||
for (j=0;j<height;j++) {
|
||||
out[i*hstride + j] += weights[i*in_channels*ktime*kheight + m*ktime*kheight + t*kheight + h] *
|
||||
in[t*in_channels*in_stride + m*in_stride + j + h];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* There's no intrinsics in this function (or the one above) because the gcc (and hopefully other compiler) auto-vectorizer is smart enough to
|
||||
produce the right code by itself based on the compile flags. */
|
||||
static void conv2d_3x3_float(float *out, const float *weights, int in_channels, int out_channels, const float *in, int height, int hstride)
|
||||
{
|
||||
int i;
|
||||
int in_stride;
|
||||
int kheight, ktime;
|
||||
kheight = ktime = 3;
|
||||
in_stride = height+kheight-1;
|
||||
for (i=0;i<out_channels;i++) {
|
||||
int m;
|
||||
OPUS_CLEAR(&out[i*hstride], height);
|
||||
for (m=0;m<in_channels;m++) {
|
||||
int j;
|
||||
for (j=0;j<height;j++) {
|
||||
/* Unrolled version of previous function -- compiler will figure out the indexing simplifications. */
|
||||
out[i*hstride + j] += weights[i*in_channels*ktime*kheight + m*ktime*kheight + 0*kheight + 0]*in[0*in_channels*in_stride + m*in_stride + j + 0]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 0*kheight + 1]*in[0*in_channels*in_stride + m*in_stride + j + 1]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 0*kheight + 2]*in[0*in_channels*in_stride + m*in_stride + j + 2]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 1*kheight + 0]*in[1*in_channels*in_stride + m*in_stride + j + 0]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 1*kheight + 1]*in[1*in_channels*in_stride + m*in_stride + j + 1]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 1*kheight + 2]*in[1*in_channels*in_stride + m*in_stride + j + 2]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 2*kheight + 0]*in[2*in_channels*in_stride + m*in_stride + j + 0]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 2*kheight + 1]*in[2*in_channels*in_stride + m*in_stride + j + 1]
|
||||
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 2*kheight + 2]*in[2*in_channels*in_stride + m*in_stride + j + 2];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#define MAX_CONV2D_INPUTS 8192
|
||||
|
||||
void RTCD_SUF(compute_conv2d_)(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation)
|
||||
{
|
||||
int i;
|
||||
const float *bias;
|
||||
float in_buf[MAX_CONV2D_INPUTS];
|
||||
int time_stride;
|
||||
celt_assert(in != out);
|
||||
time_stride = conv->in_channels*(height+conv->kheight-1);
|
||||
celt_assert(conv->ktime*time_stride <= MAX_CONV2D_INPUTS);
|
||||
OPUS_COPY(in_buf, mem, (conv->ktime-1)*time_stride);
|
||||
OPUS_COPY(&in_buf[(conv->ktime-1)*time_stride], in, time_stride);
|
||||
OPUS_COPY(mem, &in_buf[time_stride], (conv->ktime-1)*time_stride);
|
||||
bias = conv->bias;
|
||||
if (conv->kheight == 3 && conv->ktime == 3)
|
||||
conv2d_3x3_float(out, conv->float_weights, conv->in_channels, conv->out_channels, in_buf, height, hstride);
|
||||
else
|
||||
conv2d_float(out, conv->float_weights, conv->in_channels, conv->out_channels, conv->ktime, conv->kheight, in_buf, height, hstride);
|
||||
if (bias != NULL) {
|
||||
for (i=0;i<conv->out_channels;i++) {
|
||||
int j;
|
||||
for (j=0;j<height;j++) out[i*hstride+j] += bias[i];
|
||||
}
|
||||
}
|
||||
for (i=0;i<conv->out_channels;i++) {
|
||||
RTCD_SUF(compute_activation_)(&out[i*hstride], &out[i*hstride], height, activation);
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef GCC_POP_OPTIONS
|
||||
#pragma GCC pop_options
|
||||
#endif
|
||||
|
||||
#endif
|
||||
35
managed_components/78__esp-opus/dnn/nnet_default.c
Normal file
35
managed_components/78__esp-opus/dnn/nnet_default.c
Normal file
@@ -0,0 +1,35 @@
|
||||
/* Copyright (c) 2018-2019 Mozilla
|
||||
2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
|
||||
#define RTCD_ARCH c
|
||||
|
||||
#include "nnet_arch.h"
|
||||
1419
managed_components/78__esp-opus/dnn/osce.c
Normal file
1419
managed_components/78__esp-opus/dnn/osce.c
Normal file
File diff suppressed because it is too large
Load Diff
84
managed_components/78__esp-opus/dnn/osce.h
Normal file
84
managed_components/78__esp-opus/dnn/osce.h
Normal file
@@ -0,0 +1,84 @@
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef OSCE_H
|
||||
#define OSCE_H
|
||||
|
||||
|
||||
#include "opus_types.h"
|
||||
/*#include "osce_config.h"*/
|
||||
#ifndef DISABLE_LACE
|
||||
#include "lace_data.h"
|
||||
#endif
|
||||
#ifndef DISABLE_NOLACE
|
||||
#include "nolace_data.h"
|
||||
#endif
|
||||
#include "nndsp.h"
|
||||
#include "nnet.h"
|
||||
#include "osce_structs.h"
|
||||
#include "structs.h"
|
||||
|
||||
#define OSCE_METHOD_NONE 0
|
||||
#ifndef DISABLE_LACE
|
||||
#define OSCE_METHOD_LACE 1
|
||||
#endif
|
||||
#ifndef DISABLE_NOLACE
|
||||
#define OSCE_METHOD_NOLACE 2
|
||||
#endif
|
||||
|
||||
#if !defined(DISABLE_NOLACE)
|
||||
#define OSCE_DEFAULT_METHOD OSCE_METHOD_NOLACE
|
||||
#define OSCE_MAX_RNN_NEURONS NOLACE_FNET_GRU_STATE_SIZE
|
||||
#elif !defined(DISABLE_LACE)
|
||||
#define OSCE_DEFAULT_METHOD OSCE_METHOD_LACE
|
||||
#define OSCE_MAX_RNN_NEURONS LACE_FNET_GRU_STATE_SIZE
|
||||
#else
|
||||
#define OSCE_DEFAULT_METHOD OSCE_METHOD_NONE
|
||||
#define OSCE_MAX_RNN_NEURONS 0
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
|
||||
/* API */
|
||||
|
||||
|
||||
void osce_enhance_frame(
|
||||
OSCEModel *model, /* I OSCE model struct */
|
||||
silk_decoder_state *psDec, /* I/O Decoder state */
|
||||
silk_decoder_control *psDecCtrl, /* I Decoder control */
|
||||
opus_int16 xq[], /* I/O Decoded speech */
|
||||
opus_int32 num_bits, /* I Size of SILK payload in bits */
|
||||
int arch /* I Run-time architecture */
|
||||
);
|
||||
|
||||
|
||||
int osce_load_models(OSCEModel *hModel, const void *data, int len);
|
||||
void osce_reset(silk_OSCE_struct *hOSCE, int method);
|
||||
|
||||
|
||||
#endif
|
||||
60
managed_components/78__esp-opus/dnn/osce_config.h
Normal file
60
managed_components/78__esp-opus/dnn/osce_config.h
Normal file
@@ -0,0 +1,60 @@
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef OSCE_CONFIG
|
||||
#define OSCE_CONFIG
|
||||
|
||||
#define OSCE_FEATURES_MAX_HISTORY 350
|
||||
#define OSCE_FEATURE_DIM 93
|
||||
#define OSCE_MAX_FEATURE_FRAMES 4
|
||||
|
||||
#define OSCE_CLEAN_SPEC_NUM_BANDS 64
|
||||
#define OSCE_NOISY_SPEC_NUM_BANDS 18
|
||||
|
||||
#define OSCE_NO_PITCH_VALUE 7
|
||||
|
||||
#define OSCE_PREEMPH 0.85f
|
||||
|
||||
#define OSCE_PITCH_HANGOVER 0
|
||||
|
||||
#define OSCE_CLEAN_SPEC_START 0
|
||||
#define OSCE_CLEAN_SPEC_LENGTH 64
|
||||
|
||||
#define OSCE_NOISY_CEPSTRUM_START 64
|
||||
#define OSCE_NOISY_CEPSTRUM_LENGTH 18
|
||||
|
||||
#define OSCE_ACORR_START 82
|
||||
#define OSCE_ACORR_LENGTH 5
|
||||
|
||||
#define OSCE_LTP_START 87
|
||||
#define OSCE_LTP_LENGTH 5
|
||||
|
||||
#define OSCE_LOG_GAIN_START 92
|
||||
#define OSCE_LOG_GAIN_LENGTH 1
|
||||
|
||||
|
||||
#endif
|
||||
454
managed_components/78__esp-opus/dnn/osce_features.c
Normal file
454
managed_components/78__esp-opus/dnn/osce_features.c
Normal file
@@ -0,0 +1,454 @@
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#define OSCE_SPEC_WINDOW_SIZE 320
|
||||
#define OSCE_SPEC_NUM_FREQS 161
|
||||
|
||||
|
||||
/*DEBUG*/
|
||||
/*#define WRITE_FEATURES*/
|
||||
/*#define DEBUG_PRING*/
|
||||
/*******/
|
||||
|
||||
#include "stack_alloc.h"
|
||||
#include "osce_features.h"
|
||||
#include "kiss_fft.h"
|
||||
#include "os_support.h"
|
||||
#include "osce.h"
|
||||
#include "freq.h"
|
||||
|
||||
|
||||
#if defined(WRITE_FEATURES) || defined(DEBUG_PRING)
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#endif
|
||||
|
||||
static const int center_bins_clean[64] = {
|
||||
0, 2, 5, 8, 10, 12, 15, 18,
|
||||
20, 22, 25, 28, 30, 33, 35, 38,
|
||||
40, 42, 45, 48, 50, 52, 55, 58,
|
||||
60, 62, 65, 68, 70, 73, 75, 78,
|
||||
80, 82, 85, 88, 90, 92, 95, 98,
|
||||
100, 102, 105, 108, 110, 112, 115, 118,
|
||||
120, 122, 125, 128, 130, 132, 135, 138,
|
||||
140, 142, 145, 148, 150, 152, 155, 160
|
||||
};
|
||||
|
||||
static const int center_bins_noisy[18] = {
|
||||
0, 4, 8, 12, 16, 20, 24, 28,
|
||||
32, 40, 48, 56, 64, 80, 96, 112,
|
||||
136, 160
|
||||
};
|
||||
|
||||
static const float band_weights_clean[64] = {
|
||||
0.666666666667f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.400000000000f, 0.400000000000f, 0.400000000000f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.400000000000f, 0.400000000000f, 0.400000000000f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
|
||||
0.500000000000f, 0.400000000000f, 0.250000000000f, 0.333333333333f
|
||||
};
|
||||
|
||||
static const float band_weights_noisy[18] = {
|
||||
0.400000000000f, 0.250000000000f, 0.250000000000f, 0.250000000000f,
|
||||
0.250000000000f, 0.250000000000f, 0.250000000000f, 0.250000000000f,
|
||||
0.166666666667f, 0.125000000000f, 0.125000000000f, 0.125000000000f,
|
||||
0.083333333333f, 0.062500000000f, 0.062500000000f, 0.050000000000f,
|
||||
0.041666666667f, 0.080000000000f
|
||||
};
|
||||
|
||||
static float osce_window[OSCE_SPEC_WINDOW_SIZE] = {
|
||||
0.004908718808f, 0.014725683311f, 0.024541228523f, 0.034354408400f, 0.044164277127f,
|
||||
0.053969889210f, 0.063770299562f, 0.073564563600f, 0.083351737332f, 0.093130877450f,
|
||||
0.102901041421f, 0.112661287575f, 0.122410675199f, 0.132148264628f, 0.141873117332f,
|
||||
0.151584296010f, 0.161280864678f, 0.170961888760f, 0.180626435180f, 0.190273572448f,
|
||||
0.199902370753f, 0.209511902052f, 0.219101240157f, 0.228669460829f, 0.238215641862f,
|
||||
0.247738863176f, 0.257238206902f, 0.266712757475f, 0.276161601717f, 0.285583828929f,
|
||||
0.294978530977f, 0.304344802381f, 0.313681740399f, 0.322988445118f, 0.332264019538f,
|
||||
0.341507569661f, 0.350718204573f, 0.359895036535f, 0.369037181064f, 0.378143757022f,
|
||||
0.387213886697f, 0.396246695891f, 0.405241314005f, 0.414196874117f, 0.423112513073f,
|
||||
0.431987371563f, 0.440820594212f, 0.449611329655f, 0.458358730621f, 0.467061954019f,
|
||||
0.475720161014f, 0.484332517110f, 0.492898192230f, 0.501416360796f, 0.509886201809f,
|
||||
0.518306898929f, 0.526677640552f, 0.534997619887f, 0.543266035038f, 0.551482089078f,
|
||||
0.559644990127f, 0.567753951426f, 0.575808191418f, 0.583806933818f, 0.591749407690f,
|
||||
0.599634847523f, 0.607462493302f, 0.615231590581f, 0.622941390558f, 0.630591150148f,
|
||||
0.638180132051f, 0.645707604824f, 0.653172842954f, 0.660575126926f, 0.667913743292f,
|
||||
0.675187984742f, 0.682397150168f, 0.689540544737f, 0.696617479953f, 0.703627273726f,
|
||||
0.710569250438f, 0.717442741007f, 0.724247082951f, 0.730981620454f, 0.737645704427f,
|
||||
0.744238692572f, 0.750759949443f, 0.757208846506f, 0.763584762206f, 0.769887082016f,
|
||||
0.776115198508f, 0.782268511401f, 0.788346427627f, 0.794348361383f, 0.800273734191f,
|
||||
0.806121974951f, 0.811892519997f, 0.817584813152f, 0.823198305781f, 0.828732456844f,
|
||||
0.834186732948f, 0.839560608398f, 0.844853565250f, 0.850065093356f, 0.855194690420f,
|
||||
0.860241862039f, 0.865206121757f, 0.870086991109f, 0.874883999665f, 0.879596685080f,
|
||||
0.884224593137f, 0.888767277786f, 0.893224301196f, 0.897595233788f, 0.901879654283f,
|
||||
0.906077149740f, 0.910187315596f, 0.914209755704f, 0.918144082372f, 0.921989916403f,
|
||||
0.925746887127f, 0.929414632439f, 0.932992798835f, 0.936481041442f, 0.939879024058f,
|
||||
0.943186419177f, 0.946402908026f, 0.949528180593f, 0.952561935658f, 0.955503880820f,
|
||||
0.958353732530f, 0.961111216112f, 0.963776065795f, 0.966348024735f, 0.968826845041f,
|
||||
0.971212287799f, 0.973504123096f, 0.975702130039f, 0.977806096779f, 0.979815820533f,
|
||||
0.981731107599f, 0.983551773378f, 0.985277642389f, 0.986908548290f, 0.988444333892f,
|
||||
0.989884851171f, 0.991229961288f, 0.992479534599f, 0.993633450666f, 0.994691598273f,
|
||||
0.995653875433f, 0.996520189401f, 0.997290456679f, 0.997964603026f, 0.998542563469f,
|
||||
0.999024282300f, 0.999409713092f, 0.999698818696f, 0.999891571247f, 0.999987952167f,
|
||||
0.999987952167f, 0.999891571247f, 0.999698818696f, 0.999409713092f, 0.999024282300f,
|
||||
0.998542563469f, 0.997964603026f, 0.997290456679f, 0.996520189401f, 0.995653875433f,
|
||||
0.994691598273f, 0.993633450666f, 0.992479534599f, 0.991229961288f, 0.989884851171f,
|
||||
0.988444333892f, 0.986908548290f, 0.985277642389f, 0.983551773378f, 0.981731107599f,
|
||||
0.979815820533f, 0.977806096779f, 0.975702130039f, 0.973504123096f, 0.971212287799f,
|
||||
0.968826845041f, 0.966348024735f, 0.963776065795f, 0.961111216112f, 0.958353732530f,
|
||||
0.955503880820f, 0.952561935658f, 0.949528180593f, 0.946402908026f, 0.943186419177f,
|
||||
0.939879024058f, 0.936481041442f, 0.932992798835f, 0.929414632439f, 0.925746887127f,
|
||||
0.921989916403f, 0.918144082372f, 0.914209755704f, 0.910187315596f, 0.906077149740f,
|
||||
0.901879654283f, 0.897595233788f, 0.893224301196f, 0.888767277786f, 0.884224593137f,
|
||||
0.879596685080f, 0.874883999665f, 0.870086991109f, 0.865206121757f, 0.860241862039f,
|
||||
0.855194690420f, 0.850065093356f, 0.844853565250f, 0.839560608398f, 0.834186732948f,
|
||||
0.828732456844f, 0.823198305781f, 0.817584813152f, 0.811892519997f, 0.806121974951f,
|
||||
0.800273734191f, 0.794348361383f, 0.788346427627f, 0.782268511401f, 0.776115198508f,
|
||||
0.769887082016f, 0.763584762206f, 0.757208846506f, 0.750759949443f, 0.744238692572f,
|
||||
0.737645704427f, 0.730981620454f, 0.724247082951f, 0.717442741007f, 0.710569250438f,
|
||||
0.703627273726f, 0.696617479953f, 0.689540544737f, 0.682397150168f, 0.675187984742f,
|
||||
0.667913743292f, 0.660575126926f, 0.653172842954f, 0.645707604824f, 0.638180132051f,
|
||||
0.630591150148f, 0.622941390558f, 0.615231590581f, 0.607462493302f, 0.599634847523f,
|
||||
0.591749407690f, 0.583806933818f, 0.575808191418f, 0.567753951426f, 0.559644990127f,
|
||||
0.551482089078f, 0.543266035038f, 0.534997619887f, 0.526677640552f, 0.518306898929f,
|
||||
0.509886201809f, 0.501416360796f, 0.492898192230f, 0.484332517110f, 0.475720161014f,
|
||||
0.467061954019f, 0.458358730621f, 0.449611329655f, 0.440820594212f, 0.431987371563f,
|
||||
0.423112513073f, 0.414196874117f, 0.405241314005f, 0.396246695891f, 0.387213886697f,
|
||||
0.378143757022f, 0.369037181064f, 0.359895036535f, 0.350718204573f, 0.341507569661f,
|
||||
0.332264019538f, 0.322988445118f, 0.313681740399f, 0.304344802381f, 0.294978530977f,
|
||||
0.285583828929f, 0.276161601717f, 0.266712757475f, 0.257238206902f, 0.247738863176f,
|
||||
0.238215641862f, 0.228669460829f, 0.219101240157f, 0.209511902052f, 0.199902370753f,
|
||||
0.190273572448f, 0.180626435180f, 0.170961888760f, 0.161280864678f, 0.151584296010f,
|
||||
0.141873117332f, 0.132148264628f, 0.122410675199f, 0.112661287575f, 0.102901041421f,
|
||||
0.093130877450f, 0.083351737332f, 0.073564563600f, 0.063770299562f, 0.053969889210f,
|
||||
0.044164277127f, 0.034354408400f, 0.024541228523f, 0.014725683311f, 0.004908718808f
|
||||
};
|
||||
|
||||
static void apply_filterbank(float *x_out, float *x_in, const int *center_bins, const float* band_weights, int num_bands)
|
||||
{
|
||||
int b, i;
|
||||
float frac;
|
||||
|
||||
celt_assert(x_in != x_out)
|
||||
|
||||
x_out[0] = 0;
|
||||
for (b = 0; b < num_bands - 1; b++)
|
||||
{
|
||||
x_out[b+1] = 0;
|
||||
for (i = center_bins[b]; i < center_bins[b+1]; i++)
|
||||
{
|
||||
frac = (float) (center_bins[b+1] - i) / (center_bins[b+1] - center_bins[b]);
|
||||
x_out[b] += band_weights[b] * frac * x_in[i];
|
||||
x_out[b+1] += band_weights[b+1] * (1 - frac) * x_in[i];
|
||||
|
||||
}
|
||||
}
|
||||
x_out[num_bands - 1] += band_weights[num_bands - 1] * x_in[center_bins[num_bands - 1]];
|
||||
#ifdef DEBUG_PRINT
|
||||
for (b = 0; b < num_bands; b++)
|
||||
{
|
||||
printf("band[%d]: %f\n", b, x_out[b]);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
static void mag_spec_320_onesided(float *out, float *in)
|
||||
{
|
||||
celt_assert(OSCE_SPEC_WINDOW_SIZE == 320);
|
||||
kiss_fft_cpx buffer[OSCE_SPEC_WINDOW_SIZE];
|
||||
int k;
|
||||
forward_transform(buffer, in);
|
||||
|
||||
for (k = 0; k < OSCE_SPEC_NUM_FREQS; k++)
|
||||
{
|
||||
out[k] = OSCE_SPEC_WINDOW_SIZE * sqrt(buffer[k].r * buffer[k].r + buffer[k].i * buffer[k].i);
|
||||
#ifdef DEBUG_PRINT
|
||||
printf("magspec[%d]: %f\n", k, out[k]);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
static void calculate_log_spectrum_from_lpc(float *spec, opus_int16 *a_q12, int lpc_order)
|
||||
{
|
||||
float buffer[OSCE_SPEC_WINDOW_SIZE] = {0};
|
||||
int i;
|
||||
|
||||
/* zero expansion */
|
||||
buffer[0] = 1;
|
||||
for (i = 0; i < lpc_order; i++)
|
||||
{
|
||||
buffer[i+1] = - (float)a_q12[i] / (1U << 12);
|
||||
}
|
||||
|
||||
/* calculate and invert magnitude spectrum */
|
||||
mag_spec_320_onesided(buffer, buffer);
|
||||
|
||||
for (i = 0; i < OSCE_SPEC_NUM_FREQS; i++)
|
||||
{
|
||||
buffer[i] = 1.f / (buffer[i] + 1e-9f);
|
||||
}
|
||||
|
||||
/* apply filterbank */
|
||||
apply_filterbank(spec, buffer, center_bins_clean, band_weights_clean, OSCE_CLEAN_SPEC_NUM_BANDS);
|
||||
|
||||
/* log and scaling */
|
||||
for (i = 0; i < OSCE_CLEAN_SPEC_NUM_BANDS; i++)
|
||||
{
|
||||
spec[i] = 0.3f * log(spec[i] + 1e-9f);
|
||||
}
|
||||
}
|
||||
|
||||
static void calculate_cepstrum(float *cepstrum, float *signal)
|
||||
{
|
||||
float buffer[OSCE_SPEC_WINDOW_SIZE];
|
||||
float *spec = &buffer[OSCE_SPEC_NUM_FREQS + 3];
|
||||
int n;
|
||||
|
||||
celt_assert(cepstrum != signal)
|
||||
|
||||
for (n = 0; n < OSCE_SPEC_WINDOW_SIZE; n++)
|
||||
{
|
||||
buffer[n] = osce_window[n] * signal[n];
|
||||
}
|
||||
|
||||
/* calculate magnitude spectrum */
|
||||
mag_spec_320_onesided(buffer, buffer);
|
||||
|
||||
/* accumulate bands */
|
||||
apply_filterbank(spec, buffer, center_bins_noisy, band_weights_noisy, OSCE_NOISY_SPEC_NUM_BANDS);
|
||||
|
||||
/* log domain conversion */
|
||||
for (n = 0; n < OSCE_NOISY_SPEC_NUM_BANDS; n++)
|
||||
{
|
||||
spec[n] = log(spec[n] + 1e-9f);
|
||||
#ifdef DEBUG_PRINT
|
||||
printf("logspec[%d]: %f\n", n, spec[n]);
|
||||
#endif
|
||||
}
|
||||
|
||||
/* DCT-II (orthonormal) */
|
||||
celt_assert(OSCE_NOISY_SPEC_NUM_BANDS == NB_BANDS);
|
||||
dct(cepstrum, spec);
|
||||
}
|
||||
|
||||
static void calculate_acorr(float *acorr, float *signal, int lag)
|
||||
{
|
||||
int n, k;
|
||||
celt_assert(acorr != signal)
|
||||
|
||||
for (k = -2; k <= 2; k++)
|
||||
{
|
||||
acorr[k+2] = 0;
|
||||
float xx = 0;
|
||||
float xy = 0;
|
||||
float yy = 0;
|
||||
for (n = 0; n < 80; n++)
|
||||
{
|
||||
/* obviously wasteful -> fix later */
|
||||
xx += signal[n] * signal[n];
|
||||
yy += signal[n - lag + k] * signal[n - lag + k];
|
||||
xy += signal[n] * signal[n - lag + k];
|
||||
}
|
||||
acorr[k+2] = xy / sqrt(xx * yy + 1e-9f);
|
||||
}
|
||||
}
|
||||
|
||||
static int pitch_postprocessing(OSCEFeatureState *psFeatures, int lag, int type)
|
||||
{
|
||||
int new_lag;
|
||||
int modulus;
|
||||
|
||||
#ifdef OSCE_HANGOVER_BUGFIX
|
||||
#define TESTBIT 1
|
||||
#else
|
||||
#define TESTBIT 0
|
||||
#endif
|
||||
|
||||
modulus = OSCE_PITCH_HANGOVER;
|
||||
if (modulus == 0) modulus ++;
|
||||
|
||||
/* hangover is currently disabled to reflect a bug in the python code. ToDo: re-evaluate hangover */
|
||||
if (type != TYPE_VOICED && psFeatures->last_type == TYPE_VOICED && TESTBIT)
|
||||
/* enter hangover */
|
||||
{
|
||||
new_lag = OSCE_NO_PITCH_VALUE;
|
||||
if (psFeatures->pitch_hangover_count < OSCE_PITCH_HANGOVER)
|
||||
{
|
||||
new_lag = psFeatures->last_lag;
|
||||
psFeatures->pitch_hangover_count = (psFeatures->pitch_hangover_count + 1) % modulus;
|
||||
}
|
||||
}
|
||||
else if (type != TYPE_VOICED && psFeatures->pitch_hangover_count && TESTBIT)
|
||||
/* continue hangover */
|
||||
{
|
||||
new_lag = psFeatures->last_lag;
|
||||
psFeatures->pitch_hangover_count = (psFeatures->pitch_hangover_count + 1) % modulus;
|
||||
}
|
||||
else if (type != TYPE_VOICED)
|
||||
/* unvoiced frame after hangover */
|
||||
{
|
||||
new_lag = OSCE_NO_PITCH_VALUE;
|
||||
psFeatures->pitch_hangover_count = 0;
|
||||
}
|
||||
else
|
||||
/* voiced frame: update last_lag */
|
||||
{
|
||||
new_lag = lag;
|
||||
psFeatures->last_lag = lag;
|
||||
psFeatures->pitch_hangover_count = 0;
|
||||
}
|
||||
|
||||
/* buffer update */
|
||||
psFeatures->last_type = type;
|
||||
|
||||
/* with the current setup this should never happen (but who knows...) */
|
||||
celt_assert(new_lag)
|
||||
|
||||
return new_lag;
|
||||
}
|
||||
|
||||
void osce_calculate_features(
|
||||
silk_decoder_state *psDec, /* I/O Decoder state */
|
||||
silk_decoder_control *psDecCtrl, /* I Decoder control */
|
||||
float *features, /* O input features */
|
||||
float *numbits, /* O numbits and smoothed numbits */
|
||||
int *periods, /* O pitch lags on subframe basis */
|
||||
const opus_int16 xq[], /* I Decoded speech */
|
||||
opus_int32 num_bits /* I Size of SILK payload in bits */
|
||||
)
|
||||
{
|
||||
int num_subframes, num_samples;
|
||||
float buffer[OSCE_FEATURES_MAX_HISTORY + OSCE_MAX_FEATURE_FRAMES * 80];
|
||||
float *frame, *pfeatures;
|
||||
OSCEFeatureState *psFeatures;
|
||||
int i, n, k;
|
||||
#ifdef WRITE_FEATURES
|
||||
static FILE *f_feat = NULL;
|
||||
if (f_feat == NULL)
|
||||
{
|
||||
f_feat = fopen("assembled_features.f32", "wb");
|
||||
}
|
||||
#endif
|
||||
|
||||
/*OPUS_CLEAR(buffer, 1);*/
|
||||
memset(buffer, 0, sizeof(buffer));
|
||||
|
||||
num_subframes = psDec->nb_subfr;
|
||||
num_samples = num_subframes * 80;
|
||||
psFeatures = &psDec->osce.features;
|
||||
|
||||
/* smooth bit count */
|
||||
psFeatures->numbits_smooth = 0.9f * psFeatures->numbits_smooth + 0.1f * num_bits;
|
||||
numbits[0] = num_bits;
|
||||
numbits[1] = psFeatures->numbits_smooth;
|
||||
|
||||
for (n = 0; n < num_samples; n++)
|
||||
{
|
||||
buffer[OSCE_FEATURES_MAX_HISTORY + n] = (float) xq[n] / (1U<<15);
|
||||
}
|
||||
OPUS_COPY(buffer, psFeatures->signal_history, OSCE_FEATURES_MAX_HISTORY);
|
||||
|
||||
for (k = 0; k < num_subframes; k++)
|
||||
{
|
||||
pfeatures = features + k * OSCE_FEATURE_DIM;
|
||||
frame = &buffer[OSCE_FEATURES_MAX_HISTORY + k * 80];
|
||||
memset(pfeatures, 0, OSCE_FEATURE_DIM); /* precaution */
|
||||
|
||||
/* clean spectrum from lpcs (update every other frame) */
|
||||
if (k % 2 == 0)
|
||||
{
|
||||
calculate_log_spectrum_from_lpc(pfeatures + OSCE_CLEAN_SPEC_START, psDecCtrl->PredCoef_Q12[k >> 1], psDec->LPC_order);
|
||||
}
|
||||
else
|
||||
{
|
||||
OPUS_COPY(pfeatures + OSCE_CLEAN_SPEC_START, pfeatures + OSCE_CLEAN_SPEC_START - OSCE_FEATURE_DIM, OSCE_CLEAN_SPEC_LENGTH);
|
||||
}
|
||||
|
||||
/* noisy cepstrum from signal (update every other frame) */
|
||||
if (k % 2 == 0)
|
||||
{
|
||||
calculate_cepstrum(pfeatures + OSCE_NOISY_CEPSTRUM_START, frame - 160);
|
||||
}
|
||||
else
|
||||
{
|
||||
OPUS_COPY(pfeatures + OSCE_NOISY_CEPSTRUM_START, pfeatures + OSCE_NOISY_CEPSTRUM_START - OSCE_FEATURE_DIM, OSCE_NOISY_CEPSTRUM_LENGTH);
|
||||
}
|
||||
|
||||
/* pitch hangover and zero value replacement */
|
||||
periods[k] = pitch_postprocessing(psFeatures, psDecCtrl->pitchL[k], psDec->indices.signalType);
|
||||
|
||||
/* auto-correlation around pitch lag */
|
||||
calculate_acorr(pfeatures + OSCE_ACORR_START, frame, periods[k]);
|
||||
|
||||
/* ltp */
|
||||
celt_assert(OSCE_LTP_LENGTH == LTP_ORDER)
|
||||
for (i = 0; i < OSCE_LTP_LENGTH; i++)
|
||||
{
|
||||
pfeatures[OSCE_LTP_START + i] = (float) psDecCtrl->LTPCoef_Q14[k * LTP_ORDER + i] / (1U << 14);
|
||||
}
|
||||
|
||||
/* frame gain */
|
||||
pfeatures[OSCE_LOG_GAIN_START] = log((float) psDecCtrl->Gains_Q16[k] / (1UL << 16) + 1e-9f);
|
||||
|
||||
#ifdef WRITE_FEATURES
|
||||
fwrite(pfeatures, sizeof(*pfeatures), 93, f_feat);
|
||||
#endif
|
||||
}
|
||||
|
||||
/* buffer update */
|
||||
OPUS_COPY(psFeatures->signal_history, &buffer[num_samples], OSCE_FEATURES_MAX_HISTORY);
|
||||
}
|
||||
|
||||
|
||||
void osce_cross_fade_10ms(float *x_enhanced, float *x_in, int length)
|
||||
{
|
||||
int i;
|
||||
celt_assert(length >= 160);
|
||||
|
||||
for (i = 0; i < 160; i++)
|
||||
{
|
||||
x_enhanced[i] = osce_window[i] * x_enhanced[i] + (1.f - osce_window[i]) * x_in[i];
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
50
managed_components/78__esp-opus/dnn/osce_features.h
Normal file
50
managed_components/78__esp-opus/dnn/osce_features.h
Normal file
@@ -0,0 +1,50 @@
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef OSCE_FEATURES_H
|
||||
#define OSCE_FEATURES_H
|
||||
|
||||
|
||||
#include "structs.h"
|
||||
#include "opus_types.h"
|
||||
|
||||
#define OSCE_NUMBITS_BUGFIX
|
||||
|
||||
void osce_calculate_features(
|
||||
silk_decoder_state *psDec, /* I/O Decoder state */
|
||||
silk_decoder_control *psDecCtrl, /* I Decoder control */
|
||||
float *features, /* O input features */
|
||||
float *numbits, /* O numbits and smoothed numbits */
|
||||
int *periods, /* O pitch lags on subframe basis */
|
||||
const opus_int16 xq[], /* I Decoded speech */
|
||||
opus_int32 num_bits /* I Size of SILK payload in bits */
|
||||
);
|
||||
|
||||
|
||||
void osce_cross_fade_10ms(float *x_enhanced, float *x_in, int length);
|
||||
|
||||
#endif
|
||||
125
managed_components/78__esp-opus/dnn/osce_structs.h
Normal file
125
managed_components/78__esp-opus/dnn/osce_structs.h
Normal file
@@ -0,0 +1,125 @@
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifndef OSCE_STRUCTS_H
|
||||
#define OSCE_STRUCTS_H
|
||||
|
||||
#include "opus_types.h"
|
||||
#include "osce_config.h"
|
||||
#ifndef DISABLE_LACE
|
||||
#include "lace_data.h"
|
||||
#endif
|
||||
#ifndef DISABLE_NOLACE
|
||||
#include "nolace_data.h"
|
||||
#endif
|
||||
#include "nndsp.h"
|
||||
#include "nnet.h"
|
||||
|
||||
/* feature calculation */
|
||||
|
||||
typedef struct {
|
||||
float numbits_smooth;
|
||||
int pitch_hangover_count;
|
||||
int last_lag;
|
||||
int last_type;
|
||||
float signal_history[OSCE_FEATURES_MAX_HISTORY];
|
||||
int reset;
|
||||
} OSCEFeatureState;
|
||||
|
||||
|
||||
#ifndef DISABLE_LACE
|
||||
/* LACE */
|
||||
typedef struct {
|
||||
float feature_net_conv2_state[LACE_FNET_CONV2_STATE_SIZE];
|
||||
float feature_net_gru_state[LACE_COND_DIM];
|
||||
AdaCombState cf1_state;
|
||||
AdaCombState cf2_state;
|
||||
AdaConvState af1_state;
|
||||
float preemph_mem;
|
||||
float deemph_mem;
|
||||
} LACEState;
|
||||
|
||||
typedef struct
|
||||
{
|
||||
LACELayers layers;
|
||||
float window[LACE_OVERLAP_SIZE];
|
||||
} LACE;
|
||||
|
||||
#endif /* #ifndef DISABLE_LACE */
|
||||
|
||||
|
||||
#ifndef DISABLE_NOLACE
|
||||
/* NoLACE */
|
||||
typedef struct {
|
||||
float feature_net_conv2_state[NOLACE_FNET_CONV2_STATE_SIZE];
|
||||
float feature_net_gru_state[NOLACE_COND_DIM];
|
||||
float post_cf1_state[NOLACE_COND_DIM];
|
||||
float post_cf2_state[NOLACE_COND_DIM];
|
||||
float post_af1_state[NOLACE_COND_DIM];
|
||||
float post_af2_state[NOLACE_COND_DIM];
|
||||
float post_af3_state[NOLACE_COND_DIM];
|
||||
AdaCombState cf1_state;
|
||||
AdaCombState cf2_state;
|
||||
AdaConvState af1_state;
|
||||
AdaConvState af2_state;
|
||||
AdaConvState af3_state;
|
||||
AdaConvState af4_state;
|
||||
AdaShapeState tdshape1_state;
|
||||
AdaShapeState tdshape2_state;
|
||||
AdaShapeState tdshape3_state;
|
||||
float preemph_mem;
|
||||
float deemph_mem;
|
||||
} NoLACEState;
|
||||
|
||||
typedef struct {
|
||||
NOLACELayers layers;
|
||||
float window[LACE_OVERLAP_SIZE];
|
||||
} NoLACE;
|
||||
|
||||
#endif /* #ifndef DISABLE_NOLACE */
|
||||
|
||||
/* OSCEModel */
|
||||
typedef struct {
|
||||
int loaded;
|
||||
#ifndef DISABLE_LACE
|
||||
LACE lace;
|
||||
#endif
|
||||
#ifndef DISABLE_NOLACE
|
||||
NoLACE nolace;
|
||||
#endif
|
||||
} OSCEModel;
|
||||
|
||||
typedef union {
|
||||
#ifndef DISABLE_LACE
|
||||
LACEState lace;
|
||||
#endif
|
||||
#ifndef DISABLE_NOLACE
|
||||
NoLACEState nolace;
|
||||
#endif
|
||||
} OSCEState;
|
||||
|
||||
#endif
|
||||
238
managed_components/78__esp-opus/dnn/parse_lpcnet_weights.c
Normal file
238
managed_components/78__esp-opus/dnn/parse_lpcnet_weights.c
Normal file
@@ -0,0 +1,238 @@
|
||||
/* Copyright (c) 2023 Amazon */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
|
||||
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <string.h>
|
||||
#include <stdlib.h>
|
||||
#include "nnet.h"
|
||||
#include "os_support.h"
|
||||
|
||||
#define SPARSE_BLOCK_SIZE 32
|
||||
|
||||
int parse_record(const void **data, int *len, WeightArray *array) {
|
||||
WeightHead *h = (WeightHead *)*data;
|
||||
if (*len < WEIGHT_BLOCK_SIZE) return -1;
|
||||
if (h->block_size < h->size) return -1;
|
||||
if (h->block_size > *len-WEIGHT_BLOCK_SIZE) return -1;
|
||||
if (h->name[sizeof(h->name)-1] != 0) return -1;
|
||||
if (h->size < 0) return -1;
|
||||
array->name = h->name;
|
||||
array->type = h->type;
|
||||
array->size = h->size;
|
||||
array->data = (void*)((unsigned char*)(*data)+WEIGHT_BLOCK_SIZE);
|
||||
|
||||
*data = (void*)((unsigned char*)*data + h->block_size+WEIGHT_BLOCK_SIZE);
|
||||
*len -= h->block_size+WEIGHT_BLOCK_SIZE;
|
||||
return array->size;
|
||||
}
|
||||
|
||||
int parse_weights(WeightArray **list, const void *data, int len)
|
||||
{
|
||||
int nb_arrays=0;
|
||||
int capacity=20;
|
||||
*list = opus_alloc(capacity*sizeof(WeightArray));
|
||||
while (len > 0) {
|
||||
int ret;
|
||||
WeightArray array = {NULL, 0, 0, 0};
|
||||
ret = parse_record(&data, &len, &array);
|
||||
if (ret > 0) {
|
||||
if (nb_arrays+1 >= capacity) {
|
||||
/* Make sure there's room for the ending NULL element too. */
|
||||
capacity = capacity*3/2;
|
||||
*list = opus_realloc(*list, capacity*sizeof(WeightArray));
|
||||
}
|
||||
(*list)[nb_arrays++] = array;
|
||||
} else {
|
||||
opus_free(*list);
|
||||
*list = NULL;
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
(*list)[nb_arrays].name=NULL;
|
||||
return nb_arrays;
|
||||
}
|
||||
|
||||
static const void *find_array_entry(const WeightArray *arrays, const char *name) {
|
||||
while (arrays->name && strcmp(arrays->name, name) != 0) arrays++;
|
||||
return arrays;
|
||||
}
|
||||
|
||||
static const void *find_array_check(const WeightArray *arrays, const char *name, int size) {
|
||||
const WeightArray *a = find_array_entry(arrays, name);
|
||||
if (a->name && a->size == size) return a->data;
|
||||
else return NULL;
|
||||
}
|
||||
|
||||
static const void *opt_array_check(const WeightArray *arrays, const char *name, int size, int *error) {
|
||||
const WeightArray *a = find_array_entry(arrays, name);
|
||||
*error = (a->name != NULL && a->size != size);
|
||||
if (a->name && a->size == size) return a->data;
|
||||
else return NULL;
|
||||
}
|
||||
|
||||
static const void *find_idx_check(const WeightArray *arrays, const char *name, int nb_in, int nb_out, int *total_blocks) {
|
||||
int remain;
|
||||
const int *idx;
|
||||
const WeightArray *a = find_array_entry(arrays, name);
|
||||
*total_blocks = 0;
|
||||
if (a == NULL) return NULL;
|
||||
idx = a->data;
|
||||
remain = a->size/sizeof(int);
|
||||
while (remain > 0) {
|
||||
int nb_blocks;
|
||||
int i;
|
||||
nb_blocks = *idx++;
|
||||
if (remain < nb_blocks+1) return NULL;
|
||||
for (i=0;i<nb_blocks;i++) {
|
||||
int pos = *idx++;
|
||||
if (pos+3 >= nb_in || (pos&0x3)) return NULL;
|
||||
}
|
||||
nb_out -= 8;
|
||||
remain -= nb_blocks+1;
|
||||
*total_blocks += nb_blocks;
|
||||
}
|
||||
if (nb_out != 0) return NULL;
|
||||
return a->data;
|
||||
}
|
||||
|
||||
int linear_init(LinearLayer *layer, const WeightArray *arrays,
|
||||
const char *bias,
|
||||
const char *subias,
|
||||
const char *weights,
|
||||
const char *float_weights,
|
||||
const char *weights_idx,
|
||||
const char *diag,
|
||||
const char *scale,
|
||||
int nb_inputs,
|
||||
int nb_outputs)
|
||||
{
|
||||
int err;
|
||||
layer->bias = NULL;
|
||||
layer->subias = NULL;
|
||||
layer->weights = NULL;
|
||||
layer->float_weights = NULL;
|
||||
layer->weights_idx = NULL;
|
||||
layer->diag = NULL;
|
||||
layer->scale = NULL;
|
||||
if (bias != NULL) {
|
||||
if ((layer->bias = find_array_check(arrays, bias, nb_outputs*sizeof(layer->bias[0]))) == NULL) return 1;
|
||||
}
|
||||
if (subias != NULL) {
|
||||
if ((layer->subias = find_array_check(arrays, subias, nb_outputs*sizeof(layer->subias[0]))) == NULL) return 1;
|
||||
}
|
||||
if (weights_idx != NULL) {
|
||||
int total_blocks;
|
||||
if ((layer->weights_idx = find_idx_check(arrays, weights_idx, nb_inputs, nb_outputs, &total_blocks)) == NULL) return 1;
|
||||
if (weights != NULL) {
|
||||
if ((layer->weights = find_array_check(arrays, weights, SPARSE_BLOCK_SIZE*total_blocks*sizeof(layer->weights[0]))) == NULL) return 1;
|
||||
}
|
||||
if (float_weights != NULL) {
|
||||
layer->float_weights = opt_array_check(arrays, float_weights, SPARSE_BLOCK_SIZE*total_blocks*sizeof(layer->float_weights[0]), &err);
|
||||
if (err) return 1;
|
||||
}
|
||||
} else {
|
||||
if (weights != NULL) {
|
||||
if ((layer->weights = find_array_check(arrays, weights, nb_inputs*nb_outputs*sizeof(layer->weights[0]))) == NULL) return 1;
|
||||
}
|
||||
if (float_weights != NULL) {
|
||||
layer->float_weights = opt_array_check(arrays, float_weights, nb_inputs*nb_outputs*sizeof(layer->float_weights[0]), &err);
|
||||
if (err) return 1;
|
||||
}
|
||||
}
|
||||
if (diag != NULL) {
|
||||
if ((layer->diag = find_array_check(arrays, diag, nb_outputs*sizeof(layer->diag[0]))) == NULL) return 1;
|
||||
}
|
||||
if (weights != NULL) {
|
||||
if ((layer->scale = find_array_check(arrays, scale, nb_outputs*sizeof(layer->scale[0]))) == NULL) return 1;
|
||||
}
|
||||
layer->nb_inputs = nb_inputs;
|
||||
layer->nb_outputs = nb_outputs;
|
||||
return 0;
|
||||
}
|
||||
|
||||
int conv2d_init(Conv2dLayer *layer, const WeightArray *arrays,
|
||||
const char *bias,
|
||||
const char *float_weights,
|
||||
int in_channels,
|
||||
int out_channels,
|
||||
int ktime,
|
||||
int kheight)
|
||||
{
|
||||
int err;
|
||||
layer->bias = NULL;
|
||||
layer->float_weights = NULL;
|
||||
if (bias != NULL) {
|
||||
if ((layer->bias = find_array_check(arrays, bias, out_channels*sizeof(layer->bias[0]))) == NULL) return 1;
|
||||
}
|
||||
if (float_weights != NULL) {
|
||||
layer->float_weights = opt_array_check(arrays, float_weights, in_channels*out_channels*ktime*kheight*sizeof(layer->float_weights[0]), &err);
|
||||
if (err) return 1;
|
||||
}
|
||||
layer->in_channels = in_channels;
|
||||
layer->out_channels = out_channels;
|
||||
layer->ktime = ktime;
|
||||
layer->kheight = kheight;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
#if 0
|
||||
#include <fcntl.h>
|
||||
#include <sys/mman.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/stat.h>
|
||||
#include <stdio.h>
|
||||
|
||||
int main()
|
||||
{
|
||||
int fd;
|
||||
void *data;
|
||||
int len;
|
||||
int nb_arrays;
|
||||
int i;
|
||||
WeightArray *list;
|
||||
struct stat st;
|
||||
const char *filename = "weights_blob.bin";
|
||||
stat(filename, &st);
|
||||
len = st.st_size;
|
||||
fd = open(filename, O_RDONLY);
|
||||
data = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
|
||||
printf("size is %d\n", len);
|
||||
nb_arrays = parse_weights(&list, data, len);
|
||||
for (i=0;i<nb_arrays;i++) {
|
||||
printf("found %s: size %d\n", list[i].name, list[i].size);
|
||||
}
|
||||
printf("%p\n", list[i].name);
|
||||
opus_free(list);
|
||||
munmap(data, len);
|
||||
close(fd);
|
||||
return 0;
|
||||
}
|
||||
#endif
|
||||
79
managed_components/78__esp-opus/dnn/pitchdnn.c
Normal file
79
managed_components/78__esp-opus/dnn/pitchdnn.c
Normal file
@@ -0,0 +1,79 @@
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
#include <math.h>
|
||||
#include "pitchdnn.h"
|
||||
#include "os_support.h"
|
||||
#include "nnet.h"
|
||||
#include "lpcnet_private.h"
|
||||
|
||||
|
||||
float compute_pitchdnn(
|
||||
PitchDNNState *st,
|
||||
const float *if_features,
|
||||
const float *xcorr_features,
|
||||
int arch
|
||||
)
|
||||
{
|
||||
float if1_out[DENSE_IF_UPSAMPLER_1_OUT_SIZE];
|
||||
float downsampler_in[NB_XCORR_FEATURES + DENSE_IF_UPSAMPLER_2_OUT_SIZE];
|
||||
float downsampler_out[DENSE_DOWNSAMPLER_OUT_SIZE];
|
||||
float conv1_tmp1[(NB_XCORR_FEATURES + 2)*8] = {0};
|
||||
float conv1_tmp2[(NB_XCORR_FEATURES + 2)*8] = {0};
|
||||
float output[DENSE_FINAL_UPSAMPLER_OUT_SIZE];
|
||||
int i;
|
||||
int pos=0;
|
||||
float maxval=-1;
|
||||
float sum=0;
|
||||
float count=0;
|
||||
PitchDNN *model = &st->model;
|
||||
/* IF */
|
||||
compute_generic_dense(&model->dense_if_upsampler_1, if1_out, if_features, ACTIVATION_TANH, arch);
|
||||
compute_generic_dense(&model->dense_if_upsampler_2, &downsampler_in[NB_XCORR_FEATURES], if1_out, ACTIVATION_TANH, arch);
|
||||
/* xcorr*/
|
||||
OPUS_COPY(&conv1_tmp1[1], xcorr_features, NB_XCORR_FEATURES);
|
||||
compute_conv2d(&model->conv2d_1, &conv1_tmp2[1], st->xcorr_mem1, conv1_tmp1, NB_XCORR_FEATURES, NB_XCORR_FEATURES+2, ACTIVATION_TANH, arch);
|
||||
compute_conv2d(&model->conv2d_2, downsampler_in, st->xcorr_mem2, conv1_tmp2, NB_XCORR_FEATURES, NB_XCORR_FEATURES, ACTIVATION_TANH, arch);
|
||||
|
||||
compute_generic_dense(&model->dense_downsampler, downsampler_out, downsampler_in, ACTIVATION_TANH, arch);
|
||||
compute_generic_gru(&model->gru_1_input, &model->gru_1_recurrent, st->gru_state, downsampler_out, arch);
|
||||
compute_generic_dense(&model->dense_final_upsampler, output, st->gru_state, ACTIVATION_LINEAR, arch);
|
||||
for (i=0;i<180;i++) {
|
||||
if (output[i] > maxval) {
|
||||
pos = i;
|
||||
maxval = output[i];
|
||||
}
|
||||
}
|
||||
for (i=IMAX(0, pos-2); i<=IMIN(179, pos+2); i++) {
|
||||
float p = exp(output[i]);
|
||||
sum += p*i;
|
||||
count += p;
|
||||
}
|
||||
/*printf("%d %f\n", pos, sum/count);*/
|
||||
return (1.f/60.f)*(sum/count) - 1.5;
|
||||
/*return 256.f/pow(2.f, (1.f/60.f)*i);*/
|
||||
}
|
||||
|
||||
|
||||
void pitchdnn_init(PitchDNNState *st)
|
||||
{
|
||||
int ret;
|
||||
OPUS_CLEAR(st, 1);
|
||||
#ifndef USE_WEIGHTS_FILE
|
||||
ret = init_pitchdnn(&st->model, pitchdnn_arrays);
|
||||
#else
|
||||
ret = 0;
|
||||
#endif
|
||||
celt_assert(ret == 0);
|
||||
}
|
||||
|
||||
int pitchdnn_load_model(PitchDNNState *st, const void *data, int len) {
|
||||
WeightArray *list;
|
||||
int ret;
|
||||
parse_weights(&list, data, len);
|
||||
ret = init_pitchdnn(&st->model, list);
|
||||
opus_free(list);
|
||||
if (ret == 0) return 0;
|
||||
else return -1;
|
||||
}
|
||||
34
managed_components/78__esp-opus/dnn/pitchdnn.h
Normal file
34
managed_components/78__esp-opus/dnn/pitchdnn.h
Normal file
@@ -0,0 +1,34 @@
|
||||
#ifndef PITCHDNN_H
|
||||
#define PITCHDNN_H
|
||||
|
||||
|
||||
typedef struct PitchDNN PitchDNN;
|
||||
|
||||
#include "pitchdnn_data.h"
|
||||
|
||||
#define PITCH_MIN_PERIOD 32
|
||||
#define PITCH_MAX_PERIOD 256
|
||||
|
||||
#define NB_XCORR_FEATURES (PITCH_MAX_PERIOD-PITCH_MIN_PERIOD)
|
||||
|
||||
|
||||
typedef struct {
|
||||
PitchDNN model;
|
||||
float gru_state[GRU_1_STATE_SIZE];
|
||||
float xcorr_mem1[(NB_XCORR_FEATURES + 2)*2];
|
||||
float xcorr_mem2[(NB_XCORR_FEATURES + 2)*2*8];
|
||||
float xcorr_mem3[(NB_XCORR_FEATURES + 2)*2*8];
|
||||
} PitchDNNState;
|
||||
|
||||
|
||||
void pitchdnn_init(PitchDNNState *st);
|
||||
int pitchdnn_load_model(PitchDNNState *st, const void *data, int len);
|
||||
|
||||
float compute_pitchdnn(
|
||||
PitchDNNState *st,
|
||||
const float *if_features,
|
||||
const float *xcorr_features,
|
||||
int arch
|
||||
);
|
||||
|
||||
#endif
|
||||
50
managed_components/78__esp-opus/dnn/tansig_table.h
Normal file
50
managed_components/78__esp-opus/dnn/tansig_table.h
Normal file
@@ -0,0 +1,50 @@
|
||||
/* This file is auto-generated by gen_tables */
|
||||
|
||||
#ifndef TANSIG_TABLE_H
|
||||
#define TANSIG_TABLE_H
|
||||
|
||||
static const float tansig_table[201] = {
|
||||
0.000000f, 0.039979f, 0.079830f, 0.119427f, 0.158649f,
|
||||
0.197375f, 0.235496f, 0.272905f, 0.309507f, 0.345214f,
|
||||
0.379949f, 0.413644f, 0.446244f, 0.477700f, 0.507977f,
|
||||
0.537050f, 0.564900f, 0.591519f, 0.616909f, 0.641077f,
|
||||
0.664037f, 0.685809f, 0.706419f, 0.725897f, 0.744277f,
|
||||
0.761594f, 0.777888f, 0.793199f, 0.807569f, 0.821040f,
|
||||
0.833655f, 0.845456f, 0.856485f, 0.866784f, 0.876393f,
|
||||
0.885352f, 0.893698f, 0.901468f, 0.908698f, 0.915420f,
|
||||
0.921669f, 0.927473f, 0.932862f, 0.937863f, 0.942503f,
|
||||
0.946806f, 0.950795f, 0.954492f, 0.957917f, 0.961090f,
|
||||
0.964028f, 0.966747f, 0.969265f, 0.971594f, 0.973749f,
|
||||
0.975743f, 0.977587f, 0.979293f, 0.980869f, 0.982327f,
|
||||
0.983675f, 0.984921f, 0.986072f, 0.987136f, 0.988119f,
|
||||
0.989027f, 0.989867f, 0.990642f, 0.991359f, 0.992020f,
|
||||
0.992631f, 0.993196f, 0.993718f, 0.994199f, 0.994644f,
|
||||
0.995055f, 0.995434f, 0.995784f, 0.996108f, 0.996407f,
|
||||
0.996682f, 0.996937f, 0.997172f, 0.997389f, 0.997590f,
|
||||
0.997775f, 0.997946f, 0.998104f, 0.998249f, 0.998384f,
|
||||
0.998508f, 0.998623f, 0.998728f, 0.998826f, 0.998916f,
|
||||
0.999000f, 0.999076f, 0.999147f, 0.999213f, 0.999273f,
|
||||
0.999329f, 0.999381f, 0.999428f, 0.999472f, 0.999513f,
|
||||
0.999550f, 0.999585f, 0.999617f, 0.999646f, 0.999673f,
|
||||
0.999699f, 0.999722f, 0.999743f, 0.999763f, 0.999781f,
|
||||
0.999798f, 0.999813f, 0.999828f, 0.999841f, 0.999853f,
|
||||
0.999865f, 0.999875f, 0.999885f, 0.999893f, 0.999902f,
|
||||
0.999909f, 0.999916f, 0.999923f, 0.999929f, 0.999934f,
|
||||
0.999939f, 0.999944f, 0.999948f, 0.999952f, 0.999956f,
|
||||
0.999959f, 0.999962f, 0.999965f, 0.999968f, 0.999970f,
|
||||
0.999973f, 0.999975f, 0.999977f, 0.999978f, 0.999980f,
|
||||
0.999982f, 0.999983f, 0.999984f, 0.999986f, 0.999987f,
|
||||
0.999988f, 0.999989f, 0.999990f, 0.999990f, 0.999991f,
|
||||
0.999992f, 0.999992f, 0.999993f, 0.999994f, 0.999994f,
|
||||
0.999994f, 0.999995f, 0.999995f, 0.999996f, 0.999996f,
|
||||
0.999996f, 0.999997f, 0.999997f, 0.999997f, 0.999997f,
|
||||
0.999997f, 0.999998f, 0.999998f, 0.999998f, 0.999998f,
|
||||
0.999998f, 0.999998f, 0.999999f, 0.999999f, 0.999999f,
|
||||
0.999999f, 0.999999f, 0.999999f, 0.999999f, 0.999999f,
|
||||
0.999999f, 0.999999f, 0.999999f, 0.999999f, 0.999999f,
|
||||
1.000000f, 1.000000f, 1.000000f, 1.000000f, 1.000000f,
|
||||
1.000000f, 1.000000f, 1.000000f, 1.000000f, 1.000000f,
|
||||
1.000000f,
|
||||
};
|
||||
|
||||
#endif /*TANSIG_TABLE_H*/
|
||||
128
managed_components/78__esp-opus/dnn/test_vec.c
Normal file
128
managed_components/78__esp-opus/dnn/test_vec.c
Normal file
@@ -0,0 +1,128 @@
|
||||
#include <stdio.h>
|
||||
#include <math.h>
|
||||
#include "opus_types.h"
|
||||
#include "arch.h"
|
||||
#include "common.h"
|
||||
#include "tansig_table.h"
|
||||
|
||||
#define LPCNET_TEST
|
||||
|
||||
// we need to call two versions of each functions that have the same
|
||||
// name, so use #defines to temp rename them
|
||||
|
||||
#define lpcnet_exp2 lpcnet_exp2_fast
|
||||
#define tansig_approx tansig_approx_fast
|
||||
#define sigmoid_approx sigmoid_approx_fast
|
||||
#define softmax softmax_fast
|
||||
#define vec_tanh vec_tanh_fast
|
||||
#define vec_sigmoid vec_sigmoid_fast
|
||||
#define sgemv_accum16 sgemv_accum16_fast
|
||||
#define sparse_sgemv_accum16 sparse_sgemv_accum16_fast
|
||||
|
||||
#ifdef __AVX__
|
||||
#include "vec_avx.h"
|
||||
#ifdef __AVX2__
|
||||
const char simd[]="AVX2";
|
||||
#else
|
||||
const char simd[]="AVX";
|
||||
#endif
|
||||
#elif __ARM_NEON__
|
||||
#include "vec_neon.h"
|
||||
const char simd[]="NEON";
|
||||
#else
|
||||
const char simd[]="none";
|
||||
|
||||
#endif
|
||||
|
||||
#undef lpcnet_exp2
|
||||
#undef tansig_approx
|
||||
#undef sigmoid_approx
|
||||
#undef softmax
|
||||
#undef vec_tanh
|
||||
#undef vec_sigmoid
|
||||
#undef sgemv_accum16
|
||||
#undef sparse_sgemv_accum16
|
||||
#include "vec.h"
|
||||
|
||||
#define ROW_STEP 16
|
||||
#define ROWS ROW_STEP*10
|
||||
#define COLS 2
|
||||
#define ENTRIES 2
|
||||
|
||||
int test_sgemv_accum16() {
|
||||
float weights[ROWS*COLS];
|
||||
float x[COLS];
|
||||
float out[ROWS], out_fast[ROWS];
|
||||
int i;
|
||||
|
||||
printf("sgemv_accum16.....................: ");
|
||||
for(i=0; i<ROWS*COLS; i++) {
|
||||
weights[i] = i;
|
||||
}
|
||||
for(i=0; i<ROWS; i++) {
|
||||
out[i] = 0;
|
||||
out_fast[i] = 0;
|
||||
}
|
||||
|
||||
for(i=0; i<COLS; i++) {
|
||||
x[i] = i+1;
|
||||
}
|
||||
|
||||
sgemv_accum16(out, weights, ROWS, COLS, 1, x);
|
||||
sgemv_accum16_fast(out_fast, weights, ROWS, COLS, 1, x);
|
||||
|
||||
for(i=0; i<ROWS; i++) {
|
||||
if (out[i] != out_fast[i]) {
|
||||
printf("fail\n");
|
||||
for(i=0; i<ROWS; i++) {
|
||||
printf("%d %f %f\n", i, out[i], out_fast[i]);
|
||||
if (out[i] != out_fast[i])
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
printf("pass\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
int test_sparse_sgemv_accum16() {
|
||||
int rows = ROW_STEP*ENTRIES;
|
||||
int indx[] = {1,0,2,0,1};
|
||||
float w[ROW_STEP*(1+2)];
|
||||
float x[ENTRIES] = {1,2};
|
||||
float out[ROW_STEP*(1+2)], out_fast[ROW_STEP*(1+2)];
|
||||
int i;
|
||||
|
||||
printf("sparse_sgemv_accum16..............: ");
|
||||
for(i=0; i<ROW_STEP*(1+2); i++) {
|
||||
w[i] = i;
|
||||
out[i] = 0;
|
||||
out_fast[i] = 0;
|
||||
}
|
||||
|
||||
sparse_sgemv_accum16(out, w, rows, indx, x);
|
||||
sparse_sgemv_accum16_fast(out_fast, w, rows, indx, x);
|
||||
|
||||
for(i=0; i<ROW_STEP*ENTRIES; i++) {
|
||||
if (out[i] != out_fast[i]) {
|
||||
printf("fail\n");
|
||||
for(i=0; i<ROW_STEP*ENTRIES; i++) {
|
||||
printf("%d %f %f\n", i, out[i], out_fast[i]);
|
||||
if (out[i] != out_fast[i])
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
printf("pass\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main() {
|
||||
printf("testing vector routines on SIMD: %s\n", simd);
|
||||
int test1 = test_sgemv_accum16();
|
||||
int test2 = test_sparse_sgemv_accum16();
|
||||
return test1 || test2;
|
||||
}
|
||||
@@ -0,0 +1,2 @@
|
||||
from . import quantization
|
||||
from . import sparsification
|
||||
@@ -0,0 +1 @@
|
||||
from .softquant import soft_quant, remove_soft_quant
|
||||
@@ -0,0 +1,113 @@
|
||||
import torch
|
||||
|
||||
@torch.no_grad()
|
||||
def compute_optimal_scale(weight):
|
||||
with torch.no_grad():
|
||||
n_out, n_in = weight.shape
|
||||
assert n_in % 4 == 0
|
||||
if n_out % 8:
|
||||
# add padding
|
||||
pad = n_out - n_out % 8
|
||||
weight = torch.cat((weight, torch.zeros((pad, n_in), dtype=weight.dtype, device=weight.device)), dim=0)
|
||||
|
||||
weight_max_abs, _ = torch.max(torch.abs(weight), dim=1)
|
||||
weight_max_sum, _ = torch.max(torch.abs(weight[:, : n_in : 2] + weight[:, 1 : n_in : 2]), dim=1)
|
||||
scale_max = weight_max_abs / 127
|
||||
scale_sum = weight_max_sum / 129
|
||||
|
||||
scale = torch.maximum(scale_max, scale_sum)
|
||||
|
||||
return scale[:n_out]
|
||||
|
||||
@torch.no_grad()
|
||||
def q_scaled_noise(module, weight):
|
||||
if isinstance(module, torch.nn.Conv1d):
|
||||
w = weight.permute(0, 2, 1).flatten(1)
|
||||
noise = torch.rand_like(w) - 0.5
|
||||
noise[w == 0] = 0 # ignore zero entries from sparsification
|
||||
scale = compute_optimal_scale(w)
|
||||
noise = noise * scale.unsqueeze(-1)
|
||||
noise = noise.reshape(weight.size(0), weight.size(2), weight.size(1)).permute(0, 2, 1)
|
||||
elif isinstance(module, torch.nn.ConvTranspose1d):
|
||||
i, o, k = weight.shape
|
||||
w = weight.permute(2, 1, 0).reshape(k * o, i)
|
||||
noise = torch.rand_like(w) - 0.5
|
||||
noise[w == 0] = 0 # ignore zero entries from sparsification
|
||||
scale = compute_optimal_scale(w)
|
||||
noise = noise * scale.unsqueeze(-1)
|
||||
noise = noise.reshape(k, o, i).permute(2, 1, 0)
|
||||
elif len(weight.shape) == 2:
|
||||
noise = torch.rand_like(weight) - 0.5
|
||||
noise[weight == 0] = 0 # ignore zero entries from sparsification
|
||||
scale = compute_optimal_scale(weight)
|
||||
noise = noise * scale.unsqueeze(-1)
|
||||
else:
|
||||
raise ValueError('unknown quantization setting')
|
||||
|
||||
return noise
|
||||
|
||||
class SoftQuant:
|
||||
name: str
|
||||
|
||||
def __init__(self, names: str, scale: float) -> None:
|
||||
self.names = names
|
||||
self.quantization_noise = None
|
||||
self.scale = scale
|
||||
|
||||
def __call__(self, module, inputs, *args, before=True):
|
||||
if not module.training: return
|
||||
|
||||
if before:
|
||||
self.quantization_noise = dict()
|
||||
for name in self.names:
|
||||
weight = getattr(module, name)
|
||||
if self.scale is None:
|
||||
self.quantization_noise[name] = q_scaled_noise(module, weight)
|
||||
else:
|
||||
self.quantization_noise[name] = \
|
||||
self.scale * (torch.rand_like(weight) - 0.5)
|
||||
with torch.no_grad():
|
||||
weight.data[:] = weight + self.quantization_noise[name]
|
||||
else:
|
||||
for name in self.names:
|
||||
weight = getattr(module, name)
|
||||
with torch.no_grad():
|
||||
weight.data[:] = weight - self.quantization_noise[name]
|
||||
self.quantization_noise = None
|
||||
|
||||
def apply(module, names=['weight'], scale=None):
|
||||
fn = SoftQuant(names, scale)
|
||||
|
||||
for name in names:
|
||||
if not hasattr(module, name):
|
||||
raise ValueError("")
|
||||
|
||||
fn_before = lambda *x : fn(*x, before=True)
|
||||
fn_after = lambda *x : fn(*x, before=False)
|
||||
setattr(fn_before, 'sqm', fn)
|
||||
setattr(fn_after, 'sqm', fn)
|
||||
|
||||
|
||||
module.register_forward_pre_hook(fn_before)
|
||||
module.register_forward_hook(fn_after)
|
||||
|
||||
module
|
||||
|
||||
return fn
|
||||
|
||||
|
||||
def soft_quant(module, names=['weight'], scale=None):
|
||||
fn = SoftQuant.apply(module, names, scale)
|
||||
return module
|
||||
|
||||
def remove_soft_quant(module, names=['weight']):
|
||||
for k, hook in module._forward_pre_hooks.items():
|
||||
if hasattr(hook, 'sqm'):
|
||||
if isinstance(hook.sqm, SoftQuant) and hook.sqm.names == names:
|
||||
del module._forward_pre_hooks[k]
|
||||
for k, hook in module._forward_hooks.items():
|
||||
if hasattr(hook, 'sqm'):
|
||||
if isinstance(hook.sqm, SoftQuant) and hook.sqm.names == names:
|
||||
del module._forward_hooks[k]
|
||||
|
||||
return module
|
||||
@@ -0,0 +1,2 @@
|
||||
from .relegance import relegance_gradient_weighting, relegance_create_tconv_kernel, relegance_map_relevance_to_input_domain, relegance_resize_relevance_to_input_size
|
||||
from .meta_critic import MetaCritic
|
||||
@@ -0,0 +1,85 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import torch
|
||||
|
||||
class MetaCritic():
|
||||
def __init__(self, normalize=False, gamma=0.9, beta=0.0, joint_stats=False):
|
||||
""" Class for assessing relevance of discriminator scores
|
||||
|
||||
Args:
|
||||
gamma (float, optional): update rate for tracking discriminator stats. Defaults to 0.9.
|
||||
beta (float, optional): Miminum confidence related threshold. Defaults to 0.0.
|
||||
"""
|
||||
self.normalize = normalize
|
||||
self.gamma = gamma
|
||||
self.beta = beta
|
||||
self.joint_stats = joint_stats
|
||||
|
||||
self.disc_stats = dict()
|
||||
|
||||
def __call__(self, disc_id, real_scores, generated_scores):
|
||||
""" calculates relevance from normalized scores
|
||||
|
||||
Args:
|
||||
disc_id (any valid key): id for tracking discriminator statistics
|
||||
real_scores (torch.tensor): scores for real data
|
||||
generated_scores (torch.tensor): scores for generated data; expecting device to match real_scores.device
|
||||
|
||||
Returns:
|
||||
torch.tensor: output-domain relevance
|
||||
"""
|
||||
|
||||
if self.normalize:
|
||||
real_std = torch.std(real_scores.detach()).cpu().item()
|
||||
gen_std = torch.std(generated_scores.detach()).cpu().item()
|
||||
std = (real_std**2 + gen_std**2) ** .5
|
||||
mean = torch.mean(real_scores.detach()).cpu().item() - torch.mean(generated_scores.detach()).cpu().item()
|
||||
|
||||
key = 0 if self.joint_stats else disc_id
|
||||
|
||||
if key in self.disc_stats:
|
||||
self.disc_stats[key]['std'] = self.gamma * self.disc_stats[key]['std'] + (1 - self.gamma) * std
|
||||
self.disc_stats[key]['mean'] = self.gamma * self.disc_stats[key]['mean'] + (1 - self.gamma) * mean
|
||||
else:
|
||||
self.disc_stats[key] = {
|
||||
'std': std + 1e-5,
|
||||
'mean': mean
|
||||
}
|
||||
|
||||
std = self.disc_stats[key]['std']
|
||||
mean = self.disc_stats[key]['mean']
|
||||
else:
|
||||
mean, std = 0, 1
|
||||
|
||||
relevance = torch.relu((real_scores - generated_scores - mean) / std + mean - self.beta)
|
||||
|
||||
if False: print(f"relevance({disc_id}): {relevance.min()=} {relevance.max()=} {relevance.mean()=}")
|
||||
|
||||
return relevance
|
||||
@@ -0,0 +1,449 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
|
||||
|
||||
def view_one_hot(index, length):
|
||||
vec = length * [1]
|
||||
vec[index] = -1
|
||||
return vec
|
||||
|
||||
def create_smoothing_kernel(widths, gamma=1.5):
|
||||
""" creates a truncated gaussian smoothing kernel for the given widths
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
widths: list[Int] or torch.LongTensor
|
||||
specifies the shape of the smoothing kernel, entries must be > 0.
|
||||
|
||||
gamma: float, optional
|
||||
decay factor for gaussian relative to kernel size
|
||||
|
||||
Returns:
|
||||
--------
|
||||
kernel: torch.FloatTensor
|
||||
"""
|
||||
|
||||
widths = torch.LongTensor(widths)
|
||||
num_dims = len(widths)
|
||||
|
||||
assert(widths.min() > 0)
|
||||
|
||||
centers = widths.float() / 2 - 0.5
|
||||
sigmas = gamma * (centers + 1)
|
||||
|
||||
vals = []
|
||||
|
||||
vals= [((torch.arange(widths[i]) - centers[i]) / sigmas[i]) ** 2 for i in range(num_dims)]
|
||||
vals = sum([vals[i].view(view_one_hot(i, num_dims)) for i in range(num_dims)])
|
||||
|
||||
kernel = torch.exp(- vals)
|
||||
kernel = kernel / kernel.sum()
|
||||
|
||||
return kernel
|
||||
|
||||
|
||||
def create_partition_kernel(widths, strides):
|
||||
""" creates a partition kernel for mapping a convolutional network output back to the input domain
|
||||
|
||||
Given a fully convolutional network with receptive field of shape widths and the given strides, this
|
||||
function construncts an intorpolation kernel whose tranlations by multiples of the given strides form
|
||||
a partition of one on the input domain.
|
||||
|
||||
Parameter:
|
||||
----------
|
||||
widths: list[Int] or torch.LongTensor
|
||||
shape of receptive field
|
||||
|
||||
strides: list[Int] or torch.LongTensor
|
||||
total strides of convolutional network
|
||||
|
||||
Returns:
|
||||
kernel: torch.FloatTensor
|
||||
"""
|
||||
|
||||
num_dims = len(widths)
|
||||
assert num_dims == len(strides) and num_dims in {1, 2, 3}
|
||||
|
||||
convs = {1 : F.conv1d, 2 : F.conv2d, 3 : F.conv3d}
|
||||
|
||||
widths = torch.LongTensor(widths)
|
||||
strides = torch.LongTensor(strides)
|
||||
|
||||
proto_kernel = torch.ones(torch.minimum(strides, widths).tolist())
|
||||
|
||||
# create interpolation kernel eta
|
||||
eta_widths = widths - strides + 1
|
||||
if eta_widths.min() <= 0:
|
||||
print("[create_partition_kernel] warning: receptive field does not cover input domain")
|
||||
eta_widths = torch.maximum(eta_widths, torch.ones_like(eta_widths))
|
||||
|
||||
|
||||
eta = create_smoothing_kernel(eta_widths).view(1, 1, *eta_widths.tolist())
|
||||
|
||||
padding = torch.repeat_interleave(eta_widths - 1, 2, 0).tolist()[::-1] # ordering of dimensions for padding and convolution functions is reversed in torch
|
||||
padded_proto_kernel = F.pad(proto_kernel, padding)
|
||||
padded_proto_kernel = padded_proto_kernel.view(1, 1, *padded_proto_kernel.shape)
|
||||
kernel = convs[num_dims](padded_proto_kernel, eta)
|
||||
|
||||
return kernel
|
||||
|
||||
|
||||
def receptive_field(conv_model, input_shape, output_position):
|
||||
""" estimates boundaries of receptive field connected to output_position via autograd
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
conv_model: nn.Module or autograd function
|
||||
function or model implementing fully convolutional model
|
||||
|
||||
input_shape: List[Int]
|
||||
input shape ignoring batch dimension, i.e. [num_channels, dim1, dim2, ...]
|
||||
|
||||
output_position: List[Int]
|
||||
output position for which the receptive field is determined; the function raises an exception
|
||||
if output_position is out of bounds for the given input_shape.
|
||||
|
||||
Returns:
|
||||
--------
|
||||
low: List[Int]
|
||||
start indices of receptive field
|
||||
|
||||
high: List[Int]
|
||||
stop indices of receptive field
|
||||
|
||||
"""
|
||||
|
||||
x = torch.randn((1,) + tuple(input_shape), requires_grad=True)
|
||||
y = conv_model(x)
|
||||
|
||||
# collapse channels and remove batch dimension
|
||||
y = torch.sum(y, 1)[0]
|
||||
|
||||
# create mask
|
||||
mask = torch.zeros_like(y)
|
||||
index = [torch.tensor(i) for i in output_position]
|
||||
try:
|
||||
mask.index_put_(index, torch.tensor(1, dtype=mask.dtype))
|
||||
except IndexError:
|
||||
raise ValueError('output_position out of bounds')
|
||||
|
||||
(mask * y).sum().backward()
|
||||
|
||||
# sum over channels and remove batch dimension
|
||||
grad = torch.sum(x.grad, dim=1)[0]
|
||||
tmp = torch.nonzero(grad, as_tuple=True)
|
||||
low = [t.min().item() for t in tmp]
|
||||
high = [t.max().item() for t in tmp]
|
||||
|
||||
return low, high
|
||||
|
||||
def estimate_conv_parameters(model, num_channels, num_dims, width, max_stride=10):
|
||||
""" attempts to estimate receptive field size, strides and left paddings for given model
|
||||
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
model: nn.Module or autograd function
|
||||
fully convolutional model for which parameters are estimated
|
||||
|
||||
num_channels: Int
|
||||
number of input channels for model
|
||||
|
||||
num_dims: Int
|
||||
number of input dimensions for model (without channel dimension)
|
||||
|
||||
width: Int
|
||||
width of the input tensor (a hyper-square) on which the receptive fields are derived via autograd
|
||||
|
||||
max_stride: Int, optional
|
||||
assumed maximal stride of the model for any dimension, when set too low the function may fail for
|
||||
any value of width
|
||||
|
||||
Returns:
|
||||
--------
|
||||
receptive_field_size: List[Int]
|
||||
receptive field size in all dimension
|
||||
|
||||
strides: List[Int]
|
||||
stride in all dimensions
|
||||
|
||||
left_paddings: List[Int]
|
||||
left padding in all dimensions; this is relevant for aligning the receptive field on the input plane
|
||||
|
||||
Raises:
|
||||
-------
|
||||
ValueError, KeyError
|
||||
|
||||
"""
|
||||
|
||||
input_shape = [num_channels] + num_dims * [width]
|
||||
output_position1 = num_dims * [width // (2 * max_stride)]
|
||||
output_position2 = num_dims * [width // (2 * max_stride) + 1]
|
||||
|
||||
low1, high1 = receptive_field(model, input_shape, output_position1)
|
||||
low2, high2 = receptive_field(model, input_shape, output_position2)
|
||||
|
||||
widths1 = [h - l + 1 for l, h in zip(low1, high1)]
|
||||
widths2 = [h - l + 1 for l, h in zip(low2, high2)]
|
||||
|
||||
if not all([w1 - w2 == 0 for w1, w2 in zip(widths1, widths2)]) or not all([l1 != l2 for l1, l2 in zip(low1, low2)]):
|
||||
raise ValueError("[estimate_strides]: widths to small to determine strides")
|
||||
|
||||
receptive_field_size = widths1
|
||||
strides = [l2 - l1 for l1, l2 in zip(low1, low2)]
|
||||
left_paddings = [s * p - l for l, s, p in zip(low1, strides, output_position1)]
|
||||
|
||||
return receptive_field_size, strides, left_paddings
|
||||
|
||||
def inspect_conv_model(model, num_channels, num_dims, max_width=10000, width_hint=None, stride_hint=None, verbose=False):
|
||||
""" determines size of receptive field, strides and padding probabilistically
|
||||
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
model: nn.Module or autograd function
|
||||
fully convolutional model for which parameters are estimated
|
||||
|
||||
num_channels: Int
|
||||
number of input channels for model
|
||||
|
||||
num_dims: Int
|
||||
number of input dimensions for model (without channel dimension)
|
||||
|
||||
max_width: Int
|
||||
maximum width of the input tensor (a hyper-square) on which the receptive fields are derived via autograd
|
||||
|
||||
verbose: bool, optional
|
||||
if true, the function prints parameters for individual trials
|
||||
|
||||
Returns:
|
||||
--------
|
||||
receptive_field_size: List[Int]
|
||||
receptive field size in all dimension
|
||||
|
||||
strides: List[Int]
|
||||
stride in all dimensions
|
||||
|
||||
left_paddings: List[Int]
|
||||
left padding in all dimensions; this is relevant for aligning the receptive field on the input plane
|
||||
|
||||
Raises:
|
||||
-------
|
||||
ValueError
|
||||
|
||||
"""
|
||||
|
||||
max_stride = max_width // 2
|
||||
stride = max_stride // 100
|
||||
width = max_width // 100
|
||||
|
||||
if width_hint is not None: width = 2 * width_hint
|
||||
if stride_hint is not None: stride = stride_hint
|
||||
|
||||
did_it = False
|
||||
while width < max_width and stride < max_stride:
|
||||
try:
|
||||
if verbose: print(f"[inspect_conv_model] trying parameters {width=}, {stride=}")
|
||||
receptive_field_size, strides, left_paddings = estimate_conv_parameters(model, num_channels, num_dims, width, stride)
|
||||
did_it = True
|
||||
except:
|
||||
pass
|
||||
|
||||
if did_it: break
|
||||
|
||||
width *= 2
|
||||
if width >= max_width and stride < max_stride:
|
||||
stride *= 2
|
||||
width = 2 * stride
|
||||
|
||||
if not did_it:
|
||||
raise ValueError(f'could not determine conv parameter with given max_width={max_width}')
|
||||
|
||||
return receptive_field_size, strides, left_paddings
|
||||
|
||||
|
||||
class GradWeight(torch.autograd.Function):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
@staticmethod
|
||||
def forward(ctx, x, weight):
|
||||
ctx.save_for_backward(weight)
|
||||
return x.clone()
|
||||
|
||||
@staticmethod
|
||||
def backward(ctx, grad_output):
|
||||
weight, = ctx.saved_tensors
|
||||
|
||||
grad_input = grad_output * weight
|
||||
|
||||
return grad_input, None
|
||||
|
||||
|
||||
# API
|
||||
|
||||
def relegance_gradient_weighting(x, weight):
|
||||
"""
|
||||
|
||||
Args:
|
||||
x (torch.tensor): input tensor
|
||||
weight (torch.tensor or None): weight tensor for gradients of x; if None, no gradient weighting will be applied in backward pass
|
||||
|
||||
Returns:
|
||||
torch.tensor: the unmodified input tensor x
|
||||
|
||||
Raises:
|
||||
RuntimeError: if estimation of parameters fails due to exceeded compute budget
|
||||
"""
|
||||
if weight is None:
|
||||
return x
|
||||
else:
|
||||
return GradWeight.apply(x, weight)
|
||||
|
||||
|
||||
|
||||
def relegance_create_tconv_kernel(model, num_channels, num_dims, width_hint=None, stride_hint=None, verbose=False):
|
||||
""" creates parameters for mapping back output domain relevance to input tomain
|
||||
|
||||
Args:
|
||||
model (nn.Module or autograd.Function): fully convolutional model
|
||||
num_channels (int): number of input channels to model
|
||||
num_dims (int): number of input dimensions of model (without channel and batch dimension)
|
||||
width_hint(int or None): optional hint at maximal width of receptive field
|
||||
stride_hint(int or None): optional hint at maximal stride
|
||||
|
||||
Returns:
|
||||
dict: contains kernel, kernel dimensions, strides and left paddings for transposed convolution
|
||||
"""
|
||||
|
||||
max_width = int(100000 / (10 ** num_dims))
|
||||
|
||||
did_it = False
|
||||
try:
|
||||
receptive_field_size, strides, left_paddings = inspect_conv_model(model, num_channels, num_dims, max_width=max_width, width_hint=width_hint, stride_hint=stride_hint, verbose=verbose)
|
||||
did_it = True
|
||||
except:
|
||||
# try once again with larger max_width
|
||||
max_width *= 10
|
||||
|
||||
# crash if exception is raised
|
||||
try:
|
||||
if not did_it: receptive_field_size, strides, left_paddings = inspect_conv_model(model, num_channels, num_dims, max_width=max_width, width_hint=width_hint, stride_hint=stride_hint, verbose=verbose)
|
||||
except:
|
||||
raise RuntimeError("could not determine parameters within given compute budget")
|
||||
|
||||
partition_kernel = create_partition_kernel(receptive_field_size, strides)
|
||||
partition_kernel = torch.repeat_interleave(partition_kernel, num_channels, 1)
|
||||
|
||||
tconv_parameters = {
|
||||
'kernel': partition_kernel,
|
||||
'receptive_field_shape': receptive_field_size,
|
||||
'stride': strides,
|
||||
'left_padding': left_paddings,
|
||||
'num_dims': num_dims
|
||||
}
|
||||
|
||||
return tconv_parameters
|
||||
|
||||
|
||||
|
||||
def relegance_map_relevance_to_input_domain(od_relevance, tconv_parameters):
|
||||
""" maps output-domain relevance to input-domain relevance via transpose convolution
|
||||
|
||||
Args:
|
||||
od_relevance (torch.tensor): output-domain relevance
|
||||
tconv_parameters (dict): parameter dict as created by relegance_create_tconv_kernel
|
||||
|
||||
Returns:
|
||||
torch.tensor: input-domain relevance. The tensor is left aligned, i.e. the all-zero index of the output corresponds to the all-zero index of the discriminator input.
|
||||
Otherwise, the size of the output tensor does not need to match the size of the discriminator input. Use relegance_resize_relevance_to_input_size for a
|
||||
convenient way to adjust the output to the correct size.
|
||||
|
||||
Raises:
|
||||
ValueError: if number of dimensions is not supported
|
||||
"""
|
||||
|
||||
kernel = tconv_parameters['kernel'].to(od_relevance.device)
|
||||
rf_shape = tconv_parameters['receptive_field_shape']
|
||||
stride = tconv_parameters['stride']
|
||||
left_padding = tconv_parameters['left_padding']
|
||||
|
||||
num_dims = len(kernel.shape) - 2
|
||||
|
||||
# repeat boundary values
|
||||
od_padding = [rf_shape[i//2] // stride[i//2] + 1 for i in range(2 * num_dims)]
|
||||
padded_od_relevance = F.pad(od_relevance, od_padding[::-1], mode='replicate')
|
||||
od_padding = od_padding[::2]
|
||||
|
||||
# apply mapping and left trimming
|
||||
if num_dims == 1:
|
||||
id_relevance = F.conv_transpose1d(padded_od_relevance, kernel, stride=stride)
|
||||
id_relevance = id_relevance[..., left_padding[0] + stride[0] * od_padding[0] :]
|
||||
elif num_dims == 2:
|
||||
id_relevance = F.conv_transpose2d(padded_od_relevance, kernel, stride=stride)
|
||||
id_relevance = id_relevance[..., left_padding[0] + stride[0] * od_padding[0] :, left_padding[1] + stride[1] * od_padding[1]:]
|
||||
elif num_dims == 3:
|
||||
id_relevance = F.conv_transpose2d(padded_od_relevance, kernel, stride=stride)
|
||||
id_relevance = id_relevance[..., left_padding[0] + stride[0] * od_padding[0] :, left_padding[1] + stride[1] * od_padding[1]:, left_padding[2] + stride[2] * od_padding[2] :]
|
||||
else:
|
||||
raise ValueError(f'[relegance_map_to_input_domain] error: num_dims = {num_dims} not supported')
|
||||
|
||||
return id_relevance
|
||||
|
||||
|
||||
def relegance_resize_relevance_to_input_size(reference_input, relevance):
|
||||
""" adjusts size of relevance tensor to reference input size
|
||||
|
||||
Args:
|
||||
reference_input (torch.tensor): discriminator input tensor for reference
|
||||
relevance (torch.tensor): input-domain relevance corresponding to input tensor reference_input
|
||||
|
||||
Returns:
|
||||
torch.tensor: resized relevance
|
||||
|
||||
Raises:
|
||||
ValueError: if number of dimensions is not supported
|
||||
"""
|
||||
resized_relevance = torch.zeros_like(reference_input)
|
||||
|
||||
num_dims = len(reference_input.shape) - 2
|
||||
with torch.no_grad():
|
||||
if num_dims == 1:
|
||||
resized_relevance[:] = relevance[..., : min(reference_input.size(-1), relevance.size(-1))]
|
||||
elif num_dims == 2:
|
||||
resized_relevance[:] = relevance[..., : min(reference_input.size(-2), relevance.size(-2)), : min(reference_input.size(-1), relevance.size(-1))]
|
||||
elif num_dims == 3:
|
||||
resized_relevance[:] = relevance[..., : min(reference_input.size(-3), relevance.size(-3)), : min(reference_input.size(-2), relevance.size(-2)), : min(reference_input.size(-1), relevance.size(-1))]
|
||||
else:
|
||||
raise ValueError(f'[relegance_map_to_input_domain] error: num_dims = {num_dims} not supported')
|
||||
|
||||
return resized_relevance
|
||||
@@ -0,0 +1,6 @@
|
||||
from .gru_sparsifier import GRUSparsifier
|
||||
from .conv1d_sparsifier import Conv1dSparsifier
|
||||
from .conv_transpose1d_sparsifier import ConvTranspose1dSparsifier
|
||||
from .linear_sparsifier import LinearSparsifier
|
||||
from .common import sparsify_matrix, calculate_gru_flops_per_step
|
||||
from .utils import mark_for_sparsification, create_sparsifier
|
||||
@@ -0,0 +1,58 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
class BaseSparsifier:
|
||||
def __init__(self, task_list, start, stop, interval, exponent=3):
|
||||
|
||||
# just copying parameters...
|
||||
self.start = start
|
||||
self.stop = stop
|
||||
self.interval = interval
|
||||
self.exponent = exponent
|
||||
self.task_list = task_list
|
||||
|
||||
# ... and setting counter to 0
|
||||
self.step_counter = 0
|
||||
|
||||
def step(self, verbose=False):
|
||||
# compute current interpolation factor
|
||||
self.step_counter += 1
|
||||
|
||||
if self.step_counter < self.start:
|
||||
return
|
||||
elif self.step_counter < self.stop:
|
||||
# update only every self.interval-th interval
|
||||
if self.step_counter % self.interval:
|
||||
return
|
||||
|
||||
alpha = ((self.stop - self.step_counter) / (self.stop - self.start)) ** self.exponent
|
||||
else:
|
||||
alpha = 0
|
||||
|
||||
self.sparsify(alpha, verbose=verbose)
|
||||
@@ -0,0 +1,123 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import torch
|
||||
|
||||
debug=True
|
||||
|
||||
def sparsify_matrix(matrix : torch.tensor, density : float, block_size, keep_diagonal : bool=False, return_mask : bool=False):
|
||||
""" sparsifies matrix with specified block size
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
matrix : torch.tensor
|
||||
matrix to sparsify
|
||||
density : int
|
||||
target density
|
||||
block_size : [int, int]
|
||||
block size dimensions
|
||||
keep_diagonal : bool
|
||||
If true, the diagonal will be kept. This option requires block_size[0] == block_size[1] and defaults to False
|
||||
"""
|
||||
|
||||
m, n = matrix.shape
|
||||
m1, n1 = block_size
|
||||
|
||||
if m % m1 or n % n1:
|
||||
raise ValueError(f"block size {(m1, n1)} does not divide matrix size {(m, n)}")
|
||||
|
||||
# extract diagonal if keep_diagonal = True
|
||||
if keep_diagonal:
|
||||
if m != n:
|
||||
raise ValueError("Attempting to sparsify non-square matrix with keep_diagonal=True")
|
||||
|
||||
to_spare = torch.diag(torch.diag(matrix))
|
||||
matrix = matrix - to_spare
|
||||
else:
|
||||
to_spare = torch.zeros_like(matrix)
|
||||
|
||||
# calculate energy in sub-blocks
|
||||
x = torch.reshape(matrix, (m // m1, m1, n // n1, n1))
|
||||
x = x ** 2
|
||||
block_energies = torch.sum(torch.sum(x, dim=3), dim=1)
|
||||
|
||||
number_of_blocks = (m * n) // (m1 * n1)
|
||||
number_of_survivors = round(number_of_blocks * density)
|
||||
|
||||
# masking threshold
|
||||
if number_of_survivors == 0:
|
||||
threshold = 0
|
||||
else:
|
||||
threshold = torch.sort(torch.flatten(block_energies)).values[-number_of_survivors]
|
||||
|
||||
# create mask
|
||||
mask = torch.ones_like(block_energies)
|
||||
mask[block_energies < threshold] = 0
|
||||
mask = torch.repeat_interleave(mask, m1, dim=0)
|
||||
mask = torch.repeat_interleave(mask, n1, dim=1)
|
||||
|
||||
# perform masking
|
||||
masked_matrix = mask * matrix + to_spare
|
||||
|
||||
if return_mask:
|
||||
return masked_matrix, mask
|
||||
else:
|
||||
return masked_matrix
|
||||
|
||||
def calculate_gru_flops_per_step(gru, sparsification_dict=dict(), drop_input=False):
|
||||
input_size = gru.input_size
|
||||
hidden_size = gru.hidden_size
|
||||
flops = 0
|
||||
|
||||
input_density = (
|
||||
sparsification_dict.get('W_ir', [1])[0]
|
||||
+ sparsification_dict.get('W_in', [1])[0]
|
||||
+ sparsification_dict.get('W_iz', [1])[0]
|
||||
) / 3
|
||||
|
||||
recurrent_density = (
|
||||
sparsification_dict.get('W_hr', [1])[0]
|
||||
+ sparsification_dict.get('W_hn', [1])[0]
|
||||
+ sparsification_dict.get('W_hz', [1])[0]
|
||||
) / 3
|
||||
|
||||
# input matrix vector multiplications
|
||||
if not drop_input:
|
||||
flops += 2 * 3 * input_size * hidden_size * input_density
|
||||
|
||||
# recurrent matrix vector multiplications
|
||||
flops += 2 * 3 * hidden_size * hidden_size * recurrent_density
|
||||
|
||||
# biases
|
||||
flops += 6 * hidden_size
|
||||
|
||||
# activations estimated by 10 flops per activation
|
||||
flops += 30 * hidden_size
|
||||
|
||||
return flops
|
||||
@@ -0,0 +1,133 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import torch
|
||||
|
||||
from .base_sparsifier import BaseSparsifier
|
||||
from .common import sparsify_matrix, debug
|
||||
|
||||
|
||||
class Conv1dSparsifier(BaseSparsifier):
|
||||
def __init__(self, task_list, start, stop, interval, exponent=3):
|
||||
""" Sparsifier for torch.nn.GRUs
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
task_list : list
|
||||
task_list contains a list of tuples (conv1d, params), where conv1d is an instance
|
||||
of torch.nn.Conv1d and params is a tuple (density, [m, n]),
|
||||
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
|
||||
sparsification is applied.
|
||||
|
||||
start : int
|
||||
training step after which sparsification will be started.
|
||||
|
||||
stop : int
|
||||
training step after which sparsification will be completed.
|
||||
|
||||
interval : int
|
||||
sparsification interval for steps between start and stop. After stop sparsification will be
|
||||
carried out after every call to GRUSparsifier.step()
|
||||
|
||||
exponent : float
|
||||
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
|
||||
with density (alpha + target_density * (1 * alpha)), where
|
||||
alpha = ((stop - i) / (start - stop)) ** exponent
|
||||
|
||||
Example:
|
||||
--------
|
||||
>>> import torch
|
||||
>>> conv = torch.nn.Conv1d(8, 16, 8)
|
||||
>>> params = (0.2, [8, 4])
|
||||
>>> sparsifier = Conv1dSparsifier([(conv, params)], 0, 100, 50)
|
||||
>>> for i in range(100):
|
||||
... sparsifier.step()
|
||||
"""
|
||||
super().__init__(task_list, start, stop, interval, exponent=3)
|
||||
|
||||
self.last_mask = None
|
||||
|
||||
|
||||
def sparsify(self, alpha, verbose=False):
|
||||
""" carries out sparsification step
|
||||
|
||||
Call this function after optimizer.step in your
|
||||
training loop.
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
alpha : float
|
||||
density interpolation parameter (1: dense, 0: target density)
|
||||
verbose : bool
|
||||
if true, densities are printed out
|
||||
|
||||
Returns:
|
||||
--------
|
||||
None
|
||||
|
||||
"""
|
||||
|
||||
with torch.no_grad():
|
||||
for conv, params in self.task_list:
|
||||
# reshape weight
|
||||
if hasattr(conv, 'weight_v'):
|
||||
weight = conv.weight_v
|
||||
else:
|
||||
weight = conv.weight
|
||||
i, o, k = weight.shape
|
||||
w = weight.permute(0, 2, 1).flatten(1)
|
||||
target_density, block_size = params
|
||||
density = alpha + (1 - alpha) * target_density
|
||||
w, new_mask = sparsify_matrix(w, density, block_size, return_mask=True)
|
||||
w = w.reshape(i, k, o).permute(0, 2, 1)
|
||||
weight[:] = w
|
||||
|
||||
if self.last_mask is not None:
|
||||
if not torch.all(self.last_mask * new_mask == new_mask) and debug:
|
||||
print("weight resurrection in conv.weight")
|
||||
|
||||
self.last_mask = new_mask
|
||||
|
||||
if verbose:
|
||||
print(f"conv1d_sparsier[{self.step_counter}]: {density=}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Testing sparsifier")
|
||||
|
||||
import torch
|
||||
conv = torch.nn.Conv1d(8, 16, 8)
|
||||
params = (0.2, [8, 4])
|
||||
|
||||
sparsifier = Conv1dSparsifier([(conv, params)], 0, 100, 5)
|
||||
|
||||
for i in range(100):
|
||||
sparsifier.step(verbose=True)
|
||||
|
||||
print(conv.weight)
|
||||
@@ -0,0 +1,134 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import torch
|
||||
|
||||
|
||||
from .base_sparsifier import BaseSparsifier
|
||||
from .common import sparsify_matrix, debug
|
||||
|
||||
|
||||
class ConvTranspose1dSparsifier(BaseSparsifier):
|
||||
def __init__(self, task_list, start, stop, interval, exponent=3):
|
||||
""" Sparsifier for torch.nn.GRUs
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
task_list : list
|
||||
task_list contains a list of tuples (conv1d, params), where conv1d is an instance
|
||||
of torch.nn.Conv1d and params is a tuple (density, [m, n]),
|
||||
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
|
||||
sparsification is applied.
|
||||
|
||||
start : int
|
||||
training step after which sparsification will be started.
|
||||
|
||||
stop : int
|
||||
training step after which sparsification will be completed.
|
||||
|
||||
interval : int
|
||||
sparsification interval for steps between start and stop. After stop sparsification will be
|
||||
carried out after every call to GRUSparsifier.step()
|
||||
|
||||
exponent : float
|
||||
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
|
||||
with density (alpha + target_density * (1 * alpha)), where
|
||||
alpha = ((stop - i) / (start - stop)) ** exponent
|
||||
|
||||
Example:
|
||||
--------
|
||||
>>> import torch
|
||||
>>> conv = torch.nn.ConvTranspose1d(8, 16, 8)
|
||||
>>> params = (0.2, [8, 4])
|
||||
>>> sparsifier = ConvTranspose1dSparsifier([(conv, params)], 0, 100, 50)
|
||||
>>> for i in range(100):
|
||||
... sparsifier.step()
|
||||
"""
|
||||
|
||||
super().__init__(task_list, start, stop, interval, exponent=3)
|
||||
|
||||
self.last_mask = None
|
||||
|
||||
def sparsify(self, alpha, verbose=False):
|
||||
""" carries out sparsification step
|
||||
|
||||
Call this function after optimizer.step in your
|
||||
training loop.
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
alpha : float
|
||||
density interpolation parameter (1: dense, 0: target density)
|
||||
verbose : bool
|
||||
if true, densities are printed out
|
||||
|
||||
Returns:
|
||||
--------
|
||||
None
|
||||
|
||||
"""
|
||||
|
||||
with torch.no_grad():
|
||||
for conv, params in self.task_list:
|
||||
# reshape weight
|
||||
if hasattr(conv, 'weight_v'):
|
||||
weight = conv.weight_v
|
||||
else:
|
||||
weight = conv.weight
|
||||
i, o, k = weight.shape
|
||||
w = weight.permute(2, 1, 0).reshape(k * o, i)
|
||||
target_density, block_size = params
|
||||
density = alpha + (1 - alpha) * target_density
|
||||
w, new_mask = sparsify_matrix(w, density, block_size, return_mask=True)
|
||||
w = w.reshape(k, o, i).permute(2, 1, 0)
|
||||
weight[:] = w
|
||||
|
||||
if self.last_mask is not None:
|
||||
if not torch.all(self.last_mask * new_mask == new_mask) and debug:
|
||||
print("weight resurrection in conv.weight")
|
||||
|
||||
self.last_mask = new_mask
|
||||
|
||||
if verbose:
|
||||
print(f"convtrans1d_sparsier[{self.step_counter}]: {density=}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Testing sparsifier")
|
||||
|
||||
import torch
|
||||
conv = torch.nn.ConvTranspose1d(8, 16, 4, 4)
|
||||
params = (0.2, [8, 4])
|
||||
|
||||
sparsifier = ConvTranspose1dSparsifier([(conv, params)], 0, 100, 5)
|
||||
|
||||
for i in range(100):
|
||||
sparsifier.step(verbose=True)
|
||||
|
||||
print(conv.weight)
|
||||
@@ -0,0 +1,178 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import torch
|
||||
|
||||
from .base_sparsifier import BaseSparsifier
|
||||
from .common import sparsify_matrix, debug
|
||||
|
||||
|
||||
class GRUSparsifier(BaseSparsifier):
|
||||
def __init__(self, task_list, start, stop, interval, exponent=3):
|
||||
""" Sparsifier for torch.nn.GRUs
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
task_list : list
|
||||
task_list contains a list of tuples (gru, sparsify_dict), where gru is an instance
|
||||
of torch.nn.GRU and sparsify_dic is a dictionary with keys in {'W_ir', 'W_iz', 'W_in',
|
||||
'W_hr', 'W_hz', 'W_hn'} corresponding to the input and recurrent weights for the reset,
|
||||
update, and new gate. The values of sparsify_dict are tuples (density, [m, n], keep_diagonal),
|
||||
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
|
||||
sparsification is applied and keep_diagonal is a bool variable indicating whether the diagonal
|
||||
should be kept.
|
||||
|
||||
start : int
|
||||
training step after which sparsification will be started.
|
||||
|
||||
stop : int
|
||||
training step after which sparsification will be completed.
|
||||
|
||||
interval : int
|
||||
sparsification interval for steps between start and stop. After stop sparsification will be
|
||||
carried out after every call to GRUSparsifier.step()
|
||||
|
||||
exponent : float
|
||||
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
|
||||
with density (alpha + target_density * (1 * alpha)), where
|
||||
alpha = ((stop - i) / (start - stop)) ** exponent
|
||||
|
||||
Example:
|
||||
--------
|
||||
>>> import torch
|
||||
>>> gru = torch.nn.GRU(10, 20)
|
||||
>>> sparsify_dict = {
|
||||
... 'W_ir' : (0.5, [2, 2], False),
|
||||
... 'W_iz' : (0.6, [2, 2], False),
|
||||
... 'W_in' : (0.7, [2, 2], False),
|
||||
... 'W_hr' : (0.1, [4, 4], True),
|
||||
... 'W_hz' : (0.2, [4, 4], True),
|
||||
... 'W_hn' : (0.3, [4, 4], True),
|
||||
... }
|
||||
>>> sparsifier = GRUSparsifier([(gru, sparsify_dict)], 0, 100, 50)
|
||||
>>> for i in range(100):
|
||||
... sparsifier.step()
|
||||
"""
|
||||
super().__init__(task_list, start, stop, interval, exponent=3)
|
||||
|
||||
self.last_masks = {key : None for key in ['W_ir', 'W_in', 'W_iz', 'W_hr', 'W_hn', 'W_hz']}
|
||||
|
||||
def sparsify(self, alpha, verbose=False):
|
||||
""" carries out sparsification step
|
||||
|
||||
Call this function after optimizer.step in your
|
||||
training loop.
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
alpha : float
|
||||
density interpolation parameter (1: dense, 0: target density)
|
||||
verbose : bool
|
||||
if true, densities are printed out
|
||||
|
||||
Returns:
|
||||
--------
|
||||
None
|
||||
|
||||
"""
|
||||
|
||||
with torch.no_grad():
|
||||
for gru, params in self.task_list:
|
||||
hidden_size = gru.hidden_size
|
||||
|
||||
# input weights
|
||||
for i, key in enumerate(['W_ir', 'W_iz', 'W_in']):
|
||||
if key in params:
|
||||
if hasattr(gru, 'weight_ih_l0_v'):
|
||||
weight = gru.weight_ih_l0_v
|
||||
else:
|
||||
weight = gru.weight_ih_l0
|
||||
density = alpha + (1 - alpha) * params[key][0]
|
||||
if verbose:
|
||||
print(f"[{self.step_counter}]: {key} density: {density}")
|
||||
|
||||
weight[i * hidden_size : (i+1) * hidden_size, : ], new_mask = sparsify_matrix(
|
||||
weight[i * hidden_size : (i + 1) * hidden_size, : ],
|
||||
density, # density
|
||||
params[key][1], # block_size
|
||||
params[key][2], # keep_diagonal (might want to set this to False)
|
||||
return_mask=True
|
||||
)
|
||||
|
||||
if type(self.last_masks[key]) != type(None):
|
||||
if not torch.all(self.last_masks[key] * new_mask == new_mask) and debug:
|
||||
print("weight resurrection in weight_ih_l0_v")
|
||||
|
||||
self.last_masks[key] = new_mask
|
||||
|
||||
# recurrent weights
|
||||
for i, key in enumerate(['W_hr', 'W_hz', 'W_hn']):
|
||||
if key in params:
|
||||
if hasattr(gru, 'weight_hh_l0_v'):
|
||||
weight = gru.weight_hh_l0_v
|
||||
else:
|
||||
weight = gru.weight_hh_l0
|
||||
density = alpha + (1 - alpha) * params[key][0]
|
||||
if verbose:
|
||||
print(f"[{self.step_counter}]: {key} density: {density}")
|
||||
weight[i * hidden_size : (i+1) * hidden_size, : ], new_mask = sparsify_matrix(
|
||||
weight[i * hidden_size : (i + 1) * hidden_size, : ],
|
||||
density,
|
||||
params[key][1], # block_size
|
||||
params[key][2], # keep_diagonal (might want to set this to False)
|
||||
return_mask=True
|
||||
)
|
||||
|
||||
if type(self.last_masks[key]) != type(None):
|
||||
if not torch.all(self.last_masks[key] * new_mask == new_mask) and True:
|
||||
print("weight resurrection in weight_hh_l0_v")
|
||||
|
||||
self.last_masks[key] = new_mask
|
||||
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Testing sparsifier")
|
||||
|
||||
gru = torch.nn.GRU(10, 20)
|
||||
sparsify_dict = {
|
||||
'W_ir' : (0.5, [2, 2], False),
|
||||
'W_iz' : (0.6, [2, 2], False),
|
||||
'W_in' : (0.7, [2, 2], False),
|
||||
'W_hr' : (0.1, [4, 4], True),
|
||||
'W_hz' : (0.2, [4, 4], True),
|
||||
'W_hn' : (0.3, [4, 4], True),
|
||||
}
|
||||
|
||||
sparsifier = GRUSparsifier([(gru, sparsify_dict)], 0, 100, 10)
|
||||
|
||||
for i in range(100):
|
||||
sparsifier.step(verbose=True)
|
||||
|
||||
print(gru.weight_hh_l0)
|
||||
@@ -0,0 +1,128 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import torch
|
||||
|
||||
from .base_sparsifier import BaseSparsifier
|
||||
from .common import sparsify_matrix
|
||||
|
||||
|
||||
class LinearSparsifier(BaseSparsifier):
|
||||
def __init__(self, task_list, start, stop, interval, exponent=3):
|
||||
""" Sparsifier for torch.nn.GRUs
|
||||
|
||||
Parameters:
|
||||
-----------
|
||||
task_list : list
|
||||
task_list contains a list of tuples (linear, params), where linear is an instance
|
||||
of torch.nn.Linear and params is a tuple (density, [m, n]),
|
||||
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
|
||||
sparsification is applied.
|
||||
|
||||
start : int
|
||||
training step after which sparsification will be started.
|
||||
|
||||
stop : int
|
||||
training step after which sparsification will be completed.
|
||||
|
||||
interval : int
|
||||
sparsification interval for steps between start and stop. After stop sparsification will be
|
||||
carried out after every call to GRUSparsifier.step()
|
||||
|
||||
exponent : float
|
||||
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
|
||||
with density (alpha + target_density * (1 * alpha)), where
|
||||
alpha = ((stop - i) / (start - stop)) ** exponent
|
||||
|
||||
Example:
|
||||
--------
|
||||
>>> import torch
|
||||
>>> linear = torch.nn.Linear(8, 16)
|
||||
>>> params = (0.2, [8, 4])
|
||||
>>> sparsifier = LinearSparsifier([(linear, params)], 0, 100, 50)
|
||||
>>> for i in range(100):
|
||||
... sparsifier.step()
|
||||
"""
|
||||
|
||||
super().__init__(task_list, start, stop, interval, exponent=3)
|
||||
|
||||
self.last_mask = None
|
||||
|
||||
def sparsify(self, alpha, verbose=False):
|
||||
""" carries out sparsification step
|
||||
|
||||
Call this function after optimizer.step in your
|
||||
training loop.
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
alpha : float
|
||||
density interpolation parameter (1: dense, 0: target density)
|
||||
verbose : bool
|
||||
if true, densities are printed out
|
||||
|
||||
Returns:
|
||||
--------
|
||||
None
|
||||
|
||||
"""
|
||||
|
||||
with torch.no_grad():
|
||||
for linear, params in self.task_list:
|
||||
if hasattr(linear, 'weight_v'):
|
||||
weight = linear.weight_v
|
||||
else:
|
||||
weight = linear.weight
|
||||
target_density, block_size = params
|
||||
density = alpha + (1 - alpha) * target_density
|
||||
weight[:], new_mask = sparsify_matrix(weight, density, block_size, return_mask=True)
|
||||
|
||||
if self.last_mask is not None:
|
||||
if not torch.all(self.last_mask * new_mask == new_mask) and debug:
|
||||
print("weight resurrection in conv.weight")
|
||||
|
||||
self.last_mask = new_mask
|
||||
|
||||
if verbose:
|
||||
print(f"linear_sparsifier[{self.step_counter}]: {density=}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Testing sparsifier")
|
||||
|
||||
import torch
|
||||
linear = torch.nn.Linear(8, 16)
|
||||
params = (0.2, [4, 2])
|
||||
|
||||
sparsifier = LinearSparsifier([(linear, params)], 0, 100, 5)
|
||||
|
||||
for i in range(100):
|
||||
sparsifier.step(verbose=True)
|
||||
|
||||
print(linear.weight)
|
||||
@@ -0,0 +1,64 @@
|
||||
import torch
|
||||
|
||||
from dnntools.sparsification import GRUSparsifier, LinearSparsifier, Conv1dSparsifier, ConvTranspose1dSparsifier
|
||||
|
||||
def mark_for_sparsification(module, params):
|
||||
setattr(module, 'sparsify', True)
|
||||
setattr(module, 'sparsification_params', params)
|
||||
return module
|
||||
|
||||
def create_sparsifier(module, start, stop, interval):
|
||||
sparsifier_list = []
|
||||
for m in module.modules():
|
||||
if hasattr(m, 'sparsify'):
|
||||
if isinstance(m, torch.nn.GRU):
|
||||
sparsifier_list.append(
|
||||
GRUSparsifier([(m, m.sparsification_params)], start, stop, interval)
|
||||
)
|
||||
elif isinstance(m, torch.nn.Linear):
|
||||
sparsifier_list.append(
|
||||
LinearSparsifier([(m, m.sparsification_params)], start, stop, interval)
|
||||
)
|
||||
elif isinstance(m, torch.nn.Conv1d):
|
||||
sparsifier_list.append(
|
||||
Conv1dSparsifier([(m, m.sparsification_params)], start, stop, interval)
|
||||
)
|
||||
elif isinstance(m, torch.nn.ConvTranspose1d):
|
||||
sparsifier_list.append(
|
||||
ConvTranspose1dSparsifier([(m, m.sparsification_params)], start, stop, interval)
|
||||
)
|
||||
else:
|
||||
print(f"[create_sparsifier] warning: module {m} marked for sparsification but no suitable sparsifier exists.")
|
||||
|
||||
def sparsify(verbose=False):
|
||||
for sparsifier in sparsifier_list:
|
||||
sparsifier.step(verbose)
|
||||
|
||||
return sparsify
|
||||
|
||||
|
||||
def count_parameters(model, verbose=False):
|
||||
total = 0
|
||||
for name, p in model.named_parameters():
|
||||
count = torch.ones_like(p).sum().item()
|
||||
|
||||
if verbose:
|
||||
print(f"{name}: {count} parameters")
|
||||
|
||||
total += count
|
||||
|
||||
return total
|
||||
|
||||
def estimate_nonzero_parameters(module):
|
||||
num_zero_parameters = 0
|
||||
if hasattr(module, 'sparsify'):
|
||||
params = module.sparsification_params
|
||||
if isinstance(module, torch.nn.Conv1d) or isinstance(module, torch.nn.ConvTranspose1d):
|
||||
num_zero_parameters = torch.ones_like(module.weight).sum().item() * (1 - params[0])
|
||||
elif isinstance(module, torch.nn.GRU):
|
||||
num_zero_parameters = module.input_size * module.hidden_size * (3 - params['W_ir'][0] - params['W_iz'][0] - params['W_in'][0])
|
||||
num_zero_parameters += module.hidden_size * module.hidden_size * (3 - params['W_hr'][0] - params['W_hz'][0] - params['W_hn'][0])
|
||||
elif isinstance(module, torch.nn.Linear):
|
||||
num_zero_parameters = module.in_features * module.out_features * params[0]
|
||||
else:
|
||||
raise ValueError(f'unknown sparsification method for module of type {type(module)}')
|
||||
@@ -0,0 +1 @@
|
||||
torch
|
||||
48
managed_components/78__esp-opus/dnn/torch/dnntools/setup.py
Normal file
48
managed_components/78__esp-opus/dnn/torch/dnntools/setup.py
Normal file
@@ -0,0 +1,48 @@
|
||||
"""
|
||||
/* Copyright (c) 2023 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
#!/usr/bin/env/python
|
||||
import os
|
||||
from setuptools import setup
|
||||
|
||||
lib_folder = os.path.dirname(os.path.realpath(__file__))
|
||||
|
||||
with open(os.path.join(lib_folder, 'requirements.txt'), 'r') as f:
|
||||
install_requires = list(f.read().splitlines())
|
||||
|
||||
print(install_requires)
|
||||
|
||||
setup(name='dnntools',
|
||||
version='1.0',
|
||||
author='Jan Buethe',
|
||||
author_email='jbuethe@amazon.de',
|
||||
description='Non-Standard tools for deep neural network training with PyTorch',
|
||||
packages=['dnntools', 'dnntools.sparsification', 'dnntools.quantization'],
|
||||
install_requires=install_requires
|
||||
)
|
||||
54
managed_components/78__esp-opus/dnn/torch/fargan/README.md
Normal file
54
managed_components/78__esp-opus/dnn/torch/fargan/README.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Framewise Auto-Regressive GAN (FARGAN)
|
||||
|
||||
Implementation of FARGAN, a low-complexity neural vocoder. Pre-trained models
|
||||
are provided as C code in the dnn/ directory with the corresponding model in
|
||||
dnn/models/ directory (name starts with fargan_). If you don't want to train
|
||||
a new FARGAN model, you can skip straight to the Inference section.
|
||||
|
||||
## Data preparation
|
||||
|
||||
For data preparation you need to build Opus as detailed in the top-level README.
|
||||
You will need to use the --enable-deep-plc configure option.
|
||||
The build will produce an executable named "dump_data".
|
||||
To prepare the training data, run:
|
||||
```
|
||||
./dump_data -train in_speech.pcm out_features.f32 out_speech.pcm
|
||||
```
|
||||
Where the in_speech.pcm speech file is a raw 16-bit PCM file sampled at 16 kHz.
|
||||
The speech data used for training the model can be found at:
|
||||
https://media.xiph.org/lpcnet/speech/tts_speech_negative_16k.sw
|
||||
|
||||
## Training
|
||||
|
||||
To perform pre-training, run the following command:
|
||||
```
|
||||
python ./train_fargan.py out_features.f32 out_speech.pcm output_dir --epochs 400 --batch-size 4096 --lr 0.002 --cuda-visible-devices 0
|
||||
```
|
||||
Once pre-training is complete, run adversarial training using:
|
||||
```
|
||||
python adv_train_fargan.py out_features.f32 out_speech.pcm output_dir --lr 0.000002 --reg-weight 5 --batch-size 160 --cuda-visible-devices 0 --initial-checkpoint output_dir/checkpoints/fargan_400.pth
|
||||
```
|
||||
The final model will be in output_dir/checkpoints/fargan_adv_50.pth.
|
||||
|
||||
The model can optionally be converted to C using:
|
||||
```
|
||||
python dump_fargan_weights.py output_dir/checkpoints/fargan_adv_50.pth fargan_c_dir
|
||||
```
|
||||
which will create a fargan_data.c and a fargan_data.h file in the fargan_c_dir directory.
|
||||
Copy these files to the opus/dnn/ directory (replacing the existing ones) and recompile Opus.
|
||||
|
||||
## Inference
|
||||
|
||||
To run the inference, start by generating the features from the audio using:
|
||||
```
|
||||
./fargan_demo -features test_speech.pcm test_features.f32
|
||||
```
|
||||
Synthesis can be achieved either using the PyTorch code or the C code.
|
||||
To synthesize from PyTorch, run:
|
||||
```
|
||||
python test_fargan.py output_dir/checkpoints/fargan_adv_50.pth test_features.f32 output_speech.pcm
|
||||
```
|
||||
To synthesize from the C code, run:
|
||||
```
|
||||
./fargan_demo -fargan-synthesis test_features.f32 output_speech.pcm
|
||||
```
|
||||
@@ -0,0 +1,278 @@
|
||||
import os
|
||||
import argparse
|
||||
import random
|
||||
import numpy as np
|
||||
import sys
|
||||
import math as m
|
||||
|
||||
import torch
|
||||
from torch import nn
|
||||
import torch.nn.functional as F
|
||||
import tqdm
|
||||
|
||||
import fargan
|
||||
from dataset import FARGANDataset
|
||||
from stft_loss import *
|
||||
|
||||
source_dir = os.path.split(os.path.abspath(__file__))[0]
|
||||
sys.path.append(os.path.join(source_dir, "../osce/"))
|
||||
|
||||
import models as osce_models
|
||||
|
||||
|
||||
def fmap_loss(scores_real, scores_gen):
|
||||
num_discs = len(scores_real)
|
||||
loss_feat = 0
|
||||
for k in range(num_discs):
|
||||
num_layers = len(scores_gen[k]) - 1
|
||||
f = 4 / num_discs / num_layers
|
||||
for l in range(num_layers):
|
||||
loss_feat += f * F.l1_loss(scores_gen[k][l], scores_real[k][l].detach())
|
||||
|
||||
return loss_feat
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument('features', type=str, help='path to feature file in .f32 format')
|
||||
parser.add_argument('signal', type=str, help='path to signal file in .s16 format')
|
||||
parser.add_argument('output', type=str, help='path to output folder')
|
||||
|
||||
parser.add_argument('--suffix', type=str, help="model name suffix", default="")
|
||||
parser.add_argument('--cuda-visible-devices', type=str, help="comma separates list of cuda visible device indices, default: CUDA_VISIBLE_DEVICES", default=None)
|
||||
|
||||
|
||||
model_group = parser.add_argument_group(title="model parameters")
|
||||
model_group.add_argument('--cond-size', type=int, help="first conditioning size, default: 256", default=256)
|
||||
model_group.add_argument('--gamma', type=float, help="Use A(z/gamma), default: 0.9", default=0.9)
|
||||
model_group.add_argument('--softquant', action="store_true", help="enables soft quantization during training")
|
||||
|
||||
training_group = parser.add_argument_group(title="training parameters")
|
||||
training_group.add_argument('--batch-size', type=int, help="batch size, default: 128", default=128)
|
||||
training_group.add_argument('--lr', type=float, help='learning rate, default: 5e-4', default=5e-4)
|
||||
training_group.add_argument('--epochs', type=int, help='number of training epochs, default: 50', default=50)
|
||||
training_group.add_argument('--sequence-length', type=int, help='sequence length, default: 60', default=60)
|
||||
training_group.add_argument('--lr-decay', type=float, help='learning rate decay factor, default: 0.0', default=0.0)
|
||||
training_group.add_argument('--initial-checkpoint', type=str, help='initial checkpoint to start training from, default: None', default=None)
|
||||
training_group.add_argument('--reg-weight', type=float, help='regression loss weight, default: 1.0', default=1.0)
|
||||
training_group.add_argument('--fmap-weight', type=float, help='feature matchin loss weight, default: 1.0', default=1.)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.cuda_visible_devices != None:
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda_visible_devices
|
||||
|
||||
# checkpoints
|
||||
checkpoint_dir = os.path.join(args.output, 'checkpoints')
|
||||
checkpoint = dict()
|
||||
os.makedirs(checkpoint_dir, exist_ok=True)
|
||||
|
||||
|
||||
# training parameters
|
||||
batch_size = args.batch_size
|
||||
lr = args.lr
|
||||
epochs = args.epochs
|
||||
sequence_length = args.sequence_length
|
||||
lr_decay = args.lr_decay
|
||||
|
||||
adam_betas = [0.8, 0.99]
|
||||
adam_eps = 1e-8
|
||||
features_file = args.features
|
||||
signal_file = args.signal
|
||||
|
||||
# model parameters
|
||||
cond_size = args.cond_size
|
||||
|
||||
|
||||
checkpoint['batch_size'] = batch_size
|
||||
checkpoint['lr'] = lr
|
||||
checkpoint['lr_decay'] = lr_decay
|
||||
checkpoint['epochs'] = epochs
|
||||
checkpoint['sequence_length'] = sequence_length
|
||||
checkpoint['adam_betas'] = adam_betas
|
||||
|
||||
|
||||
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
|
||||
|
||||
checkpoint['model_args'] = ()
|
||||
checkpoint['model_kwargs'] = {'cond_size': cond_size, 'gamma': args.gamma, 'softquant': args.softquant}
|
||||
print(checkpoint['model_kwargs'])
|
||||
model = fargan.FARGAN(*checkpoint['model_args'], **checkpoint['model_kwargs'])
|
||||
|
||||
|
||||
#discriminator
|
||||
disc_name = 'fdmresdisc'
|
||||
disc = osce_models.model_dict[disc_name](
|
||||
architecture='free',
|
||||
design='f_down',
|
||||
fft_sizes_16k=[2**n for n in range(6, 12)],
|
||||
freq_roi=[0, 7400],
|
||||
max_channels=256,
|
||||
noise_gain=0.0
|
||||
)
|
||||
|
||||
if type(args.initial_checkpoint) != type(None):
|
||||
checkpoint = torch.load(args.initial_checkpoint, map_location='cpu')
|
||||
model.load_state_dict(checkpoint['state_dict'], strict=False)
|
||||
|
||||
checkpoint['state_dict'] = model.state_dict()
|
||||
|
||||
|
||||
dataset = FARGANDataset(features_file, signal_file, sequence_length=sequence_length)
|
||||
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
|
||||
|
||||
|
||||
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, betas=adam_betas, eps=adam_eps)
|
||||
optimizer_disc = torch.optim.AdamW([p for p in disc.parameters() if p.requires_grad], lr=lr, betas=adam_betas, eps=adam_eps)
|
||||
|
||||
|
||||
# learning rate scheduler
|
||||
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lambda x : 1 / (1 + lr_decay * x))
|
||||
scheduler_disc = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer_disc, lr_lambda=lambda x : 1 / (1 + lr_decay * x))
|
||||
|
||||
states = None
|
||||
|
||||
spect_loss = MultiResolutionSTFTLoss(device).to(device)
|
||||
|
||||
for param in model.parameters():
|
||||
param.requires_grad = False
|
||||
|
||||
batch_count = 0
|
||||
if __name__ == '__main__':
|
||||
model.to(device)
|
||||
disc.to(device)
|
||||
|
||||
for epoch in range(1, epochs + 1):
|
||||
|
||||
m_r = 0
|
||||
m_f = 0
|
||||
s_r = 1
|
||||
s_f = 1
|
||||
|
||||
running_cont_loss = 0
|
||||
running_disc_loss = 0
|
||||
running_gen_loss = 0
|
||||
running_fmap_loss = 0
|
||||
running_reg_loss = 0
|
||||
running_wc = 0
|
||||
|
||||
print(f"training epoch {epoch}...")
|
||||
with tqdm.tqdm(dataloader, unit='batch') as tepoch:
|
||||
for i, (features, periods, target, lpc) in enumerate(tepoch):
|
||||
if epoch == 1 and i == 400:
|
||||
for param in model.parameters():
|
||||
param.requires_grad = True
|
||||
for param in model.cond_net.parameters():
|
||||
param.requires_grad = False
|
||||
for param in model.sig_net.cond_gain_dense.parameters():
|
||||
param.requires_grad = False
|
||||
|
||||
optimizer.zero_grad()
|
||||
features = features.to(device)
|
||||
#lpc = lpc.to(device)
|
||||
#lpc = lpc*(args.gamma**torch.arange(1,17, device=device))
|
||||
#lpc = fargan.interp_lpc(lpc, 4)
|
||||
periods = periods.to(device)
|
||||
if True:
|
||||
target = target[:, :sequence_length*160]
|
||||
#lpc = lpc[:,:sequence_length*4,:]
|
||||
features = features[:,:sequence_length+4,:]
|
||||
periods = periods[:,:sequence_length+4]
|
||||
else:
|
||||
target=target[::2, :]
|
||||
#lpc=lpc[::2,:]
|
||||
features=features[::2,:]
|
||||
periods=periods[::2,:]
|
||||
target = target.to(device)
|
||||
#target = fargan.analysis_filter(target, lpc[:,:,:], nb_subframes=1, gamma=args.gamma)
|
||||
|
||||
#nb_pre = random.randrange(1, 6)
|
||||
nb_pre = 2
|
||||
pre = target[:, :nb_pre*160]
|
||||
output, _ = model(features, periods, target.size(1)//160 - nb_pre, pre=pre, states=None)
|
||||
output = torch.cat([pre, output], -1)
|
||||
|
||||
|
||||
# discriminator update
|
||||
scores_gen = disc(output.detach().unsqueeze(1))
|
||||
scores_real = disc(target.unsqueeze(1))
|
||||
|
||||
disc_loss = 0
|
||||
for scale in scores_gen:
|
||||
disc_loss += ((scale[-1]) ** 2).mean()
|
||||
m_f = 0.9 * m_f + 0.1 * scale[-1].detach().mean().cpu().item()
|
||||
s_f = 0.9 * s_f + 0.1 * scale[-1].detach().std().cpu().item()
|
||||
|
||||
for scale in scores_real:
|
||||
disc_loss += ((1 - scale[-1]) ** 2).mean()
|
||||
m_r = 0.9 * m_r + 0.1 * scale[-1].detach().mean().cpu().item()
|
||||
s_r = 0.9 * s_r + 0.1 * scale[-1].detach().std().cpu().item()
|
||||
|
||||
disc_loss = 0.5 * disc_loss / len(scores_gen)
|
||||
winning_chance = 0.5 * m.erfc( (m_r - m_f) / m.sqrt(2 * (s_f**2 + s_r**2)) )
|
||||
running_wc += winning_chance
|
||||
|
||||
disc.zero_grad()
|
||||
disc_loss.backward()
|
||||
optimizer_disc.step()
|
||||
|
||||
# model update
|
||||
scores_gen = disc(output.unsqueeze(1))
|
||||
if False: # todo: check whether that makes a difference
|
||||
with torch.no_grad():
|
||||
scores_real = disc(target.unsqueeze(1))
|
||||
|
||||
cont_loss = fargan.sig_loss(target[:, nb_pre*160:nb_pre*160+80], output[:, nb_pre*160:nb_pre*160+80])
|
||||
specc_loss = spect_loss(output, target.detach())
|
||||
reg_loss = (.00*cont_loss + specc_loss)
|
||||
|
||||
loss_gen = 0
|
||||
for scale in scores_gen:
|
||||
loss_gen += ((1 - scale[-1]) ** 2).mean() / len(scores_gen)
|
||||
|
||||
feat_loss = args.fmap_weight * fmap_loss(scores_real, scores_gen)
|
||||
|
||||
reg_weight = args.reg_weight# + 15./(1 + (batch_count/7600.))
|
||||
gen_loss = reg_weight * reg_loss + feat_loss + loss_gen
|
||||
|
||||
model.zero_grad()
|
||||
|
||||
|
||||
gen_loss.backward()
|
||||
optimizer.step()
|
||||
|
||||
#model.clip_weights()
|
||||
|
||||
scheduler.step()
|
||||
scheduler_disc.step()
|
||||
|
||||
running_cont_loss += cont_loss.detach().cpu().item()
|
||||
running_gen_loss += loss_gen.detach().cpu().item()
|
||||
running_disc_loss += disc_loss.detach().cpu().item()
|
||||
running_fmap_loss += feat_loss.detach().cpu().item()
|
||||
running_reg_loss += reg_loss.detach().cpu().item()
|
||||
|
||||
|
||||
|
||||
tepoch.set_postfix(cont_loss=f"{running_cont_loss/(i+1):8.5f}",
|
||||
reg_weight=f"{reg_weight:8.5f}",
|
||||
gen_loss=f"{running_gen_loss/(i+1):8.5f}",
|
||||
disc_loss=f"{running_disc_loss/(i+1):8.5f}",
|
||||
fmap_loss=f"{running_fmap_loss/(i+1):8.5f}",
|
||||
reg_loss=f"{running_reg_loss/(i+1):8.5f}",
|
||||
wc = f"{running_wc/(i+1):8.5f}",
|
||||
)
|
||||
batch_count = batch_count + 1
|
||||
|
||||
# save checkpoint
|
||||
checkpoint_path = os.path.join(checkpoint_dir, f'fargan{args.suffix}_adv_{epoch}.pth')
|
||||
checkpoint['state_dict'] = model.state_dict()
|
||||
checkpoint['disc_sate_dict'] = disc.state_dict()
|
||||
checkpoint['loss'] = {
|
||||
'cont': running_cont_loss / len(dataloader),
|
||||
'gen': running_gen_loss / len(dataloader),
|
||||
'disc': running_disc_loss / len(dataloader),
|
||||
'fmap': running_fmap_loss / len(dataloader),
|
||||
'reg': running_reg_loss / len(dataloader)
|
||||
}
|
||||
checkpoint['epoch'] = epoch
|
||||
torch.save(checkpoint, checkpoint_path)
|
||||
61
managed_components/78__esp-opus/dnn/torch/fargan/dataset.py
Normal file
61
managed_components/78__esp-opus/dnn/torch/fargan/dataset.py
Normal file
@@ -0,0 +1,61 @@
|
||||
import torch
|
||||
import numpy as np
|
||||
import fargan
|
||||
|
||||
class FARGANDataset(torch.utils.data.Dataset):
|
||||
def __init__(self,
|
||||
feature_file,
|
||||
signal_file,
|
||||
frame_size=160,
|
||||
sequence_length=15,
|
||||
lookahead=1,
|
||||
nb_used_features=20,
|
||||
nb_features=36):
|
||||
|
||||
self.frame_size = frame_size
|
||||
self.sequence_length = sequence_length
|
||||
self.lookahead = lookahead
|
||||
self.nb_features = nb_features
|
||||
self.nb_used_features = nb_used_features
|
||||
pcm_chunk_size = self.frame_size*self.sequence_length
|
||||
|
||||
self.data = np.memmap(signal_file, dtype='int16', mode='r')
|
||||
#self.data = self.data[1::2]
|
||||
self.nb_sequences = len(self.data)//(pcm_chunk_size)-4
|
||||
self.data = self.data[(4-self.lookahead)*self.frame_size:]
|
||||
self.data = self.data[:self.nb_sequences*pcm_chunk_size]
|
||||
|
||||
|
||||
#self.data = np.reshape(self.data, (self.nb_sequences, pcm_chunk_size))
|
||||
sizeof = self.data.strides[-1]
|
||||
self.data = np.lib.stride_tricks.as_strided(self.data, shape=(self.nb_sequences, pcm_chunk_size*2),
|
||||
strides=(pcm_chunk_size*sizeof, sizeof))
|
||||
|
||||
self.features = np.reshape(np.memmap(feature_file, dtype='float32', mode='r'), (-1, nb_features))
|
||||
sizeof = self.features.strides[-1]
|
||||
self.features = np.lib.stride_tricks.as_strided(self.features, shape=(self.nb_sequences, self.sequence_length*2+4, nb_features),
|
||||
strides=(self.sequence_length*self.nb_features*sizeof, self.nb_features*sizeof, sizeof))
|
||||
#self.periods = np.round(50*self.features[:,:,self.nb_used_features-2]+100).astype('int')
|
||||
self.periods = np.round(np.clip(256./2**(self.features[:,:,self.nb_used_features-2]+1.5), 32, 255)).astype('int')
|
||||
|
||||
self.lpc = self.features[:, :, self.nb_used_features:]
|
||||
self.features = self.features[:, :, :self.nb_used_features]
|
||||
print("lpc_size:", self.lpc.shape)
|
||||
|
||||
def __len__(self):
|
||||
return self.nb_sequences
|
||||
|
||||
def __getitem__(self, index):
|
||||
features = self.features[index, :, :].copy()
|
||||
if self.lookahead != 0:
|
||||
lpc = self.lpc[index, 4-self.lookahead:-self.lookahead, :].copy()
|
||||
else:
|
||||
lpc = self.lpc[index, 4:, :].copy()
|
||||
data = self.data[index, :].copy().astype(np.float32) / 2**15
|
||||
periods = self.periods[index, :].copy()
|
||||
#lpc = lpc*(self.gamma**np.arange(1,17))
|
||||
#lpc=lpc[None,:,:]
|
||||
#lpc = fargan.interp_lpc(lpc, 4)
|
||||
#lpc=lpc[0,:,:]
|
||||
|
||||
return features, periods, data, lpc
|
||||
@@ -0,0 +1,112 @@
|
||||
import os
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
import torch
|
||||
from torch import nn
|
||||
|
||||
|
||||
sys.path.append(os.path.join(os.path.split(__file__)[0], '../weight-exchange'))
|
||||
import wexchange.torch
|
||||
|
||||
import fargan
|
||||
#from models import model_dict
|
||||
|
||||
unquantized = [ 'cond_net.pembed', 'cond_net.fdense1', 'sig_net.cond_gain_dense', 'sig_net.gain_dense_out' ]
|
||||
|
||||
unquantized2 = [
|
||||
'cond_net.pembed',
|
||||
'cond_net.fdense1',
|
||||
'cond_net.fconv1',
|
||||
'cond_net.fconv2',
|
||||
'cont_net.0',
|
||||
'sig_net.cond_gain_dense',
|
||||
'sig_net.fwc0.conv',
|
||||
'sig_net.fwc0.glu.gate',
|
||||
'sig_net.dense1_glu.gate',
|
||||
'sig_net.gru1_glu.gate',
|
||||
'sig_net.gru2_glu.gate',
|
||||
'sig_net.gru3_glu.gate',
|
||||
'sig_net.skip_glu.gate',
|
||||
'sig_net.skip_dense',
|
||||
'sig_net.sig_dense_out',
|
||||
'sig_net.gain_dense_out'
|
||||
]
|
||||
|
||||
description=f"""
|
||||
This is an unsafe dumping script for FARGAN models. It assumes that all weights are included in Linear, Conv1d or GRU layer
|
||||
and will fail to export any other weights.
|
||||
|
||||
Furthermore, the quanitze option relies on the following explicit list of layers to be excluded:
|
||||
{unquantized}.
|
||||
|
||||
Modify this script manually if adjustments are needed.
|
||||
"""
|
||||
|
||||
parser = argparse.ArgumentParser(description=description)
|
||||
parser.add_argument('weightfile', type=str, help='weight file path')
|
||||
parser.add_argument('export_folder', type=str)
|
||||
parser.add_argument('--export-filename', type=str, default='fargan_data', help='filename for source and header file (.c and .h will be added), defaults to fargan_data')
|
||||
parser.add_argument('--struct-name', type=str, default='FARGAN', help='name for C struct, defaults to FARGAN')
|
||||
parser.add_argument('--quantize', action='store_true', help='apply quantization')
|
||||
|
||||
if __name__ == "__main__":
|
||||
args = parser.parse_args()
|
||||
|
||||
print(f"loading weights from {args.weightfile}...")
|
||||
saved_gen= torch.load(args.weightfile, map_location='cpu')
|
||||
saved_gen['model_args'] = ()
|
||||
saved_gen['model_kwargs'] = {'cond_size': 256, 'gamma': 0.9}
|
||||
|
||||
model = fargan.FARGAN(*saved_gen['model_args'], **saved_gen['model_kwargs'])
|
||||
model.load_state_dict(saved_gen['state_dict'], strict=False)
|
||||
def _remove_weight_norm(m):
|
||||
try:
|
||||
torch.nn.utils.remove_weight_norm(m)
|
||||
except ValueError: # this module didn't have weight norm
|
||||
return
|
||||
model.apply(_remove_weight_norm)
|
||||
|
||||
|
||||
print("dumping model...")
|
||||
quantize_model=args.quantize
|
||||
|
||||
output_folder = args.export_folder
|
||||
os.makedirs(output_folder, exist_ok=True)
|
||||
|
||||
writer = wexchange.c_export.c_writer.CWriter(os.path.join(output_folder, args.export_filename), model_struct_name=args.struct_name, add_typedef=True)
|
||||
|
||||
for name, module in model.named_modules():
|
||||
|
||||
if quantize_model:
|
||||
quantize=name not in unquantized
|
||||
scale = None if quantize else 1/128
|
||||
else:
|
||||
quantize=False
|
||||
scale=1/128
|
||||
|
||||
if isinstance(module, nn.Linear):
|
||||
print(f"dumping linear layer {name}...")
|
||||
wexchange.torch.dump_torch_dense_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
|
||||
|
||||
elif isinstance(module, nn.Conv1d):
|
||||
print(f"dumping conv1d layer {name}...")
|
||||
wexchange.torch.dump_torch_conv1d_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
|
||||
|
||||
elif isinstance(module, nn.GRU):
|
||||
print(f"dumping GRU layer {name}...")
|
||||
wexchange.torch.dump_torch_gru_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale, recurrent_scale=scale)
|
||||
|
||||
elif isinstance(module, nn.GRUCell):
|
||||
print(f"dumping GRUCell layer {name}...")
|
||||
wexchange.torch.dump_torch_grucell_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale, recurrent_scale=scale)
|
||||
|
||||
elif isinstance(module, nn.Embedding):
|
||||
print(f"dumping Embedding layer {name}...")
|
||||
wexchange.torch.dump_torch_embedding_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
|
||||
#wexchange.torch.dump_torch_embedding_weights(writer, module)
|
||||
|
||||
else:
|
||||
print(f"Ignoring layer {name}...")
|
||||
|
||||
writer.close()
|
||||
346
managed_components/78__esp-opus/dnn/torch/fargan/fargan.py
Normal file
346
managed_components/78__esp-opus/dnn/torch/fargan/fargan.py
Normal file
@@ -0,0 +1,346 @@
|
||||
import os
|
||||
import sys
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
from torch import nn
|
||||
import torch.nn.functional as F
|
||||
import filters
|
||||
from torch.nn.utils import weight_norm
|
||||
#from convert_lsp import lpc_to_lsp, lsp_to_lpc
|
||||
from rc import lpc2rc, rc2lpc
|
||||
|
||||
source_dir = os.path.split(os.path.abspath(__file__))[0]
|
||||
sys.path.append(os.path.join(source_dir, "../dnntools"))
|
||||
from dnntools.quantization import soft_quant
|
||||
|
||||
|
||||
Fs = 16000
|
||||
|
||||
fid_dict = {}
|
||||
def dump_signal(x, filename):
|
||||
return
|
||||
if filename in fid_dict:
|
||||
fid = fid_dict[filename]
|
||||
else:
|
||||
fid = open(filename, "w")
|
||||
fid_dict[filename] = fid
|
||||
x = x.detach().numpy().astype('float32')
|
||||
x.tofile(fid)
|
||||
|
||||
|
||||
def sig_l1(y_true, y_pred):
|
||||
return torch.mean(abs(y_true-y_pred))/torch.mean(abs(y_true))
|
||||
|
||||
def sig_loss(y_true, y_pred):
|
||||
t = y_true/(1e-15+torch.norm(y_true, dim=-1, p=2, keepdim=True))
|
||||
p = y_pred/(1e-15+torch.norm(y_pred, dim=-1, p=2, keepdim=True))
|
||||
return torch.mean(1.-torch.sum(p*t, dim=-1))
|
||||
|
||||
def interp_lpc(lpc, factor):
|
||||
#print(lpc.shape)
|
||||
#f = (np.arange(factor)+.5*((factor+1)%2))/factor
|
||||
lsp = torch.atanh(lpc2rc(lpc))
|
||||
#print("lsp0:")
|
||||
#print(lsp)
|
||||
shape = lsp.shape
|
||||
#print("shape is", shape)
|
||||
shape = (shape[0], shape[1]*factor, shape[2])
|
||||
interp_lsp = torch.zeros(shape, device=lpc.device)
|
||||
for k in range(factor):
|
||||
f = (k+.5*((factor+1)%2))/factor
|
||||
interp = (1-f)*lsp[:,:-1,:] + f*lsp[:,1:,:]
|
||||
interp_lsp[:,factor//2+k:-(factor//2):factor,:] = interp
|
||||
for k in range(factor//2):
|
||||
interp_lsp[:,k,:] = interp_lsp[:,factor//2,:]
|
||||
for k in range((factor+1)//2):
|
||||
interp_lsp[:,-k-1,:] = interp_lsp[:,-(factor+3)//2,:]
|
||||
#print("lsp:")
|
||||
#print(interp_lsp)
|
||||
return rc2lpc(torch.tanh(interp_lsp))
|
||||
|
||||
def analysis_filter(x, lpc, nb_subframes=4, subframe_size=40, gamma=.9):
|
||||
device = x.device
|
||||
batch_size = lpc.size(0)
|
||||
|
||||
nb_frames = lpc.shape[1]
|
||||
|
||||
|
||||
sig = torch.zeros(batch_size, subframe_size+16, device=device)
|
||||
x = torch.reshape(x, (batch_size, nb_frames*nb_subframes, subframe_size))
|
||||
out = torch.zeros((batch_size, 0), device=device)
|
||||
|
||||
#if gamma is not None:
|
||||
# bw = gamma**(torch.arange(1, 17, device=device))
|
||||
# lpc = lpc*bw[None,None,:]
|
||||
ones = torch.ones((*(lpc.shape[:-1]), 1), device=device)
|
||||
zeros = torch.zeros((*(lpc.shape[:-1]), subframe_size-1), device=device)
|
||||
a = torch.cat([ones, lpc], -1)
|
||||
a_big = torch.cat([a, zeros], -1)
|
||||
fir_mat_big = filters.toeplitz_from_filter(a_big)
|
||||
|
||||
#print(a_big[:,0,:])
|
||||
for n in range(nb_frames):
|
||||
for k in range(nb_subframes):
|
||||
|
||||
sig = torch.cat([sig[:,subframe_size:], x[:,n*nb_subframes + k, :]], 1)
|
||||
exc = torch.bmm(fir_mat_big[:,n,:,:], sig[:,:,None])
|
||||
out = torch.cat([out, exc[:,-subframe_size:,0]], 1)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
# weight initialization and clipping
|
||||
def init_weights(module):
|
||||
if isinstance(module, nn.GRU):
|
||||
for p in module.named_parameters():
|
||||
if p[0].startswith('weight_hh_'):
|
||||
nn.init.orthogonal_(p[1])
|
||||
|
||||
def gen_phase_embedding(periods, frame_size):
|
||||
device = periods.device
|
||||
batch_size = periods.size(0)
|
||||
nb_frames = periods.size(1)
|
||||
w0 = 2*torch.pi/periods
|
||||
w0_shift = torch.cat([2*torch.pi*torch.rand((batch_size, 1), device=device)/frame_size, w0[:,:-1]], 1)
|
||||
cum_phase = frame_size*torch.cumsum(w0_shift, 1)
|
||||
fine_phase = w0[:,:,None]*torch.broadcast_to(torch.arange(frame_size, device=device), (batch_size, nb_frames, frame_size))
|
||||
embed = torch.unsqueeze(cum_phase, 2) + fine_phase
|
||||
embed = torch.reshape(embed, (batch_size, -1))
|
||||
return torch.cos(embed), torch.sin(embed)
|
||||
|
||||
class GLU(nn.Module):
|
||||
def __init__(self, feat_size, softquant=False):
|
||||
super(GLU, self).__init__()
|
||||
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.gate = weight_norm(nn.Linear(feat_size, feat_size, bias=False))
|
||||
|
||||
if softquant:
|
||||
self.gate = soft_quant(self.gate)
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
|
||||
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
out = x * torch.sigmoid(self.gate(x))
|
||||
|
||||
return out
|
||||
|
||||
class FWConv(nn.Module):
|
||||
def __init__(self, in_size, out_size, kernel_size=2, softquant=False):
|
||||
super(FWConv, self).__init__()
|
||||
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.in_size = in_size
|
||||
self.kernel_size = kernel_size
|
||||
self.conv = weight_norm(nn.Linear(in_size*self.kernel_size, out_size, bias=False))
|
||||
self.glu = GLU(out_size, softquant=softquant)
|
||||
|
||||
if softquant:
|
||||
self.conv = soft_quant(self.conv)
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
|
||||
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x, state):
|
||||
xcat = torch.cat((state, x), -1)
|
||||
#print(x.shape, state.shape, xcat.shape, self.in_size, self.kernel_size)
|
||||
out = self.glu(torch.tanh(self.conv(xcat)))
|
||||
return out, xcat[:,self.in_size:]
|
||||
|
||||
def n(x):
|
||||
return torch.clamp(x + (1./127.)*(torch.rand_like(x)-.5), min=-1., max=1.)
|
||||
|
||||
class FARGANCond(nn.Module):
|
||||
def __init__(self, feature_dim=20, cond_size=256, pembed_dims=12, softquant=False):
|
||||
super(FARGANCond, self).__init__()
|
||||
|
||||
self.feature_dim = feature_dim
|
||||
self.cond_size = cond_size
|
||||
|
||||
self.pembed = nn.Embedding(224, pembed_dims)
|
||||
self.fdense1 = nn.Linear(self.feature_dim + pembed_dims, 64, bias=False)
|
||||
self.fconv1 = nn.Conv1d(64, 128, kernel_size=3, padding='valid', bias=False)
|
||||
self.fdense2 = nn.Linear(128, 80*4, bias=False)
|
||||
|
||||
if softquant:
|
||||
self.fconv1 = soft_quant(self.fconv1)
|
||||
self.fdense2 = soft_quant(self.fdense2)
|
||||
|
||||
self.apply(init_weights)
|
||||
nb_params = sum(p.numel() for p in self.parameters())
|
||||
print(f"cond model: {nb_params} weights")
|
||||
|
||||
def forward(self, features, period):
|
||||
features = features[:,2:,:]
|
||||
period = period[:,2:]
|
||||
p = self.pembed(period-32)
|
||||
features = torch.cat((features, p), -1)
|
||||
tmp = torch.tanh(self.fdense1(features))
|
||||
tmp = tmp.permute(0, 2, 1)
|
||||
tmp = torch.tanh(self.fconv1(tmp))
|
||||
tmp = tmp.permute(0, 2, 1)
|
||||
tmp = torch.tanh(self.fdense2(tmp))
|
||||
#tmp = torch.tanh(self.fdense2(tmp))
|
||||
return tmp
|
||||
|
||||
class FARGANSub(nn.Module):
|
||||
def __init__(self, subframe_size=40, nb_subframes=4, cond_size=256, softquant=False):
|
||||
super(FARGANSub, self).__init__()
|
||||
|
||||
self.subframe_size = subframe_size
|
||||
self.nb_subframes = nb_subframes
|
||||
self.cond_size = cond_size
|
||||
self.cond_gain_dense = nn.Linear(80, 1)
|
||||
|
||||
#self.sig_dense1 = nn.Linear(4*self.subframe_size+self.passthrough_size+self.cond_size, self.cond_size, bias=False)
|
||||
self.fwc0 = FWConv(2*self.subframe_size+80+4, 192, softquant=softquant)
|
||||
self.gru1 = nn.GRUCell(192+2*self.subframe_size, 160, bias=False)
|
||||
self.gru2 = nn.GRUCell(160+2*self.subframe_size, 128, bias=False)
|
||||
self.gru3 = nn.GRUCell(128+2*self.subframe_size, 128, bias=False)
|
||||
|
||||
self.gru1_glu = GLU(160, softquant=softquant)
|
||||
self.gru2_glu = GLU(128, softquant=softquant)
|
||||
self.gru3_glu = GLU(128, softquant=softquant)
|
||||
self.skip_glu = GLU(128, softquant=softquant)
|
||||
#self.ptaps_dense = nn.Linear(4*self.cond_size, 5)
|
||||
|
||||
self.skip_dense = nn.Linear(192+160+2*128+2*self.subframe_size, 128, bias=False)
|
||||
self.sig_dense_out = nn.Linear(128, self.subframe_size, bias=False)
|
||||
self.gain_dense_out = nn.Linear(192, 4)
|
||||
|
||||
if softquant:
|
||||
self.gru1 = soft_quant(self.gru1, names=['weight_hh', 'weight_ih'])
|
||||
self.gru2 = soft_quant(self.gru2, names=['weight_hh', 'weight_ih'])
|
||||
self.gru3 = soft_quant(self.gru3, names=['weight_hh', 'weight_ih'])
|
||||
self.skip_dense = soft_quant(self.skip_dense)
|
||||
self.sig_dense_out = soft_quant(self.sig_dense_out)
|
||||
|
||||
self.apply(init_weights)
|
||||
nb_params = sum(p.numel() for p in self.parameters())
|
||||
print(f"subframe model: {nb_params} weights")
|
||||
|
||||
def forward(self, cond, prev_pred, exc_mem, period, states, gain=None):
|
||||
device = exc_mem.device
|
||||
#print(cond.shape, prev.shape)
|
||||
|
||||
cond = n(cond)
|
||||
dump_signal(gain, 'gain0.f32')
|
||||
gain = torch.exp(self.cond_gain_dense(cond))
|
||||
dump_signal(gain, 'gain1.f32')
|
||||
idx = 256-period[:,None]
|
||||
rng = torch.arange(self.subframe_size+4, device=device)
|
||||
idx = idx + rng[None,:] - 2
|
||||
mask = idx >= 256
|
||||
idx = idx - mask*period[:,None]
|
||||
pred = torch.gather(exc_mem, 1, idx)
|
||||
pred = n(pred/(1e-5+gain))
|
||||
|
||||
prev = exc_mem[:,-self.subframe_size:]
|
||||
dump_signal(prev, 'prev_in.f32')
|
||||
prev = n(prev/(1e-5+gain))
|
||||
dump_signal(prev, 'pitch_exc.f32')
|
||||
dump_signal(exc_mem, 'exc_mem.f32')
|
||||
|
||||
tmp = torch.cat((cond, pred, prev), 1)
|
||||
#fpitch = taps[:,0:1]*pred[:,:-4] + taps[:,1:2]*pred[:,1:-3] + taps[:,2:3]*pred[:,2:-2] + taps[:,3:4]*pred[:,3:-1] + taps[:,4:]*pred[:,4:]
|
||||
fpitch = pred[:,2:-2]
|
||||
|
||||
#tmp = self.dense1_glu(torch.tanh(self.sig_dense1(tmp)))
|
||||
fwc0_out, fwc0_state = self.fwc0(tmp, states[3])
|
||||
fwc0_out = n(fwc0_out)
|
||||
pitch_gain = torch.sigmoid(self.gain_dense_out(fwc0_out))
|
||||
|
||||
gru1_state = self.gru1(torch.cat([fwc0_out, pitch_gain[:,0:1]*fpitch, prev], 1), states[0])
|
||||
gru1_out = self.gru1_glu(n(gru1_state))
|
||||
gru1_out = n(gru1_out)
|
||||
gru2_state = self.gru2(torch.cat([gru1_out, pitch_gain[:,1:2]*fpitch, prev], 1), states[1])
|
||||
gru2_out = self.gru2_glu(n(gru2_state))
|
||||
gru2_out = n(gru2_out)
|
||||
gru3_state = self.gru3(torch.cat([gru2_out, pitch_gain[:,2:3]*fpitch, prev], 1), states[2])
|
||||
gru3_out = self.gru3_glu(n(gru3_state))
|
||||
gru3_out = n(gru3_out)
|
||||
gru3_out = torch.cat([gru1_out, gru2_out, gru3_out, fwc0_out], 1)
|
||||
skip_out = torch.tanh(self.skip_dense(torch.cat([gru3_out, pitch_gain[:,3:4]*fpitch, prev], 1)))
|
||||
skip_out = self.skip_glu(n(skip_out))
|
||||
sig_out = torch.tanh(self.sig_dense_out(skip_out))
|
||||
dump_signal(sig_out, 'exc_out.f32')
|
||||
#taps = self.ptaps_dense(gru3_out)
|
||||
#taps = .2*taps + torch.exp(taps)
|
||||
#taps = taps / (1e-2 + torch.sum(torch.abs(taps), dim=-1, keepdim=True))
|
||||
#dump_signal(taps, 'taps.f32')
|
||||
|
||||
dump_signal(pitch_gain, 'pgain.f32')
|
||||
#sig_out = (sig_out + pitch_gain*fpitch) * gain
|
||||
sig_out = sig_out * gain
|
||||
exc_mem = torch.cat([exc_mem[:,self.subframe_size:], sig_out], 1)
|
||||
prev_pred = torch.cat([prev_pred[:,self.subframe_size:], fpitch], 1)
|
||||
dump_signal(sig_out, 'sig_out.f32')
|
||||
return sig_out, exc_mem, prev_pred, (gru1_state, gru2_state, gru3_state, fwc0_state)
|
||||
|
||||
class FARGAN(nn.Module):
|
||||
def __init__(self, subframe_size=40, nb_subframes=4, feature_dim=20, cond_size=256, passthrough_size=0, has_gain=False, gamma=None, softquant=False):
|
||||
super(FARGAN, self).__init__()
|
||||
|
||||
self.subframe_size = subframe_size
|
||||
self.nb_subframes = nb_subframes
|
||||
self.frame_size = self.subframe_size*self.nb_subframes
|
||||
self.feature_dim = feature_dim
|
||||
self.cond_size = cond_size
|
||||
|
||||
self.cond_net = FARGANCond(feature_dim=feature_dim, cond_size=cond_size, softquant=softquant)
|
||||
self.sig_net = FARGANSub(subframe_size=subframe_size, nb_subframes=nb_subframes, cond_size=cond_size, softquant=softquant)
|
||||
|
||||
def forward(self, features, period, nb_frames, pre=None, states=None):
|
||||
device = features.device
|
||||
batch_size = features.size(0)
|
||||
|
||||
prev = torch.zeros(batch_size, 256, device=device)
|
||||
exc_mem = torch.zeros(batch_size, 256, device=device)
|
||||
nb_pre_frames = pre.size(1)//self.frame_size if pre is not None else 0
|
||||
|
||||
states = (
|
||||
torch.zeros(batch_size, 160, device=device),
|
||||
torch.zeros(batch_size, 128, device=device),
|
||||
torch.zeros(batch_size, 128, device=device),
|
||||
torch.zeros(batch_size, (2*self.subframe_size+80+4)*1, device=device)
|
||||
)
|
||||
|
||||
sig = torch.zeros((batch_size, 0), device=device)
|
||||
cond = self.cond_net(features, period)
|
||||
if pre is not None:
|
||||
exc_mem[:,-self.frame_size:] = pre[:, :self.frame_size]
|
||||
start = 1 if nb_pre_frames>0 else 0
|
||||
for n in range(start, nb_frames+nb_pre_frames):
|
||||
for k in range(self.nb_subframes):
|
||||
pos = n*self.frame_size + k*self.subframe_size
|
||||
#print("now: ", preal.shape, prev.shape, sig_in.shape)
|
||||
pitch = period[:, 3+n]
|
||||
gain = .03*10**(0.5*features[:, 3+n, 0:1]/np.sqrt(18.0))
|
||||
#gain = gain[:,:,None]
|
||||
out, exc_mem, prev, states = self.sig_net(cond[:, n, k*80:(k+1)*80], prev, exc_mem, pitch, states, gain=gain)
|
||||
|
||||
if n < nb_pre_frames:
|
||||
out = pre[:, pos:pos+self.subframe_size]
|
||||
exc_mem[:,-self.subframe_size:] = out
|
||||
else:
|
||||
sig = torch.cat([sig, out], 1)
|
||||
|
||||
states = [s.detach() for s in states]
|
||||
return sig, states
|
||||
46
managed_components/78__esp-opus/dnn/torch/fargan/filters.py
Normal file
46
managed_components/78__esp-opus/dnn/torch/fargan/filters.py
Normal file
@@ -0,0 +1,46 @@
|
||||
import torch
|
||||
from torch import nn
|
||||
import torch.nn.functional as F
|
||||
import math
|
||||
|
||||
def toeplitz_from_filter(a):
|
||||
device = a.device
|
||||
L = a.size(-1)
|
||||
size0 = (*(a.shape[:-1]), L, L+1)
|
||||
size = (*(a.shape[:-1]), L, L)
|
||||
rnge = torch.arange(0, L, dtype=torch.int64, device=device)
|
||||
z = torch.tensor(0, device=device)
|
||||
idx = torch.maximum(rnge[:,None] - rnge[None,:] + 1, z)
|
||||
a = torch.cat([a[...,:1]*0, a], -1)
|
||||
#print(a)
|
||||
a = a[...,None,:]
|
||||
#print(idx)
|
||||
a = torch.broadcast_to(a, size0)
|
||||
idx = torch.broadcast_to(idx, size)
|
||||
#print(idx)
|
||||
return torch.gather(a, -1, idx)
|
||||
|
||||
def filter_iir_response(a, N):
|
||||
device = a.device
|
||||
L = a.size(-1)
|
||||
ar = a.flip(dims=(2,))
|
||||
size = (*(a.shape[:-1]), N)
|
||||
R = torch.zeros(size, device=device)
|
||||
R[:,:,0] = torch.ones((a.shape[:-1]), device=device)
|
||||
for i in range(1, L):
|
||||
R[:,:,i] = - torch.sum(ar[:,:,L-i-1:-1] * R[:,:,:i], axis=-1)
|
||||
#R[:,:,i] = - torch.einsum('ijk,ijk->ij', ar[:,:,L-i-1:-1], R[:,:,:i])
|
||||
for i in range(L, N):
|
||||
R[:,:,i] = - torch.sum(ar[:,:,:-1] * R[:,:,i-L+1:i], axis=-1)
|
||||
#R[:,:,i] = - torch.einsum('ijk,ijk->ij', ar[:,:,:-1], R[:,:,i-L+1:i])
|
||||
return R
|
||||
|
||||
if __name__ == '__main__':
|
||||
#a = torch.tensor([ [[1, -.9, 0.02], [1, -.8, .01]], [[1, .9, 0], [1, .8, 0]]])
|
||||
a = torch.tensor([ [[1, -.9, 0.02], [1, -.8, .01]]])
|
||||
A = toeplitz_from_filter(a)
|
||||
#print(A)
|
||||
R = filter_iir_response(a, 5)
|
||||
|
||||
RA = toeplitz_from_filter(R)
|
||||
print(RA)
|
||||
29
managed_components/78__esp-opus/dnn/torch/fargan/rc.py
Normal file
29
managed_components/78__esp-opus/dnn/torch/fargan/rc.py
Normal file
@@ -0,0 +1,29 @@
|
||||
import torch
|
||||
|
||||
|
||||
|
||||
def rc2lpc(rc):
|
||||
order = rc.shape[-1]
|
||||
lpc=rc[...,0:1]
|
||||
for i in range(1, order):
|
||||
lpc = torch.cat([lpc + rc[...,i:i+1]*torch.flip(lpc,dims=(-1,)), rc[...,i:i+1]], -1)
|
||||
#print("to:", lpc)
|
||||
return lpc
|
||||
|
||||
def lpc2rc(lpc):
|
||||
order = lpc.shape[-1]
|
||||
rc = lpc[...,-1:]
|
||||
for i in range(order-1, 0, -1):
|
||||
ki = lpc[...,-1:]
|
||||
lpc = lpc[...,:-1]
|
||||
lpc = (lpc - ki*torch.flip(lpc,dims=(-1,)))/(1 - ki*ki)
|
||||
rc = torch.cat([lpc[...,-1:] , rc], -1)
|
||||
return rc
|
||||
|
||||
if __name__ == "__main__":
|
||||
rc = torch.tensor([[.5, -.5, .6, -.6]])
|
||||
print(rc)
|
||||
lpc = rc2lpc(rc)
|
||||
print(lpc)
|
||||
rc2 = lpc2rc(lpc)
|
||||
print(rc2)
|
||||
186
managed_components/78__esp-opus/dnn/torch/fargan/stft_loss.py
Normal file
186
managed_components/78__esp-opus/dnn/torch/fargan/stft_loss.py
Normal file
@@ -0,0 +1,186 @@
|
||||
"""STFT-based Loss modules."""
|
||||
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
import numpy as np
|
||||
import torchaudio
|
||||
|
||||
|
||||
def stft(x, fft_size, hop_size, win_length, window):
|
||||
"""Perform STFT and convert to magnitude spectrogram.
|
||||
Args:
|
||||
x (Tensor): Input signal tensor (B, T).
|
||||
fft_size (int): FFT size.
|
||||
hop_size (int): Hop size.
|
||||
win_length (int): Window length.
|
||||
window (str): Window function type.
|
||||
Returns:
|
||||
Tensor: Magnitude spectrogram (B, #frames, fft_size // 2 + 1).
|
||||
"""
|
||||
|
||||
#x_stft = torch.stft(x, fft_size, hop_size, win_length, window, return_complex=False)
|
||||
#real = x_stft[..., 0]
|
||||
#imag = x_stft[..., 1]
|
||||
|
||||
# (kan-bayashi): clamp is needed to avoid nan or inf
|
||||
#return torchaudio.functional.amplitude_to_DB(torch.abs(x_stft),db_multiplier=0.0, multiplier=20,amin=1e-05,top_db=80)
|
||||
#return torch.clamp(torch.abs(x_stft), min=1e-7)
|
||||
|
||||
x_stft = torch.stft(x, fft_size, hop_size, win_length, window, return_complex=True)
|
||||
return torch.clamp(torch.abs(x_stft), min=1e-7)
|
||||
|
||||
class SpectralConvergenceLoss(torch.nn.Module):
|
||||
"""Spectral convergence loss module."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initilize spectral convergence loss module."""
|
||||
super(SpectralConvergenceLoss, self).__init__()
|
||||
|
||||
def forward(self, x_mag, y_mag):
|
||||
"""Calculate forward propagation.
|
||||
Args:
|
||||
x_mag (Tensor): Magnitude spectrogram of predicted signal (B, #frames, #freq_bins).
|
||||
y_mag (Tensor): Magnitude spectrogram of groundtruth signal (B, #frames, #freq_bins).
|
||||
Returns:
|
||||
Tensor: Spectral convergence loss value.
|
||||
"""
|
||||
x_mag = torch.sqrt(x_mag)
|
||||
y_mag = torch.sqrt(y_mag)
|
||||
return torch.norm(y_mag - x_mag, p=1) / torch.norm(y_mag, p=1)
|
||||
|
||||
class LogSTFTMagnitudeLoss(torch.nn.Module):
|
||||
"""Log STFT magnitude loss module."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initilize los STFT magnitude loss module."""
|
||||
super(LogSTFTMagnitudeLoss, self).__init__()
|
||||
|
||||
def forward(self, x, y):
|
||||
"""Calculate forward propagation.
|
||||
Args:
|
||||
x_mag (Tensor): Magnitude spectrogram of predicted signal (B, #frames, #freq_bins).
|
||||
y_mag (Tensor): Magnitude spectrogram of groundtruth signal (B, #frames, #freq_bins).
|
||||
Returns:
|
||||
Tensor: Log STFT magnitude loss value.
|
||||
"""
|
||||
#F.l1_loss(torch.sqrt(y_mag), torch.sqrt(x_mag)) +
|
||||
#F.l1_loss(torchaudio.functional.amplitude_to_DB(y_mag,db_multiplier=0.0, multiplier=20,amin=1e-05,top_db=80),\
|
||||
#torchaudio.functional.amplitude_to_DB(x_mag,db_multiplier=0.0, multiplier=20,amin=1e-05,top_db=80))
|
||||
|
||||
#y_mag[:,:y_mag.size(1)//2,:] = y_mag[:,:y_mag.size(1)//2,:] *0.0
|
||||
|
||||
#return F.l1_loss(torch.log(y_mag) + torch.sqrt(y_mag), torch.log(x_mag) + torch.sqrt(x_mag))
|
||||
|
||||
#return F.l1_loss(y_mag, x_mag)
|
||||
|
||||
error_loss = F.l1_loss(y, x) #+ F.l1_loss(torch.sqrt(y), torch.sqrt(x))#F.l1_loss(torch.log(y), torch.log(x))#
|
||||
|
||||
#x = torch.log(x)
|
||||
#y = torch.log(y)
|
||||
#x = x.permute(0,2,1).contiguous()
|
||||
#y = y.permute(0,2,1).contiguous()
|
||||
|
||||
'''mean_x = torch.mean(x, dim=1, keepdim=True)
|
||||
mean_y = torch.mean(y, dim=1, keepdim=True)
|
||||
|
||||
var_x = torch.var(x, dim=1, keepdim=True)
|
||||
var_y = torch.var(y, dim=1, keepdim=True)
|
||||
|
||||
std_x = torch.std(x, dim=1, keepdim=True)
|
||||
std_y = torch.std(y, dim=1, keepdim=True)
|
||||
|
||||
x_minus_mean = x - mean_x
|
||||
y_minus_mean = y - mean_y
|
||||
|
||||
pearson_corr = torch.sum(x_minus_mean * y_minus_mean, dim=1, keepdim=True) / \
|
||||
(torch.sqrt(torch.sum(x_minus_mean ** 2, dim=1, keepdim=True) + 1e-7) * \
|
||||
torch.sqrt(torch.sum(y_minus_mean ** 2, dim=1, keepdim=True) + 1e-7))
|
||||
|
||||
numerator = 2.0 * pearson_corr * std_x * std_y
|
||||
denominator = var_x + var_y + (mean_y - mean_x)**2
|
||||
|
||||
ccc = numerator/denominator
|
||||
|
||||
ccc_loss = F.l1_loss(1.0 - ccc, torch.zeros_like(ccc))'''
|
||||
|
||||
return error_loss #+ ccc_loss#+ ccc_loss
|
||||
|
||||
|
||||
class STFTLoss(torch.nn.Module):
|
||||
"""STFT loss module."""
|
||||
|
||||
def __init__(self, device, fft_size=1024, shift_size=120, win_length=600, window="hann_window"):
|
||||
"""Initialize STFT loss module."""
|
||||
super(STFTLoss, self).__init__()
|
||||
self.fft_size = fft_size
|
||||
self.shift_size = shift_size
|
||||
self.win_length = win_length
|
||||
self.window = getattr(torch, window)(win_length).to(device)
|
||||
self.spectral_convergenge_loss = SpectralConvergenceLoss()
|
||||
self.log_stft_magnitude_loss = LogSTFTMagnitudeLoss()
|
||||
|
||||
def forward(self, x, y):
|
||||
"""Calculate forward propagation.
|
||||
Args:
|
||||
x (Tensor): Predicted signal (B, T).
|
||||
y (Tensor): Groundtruth signal (B, T).
|
||||
Returns:
|
||||
Tensor: Spectral convergence loss value.
|
||||
Tensor: Log STFT magnitude loss value.
|
||||
"""
|
||||
x_mag = stft(x, self.fft_size, self.shift_size, self.win_length, self.window)
|
||||
y_mag = stft(y, self.fft_size, self.shift_size, self.win_length, self.window)
|
||||
sc_loss = self.spectral_convergenge_loss(x_mag, y_mag)
|
||||
mag_loss = self.log_stft_magnitude_loss(x_mag, y_mag)
|
||||
|
||||
return sc_loss, mag_loss
|
||||
|
||||
|
||||
class MultiResolutionSTFTLoss(torch.nn.Module):
|
||||
|
||||
'''def __init__(self,
|
||||
device,
|
||||
fft_sizes=[2048, 1024, 512, 256, 128, 64],
|
||||
hop_sizes=[512, 256, 128, 64, 32, 16],
|
||||
win_lengths=[2048, 1024, 512, 256, 128, 64],
|
||||
window="hann_window"):'''
|
||||
|
||||
'''def __init__(self,
|
||||
device,
|
||||
fft_sizes=[2048, 1024, 512, 256, 128, 64],
|
||||
hop_sizes=[256, 128, 64, 32, 16, 8],
|
||||
win_lengths=[1024, 512, 256, 128, 64, 32],
|
||||
window="hann_window"):'''
|
||||
|
||||
def __init__(self,
|
||||
device,
|
||||
fft_sizes=[2560, 1280, 640, 320, 160, 80],
|
||||
hop_sizes=[640, 320, 160, 80, 40, 20],
|
||||
win_lengths=[2560, 1280, 640, 320, 160, 80],
|
||||
window="hann_window"):
|
||||
|
||||
super(MultiResolutionSTFTLoss, self).__init__()
|
||||
assert len(fft_sizes) == len(hop_sizes) == len(win_lengths)
|
||||
self.stft_losses = torch.nn.ModuleList()
|
||||
for fs, ss, wl in zip(fft_sizes, hop_sizes, win_lengths):
|
||||
self.stft_losses += [STFTLoss(device, fs, ss, wl, window)]
|
||||
|
||||
def forward(self, x, y):
|
||||
"""Calculate forward propagation.
|
||||
Args:
|
||||
x (Tensor): Predicted signal (B, T).
|
||||
y (Tensor): Groundtruth signal (B, T).
|
||||
Returns:
|
||||
Tensor: Multi resolution spectral convergence loss value.
|
||||
Tensor: Multi resolution log STFT magnitude loss value.
|
||||
"""
|
||||
sc_loss = 0.0
|
||||
mag_loss = 0.0
|
||||
for f in self.stft_losses:
|
||||
sc_l, mag_l = f(x, y)
|
||||
sc_loss += sc_l
|
||||
#mag_loss += mag_l
|
||||
sc_loss /= len(self.stft_losses)
|
||||
mag_loss /= len(self.stft_losses)
|
||||
|
||||
return sc_loss #mag_loss #+
|
||||
128
managed_components/78__esp-opus/dnn/torch/fargan/test_fargan.py
Normal file
128
managed_components/78__esp-opus/dnn/torch/fargan/test_fargan.py
Normal file
@@ -0,0 +1,128 @@
|
||||
import os
|
||||
import argparse
|
||||
import numpy as np
|
||||
|
||||
import torch
|
||||
from torch import nn
|
||||
import torch.nn.functional as F
|
||||
import tqdm
|
||||
|
||||
import fargan
|
||||
from dataset import FARGANDataset
|
||||
|
||||
nb_features = 36
|
||||
nb_used_features = 20
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument('model', type=str, help='CELPNet model')
|
||||
parser.add_argument('features', type=str, help='path to feature file in .f32 format')
|
||||
parser.add_argument('output', type=str, help='path to output file (16-bit PCM)')
|
||||
|
||||
parser.add_argument('--cuda-visible-devices', type=str, help="comma separates list of cuda visible device indices, default: CUDA_VISIBLE_DEVICES", default=None)
|
||||
|
||||
|
||||
model_group = parser.add_argument_group(title="model parameters")
|
||||
model_group.add_argument('--cond-size', type=int, help="first conditioning size, default: 256", default=256)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.cuda_visible_devices != None:
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda_visible_devices
|
||||
|
||||
|
||||
features_file = args.features
|
||||
signal_file = args.output
|
||||
|
||||
|
||||
|
||||
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
|
||||
|
||||
checkpoint = torch.load(args.model, map_location='cpu')
|
||||
|
||||
model = fargan.FARGAN(*checkpoint['model_args'], **checkpoint['model_kwargs'])
|
||||
|
||||
|
||||
model.load_state_dict(checkpoint['state_dict'], strict=False)
|
||||
|
||||
features = np.reshape(np.memmap(features_file, dtype='float32', mode='r'), (1, -1, nb_features))
|
||||
lpc = features[:,4-1:-1,nb_used_features:]
|
||||
features = features[:, :, :nb_used_features]
|
||||
#periods = np.round(50*features[:,:,nb_used_features-2]+100).astype('int')
|
||||
periods = np.round(np.clip(256./2**(features[:,:,nb_used_features-2]+1.5), 32, 255)).astype('int')
|
||||
|
||||
|
||||
nb_frames = features.shape[1]
|
||||
#nb_frames = 1000
|
||||
gamma = checkpoint['model_kwargs']['gamma']
|
||||
|
||||
def lpc_synthesis_one_frame(frame, filt, buffer, weighting_vector=np.ones(16)):
|
||||
|
||||
out = np.zeros_like(frame)
|
||||
filt = np.flip(filt)
|
||||
|
||||
inp = frame[:]
|
||||
|
||||
|
||||
for i in range(0, inp.shape[0]):
|
||||
|
||||
s = inp[i] - np.dot(buffer*weighting_vector, filt)
|
||||
|
||||
buffer[0] = s
|
||||
|
||||
buffer = np.roll(buffer, -1)
|
||||
|
||||
out[i] = s
|
||||
|
||||
return out
|
||||
|
||||
def inverse_perceptual_weighting (pw_signal, filters, weighting_vector):
|
||||
|
||||
#inverse perceptual weighting= H_preemph / W(z/gamma)
|
||||
|
||||
signal = np.zeros_like(pw_signal)
|
||||
buffer = np.zeros(16)
|
||||
num_frames = pw_signal.shape[0] //160
|
||||
assert num_frames == filters.shape[0]
|
||||
for frame_idx in range(0, num_frames):
|
||||
|
||||
in_frame = pw_signal[frame_idx*160: (frame_idx+1)*160][:]
|
||||
out_sig_frame = lpc_synthesis_one_frame(in_frame, filters[frame_idx, :], buffer, weighting_vector)
|
||||
signal[frame_idx*160: (frame_idx+1)*160] = out_sig_frame[:]
|
||||
buffer[:] = out_sig_frame[-16:]
|
||||
return signal
|
||||
|
||||
def inverse_perceptual_weighting40 (pw_signal, filters):
|
||||
|
||||
#inverse perceptual weighting= H_preemph / W(z/gamma)
|
||||
|
||||
signal = np.zeros_like(pw_signal)
|
||||
buffer = np.zeros(16)
|
||||
num_frames = pw_signal.shape[0] //40
|
||||
assert num_frames == filters.shape[0]
|
||||
for frame_idx in range(0, num_frames):
|
||||
in_frame = pw_signal[frame_idx*40: (frame_idx+1)*40][:]
|
||||
out_sig_frame = lpc_synthesis_one_frame(in_frame, filters[frame_idx, :], buffer)
|
||||
signal[frame_idx*40: (frame_idx+1)*40] = out_sig_frame[:]
|
||||
buffer[:] = out_sig_frame[-16:]
|
||||
return signal
|
||||
|
||||
from scipy.signal import lfilter
|
||||
|
||||
if __name__ == '__main__':
|
||||
model.to(device)
|
||||
features = torch.tensor(features).to(device)
|
||||
#lpc = torch.tensor(lpc).to(device)
|
||||
periods = torch.tensor(periods).to(device)
|
||||
weighting = gamma**np.arange(1, 17)
|
||||
lpc = lpc*weighting
|
||||
lpc = fargan.interp_lpc(torch.tensor(lpc), 4).numpy()
|
||||
|
||||
sig, _ = model(features, periods, nb_frames - 4)
|
||||
#weighting_vector = np.array([gamma**i for i in range(16,0,-1)])
|
||||
sig = sig.detach().numpy().flatten()
|
||||
sig = lfilter(np.array([1.]), np.array([1., -.85]), sig)
|
||||
#sig = inverse_perceptual_weighting40(sig, lpc[0,:,:])
|
||||
|
||||
pcm = np.round(32768*np.clip(sig, a_max=.99, a_min=-.99)).astype('int16')
|
||||
pcm.tofile(signal_file)
|
||||
169
managed_components/78__esp-opus/dnn/torch/fargan/train_fargan.py
Normal file
169
managed_components/78__esp-opus/dnn/torch/fargan/train_fargan.py
Normal file
@@ -0,0 +1,169 @@
|
||||
import os
|
||||
import argparse
|
||||
import random
|
||||
import numpy as np
|
||||
|
||||
import torch
|
||||
from torch import nn
|
||||
import torch.nn.functional as F
|
||||
import tqdm
|
||||
|
||||
import fargan
|
||||
from dataset import FARGANDataset
|
||||
from stft_loss import *
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument('features', type=str, help='path to feature file in .f32 format')
|
||||
parser.add_argument('signal', type=str, help='path to signal file in .s16 format')
|
||||
parser.add_argument('output', type=str, help='path to output folder')
|
||||
|
||||
parser.add_argument('--suffix', type=str, help="model name suffix", default="")
|
||||
parser.add_argument('--cuda-visible-devices', type=str, help="comma separates list of cuda visible device indices, default: CUDA_VISIBLE_DEVICES", default=None)
|
||||
|
||||
|
||||
model_group = parser.add_argument_group(title="model parameters")
|
||||
model_group.add_argument('--cond-size', type=int, help="first conditioning size, default: 256", default=256)
|
||||
model_group.add_argument('--gamma', type=float, help="Use A(z/gamma), default: 0.9", default=0.9)
|
||||
model_group.add_argument('--softquant', action="store_true", help="enables soft quantization during training")
|
||||
|
||||
training_group = parser.add_argument_group(title="training parameters")
|
||||
training_group.add_argument('--batch-size', type=int, help="batch size, default: 512", default=512)
|
||||
training_group.add_argument('--lr', type=float, help='learning rate, default: 1e-3', default=1e-3)
|
||||
training_group.add_argument('--epochs', type=int, help='number of training epochs, default: 20', default=20)
|
||||
training_group.add_argument('--sequence-length', type=int, help='sequence length, default: 15', default=15)
|
||||
training_group.add_argument('--lr-decay', type=float, help='learning rate decay factor, default: 1e-4', default=1e-4)
|
||||
training_group.add_argument('--initial-checkpoint', type=str, help='initial checkpoint to start training from, default: None', default=None)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.cuda_visible_devices != None:
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda_visible_devices
|
||||
|
||||
# checkpoints
|
||||
checkpoint_dir = os.path.join(args.output, 'checkpoints')
|
||||
checkpoint = dict()
|
||||
os.makedirs(checkpoint_dir, exist_ok=True)
|
||||
|
||||
|
||||
# training parameters
|
||||
batch_size = args.batch_size
|
||||
lr = args.lr
|
||||
epochs = args.epochs
|
||||
sequence_length = args.sequence_length
|
||||
lr_decay = args.lr_decay
|
||||
|
||||
adam_betas = [0.8, 0.95]
|
||||
adam_eps = 1e-8
|
||||
features_file = args.features
|
||||
signal_file = args.signal
|
||||
|
||||
# model parameters
|
||||
cond_size = args.cond_size
|
||||
|
||||
|
||||
checkpoint['batch_size'] = batch_size
|
||||
checkpoint['lr'] = lr
|
||||
checkpoint['lr_decay'] = lr_decay
|
||||
checkpoint['epochs'] = epochs
|
||||
checkpoint['sequence_length'] = sequence_length
|
||||
checkpoint['adam_betas'] = adam_betas
|
||||
|
||||
|
||||
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
|
||||
|
||||
checkpoint['model_args'] = ()
|
||||
checkpoint['model_kwargs'] = {'cond_size': cond_size, 'gamma': args.gamma, 'softquant': args.softquant}
|
||||
print(checkpoint['model_kwargs'])
|
||||
model = fargan.FARGAN(*checkpoint['model_args'], **checkpoint['model_kwargs'])
|
||||
|
||||
#model = fargan.FARGAN()
|
||||
#model = nn.DataParallel(model)
|
||||
|
||||
if type(args.initial_checkpoint) != type(None):
|
||||
checkpoint = torch.load(args.initial_checkpoint, map_location='cpu')
|
||||
model.load_state_dict(checkpoint['state_dict'], strict=False)
|
||||
|
||||
checkpoint['state_dict'] = model.state_dict()
|
||||
|
||||
|
||||
dataset = FARGANDataset(features_file, signal_file, sequence_length=sequence_length)
|
||||
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
|
||||
|
||||
|
||||
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, betas=adam_betas, eps=adam_eps)
|
||||
|
||||
|
||||
# learning rate scheduler
|
||||
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lambda x : 1 / (1 + lr_decay * x))
|
||||
|
||||
states = None
|
||||
|
||||
spect_loss = MultiResolutionSTFTLoss(device).to(device)
|
||||
|
||||
if __name__ == '__main__':
|
||||
model.to(device)
|
||||
|
||||
for epoch in range(1, epochs + 1):
|
||||
|
||||
running_specc = 0
|
||||
running_cont_loss = 0
|
||||
running_loss = 0
|
||||
|
||||
print(f"training epoch {epoch}...")
|
||||
with tqdm.tqdm(dataloader, unit='batch') as tepoch:
|
||||
for i, (features, periods, target, lpc) in enumerate(tepoch):
|
||||
optimizer.zero_grad()
|
||||
features = features.to(device)
|
||||
#lpc = torch.tensor(fargan.interp_lpc(lpc.numpy(), 4))
|
||||
#print("interp size", lpc.shape)
|
||||
#lpc = lpc.to(device)
|
||||
#lpc = lpc*(args.gamma**torch.arange(1,17, device=device))
|
||||
#lpc = fargan.interp_lpc(lpc, 4)
|
||||
periods = periods.to(device)
|
||||
if (np.random.rand() > 0.1):
|
||||
target = target[:, :sequence_length*160]
|
||||
#lpc = lpc[:,:sequence_length*4,:]
|
||||
features = features[:,:sequence_length+4,:]
|
||||
periods = periods[:,:sequence_length+4]
|
||||
else:
|
||||
target=target[::2, :]
|
||||
#lpc=lpc[::2,:]
|
||||
features=features[::2,:]
|
||||
periods=periods[::2,:]
|
||||
target = target.to(device)
|
||||
#print(target.shape, lpc.shape)
|
||||
#target = fargan.analysis_filter(target, lpc[:,:,:], nb_subframes=1, gamma=args.gamma)
|
||||
|
||||
#nb_pre = random.randrange(1, 6)
|
||||
nb_pre = 2
|
||||
pre = target[:, :nb_pre*160]
|
||||
sig, states = model(features, periods, target.size(1)//160 - nb_pre, pre=pre, states=None)
|
||||
sig = torch.cat([pre, sig], -1)
|
||||
|
||||
cont_loss = fargan.sig_loss(target[:, nb_pre*160:nb_pre*160+160], sig[:, nb_pre*160:nb_pre*160+160])
|
||||
specc_loss = spect_loss(sig, target.detach())
|
||||
loss = .03*cont_loss + specc_loss
|
||||
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
|
||||
#model.clip_weights()
|
||||
|
||||
scheduler.step()
|
||||
|
||||
running_specc += specc_loss.detach().cpu().item()
|
||||
running_cont_loss += cont_loss.detach().cpu().item()
|
||||
|
||||
running_loss += loss.detach().cpu().item()
|
||||
tepoch.set_postfix(loss=f"{running_loss/(i+1):8.5f}",
|
||||
cont_loss=f"{running_cont_loss/(i+1):8.5f}",
|
||||
specc=f"{running_specc/(i+1):8.5f}",
|
||||
)
|
||||
|
||||
# save checkpoint
|
||||
checkpoint_path = os.path.join(checkpoint_dir, f'fargan{args.suffix}_{epoch}.pth')
|
||||
checkpoint['state_dict'] = model.state_dict()
|
||||
checkpoint['loss'] = running_loss / len(dataloader)
|
||||
checkpoint['epoch'] = epoch
|
||||
torch.save(checkpoint, checkpoint_path)
|
||||
@@ -0,0 +1,88 @@
|
||||
import os
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
import torch
|
||||
from torch import nn
|
||||
|
||||
|
||||
sys.path.append(os.path.join(os.path.split(__file__)[0], '../weight-exchange'))
|
||||
import wexchange.torch
|
||||
|
||||
from models import model_dict
|
||||
|
||||
unquantized = [
|
||||
'bfcc_with_corr_upsampler.fc',
|
||||
'cont_net.0',
|
||||
'fwc6.cont_fc.0',
|
||||
'fwc6.fc.0',
|
||||
'fwc6.fc.1.gate',
|
||||
'fwc7.cont_fc.0',
|
||||
'fwc7.fc.0',
|
||||
'fwc7.fc.1.gate'
|
||||
]
|
||||
|
||||
description=f"""
|
||||
This is an unsafe dumping script for FWGAN models. It assumes that all weights are included in Linear, Conv1d or GRU layer
|
||||
and will fail to export any other weights.
|
||||
|
||||
Furthermore, the quanitze option relies on the following explicit list of layers to be excluded:
|
||||
{unquantized}.
|
||||
|
||||
Modify this script manually if adjustments are needed.
|
||||
"""
|
||||
|
||||
parser = argparse.ArgumentParser(description=description)
|
||||
parser.add_argument('model', choices=['fwgan400', 'fwgan500'], help='model name')
|
||||
parser.add_argument('weightfile', type=str, help='weight file path')
|
||||
parser.add_argument('export_folder', type=str)
|
||||
parser.add_argument('--export-filename', type=str, default='fwgan_data', help='filename for source and header file (.c and .h will be added), defaults to fwgan_data')
|
||||
parser.add_argument('--struct-name', type=str, default='FWGAN', help='name for C struct, defaults to FWGAN')
|
||||
parser.add_argument('--quantize', action='store_true', help='apply quantization')
|
||||
|
||||
if __name__ == "__main__":
|
||||
args = parser.parse_args()
|
||||
|
||||
model = model_dict[args.model]()
|
||||
|
||||
print(f"loading weights from {args.weightfile}...")
|
||||
saved_gen= torch.load(args.weightfile, map_location='cpu')
|
||||
model.load_state_dict(saved_gen)
|
||||
def _remove_weight_norm(m):
|
||||
try:
|
||||
torch.nn.utils.remove_weight_norm(m)
|
||||
except ValueError: # this module didn't have weight norm
|
||||
return
|
||||
model.apply(_remove_weight_norm)
|
||||
|
||||
|
||||
print("dumping model...")
|
||||
quantize_model=args.quantize
|
||||
|
||||
output_folder = args.export_folder
|
||||
os.makedirs(output_folder, exist_ok=True)
|
||||
|
||||
writer = wexchange.c_export.c_writer.CWriter(os.path.join(output_folder, args.export_filename), model_struct_name=args.struct_name)
|
||||
|
||||
for name, module in model.named_modules():
|
||||
|
||||
if quantize_model:
|
||||
quantize=name not in unquantized
|
||||
scale = None if quantize else 1/128
|
||||
else:
|
||||
quantize=False
|
||||
scale=1/128
|
||||
|
||||
if isinstance(module, nn.Linear):
|
||||
print(f"dumping linear layer {name}...")
|
||||
wexchange.torch.dump_torch_dense_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
|
||||
|
||||
if isinstance(module, nn.Conv1d):
|
||||
print(f"dumping conv1d layer {name}...")
|
||||
wexchange.torch.dump_torch_conv1d_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
|
||||
|
||||
if isinstance(module, nn.GRU):
|
||||
print(f"dumping GRU layer {name}...")
|
||||
wexchange.torch.dump_torch_gru_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale, recurrent_scale=scale)
|
||||
|
||||
writer.close()
|
||||
141
managed_components/78__esp-opus/dnn/torch/fwgan/inference.py
Normal file
141
managed_components/78__esp-opus/dnn/torch/fwgan/inference.py
Normal file
@@ -0,0 +1,141 @@
|
||||
import os
|
||||
import time
|
||||
import torch
|
||||
import numpy as np
|
||||
from scipy import signal as si
|
||||
from scipy.io import wavfile
|
||||
import argparse
|
||||
|
||||
from models import model_dict
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('model', choices=['fwgan400', 'fwgan500'], help='model name')
|
||||
parser.add_argument('weightfile', type=str, help='weight file')
|
||||
parser.add_argument('input', type=str, help='input: feature file or folder with feature files')
|
||||
parser.add_argument('output', type=str, help='output: wav file name or folder name, depending on input')
|
||||
|
||||
|
||||
########################### Signal Processing Layers ###########################
|
||||
|
||||
def preemphasis(x, coef= -0.85):
|
||||
|
||||
return si.lfilter(np.array([1.0, coef]), np.array([1.0]), x).astype('float32')
|
||||
|
||||
def deemphasis(x, coef= -0.85):
|
||||
|
||||
return si.lfilter(np.array([1.0]), np.array([1.0, coef]), x).astype('float32')
|
||||
|
||||
gamma = 0.92
|
||||
weighting_vector = np.array([gamma**i for i in range(16,0,-1)])
|
||||
|
||||
|
||||
def lpc_synthesis_one_frame(frame, filt, buffer, weighting_vector=np.ones(16)):
|
||||
|
||||
out = np.zeros_like(frame)
|
||||
|
||||
filt = np.flip(filt)
|
||||
|
||||
inp = frame[:]
|
||||
|
||||
|
||||
for i in range(0, inp.shape[0]):
|
||||
|
||||
s = inp[i] - np.dot(buffer*weighting_vector, filt)
|
||||
|
||||
buffer[0] = s
|
||||
|
||||
buffer = np.roll(buffer, -1)
|
||||
|
||||
out[i] = s
|
||||
|
||||
return out
|
||||
|
||||
def inverse_perceptual_weighting (pw_signal, filters, weighting_vector):
|
||||
|
||||
#inverse perceptual weighting= H_preemph / W(z/gamma)
|
||||
|
||||
pw_signal = preemphasis(pw_signal)
|
||||
|
||||
signal = np.zeros_like(pw_signal)
|
||||
buffer = np.zeros(16)
|
||||
num_frames = pw_signal.shape[0] //160
|
||||
assert num_frames == filters.shape[0]
|
||||
|
||||
for frame_idx in range(0, num_frames):
|
||||
|
||||
in_frame = pw_signal[frame_idx*160: (frame_idx+1)*160][:]
|
||||
out_sig_frame = lpc_synthesis_one_frame(in_frame, filters[frame_idx, :], buffer, weighting_vector)
|
||||
signal[frame_idx*160: (frame_idx+1)*160] = out_sig_frame[:]
|
||||
buffer[:] = out_sig_frame[-16:]
|
||||
|
||||
return signal
|
||||
|
||||
|
||||
def process_item(generator, feature_filename, output_filename, verbose=False):
|
||||
|
||||
feat = np.memmap(feature_filename, dtype='float32', mode='r')
|
||||
|
||||
num_feat_frames = len(feat) // 36
|
||||
feat = np.reshape(feat, (num_feat_frames, 36))
|
||||
|
||||
bfcc = np.copy(feat[:, :18])
|
||||
corr = np.copy(feat[:, 19:20]) + 0.5
|
||||
bfcc_with_corr = torch.from_numpy(np.hstack((bfcc, corr))).type(torch.FloatTensor).unsqueeze(0)#.to(device)
|
||||
|
||||
period = torch.from_numpy((0.1 + 50 * np.copy(feat[:, 18:19]) + 100)\
|
||||
.astype('int32')).type(torch.long).view(1,-1)#.to(device)
|
||||
|
||||
lpc_filters = np.copy(feat[:, -16:])
|
||||
|
||||
start_time = time.time()
|
||||
x1 = generator(period, bfcc_with_corr, torch.zeros(1,320)) #this means the vocoder runs in complete synthesis mode with zero history audio frames
|
||||
end_time = time.time()
|
||||
total_time = end_time - start_time
|
||||
x1 = x1.squeeze(1).squeeze(0).detach().cpu().numpy()
|
||||
gen_seconds = len(x1)/16000
|
||||
out = deemphasis(inverse_perceptual_weighting(x1, lpc_filters, weighting_vector))
|
||||
if verbose:
|
||||
print(f"Took {total_time:.3f}s to generate {len(x1)} samples ({gen_seconds}s) -> {gen_seconds/total_time:.2f}x real time")
|
||||
|
||||
out = np.clip(np.round(2**15 * out), -2**15, 2**15 -1).astype(np.int16)
|
||||
wavfile.write(output_filename, 16000, out)
|
||||
|
||||
|
||||
########################### The inference loop over folder containing lpcnet feature files #################################
|
||||
if __name__ == "__main__":
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
generator = model_dict[args.model]()
|
||||
|
||||
|
||||
#Load the FWGAN500Hz Checkpoint
|
||||
saved_gen= torch.load(args.weightfile, map_location='cpu')
|
||||
generator.load_state_dict(saved_gen)
|
||||
|
||||
#this is just to remove the weight_norm from the model layers as it's no longer needed
|
||||
def _remove_weight_norm(m):
|
||||
try:
|
||||
torch.nn.utils.remove_weight_norm(m)
|
||||
except ValueError: # this module didn't have weight norm
|
||||
return
|
||||
generator.apply(_remove_weight_norm)
|
||||
|
||||
#enable inference mode
|
||||
generator = generator.eval()
|
||||
|
||||
print('Successfully loaded the generator model ... start generation:')
|
||||
|
||||
if os.path.isdir(args.input):
|
||||
|
||||
os.makedirs(args.output, exist_ok=True)
|
||||
|
||||
for fn in os.listdir(args.input):
|
||||
print(f"processing input {fn}...")
|
||||
feature_filename = os.path.join(args.input, fn)
|
||||
output_filename = os.path.join(args.output, os.path.splitext(fn)[0] + f"_{args.model}.wav")
|
||||
process_item(generator, feature_filename, output_filename)
|
||||
else:
|
||||
process_item(generator, args.input, args.output)
|
||||
|
||||
print("Finished!")
|
||||
@@ -0,0 +1,7 @@
|
||||
from .fwgan400 import FWGAN400ContLarge
|
||||
from .fwgan500 import FWGAN500Cont
|
||||
|
||||
model_dict = {
|
||||
'fwgan400': FWGAN400ContLarge,
|
||||
'fwgan500': FWGAN500Cont
|
||||
}
|
||||
@@ -0,0 +1,308 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
from torch.nn.utils import weight_norm
|
||||
import numpy as np
|
||||
|
||||
which_norm = weight_norm
|
||||
|
||||
#################### Definition of basic model components ####################
|
||||
|
||||
#Convolutional layer with 1 frame look-ahead (used for feature PreCondNet)
|
||||
class ConvLookahead(nn.Module):
|
||||
def __init__(self, in_ch, out_ch, kernel_size, dilation=1, groups=1, bias= False):
|
||||
super(ConvLookahead, self).__init__()
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.padding_left = (kernel_size - 2) * dilation
|
||||
self.padding_right = 1 * dilation
|
||||
|
||||
self.conv = which_norm(nn.Conv1d(in_ch,out_ch,kernel_size,dilation=dilation, groups=groups, bias= bias))
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
x = F.pad(x,(self.padding_left, self.padding_right))
|
||||
conv_out = self.conv(x)
|
||||
return conv_out
|
||||
|
||||
#(modified) GLU Activation layer definition
|
||||
class GLU(nn.Module):
|
||||
def __init__(self, feat_size):
|
||||
super(GLU, self).__init__()
|
||||
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.gate = which_norm(nn.Linear(feat_size, feat_size, bias=False))
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
|
||||
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
out = torch.tanh(x) * torch.sigmoid(self.gate(x))
|
||||
|
||||
return out
|
||||
|
||||
#GRU layer definition
|
||||
class ContForwardGRU(nn.Module):
|
||||
def __init__(self, input_size, hidden_size, num_layers=1):
|
||||
super(ContForwardGRU, self).__init__()
|
||||
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.hidden_size = hidden_size
|
||||
|
||||
self.cont_fc = nn.Sequential(which_norm(nn.Linear(64, self.hidden_size, bias=False)),
|
||||
nn.Tanh())
|
||||
|
||||
self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True,\
|
||||
bias=False)
|
||||
|
||||
self.nl = GLU(self.hidden_size)
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x, x0):
|
||||
|
||||
self.gru.flatten_parameters()
|
||||
|
||||
h0 = self.cont_fc(x0).unsqueeze(0)
|
||||
|
||||
output, h0 = self.gru(x, h0)
|
||||
|
||||
return self.nl(output)
|
||||
|
||||
# Framewise convolution layer definition
|
||||
class ContFramewiseConv(torch.nn.Module):
|
||||
|
||||
def __init__(self, frame_len, out_dim, frame_kernel_size=3, act='glu', causal=True):
|
||||
|
||||
super(ContFramewiseConv, self).__init__()
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.frame_kernel_size = frame_kernel_size
|
||||
self.frame_len = frame_len
|
||||
|
||||
if (causal == True) or (self.frame_kernel_size == 2):
|
||||
|
||||
self.required_pad_left = (self.frame_kernel_size - 1) * self.frame_len
|
||||
self.required_pad_right = 0
|
||||
|
||||
self.cont_fc = nn.Sequential(which_norm(nn.Linear(64, self.required_pad_left, bias=False)),
|
||||
nn.Tanh()
|
||||
)
|
||||
|
||||
else:
|
||||
|
||||
self.required_pad_left = (self.frame_kernel_size - 1)//2 * self.frame_len
|
||||
self.required_pad_right = (self.frame_kernel_size - 1)//2 * self.frame_len
|
||||
|
||||
self.fc_input_dim = self.frame_kernel_size * self.frame_len
|
||||
self.fc_out_dim = out_dim
|
||||
|
||||
if act=='glu':
|
||||
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
|
||||
GLU(self.fc_out_dim)
|
||||
)
|
||||
if act=='tanh':
|
||||
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
|
||||
nn.Tanh()
|
||||
)
|
||||
|
||||
self.init_weights()
|
||||
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
|
||||
isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x, x0):
|
||||
|
||||
if self.frame_kernel_size == 1:
|
||||
return self.fc(x)
|
||||
|
||||
x_flat = x.reshape(x.size(0),1,-1)
|
||||
pad = self.cont_fc(x0).view(x0.size(0),1,-1)
|
||||
x_flat_padded = torch.cat((pad, x_flat), dim=-1).unsqueeze(2)
|
||||
|
||||
x_flat_padded_unfolded = F.unfold(x_flat_padded,\
|
||||
kernel_size= (1,self.fc_input_dim), stride=self.frame_len).permute(0,2,1).contiguous()
|
||||
|
||||
out = self.fc(x_flat_padded_unfolded)
|
||||
return out
|
||||
|
||||
# A fully-connected based upsampling layer definition
|
||||
class UpsampleFC(nn.Module):
|
||||
def __init__(self, in_ch, out_ch, upsample_factor):
|
||||
super(UpsampleFC, self).__init__()
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.in_ch = in_ch
|
||||
self.out_ch = out_ch
|
||||
self.upsample_factor = upsample_factor
|
||||
self.fc = nn.Linear(in_ch, out_ch * upsample_factor, bias=False)
|
||||
self.nl = nn.Tanh()
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or\
|
||||
isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
batch_size = x.size(0)
|
||||
x = x.permute(0, 2, 1)
|
||||
x = self.nl(self.fc(x))
|
||||
x = x.reshape((batch_size, -1, self.out_ch))
|
||||
x = x.permute(0, 2, 1)
|
||||
return x
|
||||
|
||||
########################### The complete model definition #################################
|
||||
|
||||
class FWGAN400ContLarge(nn.Module):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.bfcc_with_corr_upsampler = UpsampleFC(19,80,4)
|
||||
|
||||
self.feat_in_conv1 = ConvLookahead(160,256,kernel_size=5)
|
||||
self.feat_in_nl1 = GLU(256)
|
||||
|
||||
self.cont_net = nn.Sequential(which_norm(nn.Linear(321, 160, bias=False)),
|
||||
nn.Tanh(),
|
||||
which_norm(nn.Linear(160, 160, bias=False)),
|
||||
nn.Tanh(),
|
||||
which_norm(nn.Linear(160, 80, bias=False)),
|
||||
nn.Tanh(),
|
||||
which_norm(nn.Linear(80, 80, bias=False)),
|
||||
nn.Tanh(),
|
||||
which_norm(nn.Linear(80, 64, bias=False)),
|
||||
nn.Tanh(),
|
||||
which_norm(nn.Linear(64, 64, bias=False)),
|
||||
nn.Tanh())
|
||||
|
||||
self.rnn = ContForwardGRU(256,256)
|
||||
|
||||
self.fwc1 = ContFramewiseConv(256, 256)
|
||||
self.fwc2 = ContFramewiseConv(256, 128)
|
||||
self.fwc3 = ContFramewiseConv(128, 128)
|
||||
self.fwc4 = ContFramewiseConv(128, 64)
|
||||
self.fwc5 = ContFramewiseConv(64, 64)
|
||||
self.fwc6 = ContFramewiseConv(64, 40)
|
||||
self.fwc7 = ContFramewiseConv(40, 40)
|
||||
|
||||
self.init_weights()
|
||||
self.count_parameters()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
|
||||
isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def count_parameters(self):
|
||||
num_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
|
||||
print(f"Total number of {self.__class__.__name__} network parameters = {num_params}\n")
|
||||
|
||||
def create_phase_signals(self, periods):
|
||||
|
||||
batch_size = periods.size(0)
|
||||
progression = torch.arange(1, 160 + 1, dtype=periods.dtype, device=periods.device).view((1, -1))
|
||||
progression = torch.repeat_interleave(progression, batch_size, 0)
|
||||
|
||||
phase0 = torch.zeros(batch_size, dtype=periods.dtype, device=periods.device).unsqueeze(-1)
|
||||
chunks = []
|
||||
for sframe in range(periods.size(1)):
|
||||
f = (2.0 * torch.pi / periods[:, sframe]).unsqueeze(-1)
|
||||
|
||||
chunk_sin = torch.sin(f * progression + phase0)
|
||||
chunk_sin = chunk_sin.reshape(chunk_sin.size(0),-1,40)
|
||||
|
||||
chunk_cos = torch.cos(f * progression + phase0)
|
||||
chunk_cos = chunk_cos.reshape(chunk_cos.size(0),-1,40)
|
||||
|
||||
chunk = torch.cat((chunk_sin, chunk_cos), dim = -1)
|
||||
|
||||
phase0 = phase0 + 160 * f
|
||||
|
||||
chunks.append(chunk)
|
||||
|
||||
phase_signals = torch.cat(chunks, dim=1)
|
||||
|
||||
return phase_signals
|
||||
|
||||
|
||||
def gain_multiply(self, x, c0):
|
||||
|
||||
gain = 10**(0.5*c0/np.sqrt(18.0))
|
||||
gain = torch.repeat_interleave(gain, 160, dim=-1)
|
||||
gain = gain.reshape(gain.size(0),1,-1).squeeze(1)
|
||||
|
||||
return x * gain
|
||||
|
||||
def forward(self, pitch_period, bfcc_with_corr, x0):
|
||||
|
||||
norm_x0 = torch.norm(x0,2, dim=-1, keepdim=True)
|
||||
x0 = x0 / torch.sqrt((1e-8) + norm_x0**2)
|
||||
x0 = torch.cat((torch.log(norm_x0 + 1e-7), x0), dim=-1)
|
||||
|
||||
p_embed = self.create_phase_signals(pitch_period).permute(0, 2, 1).contiguous()
|
||||
|
||||
envelope = self.bfcc_with_corr_upsampler(bfcc_with_corr.permute(0,2,1).contiguous())
|
||||
|
||||
feat_in = torch.cat((p_embed , envelope), dim=1)
|
||||
|
||||
wav_latent1 = self.feat_in_nl1(self.feat_in_conv1(feat_in).permute(0,2,1).contiguous())
|
||||
|
||||
cont_latent = self.cont_net(x0)
|
||||
|
||||
rnn_out = self.rnn(wav_latent1, cont_latent)
|
||||
|
||||
fwc1_out = self.fwc1(rnn_out, cont_latent)
|
||||
|
||||
fwc2_out = self.fwc2(fwc1_out, cont_latent)
|
||||
|
||||
fwc3_out = self.fwc3(fwc2_out, cont_latent)
|
||||
|
||||
fwc4_out = self.fwc4(fwc3_out, cont_latent)
|
||||
|
||||
fwc5_out = self.fwc5(fwc4_out, cont_latent)
|
||||
|
||||
fwc6_out = self.fwc6(fwc5_out, cont_latent)
|
||||
|
||||
fwc7_out = self.fwc7(fwc6_out, cont_latent)
|
||||
|
||||
waveform = fwc7_out.reshape(fwc7_out.size(0),1,-1).squeeze(1)
|
||||
|
||||
waveform = self.gain_multiply(waveform,bfcc_with_corr[:,:,:1])
|
||||
|
||||
return waveform
|
||||
@@ -0,0 +1,260 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
from torch.nn.utils import weight_norm
|
||||
import numpy as np
|
||||
|
||||
|
||||
which_norm = weight_norm
|
||||
|
||||
#################### Definition of basic model components ####################
|
||||
|
||||
#Convolutional layer with 1 frame look-ahead (used for feature PreCondNet)
|
||||
class ConvLookahead(nn.Module):
|
||||
def __init__(self, in_ch, out_ch, kernel_size, dilation=1, groups=1, bias= False):
|
||||
super(ConvLookahead, self).__init__()
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.padding_left = (kernel_size - 2) * dilation
|
||||
self.padding_right = 1 * dilation
|
||||
|
||||
self.conv = which_norm(nn.Conv1d(in_ch,out_ch,kernel_size,dilation=dilation, groups=groups, bias= bias))
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
x = F.pad(x,(self.padding_left, self.padding_right))
|
||||
conv_out = self.conv(x)
|
||||
return conv_out
|
||||
|
||||
#(modified) GLU Activation layer definition
|
||||
class GLU(nn.Module):
|
||||
def __init__(self, feat_size):
|
||||
super(GLU, self).__init__()
|
||||
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.gate = which_norm(nn.Linear(feat_size, feat_size, bias=False))
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
|
||||
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
out = torch.tanh(x) * torch.sigmoid(self.gate(x))
|
||||
|
||||
return out
|
||||
|
||||
#GRU layer definition
|
||||
class ContForwardGRU(nn.Module):
|
||||
def __init__(self, input_size, hidden_size, num_layers=1):
|
||||
super(ContForwardGRU, self).__init__()
|
||||
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.hidden_size = hidden_size
|
||||
|
||||
#This is to initialize the layer with history audio samples for continuation.
|
||||
self.cont_fc = nn.Sequential(which_norm(nn.Linear(320, self.hidden_size, bias=False)),
|
||||
nn.Tanh())
|
||||
|
||||
self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True,\
|
||||
bias=False)
|
||||
|
||||
self.nl = GLU(self.hidden_size)
|
||||
|
||||
self.init_weights()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x, x0):
|
||||
|
||||
self.gru.flatten_parameters()
|
||||
|
||||
h0 = self.cont_fc(x0).unsqueeze(0)
|
||||
|
||||
output, h0 = self.gru(x, h0)
|
||||
|
||||
return self.nl(output)
|
||||
|
||||
# Framewise convolution layer definition
|
||||
class ContFramewiseConv(torch.nn.Module):
|
||||
|
||||
def __init__(self, frame_len, out_dim, frame_kernel_size=3, act='glu', causal=True):
|
||||
|
||||
super(ContFramewiseConv, self).__init__()
|
||||
torch.manual_seed(5)
|
||||
|
||||
self.frame_kernel_size = frame_kernel_size
|
||||
self.frame_len = frame_len
|
||||
|
||||
if (causal == True) or (self.frame_kernel_size == 2):
|
||||
|
||||
self.required_pad_left = (self.frame_kernel_size - 1) * self.frame_len
|
||||
self.required_pad_right = 0
|
||||
|
||||
#This is to initialize the layer with history audio samples for continuation.
|
||||
self.cont_fc = nn.Sequential(which_norm(nn.Linear(320, self.required_pad_left, bias=False)),
|
||||
nn.Tanh()
|
||||
)
|
||||
|
||||
else:
|
||||
#This means non-causal frame-wise convolution. We don't use it at the moment
|
||||
self.required_pad_left = (self.frame_kernel_size - 1)//2 * self.frame_len
|
||||
self.required_pad_right = (self.frame_kernel_size - 1)//2 * self.frame_len
|
||||
|
||||
self.fc_input_dim = self.frame_kernel_size * self.frame_len
|
||||
self.fc_out_dim = out_dim
|
||||
|
||||
if act=='glu':
|
||||
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
|
||||
GLU(self.fc_out_dim)
|
||||
)
|
||||
if act=='tanh':
|
||||
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
|
||||
nn.Tanh()
|
||||
)
|
||||
|
||||
self.init_weights()
|
||||
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
|
||||
isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def forward(self, x, x0):
|
||||
|
||||
if self.frame_kernel_size == 1:
|
||||
return self.fc(x)
|
||||
|
||||
x_flat = x.reshape(x.size(0),1,-1)
|
||||
pad = self.cont_fc(x0).view(x0.size(0),1,-1)
|
||||
x_flat_padded = torch.cat((pad, x_flat), dim=-1).unsqueeze(2)
|
||||
|
||||
x_flat_padded_unfolded = F.unfold(x_flat_padded,\
|
||||
kernel_size= (1,self.fc_input_dim), stride=self.frame_len).permute(0,2,1).contiguous()
|
||||
|
||||
out = self.fc(x_flat_padded_unfolded)
|
||||
return out
|
||||
|
||||
########################### The complete model definition #################################
|
||||
|
||||
class FWGAN500Cont(nn.Module):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
torch.manual_seed(5)
|
||||
|
||||
#PrecondNet:
|
||||
self.bfcc_with_corr_upsampler = nn.Sequential(nn.ConvTranspose1d(19,64,kernel_size=5,stride=5,padding=0,\
|
||||
bias=False),
|
||||
nn.Tanh())
|
||||
|
||||
self.feat_in_conv = ConvLookahead(128,256,kernel_size=5)
|
||||
self.feat_in_nl = GLU(256)
|
||||
|
||||
#GRU:
|
||||
self.rnn = ContForwardGRU(256,256)
|
||||
|
||||
#Frame-wise convolution stack:
|
||||
self.fwc1 = ContFramewiseConv(256, 256)
|
||||
self.fwc2 = ContFramewiseConv(256, 128)
|
||||
self.fwc3 = ContFramewiseConv(128, 128)
|
||||
self.fwc4 = ContFramewiseConv(128, 64)
|
||||
self.fwc5 = ContFramewiseConv(64, 64)
|
||||
self.fwc6 = ContFramewiseConv(64, 32)
|
||||
self.fwc7 = ContFramewiseConv(32, 32, act='tanh')
|
||||
|
||||
self.init_weights()
|
||||
self.count_parameters()
|
||||
|
||||
def init_weights(self):
|
||||
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
|
||||
isinstance(m, nn.Embedding):
|
||||
nn.init.orthogonal_(m.weight.data)
|
||||
|
||||
def count_parameters(self):
|
||||
num_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
|
||||
print(f"Total number of {self.__class__.__name__} network parameters = {num_params}\n")
|
||||
|
||||
def create_phase_signals(self, periods):
|
||||
|
||||
batch_size = periods.size(0)
|
||||
progression = torch.arange(1, 160 + 1, dtype=periods.dtype, device=periods.device).view((1, -1))
|
||||
progression = torch.repeat_interleave(progression, batch_size, 0)
|
||||
|
||||
phase0 = torch.zeros(batch_size, dtype=periods.dtype, device=periods.device).unsqueeze(-1)
|
||||
chunks = []
|
||||
for sframe in range(periods.size(1)):
|
||||
f = (2.0 * torch.pi / periods[:, sframe]).unsqueeze(-1)
|
||||
|
||||
chunk_sin = torch.sin(f * progression + phase0)
|
||||
chunk_sin = chunk_sin.reshape(chunk_sin.size(0),-1,32)
|
||||
|
||||
chunk_cos = torch.cos(f * progression + phase0)
|
||||
chunk_cos = chunk_cos.reshape(chunk_cos.size(0),-1,32)
|
||||
|
||||
chunk = torch.cat((chunk_sin, chunk_cos), dim = -1)
|
||||
|
||||
phase0 = phase0 + 160 * f
|
||||
|
||||
chunks.append(chunk)
|
||||
|
||||
phase_signals = torch.cat(chunks, dim=1)
|
||||
|
||||
return phase_signals
|
||||
|
||||
|
||||
def gain_multiply(self, x, c0):
|
||||
|
||||
gain = 10**(0.5*c0/np.sqrt(18.0))
|
||||
gain = torch.repeat_interleave(gain, 160, dim=-1)
|
||||
gain = gain.reshape(gain.size(0),1,-1).squeeze(1)
|
||||
|
||||
return x * gain
|
||||
|
||||
def forward(self, pitch_period, bfcc_with_corr, x0):
|
||||
|
||||
#This should create a latent representation of shape [Batch_dim, 500 frames, 256 elemets per frame]
|
||||
p_embed = self.create_phase_signals(pitch_period).permute(0, 2, 1).contiguous()
|
||||
envelope = self.bfcc_with_corr_upsampler(bfcc_with_corr.permute(0,2,1).contiguous())
|
||||
feat_in = torch.cat((p_embed , envelope), dim=1)
|
||||
wav_latent = self.feat_in_nl(self.feat_in_conv(feat_in).permute(0,2,1).contiguous())
|
||||
|
||||
#Generation with continuation using history samples x0 starts from here:
|
||||
|
||||
rnn_out = self.rnn(wav_latent, x0)
|
||||
|
||||
fwc1_out = self.fwc1(rnn_out, x0)
|
||||
fwc2_out = self.fwc2(fwc1_out, x0)
|
||||
fwc3_out = self.fwc3(fwc2_out, x0)
|
||||
fwc4_out = self.fwc4(fwc3_out, x0)
|
||||
fwc5_out = self.fwc5(fwc4_out, x0)
|
||||
fwc6_out = self.fwc6(fwc5_out, x0)
|
||||
fwc7_out = self.fwc7(fwc6_out, x0)
|
||||
|
||||
waveform_unscaled = fwc7_out.reshape(fwc7_out.size(0),1,-1).squeeze(1)
|
||||
waveform = self.gain_multiply(waveform_unscaled,bfcc_with_corr[:,:,:1])
|
||||
|
||||
return waveform
|
||||
27
managed_components/78__esp-opus/dnn/torch/lossgen/README.md
Normal file
27
managed_components/78__esp-opus/dnn/torch/lossgen/README.md
Normal file
@@ -0,0 +1,27 @@
|
||||
#Packet loss simulator
|
||||
|
||||
This code is an attempt at simulating better packet loss scenarios. The most common way of simulating
|
||||
packet loss is to use a random sequence where each packet loss event is uncorrelated with previous events.
|
||||
That is a simplistic model since we know that losses often occur in bursts. This model uses real data
|
||||
to build a generative model for packet loss.
|
||||
|
||||
We use the training data provided for the Audio Deep Packet Loss Concealment Challenge, which is available at:
|
||||
|
||||
http://plcchallenge2022pub.blob.core.windows.net/plcchallengearchive/test_train.tar.gz
|
||||
|
||||
To create the training data, run:
|
||||
|
||||
`./process_data.sh /<path>/test_train/train/lossy_signals/`
|
||||
|
||||
That will create an ascii loss\_sorted.txt file with all loss data sorted in increasing packet loss
|
||||
percentage. Then just run:
|
||||
|
||||
`python ./train_lossgen.py`
|
||||
|
||||
to train a model
|
||||
|
||||
To generate a sequence, run
|
||||
|
||||
`python3 ./test_lossgen.py <checkpoint> <percentage> output.txt --length 10000`
|
||||
|
||||
where <checkpoint> is the .pth model file and <percentage> is the amount of loss (e.g. 0.2 for 20% loss).
|
||||
@@ -0,0 +1,101 @@
|
||||
"""
|
||||
/* Copyright (c) 2022 Amazon
|
||||
Written by Jan Buethe */
|
||||
/*
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
"""
|
||||
|
||||
import os
|
||||
import argparse
|
||||
import sys
|
||||
|
||||
sys.path.append(os.path.join(os.path.dirname(__file__), '../weight-exchange'))
|
||||
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument('checkpoint', type=str, help='model checkpoint')
|
||||
parser.add_argument('output_dir', type=str, help='output folder')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
import torch
|
||||
import numpy as np
|
||||
|
||||
import lossgen
|
||||
from wexchange.torch import dump_torch_weights
|
||||
from wexchange.c_export import CWriter, print_vector
|
||||
|
||||
def c_export(args, model):
|
||||
|
||||
message = f"Auto generated from checkpoint {os.path.basename(args.checkpoint)}"
|
||||
|
||||
writer = CWriter(os.path.join(args.output_dir, "lossgen_data"), message=message, model_struct_name='LossGen', enable_binary_blob=False, add_typedef=True)
|
||||
writer.header.write(
|
||||
f"""
|
||||
#include "opus_types.h"
|
||||
"""
|
||||
)
|
||||
|
||||
dense_layers = [
|
||||
('dense_in', "lossgen_dense_in"),
|
||||
('dense_out', "lossgen_dense_out")
|
||||
]
|
||||
|
||||
|
||||
for name, export_name in dense_layers:
|
||||
layer = model.get_submodule(name)
|
||||
dump_torch_weights(writer, layer, name=export_name, verbose=True, quantize=False, scale=None)
|
||||
|
||||
|
||||
gru_layers = [
|
||||
("gru1", "lossgen_gru1"),
|
||||
("gru2", "lossgen_gru2"),
|
||||
]
|
||||
|
||||
max_rnn_units = max([dump_torch_weights(writer, model.get_submodule(name), export_name, verbose=True, input_sparse=False, quantize=True, scale=None, recurrent_scale=None)
|
||||
for name, export_name in gru_layers])
|
||||
|
||||
writer.header.write(
|
||||
f"""
|
||||
|
||||
#define LOSSGEN_MAX_RNN_UNITS {max_rnn_units}
|
||||
|
||||
"""
|
||||
)
|
||||
|
||||
writer.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
os.makedirs(args.output_dir, exist_ok=True)
|
||||
checkpoint = torch.load(args.checkpoint, map_location='cpu')
|
||||
model = lossgen.LossGen(*checkpoint['model_args'], **checkpoint['model_kwargs'])
|
||||
model.load_state_dict(checkpoint['state_dict'], strict=False)
|
||||
#model = LossGen()
|
||||
#checkpoint = torch.load(args.checkpoint, map_location='cpu')
|
||||
#model.load_state_dict(checkpoint['state_dict'])
|
||||
c_export(args, model)
|
||||
29
managed_components/78__esp-opus/dnn/torch/lossgen/lossgen.py
Normal file
29
managed_components/78__esp-opus/dnn/torch/lossgen/lossgen.py
Normal file
@@ -0,0 +1,29 @@
|
||||
import torch
|
||||
from torch import nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
class LossGen(nn.Module):
|
||||
def __init__(self, gru1_size=16, gru2_size=16):
|
||||
super(LossGen, self).__init__()
|
||||
|
||||
self.gru1_size = gru1_size
|
||||
self.gru2_size = gru2_size
|
||||
self.dense_in = nn.Linear(2, 8)
|
||||
self.gru1 = nn.GRU(8, self.gru1_size, batch_first=True)
|
||||
self.gru2 = nn.GRU(self.gru1_size, self.gru2_size, batch_first=True)
|
||||
self.dense_out = nn.Linear(self.gru2_size, 1)
|
||||
|
||||
def forward(self, loss, perc, states=None):
|
||||
#print(states)
|
||||
device = loss.device
|
||||
batch_size = loss.size(0)
|
||||
if states is None:
|
||||
gru1_state = torch.zeros((1, batch_size, self.gru1_size), device=device)
|
||||
gru2_state = torch.zeros((1, batch_size, self.gru2_size), device=device)
|
||||
else:
|
||||
gru1_state = states[0]
|
||||
gru2_state = states[1]
|
||||
x = torch.tanh(self.dense_in(torch.cat([loss, perc], dim=-1)))
|
||||
gru1_out, gru1_state = self.gru1(x, gru1_state)
|
||||
gru2_out, gru2_state = self.gru2(gru1_out, gru2_state)
|
||||
return self.dense_out(gru2_out), [gru1_state, gru2_state]
|
||||
@@ -0,0 +1,17 @@
|
||||
#!/bin/sh
|
||||
|
||||
#directory containing the loss files
|
||||
datadir=$1
|
||||
|
||||
for i in $datadir/*_is_lost.txt
|
||||
do
|
||||
perc=`cat $i | awk '{a+=$1}END{print a/NR}'`
|
||||
echo $perc $i
|
||||
done > percentage_list.txt
|
||||
|
||||
sort -n percentage_list.txt | awk '{print $2}' > percentage_sorted.txt
|
||||
|
||||
for i in `cat percentage_sorted.txt`
|
||||
do
|
||||
cat $i
|
||||
done > loss_sorted.txt
|
||||
@@ -0,0 +1,42 @@
|
||||
import lossgen
|
||||
import os
|
||||
import argparse
|
||||
import torch
|
||||
import numpy as np
|
||||
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument('model', type=str, help='CELPNet model')
|
||||
parser.add_argument('percentage', type=float, help='percentage loss')
|
||||
parser.add_argument('output', type=str, help='path to output file (ascii)')
|
||||
|
||||
parser.add_argument('--length', type=int, help="length of sequence to generate", default=500)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
|
||||
|
||||
checkpoint = torch.load(args.model, map_location='cpu')
|
||||
model = lossgen.LossGen(*checkpoint['model_args'], **checkpoint['model_kwargs'])
|
||||
model.load_state_dict(checkpoint['state_dict'], strict=False)
|
||||
|
||||
states=None
|
||||
last = torch.zeros((1,1,1))
|
||||
perc = torch.tensor((args.percentage,))[None,None,:]
|
||||
seq = torch.zeros((0,1,1))
|
||||
|
||||
one = torch.ones((1,1,1))
|
||||
zero = torch.zeros((1,1,1))
|
||||
|
||||
if __name__ == '__main__':
|
||||
for i in range(args.length):
|
||||
prob, states = model(last, perc, states=states)
|
||||
prob = torch.sigmoid(prob)
|
||||
states[0] = states[0].detach()
|
||||
states[1] = states[1].detach()
|
||||
loss = one if np.random.rand() < prob else zero
|
||||
last = loss
|
||||
seq = torch.cat([seq, loss])
|
||||
|
||||
np.savetxt(args.output, seq[:,:,0].numpy().astype('int'), fmt='%d')
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user