add some code

This commit is contained in:
2025-09-05 13:25:11 +08:00
parent 9ff0a99e7a
commit 3cf1229a85
8911 changed files with 2535396 additions and 0 deletions

View File

@@ -0,0 +1,24 @@
#
# install
# conda env create -f=LPCNet.yml
#
# update
# conda env update -f=LPCNet.yml
#
# activate
# conda activate LPCNet
#
# remove
# conda remove --name LPCNet --all
#
name: LPCNet
channels:
- anaconda
- conda-forge
dependencies:
- keras==2.2.4
- python>=3.6
- tensorflow-gpu==1.12.0
- cudatoolkit
- h5py
- numpy

View File

@@ -0,0 +1 @@
See README.md

View File

@@ -0,0 +1,126 @@
# LPCNet
Low complexity implementation of the WaveRNN-based LPCNet algorithm, as described in:
- J.-M. Valin, J. Skoglund, [LPCNet: Improving Neural Speech Synthesis Through Linear Prediction](https://jmvalin.ca/papers/lpcnet_icassp2019.pdf), *Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, arXiv:1810.11846, 2019.
- J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet](https://jmvalin.ca/papers/improved_lpcnet.pdf), *Proc. ICASSP*, arxiv:2106.04129, 2022.
- K. Subramani, J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation](https://jmvalin.ca/papers/lpcnet_end2end.pdf), *Proc. INTERSPEECH*, arxiv:2106.04129, 2022.
For coding/PLC applications of LPCNet, see:
- J.-M. Valin, J. Skoglund, [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://jmvalin.ca/papers/lpcnet_codec.pdf), *Proc. INTERSPEECH*, arxiv:1903.12087, 2019.
- J. Skoglund, J.-M. Valin, [Improving Opus Low Bit Rate Quality with Neural Speech Synthesis](https://jmvalin.ca/papers/opusnet.pdf), *Proc. INTERSPEECH*, arxiv:1905.04628, 2020.
- J.-M. Valin, A. Mustafa, C. Montgomery, T.B. Terriberry, M. Klingbeil, P. Smaragdis, A. Krishnaswamy, [Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model](https://jmvalin.ca/papers/lpcnet_plc.pdf), *Proc. INTERSPEECH*, arxiv:2205.05785, 2022.
- J.-M. Valin, J. Büthe, A. Mustafa, [Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder](https://jmvalin.ca/papers/valin_dred.pdf), *Proc. ICASSP*, arXiv:2212.04453, 2023. ([blog post](https://www.amazon.science/blog/neural-encoding-enables-more-efficient-recovery-of-lost-audio-packets))
# Introduction
Work in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 3 GFLOP) with SIMD support (SSE2, SSSE3, AVX, AVX2/FMA, NEON currently supported). The code also supports very low bitrate compression at 1.6 kb/s.
The BSD licensed software is written in C and Python/Keras. For training, a GTX 1080 Ti or better is recommended.
This software is an open source starting point for LPCNet/WaveRNN-based speech synthesis and coding.
# Using the existing software
You can build the code using:
```
./autogen.sh
./configure
make
```
Note that the autogen.sh script is used when building from Git and will automatically download the latest model
(models are too large to put in Git). By default, LPCNet will attempt to use 8-bit dot product instructions on AVX\*/Neon to
speed up inference. To disable that (e.g. to avoid quantization effects when retraining), add --disable-dot-product to the
configure script. LPCNet does not yet have a complete implementation for some of the integer operations on the ARMv7
architecture so for now you will also need --disable-dot-product to successfully compile on 32-bit ARM.
It is highly recommended to set the CFLAGS environment variable to enable AVX or NEON *prior* to running configure, otherwise
no vectorization will take place and the code will be very slow. On a recent x86 CPU, something like
```
export CFLAGS='-Ofast -g -march=native'
```
should work. On ARM, you can enable Neon with:
```
export CFLAGS='-Ofast -g -mfpu=neon'
```
While not strictly required, the -Ofast flag will help with auto-vectorization, especially for dot products that
cannot be optimized without -ffast-math (which -Ofast enables). Additionally, -falign-loops=32 has been shown to
help on x86.
You can test the capabilities of LPCNet using the lpcnet\_demo application. To encode a file:
```
./lpcnet_demo -encode input.pcm compressed.bin
```
where input.pcm is a 16-bit (machine endian) PCM file sampled at 16 kHz. The raw compressed data (no header)
is written to compressed.bin and consists of 8 bytes per 40-ms packet.
To decode:
```
./lpcnet_demo -decode compressed.bin output.pcm
```
where output.pcm is also 16-bit, 16 kHz PCM.
Alternatively, you can run the uncompressed analysis/synthesis using -features
instead of -encode and -synthesis instead of -decode.
The same functionality is available in the form of a library. See include/lpcnet.h for the API.
To try packet loss concealment (PLC), you first need a PLC model, which you can get with:
```
./download_model.sh plc-3b1eab4
```
or (for the PLC challenge submission):
```
./download_model.sh plc_challenge
```
PLC can be tested with:
```
./lpcnet_demo -plc_file noncausal_dc error_pattern.txt input.pcm output.pcm
```
where error_pattern.txt is a text file with one entry per 20-ms packet, with 1 meaning "packet lost" and 0 meaning "packet not lost".
noncausal_dc is the non-causal (5-ms look-ahead) with special handling for DC offsets. It's also possible to use "noncausal", "causal",
or "causal_dc".
# Training a new model
This codebase is also meant for research and it is possible to train new models. These are the steps to do that:
1. Set up a Keras system with GPU.
1. Generate training data:
```
./dump_data -train input.s16 features.f32 data.s16
```
where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.
1. Now that you have your files, train with:
```
python3 training_tf2/train_lpcnet.py features.f32 data.s16 model_name
```
and it will generate an h5 file for each iteration, with model\_name as prefix. If it stops with a
"Failed to allocate RNN reserve space" message try specifying a smaller --batch-size for train\_lpcnet.py.
1. You can synthesise speech with Python and your GPU card (very slow):
```
./dump_data -test test_input.s16 test_features.f32
./training_tf2/test_lpcnet.py lpcnet_model_name.h5 test_features.f32 test.s16
```
1. Or with C on a CPU (C inference is much faster):
First extract the model files nnet\_data.h and nnet\_data.c
```
./training_tf2/dump_lpcnet.py lpcnet_model_name.h5
```
and move the generated nnet\_data.\* files to the src/ directory.
Then you just need to rebuild the software and use lpcnet\_demo as explained above.
# Speech Material for Training
Suitable training material can be obtained from [Open Speech and Language Resources](https://www.openslr.org/). See the datasets.txt file for details on suitable training data.
# Reading Further
1. [LPCNet: DSP-Boosted Neural Speech Synthesis](https://people.xiph.org/~jm/demo/lpcnet/)
1. [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://people.xiph.org/~jm/demo/lpcnet_codec/)
1. Sample model files (check compatibility): https://media.xiph.org/lpcnet/data/

View File

@@ -0,0 +1,449 @@
#include "lace_data.h"
#include "nolace_data.h"
#include "osce.h"
#include "nndsp.h"
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
extern const WeightArray lacelayers_arrays[];
extern const WeightArray nolacelayers_arrays[];
void adaconv_compare(
const char * prefix,
int num_frames,
AdaConvState* hAdaConv,
LinearLayer *kernel_layer,
LinearLayer *gain_layer,
int feature_dim,
int frame_size,
int overlap_size,
int in_channels,
int out_channels,
int kernel_size,
int left_padding,
float filter_gain_a,
float filter_gain_b,
float shape_gain
)
{
char feature_file[256];
char x_in_file[256];
char x_out_file[256];
char message[512];
int i_frame, i_sample;
float mse;
float features[512];
float x_in[512];
float x_out_ref[512];
float x_out[512];
float window[40];
init_adaconv_state(hAdaConv);
compute_overlap_window(window, 40);
FILE *f_features, *f_x_in, *f_x_out;
strcpy(feature_file, prefix);
strcat(feature_file, "_features.f32");
f_features = fopen(feature_file, "r");
if (f_features == NULL)
{
sprintf(message, "could not open file %s", feature_file);
perror(message);
exit(1);
}
strcpy(x_in_file, prefix);
strcat(x_in_file, "_x_in.f32");
f_x_in = fopen(x_in_file, "r");
if (f_x_in == NULL)
{
sprintf(message, "could not open file %s", x_in_file);
perror(message);
exit(1);
}
strcpy(x_out_file, prefix);
strcat(x_out_file, "_x_out.f32");
f_x_out = fopen(x_out_file, "r");
if (f_x_out == NULL)
{
sprintf(message, "could not open file %s", x_out_file);
perror(message);
exit(1);
}
for (i_frame = 0; i_frame < num_frames; i_frame ++)
{
if (fread(features, sizeof(float), feature_dim, f_features) != feature_dim)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, feature_file);
exit(1);
}
if (fread(x_in, sizeof(float), frame_size * in_channels, f_x_in) != frame_size * in_channels)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_in_file);
exit(1);
}
if (fread(x_out_ref, sizeof(float), frame_size * out_channels, f_x_out) != frame_size * out_channels)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_out_file);
exit(1);
}
adaconv_process_frame(hAdaConv, x_out, x_in, features, kernel_layer, gain_layer, feature_dim,
frame_size, overlap_size, in_channels, out_channels, kernel_size, left_padding,
filter_gain_a, filter_gain_b, shape_gain, window, 0);
mse = 0;
for (i_sample = 0; i_sample < frame_size * out_channels; i_sample ++)
{
mse += pow(x_out_ref[i_sample] - x_out[i_sample], 2);
}
mse = sqrt(mse / (frame_size * out_channels));
printf("rmse[%d] %f\n", i_frame, mse);
}
}
void adacomb_compare(
const char * prefix,
int num_frames,
AdaCombState* hAdaComb,
LinearLayer *kernel_layer,
LinearLayer *gain_layer,
LinearLayer *global_gain_layer,
int feature_dim,
int frame_size,
int overlap_size,
int kernel_size,
int left_padding,
float filter_gain_a,
float filter_gain_b,
float log_gain_limit
)
{
char feature_file[256];
char x_in_file[256];
char p_in_file[256];
char x_out_file[256];
char message[512];
int i_frame, i_sample;
float mse;
float features[512];
float x_in[512];
float x_out_ref[512];
float x_out[512];
int pitch_lag;
float window[40];
init_adacomb_state(hAdaComb);
compute_overlap_window(window, 40);
FILE *f_features, *f_x_in, *f_p_in, *f_x_out;
strcpy(feature_file, prefix);
strcat(feature_file, "_features.f32");
f_features = fopen(feature_file, "r");
if (f_features == NULL)
{
sprintf(message, "could not open file %s", feature_file);
perror(message);
exit(1);
}
strcpy(x_in_file, prefix);
strcat(x_in_file, "_x_in.f32");
f_x_in = fopen(x_in_file, "r");
if (f_x_in == NULL)
{
sprintf(message, "could not open file %s", x_in_file);
perror(message);
exit(1);
}
strcpy(p_in_file, prefix);
strcat(p_in_file, "_p_in.s32");
f_p_in = fopen(p_in_file, "r");
if (f_p_in == NULL)
{
sprintf(message, "could not open file %s", p_in_file);
perror(message);
exit(1);
}
strcpy(x_out_file, prefix);
strcat(x_out_file, "_x_out.f32");
f_x_out = fopen(x_out_file, "r");
if (f_x_out == NULL)
{
sprintf(message, "could not open file %s", x_out_file);
perror(message);
exit(1);
}
for (i_frame = 0; i_frame < num_frames; i_frame ++)
{
if (fread(features, sizeof(float), feature_dim, f_features) != feature_dim)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, feature_file);
exit(1);
}
if (fread(x_in, sizeof(float), frame_size, f_x_in) != frame_size)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_in_file);
exit(1);
}
if (fread(&pitch_lag, sizeof(int), 1, f_p_in) != 1)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, p_in_file);
exit(1);
}
if (fread(x_out_ref, sizeof(float), frame_size, f_x_out) != frame_size)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_out_file);
exit(1);
}
adacomb_process_frame(hAdaComb, x_out, x_in, features, kernel_layer, gain_layer, global_gain_layer,
pitch_lag, feature_dim, frame_size, overlap_size, kernel_size, left_padding, filter_gain_a, filter_gain_b, log_gain_limit, window, 0);
mse = 0;
for (i_sample = 0; i_sample < frame_size; i_sample ++)
{
mse += pow(x_out_ref[i_sample] - x_out[i_sample], 2);
}
mse = sqrt(mse / (frame_size));
printf("rmse[%d] %f\n", i_frame, mse);
}
}
void adashape_compare(
const char * prefix,
int num_frames,
AdaShapeState* hAdaShape,
LinearLayer *alpha1,
LinearLayer *alpha2,
int feature_dim,
int frame_size,
int avg_pool_k
)
{
char feature_file[256];
char x_in_file[256];
char x_out_file[256];
char message[512];
int i_frame, i_sample;
float mse;
float features[512];
float x_in[512];
float x_out_ref[512];
float x_out[512];
init_adashape_state(hAdaShape);
FILE *f_features, *f_x_in, *f_x_out;
strcpy(feature_file, prefix);
strcat(feature_file, "_features.f32");
f_features = fopen(feature_file, "r");
if (f_features == NULL)
{
sprintf(message, "could not open file %s", feature_file);
perror(message);
exit(1);
}
strcpy(x_in_file, prefix);
strcat(x_in_file, "_x_in.f32");
f_x_in = fopen(x_in_file, "r");
if (f_x_in == NULL)
{
sprintf(message, "could not open file %s", x_in_file);
perror(message);
exit(1);
}
strcpy(x_out_file, prefix);
strcat(x_out_file, "_x_out.f32");
f_x_out = fopen(x_out_file, "r");
if (f_x_out == NULL)
{
sprintf(message, "could not open file %s", x_out_file);
perror(message);
exit(1);
}
for (i_frame = 0; i_frame < num_frames; i_frame ++)
{
if (fread(features, sizeof(float), feature_dim, f_features) != feature_dim)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, feature_file);
exit(1);
}
if (fread(x_in, sizeof(float), frame_size, f_x_in) != frame_size)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_in_file);
exit(1);
}
if (fread(x_out_ref, sizeof(float), frame_size, f_x_out) != frame_size)
{
fprintf(stderr, "could not read frame %d from %s\n", i_frame, x_out_file);
exit(1);
}
adashape_process_frame(hAdaShape, x_out, x_in, features, alpha1, alpha2, feature_dim,
frame_size, avg_pool_k, 0);
mse = 0;
for (i_sample = 0; i_sample < frame_size; i_sample ++)
{
mse += pow(x_out_ref[i_sample] - x_out[i_sample], 2);
}
mse = sqrt(mse / (frame_size));
printf("rmse[%d] %f\n", i_frame, mse);
}
}
int main()
{
LACELayers hLACE;
NOLACELayers hNoLACE;
AdaConvState hAdaConv;
AdaCombState hAdaComb;
AdaShapeState hAdaShape;
init_adaconv_state(&hAdaConv);
init_lacelayers(&hLACE, lacelayers_arrays);
init_nolacelayers(&hNoLACE, nolacelayers_arrays);
printf("\ntesting lace.af1 (1 in, 1 out)...\n");
adaconv_compare(
"testvectors/lace_af1",
5,
&hAdaConv,
&hLACE.lace_af1_kernel,
&hLACE.lace_af1_gain,
LACE_AF1_FEATURE_DIM,
LACE_AF1_FRAME_SIZE,
LACE_AF1_OVERLAP_SIZE,
LACE_AF1_IN_CHANNELS,
LACE_AF1_OUT_CHANNELS,
LACE_AF1_KERNEL_SIZE,
LACE_AF1_LEFT_PADDING,
LACE_AF1_FILTER_GAIN_A,
LACE_AF1_FILTER_GAIN_B,
LACE_AF1_SHAPE_GAIN
);
printf("\ntesting nolace.af1 (1 in, 2 out)...\n");
adaconv_compare(
"testvectors/nolace_af1",
5,
&hAdaConv,
&hNoLACE.nolace_af1_kernel,
&hNoLACE.nolace_af1_gain,
NOLACE_AF1_FEATURE_DIM,
NOLACE_AF1_FRAME_SIZE,
NOLACE_AF1_OVERLAP_SIZE,
NOLACE_AF1_IN_CHANNELS,
NOLACE_AF1_OUT_CHANNELS,
NOLACE_AF1_KERNEL_SIZE,
NOLACE_AF1_LEFT_PADDING,
NOLACE_AF1_FILTER_GAIN_A,
NOLACE_AF1_FILTER_GAIN_B,
NOLACE_AF1_SHAPE_GAIN
);
printf("testing nolace.af4 (2 in, 1 out)...\n");
adaconv_compare(
"testvectors/nolace_af4",
5,
&hAdaConv,
&hNoLACE.nolace_af4_kernel,
&hNoLACE.nolace_af4_gain,
NOLACE_AF4_FEATURE_DIM,
NOLACE_AF4_FRAME_SIZE,
NOLACE_AF4_OVERLAP_SIZE,
NOLACE_AF4_IN_CHANNELS,
NOLACE_AF4_OUT_CHANNELS,
NOLACE_AF4_KERNEL_SIZE,
NOLACE_AF4_LEFT_PADDING,
NOLACE_AF4_FILTER_GAIN_A,
NOLACE_AF4_FILTER_GAIN_B,
NOLACE_AF4_SHAPE_GAIN
);
printf("\ntesting nolace.af2 (2 in, 2 out)...\n");
adaconv_compare(
"testvectors/nolace_af2",
5,
&hAdaConv,
&hNoLACE.nolace_af2_kernel,
&hNoLACE.nolace_af2_gain,
NOLACE_AF2_FEATURE_DIM,
NOLACE_AF2_FRAME_SIZE,
NOLACE_AF2_OVERLAP_SIZE,
NOLACE_AF2_IN_CHANNELS,
NOLACE_AF2_OUT_CHANNELS,
NOLACE_AF2_KERNEL_SIZE,
NOLACE_AF2_LEFT_PADDING,
NOLACE_AF2_FILTER_GAIN_A,
NOLACE_AF2_FILTER_GAIN_B,
NOLACE_AF2_SHAPE_GAIN
);
printf("\ntesting lace.cf1...\n");
adacomb_compare(
"testvectors/lace_cf1",
5,
&hAdaComb,
&hLACE.lace_cf1_kernel,
&hLACE.lace_cf1_gain,
&hLACE.lace_cf1_global_gain,
LACE_CF1_FEATURE_DIM,
LACE_CF1_FRAME_SIZE,
LACE_CF1_OVERLAP_SIZE,
LACE_CF1_KERNEL_SIZE,
LACE_CF1_LEFT_PADDING,
LACE_CF1_FILTER_GAIN_A,
LACE_CF1_FILTER_GAIN_B,
LACE_CF1_LOG_GAIN_LIMIT
);
printf("\ntesting nolace.tdshape1...\n");
adashape_compare(
"testvectors/nolace_tdshape1",
5,
&hAdaShape,
&hNoLACE.nolace_tdshape1_alpha1,
&hNoLACE.nolace_tdshape1_alpha2,
NOLACE_TDSHAPE1_FEATURE_DIM,
NOLACE_TDSHAPE1_FRAME_SIZE,
NOLACE_TDSHAPE1_AVG_POOL_K
);
return 0;
}
/* gcc -DVAR_ARRAYS -DENABLE_OSCE -I ../include -I ../silk -I . -I ../celt adaconvtest.c nndsp.c lace_data.c nolace_data.c nnet.c nnet_default.c ../celt/pitch.c ../celt/celt_lpc.c parse_lpcnet_weights.c -lm -o adaconvtest */

View File

@@ -0,0 +1,88 @@
/* Copyright (c) 2018-2019 Mozilla
2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "arm/armcpu.h"
#include "nnet.h"
#if defined(OPUS_HAVE_RTCD)
#if (defined(OPUS_ARM_MAY_HAVE_DOTPROD) && !defined(OPUS_ARM_PRESUME_DOTPROD))
void (*const DNN_COMPUTE_LINEAR_IMPL[OPUS_ARCHMASK + 1])(
const LinearLayer *linear,
float *out,
const float *in
) = {
compute_linear_c, /* default */
compute_linear_c,
compute_linear_c,
MAY_HAVE_NEON(compute_linear), /* neon */
MAY_HAVE_DOTPROD(compute_linear) /* dotprod */
};
#endif
#if (defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON)) && !defined(OPUS_ARM_PRESUME_NEON)
void (*const DNN_COMPUTE_ACTIVATION_IMPL[OPUS_ARCHMASK + 1])(
float *output,
const float *input,
int N,
int activation
) = {
compute_activation_c, /* default */
compute_activation_c,
compute_activation_c,
MAY_HAVE_NEON(compute_activation), /* neon */
MAY_HAVE_DOTPROD(compute_activation) /* dotprod */
};
void (*const DNN_COMPUTE_CONV2D_IMPL[OPUS_ARCHMASK + 1])(
const Conv2dLayer *conv,
float *out,
float *mem,
const float *in,
int height,
int hstride,
int activation
) = {
compute_conv2d_c, /* default */
compute_conv2d_c,
compute_conv2d_c,
MAY_HAVE_NEON(compute_conv2d), /* neon */
MAY_HAVE_DOTPROD(compute_conv2d) /* dotprod */
};
#endif
#endif

View File

@@ -0,0 +1,104 @@
/* Copyright (c) 2011-2019 Mozilla
2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DNN_ARM_H
#define DNN_ARM_H
#include "cpu_support.h"
#include "opus_types.h"
void compute_linear_dotprod(const LinearLayer *linear, float *out, const float *in);
void compute_linear_neon(const LinearLayer *linear, float *out, const float *in);
void compute_activation_neon(float *output, const float *input, int N, int activation);
void compute_activation_dotprod(float *output, const float *input, int N, int activation);
void compute_conv2d_neon(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation);
void compute_conv2d_dotprod(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation);
#if defined(OPUS_ARM_PRESUME_DOTPROD)
#define OVERRIDE_COMPUTE_LINEAR
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_dotprod(linear, out, in))
#elif defined(OPUS_ARM_PRESUME_NEON_INTR) && !defined(OPUS_ARM_MAY_HAVE_DOTPROD)
#define OVERRIDE_COMPUTE_LINEAR
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_neon(linear, out, in))
#elif defined(OPUS_HAVE_RTCD) && (defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON))
extern void (*const DNN_COMPUTE_LINEAR_IMPL[OPUS_ARCHMASK + 1])(
const LinearLayer *linear,
float *out,
const float *in
);
#define OVERRIDE_COMPUTE_LINEAR
#define compute_linear(linear, out, in, arch) \
((*DNN_COMPUTE_LINEAR_IMPL[(arch) & OPUS_ARCHMASK])(linear, out, in))
#endif
#if defined(OPUS_ARM_PRESUME_NEON)
#define OVERRIDE_COMPUTE_ACTIVATION
#define compute_activation(output, input, N, activation, arch) ((void)(arch),compute_activation_neon(output, input, N, activation))
#define OVERRIDE_COMPUTE_CONV2D
#define compute_conv2d(conv, out, mem, in, height, hstride, activation, arch) ((void)(arch),compute_conv2d_neon(conv, out, mem, in, height, hstride, activation))
#elif defined(OPUS_HAVE_RTCD) && (defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON))
extern void (*const DNN_COMPUTE_ACTIVATION_IMPL[OPUS_ARCHMASK + 1])(
float *output,
const float *input,
int N,
int activation
);
#define OVERRIDE_COMPUTE_ACTIVATION
#define compute_activation(output, input, N, activation, arch) \
((*DNN_COMPUTE_ACTIVATION_IMPL[(arch) & OPUS_ARCHMASK])(output, input, N, activation))
extern void (*const DNN_COMPUTE_CONV2D_IMPL[OPUS_ARCHMASK + 1])(
const Conv2dLayer *conv,
float *out,
float *mem,
const float *in,
int height,
int hstride,
int activation
);
#define OVERRIDE_COMPUTE_CONV2D
#define compute_conv2d(conv, out, mem, in, height, hstride, activation, arch) \
((*DNN_COMPUTE_CONV2D_IMPL[(arch) & OPUS_ARCHMASK])(conv, out, mem, in, height, hstride, activation))
#endif
#endif /* DNN_ARM_H */

View File

@@ -0,0 +1,38 @@
/* Copyright (c) 2018-2019 Mozilla
2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#ifndef __ARM_FEATURE_DOTPROD
#error nnet_dotprod.c is being compiled without DOTPROD enabled
#endif
#define RTCD_ARCH dotprod
#include "nnet_arch.h"

View File

@@ -0,0 +1,38 @@
/* Copyright (c) 2018-2019 Mozilla
2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#if !(defined(__ARM_NEON__) || defined(__ARM_NEON))
#error nnet_neon.c is being compiled without Neon enabled
#endif
#define RTCD_ARCH neon
#include "nnet_arch.h"

View File

@@ -0,0 +1,246 @@
/***********************************************************************
Copyright (c) 2006-2011, Skype Limited. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of Internet Society, IETF or IETF Trust, nor the
names of specific contributors, may be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
***********************************************************************/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <math.h>
#include <string.h>
#include <assert.h>
#include "arch.h"
#include "burg.h"
#define MAX_FRAME_SIZE 384 /* subfr_length * nb_subfr = ( 0.005 * 16000 + 16 ) * 4 = 384*/
#define SILK_MAX_ORDER_LPC 16
#define FIND_LPC_COND_FAC 1e-5f
/* sum of squares of a silk_float array, with result as double */
static double silk_energy_FLP(
const float *data,
int dataSize
)
{
int i;
double result;
/* 4x unrolled loop */
result = 0.0;
for( i = 0; i < dataSize - 3; i += 4 ) {
result += data[ i + 0 ] * (double)data[ i + 0 ] +
data[ i + 1 ] * (double)data[ i + 1 ] +
data[ i + 2 ] * (double)data[ i + 2 ] +
data[ i + 3 ] * (double)data[ i + 3 ];
}
/* add any remaining products */
for( ; i < dataSize; i++ ) {
result += data[ i ] * (double)data[ i ];
}
assert( result >= 0.0 );
return result;
}
/* inner product of two silk_float arrays, with result as double */
static double silk_inner_product_FLP(
const float *data1,
const float *data2,
int dataSize
)
{
int i;
double result;
/* 4x unrolled loop */
result = 0.0;
for( i = 0; i < dataSize - 3; i += 4 ) {
result += data1[ i + 0 ] * (double)data2[ i + 0 ] +
data1[ i + 1 ] * (double)data2[ i + 1 ] +
data1[ i + 2 ] * (double)data2[ i + 2 ] +
data1[ i + 3 ] * (double)data2[ i + 3 ];
}
/* add any remaining products */
for( ; i < dataSize; i++ ) {
result += data1[ i ] * (double)data2[ i ];
}
return result;
}
/* Compute reflection coefficients from input signal */
float silk_burg_analysis( /* O returns residual energy */
float A[], /* O prediction coefficients (length order) */
const float x[], /* I input signal, length: nb_subfr*(D+L_sub) */
const float minInvGain, /* I minimum inverse prediction gain */
const int subfr_length, /* I input signal subframe length (incl. D preceding samples) */
const int nb_subfr, /* I number of subframes stacked in x */
const int D /* I order */
)
{
int k, n, s, reached_max_gain;
double C0, invGain, num, nrg_f, nrg_b, rc, Atmp, tmp1, tmp2;
const float *x_ptr;
double C_first_row[ SILK_MAX_ORDER_LPC ], C_last_row[ SILK_MAX_ORDER_LPC ];
double CAf[ SILK_MAX_ORDER_LPC + 1 ], CAb[ SILK_MAX_ORDER_LPC + 1 ];
double Af[ SILK_MAX_ORDER_LPC ];
assert( subfr_length * nb_subfr <= MAX_FRAME_SIZE );
/* Compute autocorrelations, added over subframes */
C0 = silk_energy_FLP( x, nb_subfr * subfr_length );
memset( C_first_row, 0, SILK_MAX_ORDER_LPC * sizeof( double ) );
for( s = 0; s < nb_subfr; s++ ) {
x_ptr = x + s * subfr_length;
for( n = 1; n < D + 1; n++ ) {
C_first_row[ n - 1 ] += silk_inner_product_FLP( x_ptr, x_ptr + n, subfr_length - n );
}
}
memcpy( C_last_row, C_first_row, SILK_MAX_ORDER_LPC * sizeof( double ) );
/* Initialize */
CAb[ 0 ] = CAf[ 0 ] = C0 + FIND_LPC_COND_FAC * C0 + 1e-9f;
invGain = 1.0f;
reached_max_gain = 0;
for( n = 0; n < D; n++ ) {
/* Update first row of correlation matrix (without first element) */
/* Update last row of correlation matrix (without last element, stored in reversed order) */
/* Update C * Af */
/* Update C * flipud(Af) (stored in reversed order) */
for( s = 0; s < nb_subfr; s++ ) {
x_ptr = x + s * subfr_length;
tmp1 = x_ptr[ n ];
tmp2 = x_ptr[ subfr_length - n - 1 ];
for( k = 0; k < n; k++ ) {
C_first_row[ k ] -= x_ptr[ n ] * x_ptr[ n - k - 1 ];
C_last_row[ k ] -= x_ptr[ subfr_length - n - 1 ] * x_ptr[ subfr_length - n + k ];
Atmp = Af[ k ];
tmp1 += x_ptr[ n - k - 1 ] * Atmp;
tmp2 += x_ptr[ subfr_length - n + k ] * Atmp;
}
for( k = 0; k <= n; k++ ) {
CAf[ k ] -= tmp1 * x_ptr[ n - k ];
CAb[ k ] -= tmp2 * x_ptr[ subfr_length - n + k - 1 ];
}
}
tmp1 = C_first_row[ n ];
tmp2 = C_last_row[ n ];
for( k = 0; k < n; k++ ) {
Atmp = Af[ k ];
tmp1 += C_last_row[ n - k - 1 ] * Atmp;
tmp2 += C_first_row[ n - k - 1 ] * Atmp;
}
CAf[ n + 1 ] = tmp1;
CAb[ n + 1 ] = tmp2;
/* Calculate nominator and denominator for the next order reflection (parcor) coefficient */
num = CAb[ n + 1 ];
nrg_b = CAb[ 0 ];
nrg_f = CAf[ 0 ];
for( k = 0; k < n; k++ ) {
Atmp = Af[ k ];
num += CAb[ n - k ] * Atmp;
nrg_b += CAb[ k + 1 ] * Atmp;
nrg_f += CAf[ k + 1 ] * Atmp;
}
assert( nrg_f > 0.0 );
assert( nrg_b > 0.0 );
/* Calculate the next order reflection (parcor) coefficient */
rc = -2.0 * num / ( nrg_f + nrg_b );
assert( rc > -1.0 && rc < 1.0 );
/* Update inverse prediction gain */
tmp1 = invGain * ( 1.0 - rc * rc );
if( tmp1 <= minInvGain ) {
/* Max prediction gain exceeded; set reflection coefficient such that max prediction gain is exactly hit */
rc = sqrt( 1.0 - minInvGain / invGain );
if( num > 0 ) {
/* Ensure adjusted reflection coefficients has the original sign */
rc = -rc;
}
invGain = minInvGain;
reached_max_gain = 1;
} else {
invGain = tmp1;
}
/* Update the AR coefficients */
for( k = 0; k < (n + 1) >> 1; k++ ) {
tmp1 = Af[ k ];
tmp2 = Af[ n - k - 1 ];
Af[ k ] = tmp1 + rc * tmp2;
Af[ n - k - 1 ] = tmp2 + rc * tmp1;
}
Af[ n ] = rc;
if( reached_max_gain ) {
/* Reached max prediction gain; set remaining coefficients to zero and exit loop */
for( k = n + 1; k < D; k++ ) {
Af[ k ] = 0.0;
}
break;
}
/* Update C * Af and C * Ab */
for( k = 0; k <= n + 1; k++ ) {
tmp1 = CAf[ k ];
CAf[ k ] += rc * CAb[ n - k + 1 ];
CAb[ n - k + 1 ] += rc * tmp1;
}
}
if( reached_max_gain ) {
/* Convert to float */
for( k = 0; k < D; k++ ) {
A[ k ] = (float)( -Af[ k ] );
}
/* Subtract energy of preceding samples from C0 */
for( s = 0; s < nb_subfr; s++ ) {
C0 -= silk_energy_FLP( x + s * subfr_length, D );
}
/* Approximate residual energy */
nrg_f = C0 * invGain;
} else {
/* Compute residual energy and store coefficients as float */
nrg_f = CAf[ 0 ];
tmp1 = 1.0;
for( k = 0; k < D; k++ ) {
Atmp = Af[ k ];
nrg_f += CAf[ k + 1 ] * Atmp;
tmp1 += Atmp * Atmp;
A[ k ] = (float)(-Atmp);
}
nrg_f -= FIND_LPC_COND_FAC * C0 * tmp1;
}
/* Return residual energy */
return MAX32(0, (float)nrg_f);
}

View File

@@ -0,0 +1,41 @@
/***********************************************************************
Copyright (c) 2006-2011, Skype Limited. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of Internet Society, IETF or IETF Trust, nor the
names of specific contributors, may be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
***********************************************************************/
#ifndef BURG_H
#define BURG_H
float silk_burg_analysis( /* O returns residual energy */
float A[], /* O prediction coefficients (length order) */
const float x[], /* I input signal, length: nb_subfr*(D+L_sub) */
const float minInvGain, /* I minimum inverse prediction gain */
const int subfr_length, /* I input signal subframe length (incl. D preceding samples) */
const int nb_subfr, /* I number of subframes stacked in x */
const int D /* I order */
);
#endif

View File

@@ -0,0 +1,56 @@
#ifndef COMMON_H
#define COMMON_H
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "opus_defines.h"
#define LOG256 5.5451774445f
static OPUS_INLINE float log2_approx(float x)
{
int integer;
float frac;
union {
float f;
int i;
} in;
in.f = x;
integer = (in.i>>23)-127;
in.i -= integer<<23;
frac = in.f - 1.5f;
frac = -0.41445418f + frac*(0.95909232f
+ frac*(-0.33951290f + frac*0.16541097f));
return 1+integer+frac;
}
#define log_approx(x) (0.69315f*log2_approx(x))
static OPUS_INLINE float ulaw2lin(float u)
{
float s;
float scale_1 = 32768.f/255.f;
u = u - 128.f;
s = u >= 0.f ? 1.f : -1.f;
u = fabs(u);
return s*scale_1*(exp(u/128.*LOG256)-1);
}
static OPUS_INLINE int lin2ulaw(float x)
{
float u;
float scale = 255.f/32768.f;
int s = x >= 0 ? 1 : -1;
x = fabs(x);
u = (s*(128*log_approx(1+scale*x)/LOG256));
u = 128 + u;
if (u < 0) u = 0;
if (u > 255) u = 255;
return (int)floor(.5 + u);
}
#endif

View File

@@ -0,0 +1,163 @@
The following datasets can be used to train a language-independent FARGAN model
and a Deep REDundancy (DRED) model. Note that this data typically needs to be
resampled before it can be used.
https://www.openslr.org/resources/30/si_lk.tar.gz
https://www.openslr.org/resources/32/af_za.tar.gz
https://www.openslr.org/resources/32/st_za.tar.gz
https://www.openslr.org/resources/32/tn_za.tar.gz
https://www.openslr.org/resources/32/xh_za.tar.gz
https://www.openslr.org/resources/37/bn_bd.zip
https://www.openslr.org/resources/37/bn_in.zip
https://www.openslr.org/resources/41/jv_id_female.zip
https://www.openslr.org/resources/41/jv_id_male.zip
https://www.openslr.org/resources/42/km_kh_male.zip
https://www.openslr.org/resources/43/ne_np_female.zip
https://www.openslr.org/resources/44/su_id_female.zip
https://www.openslr.org/resources/44/su_id_male.zip
https://www.openslr.org/resources/61/es_ar_female.zip
https://www.openslr.org/resources/61/es_ar_male.zip
https://www.openslr.org/resources/63/ml_in_female.zip
https://www.openslr.org/resources/63/ml_in_male.zip
https://www.openslr.org/resources/64/mr_in_female.zip
https://www.openslr.org/resources/65/ta_in_female.zip
https://www.openslr.org/resources/65/ta_in_male.zip
https://www.openslr.org/resources/66/te_in_female.zip
https://www.openslr.org/resources/66/te_in_male.zip
https://www.openslr.org/resources/69/ca_es_female.zip
https://www.openslr.org/resources/69/ca_es_male.zip
https://www.openslr.org/resources/70/en_ng_female.zip
https://www.openslr.org/resources/70/en_ng_male.zip
https://www.openslr.org/resources/71/es_cl_female.zip
https://www.openslr.org/resources/71/es_cl_male.zip
https://www.openslr.org/resources/72/es_co_female.zip
https://www.openslr.org/resources/72/es_co_male.zip
https://www.openslr.org/resources/73/es_pe_female.zip
https://www.openslr.org/resources/73/es_pe_male.zip
https://www.openslr.org/resources/74/es_pr_female.zip
https://www.openslr.org/resources/75/es_ve_female.zip
https://www.openslr.org/resources/75/es_ve_male.zip
https://www.openslr.org/resources/76/eu_es_female.zip
https://www.openslr.org/resources/76/eu_es_male.zip
https://www.openslr.org/resources/77/gl_es_female.zip
https://www.openslr.org/resources/77/gl_es_male.zip
https://www.openslr.org/resources/78/gu_in_female.zip
https://www.openslr.org/resources/78/gu_in_male.zip
https://www.openslr.org/resources/79/kn_in_female.zip
https://www.openslr.org/resources/79/kn_in_male.zip
https://www.openslr.org/resources/80/my_mm_female.zip
https://www.openslr.org/resources/83/irish_english_male.zip
https://www.openslr.org/resources/83/midlands_english_female.zip
https://www.openslr.org/resources/83/midlands_english_male.zip
https://www.openslr.org/resources/83/northern_english_female.zip
https://www.openslr.org/resources/83/northern_english_male.zip
https://www.openslr.org/resources/83/scottish_english_female.zip
https://www.openslr.org/resources/83/scottish_english_male.zip
https://www.openslr.org/resources/83/southern_english_female.zip
https://www.openslr.org/resources/83/southern_english_male.zip
https://www.openslr.org/resources/83/welsh_english_female.zip
https://www.openslr.org/resources/83/welsh_english_male.zip
https://www.openslr.org/resources/86/yo_ng_female.zip
https://www.openslr.org/resources/86/yo_ng_male.zip
The corresponding citations for all these datasets are:
@inproceedings{demirsahin-etal-2020-open,
title = {{Open-source Multi-speaker Corpora of the English Accents in the British Isles}},
author = {Demirsahin, Isin and Kjartansson, Oddur and Gutkin, Alexander and Rivera, Clara},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
month = may,
year = {2020},
pages = {6532--6541},
address = {Marseille, France},
publisher = {European Language Resources Association (ELRA)},
url = {https://www.aclweb.org/anthology/2020.lrec-1.804},
ISBN = {979-10-95546-34-4},
}
@inproceedings{kjartansson-etal-2020-open,
title = {{Open-Source High Quality Speech Datasets for Basque, Catalan and Galician}},
author = {Kjartansson, Oddur and Gutkin, Alexander and Butryna, Alena and Demirsahin, Isin and Rivera, Clara},
booktitle = {Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)},
year = {2020},
pages = {21--27},
month = may,
address = {Marseille, France},
publisher = {European Language Resources association (ELRA)},
url = {https://www.aclweb.org/anthology/2020.sltu-1.3},
ISBN = {979-10-95546-35-1},
}
@inproceedings{guevara-rukoz-etal-2020-crowdsourcing,
title = {{Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech}},
author = {Guevara-Rukoz, Adriana and Demirsahin, Isin and He, Fei and Chu, Shan-Hui Cathy and Sarin, Supheakmungkol and Pipatsrisawat, Knot and Gutkin, Alexander and Butryna, Alena and Kjartansson, Oddur},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
year = {2020},
month = may,
address = {Marseille, France},
publisher = {European Language Resources Association (ELRA)},
url = {https://www.aclweb.org/anthology/2020.lrec-1.801},
pages = {6504--6513},
ISBN = {979-10-95546-34-4},
}
@inproceedings{he-etal-2020-open,
title = {{Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems}},
author = {He, Fei and Chu, Shan-Hui Cathy and Kjartansson, Oddur and Rivera, Clara and Katanova, Anna and Gutkin, Alexander and Demirsahin, Isin and Johny, Cibu and Jansche, Martin and Sarin, Supheakmungkol and Pipatsrisawat, Knot},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
month = may,
year = {2020},
address = {Marseille, France},
publisher = {European Language Resources Association (ELRA)},
pages = {6494--6503},
url = {https://www.aclweb.org/anthology/2020.lrec-1.800},
ISBN = "{979-10-95546-34-4}",
}
@inproceedings{kjartansson-etal-tts-sltu2018,
title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}},
author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin},
booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)},
year = {2018},
address = {Gurugram, India},
month = aug,
pages = {66--70},
URL = {http://dx.doi.org/10.21437/SLTU.2018-14}
}
@inproceedings{oo-etal-2020-burmese,
title = {{Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech}},
author = {Oo, Yin May and Wattanavekin, Theeraphol and Li, Chenfang and De Silva, Pasindu and Sarin, Supheakmungkol and Pipatsrisawat, Knot and Jansche, Martin and Kjartansson, Oddur and Gutkin, Alexander},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
month = may,
year = {2020},
pages = "6328--6339",
address = {Marseille, France},
publisher = {European Language Resources Association (ELRA)},
url = {https://www.aclweb.org/anthology/2020.lrec-1.777},
ISBN = {979-10-95546-34-4},
}
@inproceedings{van-niekerk-etal-2017,
title = {{Rapid development of TTS corpora for four South African languages}},
author = {Daniel van Niekerk and Charl van Heerden and Marelie Davel and Neil Kleynhans and Oddur Kjartansson and Martin Jansche and Linne Ha},
booktitle = {Proc. Interspeech 2017},
pages = {2178--2182},
address = {Stockholm, Sweden},
month = aug,
year = {2017},
URL = {http://dx.doi.org/10.21437/Interspeech.2017-1139}
}
@inproceedings{gutkin-et-al-yoruba2020,
title = {{Developing an Open-Source Corpus of Yoruba Speech}},
author = {Alexander Gutkin and I{\c{s}}{\i}n Demir{\c{s}}ahin and Oddur Kjartansson and Clara Rivera and K\d{\'o}lá Túb\d{\`o}sún},
booktitle = {Proceedings of Interspeech 2020},
pages = {404--408},
month = {October},
year = {2020},
address = {Shanghai, China},
publisher = {International Speech and Communication Association (ISCA)},
doi = {10.21437/Interspeech.2020-1096},
url = {http://dx.doi.org/10.21437/Interspeech.2020-1096},
}

View File

@@ -0,0 +1,9 @@
@echo off
set model=opus_data-%1.tar.gz
if not exist %model% (
echo Downloading latest model
powershell -Command "(New-Object System.Net.WebClient).DownloadFile('https://media.xiph.org/opus/models/%model%', '%model%')"
)
tar -xvzf %model%

View File

@@ -0,0 +1,30 @@
#!/bin/sh
set -e
model=opus_data-$1.tar.gz
if [ ! -f $model ]; then
echo "Downloading latest model"
wget https://media.xiph.org/opus/models/$model
fi
if command -v sha256sum
then
echo "Validating checksum"
checksum="$1"
checksum2=$(sha256sum $model | awk '{print $1}')
if [ "$checksum" != "$checksum2" ]
then
echo "Aborting due to mismatching checksums. This could be caused by a corrupted download of $model."
echo "Consider deleting local copy of $model and running this script again."
exit 1
else
echo "checksums match"
fi
else
echo "Could not find sha256 sum; skipping verification. Please verify manually that sha256 hash of ${model} matches ${1}."
fi
tar xvomf $model

View File

@@ -0,0 +1,44 @@
/* Copyright (c) 2022 Amazon
Written by Jean-Marc Valin */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <math.h>
#include "celt/entenc.h"
#include "os_support.h"
#include "dred_config.h"
#include "dred_coding.h"
int compute_quantizer(int q0, int dQ, int qmax, int i) {
int quant;
static const int dQ_table[8] = {0, 2, 3, 4, 6, 8, 12, 16};
quant = q0 + (dQ_table[dQ]*i + 8)/16;
return quant > qmax ? qmax : quant;
}

View File

@@ -0,0 +1,36 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DRED_CODING_H
#define DRED_CODING_H
#include "opus_types.h"
#include "entcode.h"
int compute_quantizer(int q0, int dQ, int qmax, int i);
#endif

View File

@@ -0,0 +1,54 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DRED_CONFIG_H
#define DRED_CONFIG_H
/* Change this once DRED gets an extension number assigned. */
#define DRED_EXTENSION_ID 126
/* Remove these two completely once DRED gets an extension number assigned. */
#define DRED_EXPERIMENTAL_VERSION 10
#define DRED_EXPERIMENTAL_BYTES 2
#define DRED_MIN_BYTES 8
/* these are inpart duplicates to the values defined in dred_rdovae_constants.h */
#define DRED_SILK_ENCODER_DELAY (79+12-80)
#define DRED_FRAME_SIZE 160
#define DRED_DFRAME_SIZE (2 * (DRED_FRAME_SIZE))
#define DRED_MAX_DATA_SIZE 1000
#define DRED_ENC_Q0 6
#define DRED_ENC_Q1 15
/* Covers 1.04 second so we can cover one second, after the lookahead. */
#define DRED_MAX_LATENTS 26
#define DRED_NUM_REDUNDANCY_FRAMES (2*DRED_MAX_LATENTS)
#define DRED_MAX_FRAMES (4*DRED_MAX_LATENTS)
#endif

View File

@@ -0,0 +1,129 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <string.h>
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "os_support.h"
#include "dred_decoder.h"
#include "dred_coding.h"
#include "celt/entdec.h"
#include "celt/laplace.h"
#include "dred_rdovae_stats_data.h"
#include "dred_rdovae_constants.h"
static void dred_decode_latents(ec_dec *dec, float *x, const opus_uint8 *scale, const opus_uint8 *r, const opus_uint8 *p0, int dim) {
int i;
for (i=0;i<dim;i++) {
int q;
if (r[i] == 0 || p0[i] == 255) q = 0;
else q = ec_laplace_decode_p0(dec, p0[i]<<7, r[i]<<7);
x[i] = q*256.f/(scale[i] == 0 ? 1 : scale[i]);
}
}
int dred_ec_decode(OpusDRED *dec, const opus_uint8 *bytes, int num_bytes, int min_feature_frames, int dred_frame_offset)
{
ec_dec ec;
int q_level;
int i;
int offset;
int q0;
int dQ;
int qmax;
int state_qoffset;
int extra_offset;
/* since features are decoded in quadruples, it makes no sense to go with an uneven number of redundancy frames */
celt_assert(DRED_NUM_REDUNDANCY_FRAMES % 2 == 0);
/* decode initial state and initialize RDOVAE decoder */
ec_dec_init(&ec, (unsigned char*)bytes, num_bytes);
q0 = ec_dec_uint(&ec, 16);
dQ = ec_dec_uint(&ec, 8);
if (ec_dec_uint(&ec, 2)) extra_offset = 32*ec_dec_uint(&ec, 256);
else extra_offset = 0;
/* Compute total offset, including DRED position in a multiframe packet. */
dec->dred_offset = 16 - ec_dec_uint(&ec, 32) - extra_offset + dred_frame_offset;
/*printf("%d %d %d\n", dred_offset, q0, dQ);*/
qmax = 15;
if (q0 < 14 && dQ > 0) {
int nvals;
int ft;
int s;
/* The distribution for the dQmax symbol is split evenly between zero
(which implies qmax == 15) and larger values, with the probability of
all larger values being uniform.
This is equivalent to coding 1 bit to decide if the maximum is less than
15 followed by a uint to decide the actual value if it is less than
15, but combined into a single symbol. */
nvals = 15 - (q0 + 1);
ft = 2*nvals;
s = ec_decode(&ec, ft);
if (s >= nvals) {
qmax = q0 + (s - nvals) + 1;
ec_dec_update(&ec, s, s + 1, ft);
}
else {
ec_dec_update(&ec, 0, nvals, ft);
}
}
state_qoffset = q0*DRED_STATE_DIM;
dred_decode_latents(
&ec,
dec->state,
dred_state_quant_scales_q8 + state_qoffset,
dred_state_r_q8 + state_qoffset,
dred_state_p0_q8 + state_qoffset,
DRED_STATE_DIM);
/* decode newest to oldest and store oldest to newest */
for (i = 0; i < IMIN(DRED_NUM_REDUNDANCY_FRAMES, (min_feature_frames+1)/2); i += 2)
{
/* FIXME: Figure out how to avoid missing a last frame that would take up < 8 bits. */
if (8*num_bytes - ec_tell(&ec) <= 7)
break;
q_level = compute_quantizer(q0, dQ, qmax, i/2);
offset = q_level*DRED_LATENT_DIM;
dred_decode_latents(
&ec,
&dec->latents[(i/2)*DRED_LATENT_DIM],
dred_latent_quant_scales_q8 + offset,
dred_latent_r_q8 + offset,
dred_latent_p0_q8 + offset,
DRED_LATENT_DIM
);
offset = 2 * i * DRED_NUM_FEATURES;
}
dec->process_stage = 1;
dec->nb_latents = i/2;
return i/2;
}

View File

@@ -0,0 +1,49 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DRED_DECODER_H
#define DRED_DECODER_H
#include "opus.h"
#include "dred_config.h"
#include "dred_rdovae.h"
#include "entcode.h"
#include "dred_rdovae_constants.h"
struct OpusDRED {
float fec_features[2*DRED_NUM_REDUNDANCY_FRAMES*DRED_NUM_FEATURES];
float state[DRED_STATE_DIM];
float latents[(DRED_NUM_REDUNDANCY_FRAMES/2)*DRED_LATENT_DIM];
int nb_latents;
int process_stage;
int dred_offset;
};
int dred_ec_decode(OpusDRED *dec, const opus_uint8 *bytes, int num_bytes, int min_feature_frames, int dred_frame_offset);
#endif

View File

@@ -0,0 +1,363 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <string.h>
#if 0
#include <stdio.h>
#include <math.h>
#endif
#include "dred_encoder.h"
#include "dred_coding.h"
#include "celt/entenc.h"
#include "dred_decoder.h"
#include "float_cast.h"
#include "os_support.h"
#include "celt/laplace.h"
#include "dred_rdovae_stats_data.h"
static void DRED_rdovae_init_encoder(RDOVAEEncState *enc_state)
{
memset(enc_state, 0, sizeof(*enc_state));
}
int dred_encoder_load_model(DREDEnc* enc, const void *data, int len)
{
WeightArray *list;
int ret;
parse_weights(&list, data, len);
ret = init_rdovaeenc(&enc->model, list);
opus_free(list);
if (ret == 0) {
ret = lpcnet_encoder_load_model(&enc->lpcnet_enc_state, data, len);
}
if (ret == 0) enc->loaded = 1;
return (ret == 0) ? OPUS_OK : OPUS_BAD_ARG;
}
void dred_encoder_reset(DREDEnc* enc)
{
OPUS_CLEAR((char*)&enc->DREDENC_RESET_START,
sizeof(DREDEnc)-
((char*)&enc->DREDENC_RESET_START - (char*)enc));
enc->input_buffer_fill = DRED_SILK_ENCODER_DELAY;
lpcnet_encoder_init(&enc->lpcnet_enc_state);
DRED_rdovae_init_encoder(&enc->rdovae_enc);
}
void dred_encoder_init(DREDEnc* enc, opus_int32 Fs, int channels)
{
enc->Fs = Fs;
enc->channels = channels;
enc->loaded = 0;
#ifndef USE_WEIGHTS_FILE
if (init_rdovaeenc(&enc->model, rdovaeenc_arrays) == 0) enc->loaded = 1;
#endif
dred_encoder_reset(enc);
}
static void dred_process_frame(DREDEnc *enc, int arch)
{
float feature_buffer[2 * 36];
float input_buffer[2*DRED_NUM_FEATURES] = {0};
celt_assert(enc->loaded);
/* shift latents buffer */
OPUS_MOVE(enc->latents_buffer + DRED_LATENT_DIM, enc->latents_buffer, (DRED_MAX_FRAMES - 1) * DRED_LATENT_DIM);
OPUS_MOVE(enc->state_buffer + DRED_STATE_DIM, enc->state_buffer, (DRED_MAX_FRAMES - 1) * DRED_STATE_DIM);
/* calculate LPCNet features */
lpcnet_compute_single_frame_features_float(&enc->lpcnet_enc_state, enc->input_buffer, feature_buffer, arch);
lpcnet_compute_single_frame_features_float(&enc->lpcnet_enc_state, enc->input_buffer + DRED_FRAME_SIZE, feature_buffer + 36, arch);
/* prepare input buffer (discard LPC coefficients) */
OPUS_COPY(input_buffer, feature_buffer, DRED_NUM_FEATURES);
OPUS_COPY(input_buffer + DRED_NUM_FEATURES, feature_buffer + 36, DRED_NUM_FEATURES);
/* run RDOVAE encoder */
dred_rdovae_encode_dframe(&enc->rdovae_enc, &enc->model, enc->latents_buffer, enc->state_buffer, input_buffer, arch);
enc->latents_buffer_fill = IMIN(enc->latents_buffer_fill+1, DRED_NUM_REDUNDANCY_FRAMES);
}
void filter_df2t(const float *in, float *out, int len, float b0, const float *b, const float *a, int order, float *mem)
{
int i;
for (i=0;i<len;i++) {
int j;
float xi, yi, nyi;
xi = in[i];
yi = xi*b0 + mem[0];
nyi = -yi;
for (j=0;j<order;j++)
{
mem[j] = mem[j+1] + b[j]*xi + a[j]*nyi;
}
out[i] = yi;
/*fprintf(stdout, "%f\n", out[i]);*/
}
}
#define MAX_DOWNMIX_BUFFER (960*2)
static void dred_convert_to_16k(DREDEnc *enc, const float *in, int in_len, float *out, int out_len)
{
float downmix[MAX_DOWNMIX_BUFFER];
int i;
int up;
celt_assert(enc->channels*in_len <= MAX_DOWNMIX_BUFFER);
celt_assert(in_len * (opus_int32)16000 == out_len * enc->Fs);
switch(enc->Fs) {
case 8000:
up = 2;
break;
case 12000:
up = 4;
break;
case 16000:
up = 1;
break;
case 24000:
up = 2;
break;
case 48000:
up = 1;
break;
default:
celt_assert(0);
}
OPUS_CLEAR(downmix, up*in_len);
if (enc->channels == 1) {
for (i=0;i<in_len;i++) downmix[up*i] = FLOAT2INT16(up*in[i]);
} else {
for (i=0;i<in_len;i++) downmix[up*i] = FLOAT2INT16(.5*up*(in[2*i]+in[2*i+1]));
}
if (enc->Fs == 16000) {
OPUS_COPY(out, downmix, out_len);
} else if (enc->Fs == 48000 || enc->Fs == 24000) {
/* ellip(7, .2, 70, 7750/24000) */
static const float filter_b[8] = { 0.005873358047f, 0.012980854831f, 0.014531340042f, 0.014531340042f, 0.012980854831f, 0.005873358047f, 0.004523418224f, 0.f};
static const float filter_a[8] = {-3.878718597768f, 7.748834257468f, -9.653651699533f, 8.007342726666f, -4.379450178552f, 1.463182111810f, -0.231720677804f, 0.f};
float b0 = 0.004523418224f;
filter_df2t(downmix, downmix, up*in_len, b0, filter_b, filter_a, RESAMPLING_ORDER, enc->resample_mem);
for (i=0;i<out_len;i++) out[i] = downmix[3*i];
} else if (enc->Fs == 12000) {
/* ellip(7, .2, 70, 7750/24000) */
static const float filter_b[8] = {-0.001017101081f, 0.003673127243f, 0.001009165267f, 0.001009165267f, 0.003673127243f, -0.001017101081f, 0.002033596776f, 0.f};
static const float filter_a[8] = {-4.930414411612f, 11.291643096504f, -15.322037343815f, 13.216403930898f, -7.220409219553f, 2.310550142771f, -0.334338618782f, 0.f};
float b0 = 0.002033596776f;
filter_df2t(downmix, downmix, up*in_len, b0, filter_b, filter_a, RESAMPLING_ORDER, enc->resample_mem);
for (i=0;i<out_len;i++) out[i] = downmix[3*i];
} else if (enc->Fs == 8000) {
/* ellip(7, .2, 70, 3900/8000) */
static const float filter_b[8] = { 0.081670120929f, 0.180401598565f, 0.259391051971f, 0.259391051971f, 0.180401598565f, 0.081670120929f, 0.020109185709f, 0.f};
static const float filter_a[8] = {-1.393651933659f, 2.609789872676f, -2.403541968806f, 2.056814957331f, -1.148908574570f, 0.473001413788f, -0.110359852412f, 0.f};
float b0 = 0.020109185709f;
filter_df2t(downmix, out, out_len, b0, filter_b, filter_a, RESAMPLING_ORDER, enc->resample_mem);
} else {
celt_assert(0);
}
}
void dred_compute_latents(DREDEnc *enc, const float *pcm, int frame_size, int extra_delay, int arch)
{
int curr_offset16k;
int frame_size16k = frame_size * 16000 / enc->Fs;
celt_assert(enc->loaded);
curr_offset16k = 40 + extra_delay*16000/enc->Fs - enc->input_buffer_fill;
enc->dred_offset = (int)floor((curr_offset16k+20.f)/40.f);
enc->latent_offset = 0;
while (frame_size16k > 0) {
int process_size16k;
int process_size;
process_size16k = IMIN(2*DRED_FRAME_SIZE, frame_size16k);
process_size = process_size16k * enc->Fs / 16000;
dred_convert_to_16k(enc, pcm, process_size, &enc->input_buffer[enc->input_buffer_fill], process_size16k);
enc->input_buffer_fill += process_size16k;
if (enc->input_buffer_fill >= 2*DRED_FRAME_SIZE)
{
curr_offset16k += 320;
dred_process_frame(enc, arch);
enc->input_buffer_fill -= 2*DRED_FRAME_SIZE;
OPUS_MOVE(&enc->input_buffer[0], &enc->input_buffer[2*DRED_FRAME_SIZE], enc->input_buffer_fill);
/* 15 ms (6*2.5 ms) is the ideal offset for DRED because it corresponds to our vocoder look-ahead. */
if (enc->dred_offset < 6) {
enc->dred_offset += 8;
} else {
enc->latent_offset++;
}
}
pcm += process_size;
frame_size16k -= process_size16k;
}
}
static void dred_encode_latents(ec_enc *enc, const float *x, const opus_uint8 *scale, const opus_uint8 *dzone, const opus_uint8 *r, const opus_uint8 *p0, int dim, int arch) {
int i;
int q[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
float xq[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
float delta[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
float deadzone[IMAX(DRED_LATENT_DIM,DRED_STATE_DIM)];
float eps = .1f;
/* This is split into multiple loops (with temporary arrays) so that the compiler
can vectorize all of it, and so we can call the vector tanh(). */
for (i=0;i<dim;i++) {
delta[i] = dzone[i]*(1.f/256.f);
xq[i] = x[i]*scale[i]*(1.f/256.f);
deadzone[i] = xq[i]/(delta[i]+eps);
}
compute_activation(deadzone, deadzone, dim, ACTIVATION_TANH, arch);
for (i=0;i<dim;i++) {
xq[i] = xq[i] - delta[i]*deadzone[i];
q[i] = (int)floor(.5f+xq[i]);
}
for (i=0;i<dim;i++) {
/* Make the impossible actually impossible. */
if (r[i] == 0 || p0[i] == 255) q[i] = 0;
else ec_laplace_encode_p0(enc, q[i], p0[i]<<7, r[i]<<7);
}
}
static int dred_voice_active(const unsigned char *activity_mem, int offset) {
int i;
for (i=0;i<16;i++) {
if (activity_mem[8*offset + i] == 1) return 1;
}
return 0;
}
int dred_encode_silk_frame(DREDEnc *enc, unsigned char *buf, int max_chunks, int max_bytes, int q0, int dQ, int qmax, unsigned char *activity_mem, int arch) {
ec_enc ec_encoder;
int q_level;
int i;
int offset;
int ec_buffer_fill;
int state_qoffset;
ec_enc ec_bak;
int prev_active=0;
int latent_offset;
int extra_dred_offset=0;
int dred_encoded=0;
int delayed_dred=0;
int total_offset;
latent_offset = enc->latent_offset;
/* Delaying new DRED data when just out of silence because we already have the
main Opus payload for that frame. */
if (activity_mem[0] && enc->last_extra_dred_offset>0) {
latent_offset = enc->last_extra_dred_offset;
delayed_dred = 1;
enc->last_extra_dred_offset = 0;
}
while (latent_offset < enc->latents_buffer_fill && !dred_voice_active(activity_mem, latent_offset)) {
latent_offset++;
extra_dred_offset++;
}
if (!delayed_dred) enc->last_extra_dred_offset = extra_dred_offset;
/* entropy coding of state and latents */
ec_enc_init(&ec_encoder, buf, max_bytes);
ec_enc_uint(&ec_encoder, q0, 16);
ec_enc_uint(&ec_encoder, dQ, 8);
total_offset = 16 - (enc->dred_offset - extra_dred_offset*8);
celt_assert(total_offset>=0);
if (total_offset > 31) {
ec_enc_uint(&ec_encoder, 1, 2);
ec_enc_uint(&ec_encoder, total_offset>>5, 256);
ec_enc_uint(&ec_encoder, total_offset&31, 32);
} else {
ec_enc_uint(&ec_encoder, 0, 2);
ec_enc_uint(&ec_encoder, total_offset, 32);
}
celt_assert(qmax >= q0);
if (q0 < 14 && dQ > 0) {
int nvals;
/* If you want to use qmax == q0, you should have set dQ = 0. */
celt_assert(qmax > q0);
nvals = 15 - (q0 + 1);
ec_encode(&ec_encoder, qmax >= 15 ? 0 : nvals + qmax - (q0 + 1),
qmax >= 15 ? nvals : nvals + qmax - q0, 2*nvals);
}
state_qoffset = q0*DRED_STATE_DIM;
dred_encode_latents(
&ec_encoder,
&enc->state_buffer[latent_offset*DRED_STATE_DIM],
dred_state_quant_scales_q8 + state_qoffset,
dred_state_dead_zone_q8 + state_qoffset,
dred_state_r_q8 + state_qoffset,
dred_state_p0_q8 + state_qoffset,
DRED_STATE_DIM,
arch);
if (ec_tell(&ec_encoder) > 8*max_bytes) {
return 0;
}
ec_bak = ec_encoder;
for (i = 0; i < IMIN(2*max_chunks, enc->latents_buffer_fill-latent_offset-1); i += 2)
{
int active;
q_level = compute_quantizer(q0, dQ, qmax, i/2);
offset = q_level * DRED_LATENT_DIM;
dred_encode_latents(
&ec_encoder,
enc->latents_buffer + (i+latent_offset) * DRED_LATENT_DIM,
dred_latent_quant_scales_q8 + offset,
dred_latent_dead_zone_q8 + offset,
dred_latent_r_q8 + offset,
dred_latent_p0_q8 + offset,
DRED_LATENT_DIM,
arch
);
if (ec_tell(&ec_encoder) > 8*max_bytes) {
/* If we haven't been able to code one chunk, give up on DRED completely. */
if (i==0) return 0;
break;
}
active = dred_voice_active(activity_mem, i+latent_offset);
if (active || prev_active) {
ec_bak = ec_encoder;
dred_encoded = i+2;
}
prev_active = active;
}
/* Avoid sending empty DRED packets. */
if (dred_encoded==0 || (dred_encoded<=2 && extra_dred_offset)) return 0;
ec_encoder = ec_bak;
ec_buffer_fill = (ec_tell(&ec_encoder)+7)/8;
ec_enc_shrink(&ec_encoder, ec_buffer_fill);
ec_enc_done(&ec_encoder);
return ec_buffer_fill;
}

View File

@@ -0,0 +1,71 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DRED_ENCODER_H
#define DRED_ENCODER_H
#include "lpcnet.h"
#include "dred_config.h"
#include "dred_rdovae.h"
#include "entcode.h"
#include "lpcnet_private.h"
#include "dred_rdovae_enc.h"
#include "dred_rdovae_enc_data.h"
#define RESAMPLING_ORDER 8
typedef struct {
RDOVAEEnc model;
LPCNetEncState lpcnet_enc_state;
RDOVAEEncState rdovae_enc;
int loaded;
opus_int32 Fs;
int channels;
#define DREDENC_RESET_START input_buffer
float input_buffer[2*DRED_DFRAME_SIZE];
int input_buffer_fill;
int dred_offset;
int latent_offset;
int last_extra_dred_offset;
float latents_buffer[DRED_MAX_FRAMES * DRED_LATENT_DIM];
int latents_buffer_fill;
float state_buffer[DRED_MAX_FRAMES * DRED_STATE_DIM];
float resample_mem[RESAMPLING_ORDER + 1];
} DREDEnc;
int dred_encoder_load_model(DREDEnc* enc, const void *data, int len);
void dred_encoder_init(DREDEnc* enc, opus_int32 Fs, int channels);
void dred_encoder_reset(DREDEnc* enc);
void dred_deinit_encoder(DREDEnc *enc);
void dred_compute_latents(DREDEnc *enc, const float *pcm, int frame_size, int extra_delay, int arch);
int dred_encode_silk_frame(DREDEnc *enc, unsigned char *buf, int max_chunks, int max_bytes, int q0, int dQ, int qmax, unsigned char *activity_mem, int arch);
#endif

View File

@@ -0,0 +1,42 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DRED_RDOVAE_H
#define DRED_RDOVAE_H
#include <stdlib.h>
#include "opus_types.h"
typedef struct RDOVAEDec RDOVAEDec;
typedef struct RDOVAEEnc RDOVAEEnc;
typedef struct RDOVAEDecStruct RDOVAEDecState;
typedef struct RDOVAEEncStruct RDOVAEEncState;
#endif

View File

@@ -0,0 +1,139 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "dred_rdovae_dec.h"
#include "dred_rdovae_constants.h"
#include "os_support.h"
static void conv1_cond_init(float *mem, int len, int dilation, int *init)
{
if (!*init) {
int i;
for (i=0;i<dilation;i++) OPUS_CLEAR(&mem[i*len], len);
}
*init = 1;
}
void DRED_rdovae_decode_all(const RDOVAEDec *model, float *features, const float *state, const float *latents, int nb_latents, int arch)
{
int i;
RDOVAEDecState dec;
memset(&dec, 0, sizeof(dec));
dred_rdovae_dec_init_states(&dec, model, state, arch);
for (i = 0; i < 2*nb_latents; i += 2)
{
dred_rdovae_decode_qframe(
&dec,
model,
&features[2*i*DRED_NUM_FEATURES],
&latents[(i/2)*DRED_LATENT_DIM],
arch);
}
}
void dred_rdovae_dec_init_states(
RDOVAEDecState *h, /* io: state buffer handle */
const RDOVAEDec *model,
const float *initial_state, /* i: initial state */
int arch
)
{
float hidden[DEC_HIDDEN_INIT_OUT_SIZE];
float state_init[DEC_GRU1_STATE_SIZE+DEC_GRU2_STATE_SIZE+DEC_GRU3_STATE_SIZE+DEC_GRU4_STATE_SIZE+DEC_GRU5_STATE_SIZE];
int counter=0;
compute_generic_dense(&model->dec_hidden_init, hidden, initial_state, ACTIVATION_TANH, arch);
compute_generic_dense(&model->dec_gru_init, state_init, hidden, ACTIVATION_TANH, arch);
OPUS_COPY(h->gru1_state, state_init, DEC_GRU1_STATE_SIZE);
counter += DEC_GRU1_STATE_SIZE;
OPUS_COPY(h->gru2_state, &state_init[counter], DEC_GRU2_STATE_SIZE);
counter += DEC_GRU2_STATE_SIZE;
OPUS_COPY(h->gru3_state, &state_init[counter], DEC_GRU3_STATE_SIZE);
counter += DEC_GRU3_STATE_SIZE;
OPUS_COPY(h->gru4_state, &state_init[counter], DEC_GRU4_STATE_SIZE);
counter += DEC_GRU4_STATE_SIZE;
OPUS_COPY(h->gru5_state, &state_init[counter], DEC_GRU5_STATE_SIZE);
h->initialized = 0;
}
void dred_rdovae_decode_qframe(
RDOVAEDecState *dec_state, /* io: state buffer handle */
const RDOVAEDec *model,
float *qframe, /* o: quadruple feature frame (four concatenated frames in reverse order) */
const float *input, /* i: latent vector */
int arch
)
{
float buffer[DEC_DENSE1_OUT_SIZE + DEC_GRU1_OUT_SIZE + DEC_GRU2_OUT_SIZE + DEC_GRU3_OUT_SIZE + DEC_GRU4_OUT_SIZE + DEC_GRU5_OUT_SIZE
+ DEC_CONV1_OUT_SIZE + DEC_CONV2_OUT_SIZE + DEC_CONV3_OUT_SIZE + DEC_CONV4_OUT_SIZE + DEC_CONV5_OUT_SIZE];
int output_index = 0;
/* run encoder stack and concatenate output in buffer*/
compute_generic_dense(&model->dec_dense1, &buffer[output_index], input, ACTIVATION_TANH, arch);
output_index += DEC_DENSE1_OUT_SIZE;
compute_generic_gru(&model->dec_gru1_input, &model->dec_gru1_recurrent, dec_state->gru1_state, buffer, arch);
compute_glu(&model->dec_glu1, &buffer[output_index], dec_state->gru1_state, arch);
output_index += DEC_GRU1_OUT_SIZE;
conv1_cond_init(dec_state->conv1_state, output_index, 1, &dec_state->initialized);
compute_generic_conv1d(&model->dec_conv1, &buffer[output_index], dec_state->conv1_state, buffer, output_index, ACTIVATION_TANH, arch);
output_index += DEC_CONV1_OUT_SIZE;
compute_generic_gru(&model->dec_gru2_input, &model->dec_gru2_recurrent, dec_state->gru2_state, buffer, arch);
compute_glu(&model->dec_glu2, &buffer[output_index], dec_state->gru2_state, arch);
output_index += DEC_GRU2_OUT_SIZE;
conv1_cond_init(dec_state->conv2_state, output_index, 1, &dec_state->initialized);
compute_generic_conv1d(&model->dec_conv2, &buffer[output_index], dec_state->conv2_state, buffer, output_index, ACTIVATION_TANH, arch);
output_index += DEC_CONV2_OUT_SIZE;
compute_generic_gru(&model->dec_gru3_input, &model->dec_gru3_recurrent, dec_state->gru3_state, buffer, arch);
compute_glu(&model->dec_glu3, &buffer[output_index], dec_state->gru3_state, arch);
output_index += DEC_GRU3_OUT_SIZE;
conv1_cond_init(dec_state->conv3_state, output_index, 1, &dec_state->initialized);
compute_generic_conv1d(&model->dec_conv3, &buffer[output_index], dec_state->conv3_state, buffer, output_index, ACTIVATION_TANH, arch);
output_index += DEC_CONV3_OUT_SIZE;
compute_generic_gru(&model->dec_gru4_input, &model->dec_gru4_recurrent, dec_state->gru4_state, buffer, arch);
compute_glu(&model->dec_glu4, &buffer[output_index], dec_state->gru4_state, arch);
output_index += DEC_GRU4_OUT_SIZE;
conv1_cond_init(dec_state->conv4_state, output_index, 1, &dec_state->initialized);
compute_generic_conv1d(&model->dec_conv4, &buffer[output_index], dec_state->conv4_state, buffer, output_index, ACTIVATION_TANH, arch);
output_index += DEC_CONV4_OUT_SIZE;
compute_generic_gru(&model->dec_gru5_input, &model->dec_gru5_recurrent, dec_state->gru5_state, buffer, arch);
compute_glu(&model->dec_glu5, &buffer[output_index], dec_state->gru5_state, arch);
output_index += DEC_GRU5_OUT_SIZE;
conv1_cond_init(dec_state->conv5_state, output_index, 1, &dec_state->initialized);
compute_generic_conv1d(&model->dec_conv5, &buffer[output_index], dec_state->conv5_state, buffer, output_index, ACTIVATION_TANH, arch);
output_index += DEC_CONV5_OUT_SIZE;
compute_generic_dense(&model->dec_output, qframe, buffer, ACTIVATION_LINEAR, arch);
}

View File

@@ -0,0 +1,53 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DRED_RDOVAE_DEC_H
#define DRED_RDOVAE_DEC_H
#include "dred_rdovae.h"
#include "dred_rdovae_dec_data.h"
#include "dred_rdovae_stats_data.h"
struct RDOVAEDecStruct {
int initialized;
float gru1_state[DEC_GRU1_STATE_SIZE];
float gru2_state[DEC_GRU2_STATE_SIZE];
float gru3_state[DEC_GRU3_STATE_SIZE];
float gru4_state[DEC_GRU4_STATE_SIZE];
float gru5_state[DEC_GRU5_STATE_SIZE];
float conv1_state[DEC_CONV1_STATE_SIZE];
float conv2_state[DEC_CONV2_STATE_SIZE];
float conv3_state[DEC_CONV3_STATE_SIZE];
float conv4_state[DEC_CONV4_STATE_SIZE];
float conv5_state[DEC_CONV5_STATE_SIZE];
};
void dred_rdovae_dec_init_states(RDOVAEDecState *h, const RDOVAEDec *model, const float * initial_state, int arch);
void dred_rdovae_decode_qframe(RDOVAEDecState *h, const RDOVAEDec *model, float *qframe, const float * z, int arch);
void DRED_rdovae_decode_all(const RDOVAEDec *model, float *features, const float *state, const float *latents, int nb_latents, int arch);
#endif

View File

@@ -0,0 +1,110 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <math.h>
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "dred_rdovae_enc.h"
#include "os_support.h"
#include "dred_rdovae_constants.h"
static void conv1_cond_init(float *mem, int len, int dilation, int *init)
{
if (!*init) {
int i;
for (i=0;i<dilation;i++) OPUS_CLEAR(&mem[i*len], len);
}
*init = 1;
}
void dred_rdovae_encode_dframe(
RDOVAEEncState *enc_state, /* io: encoder state */
const RDOVAEEnc *model,
float *latents, /* o: latent vector */
float *initial_state, /* o: initial state */
const float *input, /* i: double feature frame (concatenated) */
int arch
)
{
float padded_latents[DRED_PADDED_LATENT_DIM];
float padded_state[DRED_PADDED_STATE_DIM];
float buffer[ENC_DENSE1_OUT_SIZE + ENC_GRU1_OUT_SIZE + ENC_GRU2_OUT_SIZE + ENC_GRU3_OUT_SIZE + ENC_GRU4_OUT_SIZE + ENC_GRU5_OUT_SIZE
+ ENC_CONV1_OUT_SIZE + ENC_CONV2_OUT_SIZE + ENC_CONV3_OUT_SIZE + ENC_CONV4_OUT_SIZE + ENC_CONV5_OUT_SIZE];
float state_hidden[GDENSE1_OUT_SIZE];
int output_index = 0;
/* run encoder stack and concatenate output in buffer*/
compute_generic_dense(&model->enc_dense1, &buffer[output_index], input, ACTIVATION_TANH, arch);
output_index += ENC_DENSE1_OUT_SIZE;
compute_generic_gru(&model->enc_gru1_input, &model->enc_gru1_recurrent, enc_state->gru1_state, buffer, arch);
OPUS_COPY(&buffer[output_index], enc_state->gru1_state, ENC_GRU1_OUT_SIZE);
output_index += ENC_GRU1_OUT_SIZE;
conv1_cond_init(enc_state->conv1_state, output_index, 1, &enc_state->initialized);
compute_generic_conv1d(&model->enc_conv1, &buffer[output_index], enc_state->conv1_state, buffer, output_index, ACTIVATION_TANH, arch);
output_index += ENC_CONV1_OUT_SIZE;
compute_generic_gru(&model->enc_gru2_input, &model->enc_gru2_recurrent, enc_state->gru2_state, buffer, arch);
OPUS_COPY(&buffer[output_index], enc_state->gru2_state, ENC_GRU2_OUT_SIZE);
output_index += ENC_GRU2_OUT_SIZE;
conv1_cond_init(enc_state->conv2_state, output_index, 2, &enc_state->initialized);
compute_generic_conv1d_dilation(&model->enc_conv2, &buffer[output_index], enc_state->conv2_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
output_index += ENC_CONV2_OUT_SIZE;
compute_generic_gru(&model->enc_gru3_input, &model->enc_gru3_recurrent, enc_state->gru3_state, buffer, arch);
OPUS_COPY(&buffer[output_index], enc_state->gru3_state, ENC_GRU3_OUT_SIZE);
output_index += ENC_GRU3_OUT_SIZE;
conv1_cond_init(enc_state->conv3_state, output_index, 2, &enc_state->initialized);
compute_generic_conv1d_dilation(&model->enc_conv3, &buffer[output_index], enc_state->conv3_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
output_index += ENC_CONV3_OUT_SIZE;
compute_generic_gru(&model->enc_gru4_input, &model->enc_gru4_recurrent, enc_state->gru4_state, buffer, arch);
OPUS_COPY(&buffer[output_index], enc_state->gru4_state, ENC_GRU4_OUT_SIZE);
output_index += ENC_GRU4_OUT_SIZE;
conv1_cond_init(enc_state->conv4_state, output_index, 2, &enc_state->initialized);
compute_generic_conv1d_dilation(&model->enc_conv4, &buffer[output_index], enc_state->conv4_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
output_index += ENC_CONV4_OUT_SIZE;
compute_generic_gru(&model->enc_gru5_input, &model->enc_gru5_recurrent, enc_state->gru5_state, buffer, arch);
OPUS_COPY(&buffer[output_index], enc_state->gru5_state, ENC_GRU5_OUT_SIZE);
output_index += ENC_GRU5_OUT_SIZE;
conv1_cond_init(enc_state->conv5_state, output_index, 2, &enc_state->initialized);
compute_generic_conv1d_dilation(&model->enc_conv5, &buffer[output_index], enc_state->conv5_state, buffer, output_index, 2, ACTIVATION_TANH, arch);
output_index += ENC_CONV5_OUT_SIZE;
compute_generic_dense(&model->enc_zdense, padded_latents, buffer, ACTIVATION_LINEAR, arch);
OPUS_COPY(latents, padded_latents, DRED_LATENT_DIM);
/* next, calculate initial state */
compute_generic_dense(&model->gdense1, state_hidden, buffer, ACTIVATION_TANH, arch);
compute_generic_dense(&model->gdense2, padded_state, state_hidden, ACTIVATION_LINEAR, arch);
OPUS_COPY(initial_state, padded_state, DRED_STATE_DIM);
}

View File

@@ -0,0 +1,52 @@
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DRED_RDOVAE_ENC_H
#define DRED_RDOVAE_ENC_H
#include "dred_rdovae.h"
#include "dred_rdovae_enc_data.h"
struct RDOVAEEncStruct {
int initialized;
float gru1_state[ENC_GRU1_STATE_SIZE];
float gru2_state[ENC_GRU2_STATE_SIZE];
float gru3_state[ENC_GRU3_STATE_SIZE];
float gru4_state[ENC_GRU4_STATE_SIZE];
float gru5_state[ENC_GRU5_STATE_SIZE];
float conv1_state[ENC_CONV1_STATE_SIZE];
float conv2_state[2*ENC_CONV2_STATE_SIZE];
float conv3_state[2*ENC_CONV3_STATE_SIZE];
float conv4_state[2*ENC_CONV4_STATE_SIZE];
float conv5_state[2*ENC_CONV5_STATE_SIZE];
};
void dred_rdovae_encode_dframe(RDOVAEEncState *enc_state, const RDOVAEEnc *model, float *latents, float *initial_state, const float *input, int arch);
#endif

View File

@@ -0,0 +1,238 @@
/* Copyright (c) 2017-2018 Mozilla */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include "kiss_fft.h"
#include "common.h"
#include <math.h>
#include "freq.h"
#include "pitch.h"
#include "arch.h"
#include <assert.h>
#include "lpcnet.h"
#include "lpcnet_private.h"
#include "os_support.h"
#include "cpu_support.h"
static void biquad(float *y, float mem[2], const float *x, const float *b, const float *a, int N) {
int i;
for (i=0;i<N;i++) {
float xi, yi;
xi = x[i];
yi = x[i] + mem[0];
mem[0] = mem[1] + (b[0]*(double)xi - a[0]*(double)yi);
mem[1] = (b[1]*(double)xi - a[1]*(double)yi);
y[i] = yi;
}
}
static float uni_rand(void) {
return rand()/(double)RAND_MAX-.5;
}
static void rand_resp(float *a, float *b) {
a[0] = .75*uni_rand();
a[1] = .75*uni_rand();
b[0] = .75*uni_rand();
b[1] = .75*uni_rand();
}
static opus_int16 float2short(float x)
{
int i;
i = (int)floor(.5+x);
return IMAX(-32767, IMIN(32767, i));
}
int main(int argc, char **argv) {
int i;
char *argv0;
int count=0;
static const float a_hp[2] = {-1.99599, 0.99600};
static const float b_hp[2] = {-2, 1};
float a_sig[2] = {0};
float b_sig[2] = {0};
float mem_hp_x[2]={0};
float mem_resp_x[2]={0};
float mem_preemph=0;
float x[FRAME_SIZE];
int gain_change_count=0;
FILE *f1;
FILE *ffeat;
FILE *fpcm=NULL;
opus_int16 pcm[FRAME_SIZE]={0};
opus_int16 tmp[FRAME_SIZE] = {0};
float speech_gain=1;
float old_speech_gain = 1;
int one_pass_completed = 0;
LPCNetEncState *st;
int training = -1;
int burg = 0;
int pitch = 0;
FILE *fnoise = NULL;
float noise_gain = 0;
long noise_size=0;
int arch;
srand(getpid());
arch = opus_select_arch();
st = lpcnet_encoder_create();
argv0=argv[0];
if (argc == 5 && strcmp(argv[1], "-btrain")==0) {
burg = 1;
training = 1;
}
else if (argc == 4 && strcmp(argv[1], "-btest")==0) {
burg = 1;
training = 0;
}
else if (argc == 5 && strcmp(argv[1], "-ptrain")==0) {
pitch = 1;
training = 1;
fnoise = fopen(argv[2], "rb");
fseek(fnoise, 0, SEEK_END);
noise_size = ftell(fnoise);
fseek(fnoise, 0, SEEK_SET);
argv++;
}
else if (argc == 4 && strcmp(argv[1], "-ptest")==0) {
pitch = 1;
training = 0;
}
else if (argc == 5 && strcmp(argv[1], "-train")==0) training = 1;
else if (argc == 4 && strcmp(argv[1], "-test")==0) training = 0;
if (training == -1) {
fprintf(stderr, "usage: %s -train <speech> <features out> <pcm out>\n", argv0);
fprintf(stderr, " or %s -test <speech> <features out>\n", argv0);
return 1;
}
f1 = fopen(argv[2], "r");
if (f1 == NULL) {
fprintf(stderr,"Error opening input .s16 16kHz speech input file: %s\n", argv[2]);
exit(1);
}
ffeat = fopen(argv[3], "wb");
if (ffeat == NULL) {
fprintf(stderr,"Error opening output feature file: %s\n", argv[3]);
exit(1);
}
if (training && !pitch) {
fpcm = fopen(argv[4], "wb");
if (fpcm == NULL) {
fprintf(stderr,"Error opening output PCM file: %s\n", argv[4]);
exit(1);
}
}
while (1) {
size_t ret;
ret = fread(tmp, sizeof(opus_int16), FRAME_SIZE, f1);
if (feof(f1) || ret != FRAME_SIZE) {
if (!training) break;
rewind(f1);
ret = fread(tmp, sizeof(opus_int16), FRAME_SIZE, f1);
if (ret != FRAME_SIZE) {
fprintf(stderr, "error reading\n");
exit(1);
}
one_pass_completed = 1;
}
for (i=0;i<FRAME_SIZE;i++) x[i] = tmp[i];
if (count*FRAME_SIZE_5MS>=10000000 && one_pass_completed) break;
if (training && ++gain_change_count > 2821) {
speech_gain = pow(10., (-30+(rand()%40))/20.);
if (rand()&1) speech_gain = -speech_gain;
if (rand()%20==0) speech_gain *= .01;
if (!pitch && rand()%100==0) speech_gain = 0;
gain_change_count = 0;
rand_resp(a_sig, b_sig);
if (fnoise != NULL) {
long pos;
/* Randomize the fraction because rand() only gives us 31 bits. */
float frac_pos = rand()/(float)RAND_MAX;
pos = (long)(frac_pos*noise_size);
/* 32-bit alignment. */
pos = pos/4 * 4;
if (pos > noise_size-500000) pos = noise_size-500000;
noise_gain = pow(10., (-15+(rand()%40))/20.);
if (rand()%10==0) noise_gain = 0;
fseek(fnoise, pos, SEEK_SET);
}
}
if (fnoise != NULL) {
opus_int16 noise[FRAME_SIZE];
ret = fread(noise, sizeof(opus_int16), FRAME_SIZE, fnoise);
for (i=0;i<FRAME_SIZE;i++) x[i] += noise[i]*noise_gain;
}
biquad(x, mem_hp_x, x, b_hp, a_hp, FRAME_SIZE);
biquad(x, mem_resp_x, x, b_sig, a_sig, FRAME_SIZE);
for (i=0;i<FRAME_SIZE;i++) {
float g;
float f = (float)i/FRAME_SIZE;
g = f*speech_gain + (1-f)*old_speech_gain;
x[i] *= g;
}
if (burg) {
float ceps[2*NB_BANDS];
burg_cepstral_analysis(ceps, x);
fwrite(ceps, sizeof(float), 2*NB_BANDS, ffeat);
}
preemphasis(x, &mem_preemph, x, PREEMPHASIS, FRAME_SIZE);
/* PCM is delayed by 1/2 frame to make the features centered on the frames. */
for (i=0;i<FRAME_SIZE-TRAINING_OFFSET;i++) pcm[i+TRAINING_OFFSET] = float2short(x[i]);
compute_frame_features(st, x, arch);
if (pitch) {
signed char pitch_features[PITCH_MAX_PERIOD-PITCH_MIN_PERIOD+PITCH_IF_FEATURES];
for (i=0;i<PITCH_MAX_PERIOD-PITCH_MIN_PERIOD;i++) {
pitch_features[i] = (int)floor(.5f + 127.f*st->xcorr_features[i]);
}
for (i=0;i<PITCH_IF_FEATURES;i++) {
pitch_features[i+PITCH_MAX_PERIOD-PITCH_MIN_PERIOD] = (int)floor(.5f + 127.f*st->if_features[i]);
}
fwrite(pitch_features, PITCH_MAX_PERIOD-PITCH_MIN_PERIOD+PITCH_IF_FEATURES, 1, ffeat);
} else {
fwrite(st->features, sizeof(float), NB_TOTAL_FEATURES, ffeat);
}
/*if(pitch) fwrite(pcm, FRAME_SIZE, 2, stdout);*/
if (fpcm) fwrite(pcm, FRAME_SIZE, 2, fpcm);
/*if (fpcm) fwrite(pcm, sizeof(opus_int16), FRAME_SIZE, fpcm);*/
for (i=0;i<TRAINING_OFFSET;i++) pcm[i] = float2short(x[i+FRAME_SIZE-TRAINING_OFFSET]);
old_speech_gain = speech_gain;
count++;
}
fclose(f1);
fclose(ffeat);
if (fpcm) fclose(fpcm);
lpcnet_encoder_destroy(st);
return 0;
}

View File

@@ -0,0 +1,104 @@
/* Copyright (c) 2017-2018 Mozilla
Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <math.h>
#include <stdio.h>
#include "freq.h"
#include "kiss_fft.h"
int main(void) {
int i;
FILE *file;
kiss_fft_state *kfft;
float half_window[OVERLAP_SIZE];
float dct_table[NB_BANDS*NB_BANDS];
file=fopen("lpcnet_tables.c", "wb");
fprintf(file, "/* The contents of this file was automatically generated by dump_lpcnet_tables.c*/\n\n");
fprintf(file, "#ifdef HAVE_CONFIG_H\n");
fprintf(file, "#include \"config.h\"\n");
fprintf(file, "#endif\n");
fprintf(file, "#include \"kiss_fft.h\"\n\n");
kfft = opus_fft_alloc_twiddles(WINDOW_SIZE, NULL, NULL, NULL, 0);
fprintf(file, "static const arch_fft_state arch_fft = {0, NULL};\n\n");
fprintf (file, "static const opus_int16 fft_bitrev[%d] = {\n", kfft->nfft);
for (i=0;i<kfft->nfft;i++)
fprintf (file, "%d,%c", kfft->bitrev[i],(i+16)%15==0?'\n':' ');
fprintf (file, "};\n\n");
fprintf (file, "static const kiss_twiddle_cpx fft_twiddles[%d] = {\n", kfft->nfft);
for (i=0;i<kfft->nfft;i++)
fprintf (file, "{%#0.9gf, %#0.9gf},%c", kfft->twiddles[i].r, kfft->twiddles[i].i,(i+3)%2==0?'\n':' ');
fprintf (file, "};\n\n");
fprintf(file, "const kiss_fft_state kfft = {\n");
fprintf(file, "%d, /* nfft */\n", kfft->nfft);
fprintf(file, "%#0.8gf, /* scale */\n", kfft->scale);
fprintf(file, "%d, /* shift */\n", kfft->shift);
fprintf(file, "{");
for (i=0;i<2*MAXFACTORS;i++) {
fprintf(file, "%d, ", kfft->factors[i]);
}
fprintf(file, "}, /* factors */\n");
fprintf(file, "fft_bitrev, /* bitrev*/\n");
fprintf(file, "fft_twiddles, /* twiddles*/\n");
fprintf(file, "(arch_fft_state *)&arch_fft, /* arch_fft*/\n");
fprintf(file, "};\n\n");
for (i=0;i<OVERLAP_SIZE;i++)
half_window[i] = sin(.5*M_PI*sin(.5*M_PI*(i+.5)/OVERLAP_SIZE) * sin(.5*M_PI*(i+.5)/OVERLAP_SIZE));
fprintf(file, "const float half_window[] = {\n");
for (i=0;i<OVERLAP_SIZE;i++)
fprintf (file, "%#0.9gf,%c", half_window[i],(i+6)%5==0?'\n':' ');
fprintf(file, "};\n\n");
for (i=0;i<NB_BANDS;i++) {
int j;
for (j=0;j<NB_BANDS;j++) {
dct_table[i*NB_BANDS + j] = cos((i+.5)*j*M_PI/NB_BANDS);
if (j==0) dct_table[i*NB_BANDS + j] *= sqrt(.5);
}
}
fprintf(file, "const float dct_table[] = {\n");
for (i=0;i<NB_BANDS*NB_BANDS;i++)
fprintf (file, "%#0.9gf,%c", dct_table[i],(i+6)%5==0?'\n':' ');
fprintf(file, "};\n");
fclose(file);
return 0;
}

View File

@@ -0,0 +1,225 @@
/* Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "fargan.h"
#include "os_support.h"
#include "freq.h"
#include "fargan_data.h"
#include "lpcnet.h"
#include "pitch.h"
#include "nnet.h"
#include "lpcnet_private.h"
#include "cpu_support.h"
#define FARGAN_FEATURES (NB_FEATURES)
static void compute_fargan_cond(FARGANState *st, float *cond, const float *features, int period)
{
FARGAN *model;
float dense_in[NB_FEATURES+COND_NET_PEMBED_OUT_SIZE];
float conv1_in[COND_NET_FCONV1_IN_SIZE];
float fdense2_in[COND_NET_FCONV1_OUT_SIZE];
model = &st->model;
celt_assert(FARGAN_FEATURES+COND_NET_PEMBED_OUT_SIZE == model->cond_net_fdense1.nb_inputs);
celt_assert(COND_NET_FCONV1_IN_SIZE == model->cond_net_fdense1.nb_outputs);
celt_assert(COND_NET_FCONV1_OUT_SIZE == model->cond_net_fconv1.nb_outputs);
OPUS_COPY(&dense_in[NB_FEATURES], &model->cond_net_pembed.float_weights[IMAX(0,IMIN(period-32, 223))*COND_NET_PEMBED_OUT_SIZE], COND_NET_PEMBED_OUT_SIZE);
OPUS_COPY(dense_in, features, NB_FEATURES);
compute_generic_dense(&model->cond_net_fdense1, conv1_in, dense_in, ACTIVATION_TANH, st->arch);
compute_generic_conv1d(&model->cond_net_fconv1, fdense2_in, st->cond_conv1_state, conv1_in, COND_NET_FCONV1_IN_SIZE, ACTIVATION_TANH, st->arch);
compute_generic_dense(&model->cond_net_fdense2, cond, fdense2_in, ACTIVATION_TANH, st->arch);
}
static void fargan_deemphasis(float *pcm, float *deemph_mem) {
int i;
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) {
pcm[i] += FARGAN_DEEMPHASIS * *deemph_mem;
*deemph_mem = pcm[i];
}
}
static void run_fargan_subframe(FARGANState *st, float *pcm, const float *cond, int period)
{
int i, pos;
float fwc0_in[SIG_NET_INPUT_SIZE];
float gru1_in[SIG_NET_FWC0_CONV_OUT_SIZE+2*FARGAN_SUBFRAME_SIZE];
float gru2_in[SIG_NET_GRU1_OUT_SIZE+2*FARGAN_SUBFRAME_SIZE];
float gru3_in[SIG_NET_GRU2_OUT_SIZE+2*FARGAN_SUBFRAME_SIZE];
float pred[FARGAN_SUBFRAME_SIZE+4];
float prev[FARGAN_SUBFRAME_SIZE];
float pitch_gate[4];
float gain;
float gain_1;
float skip_cat[10000];
float skip_out[SIG_NET_SKIP_DENSE_OUT_SIZE];
FARGAN *model;
celt_assert(st->cont_initialized);
model = &st->model;
compute_generic_dense(&model->sig_net_cond_gain_dense, &gain, cond, ACTIVATION_LINEAR, st->arch);
gain = exp(gain);
gain_1 = 1.f/(1e-5f + gain);
pos = PITCH_MAX_PERIOD-period-2;
for (i=0;i<FARGAN_SUBFRAME_SIZE+4;i++) {
pred[i] = MIN32(1.f, MAX32(-1.f, gain_1*st->pitch_buf[IMAX(0, pos)]));
pos++;
if (pos == PITCH_MAX_PERIOD) pos -= period;
}
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) prev[i] = MAX32(-1.f, MIN16(1.f, gain_1*st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE+i]));
OPUS_COPY(&fwc0_in[0], &cond[0], FARGAN_COND_SIZE);
OPUS_COPY(&fwc0_in[FARGAN_COND_SIZE], pred, FARGAN_SUBFRAME_SIZE+4);
OPUS_COPY(&fwc0_in[FARGAN_COND_SIZE+FARGAN_SUBFRAME_SIZE+4], prev, FARGAN_SUBFRAME_SIZE);
compute_generic_conv1d(&model->sig_net_fwc0_conv, gru1_in, st->fwc0_mem, fwc0_in, SIG_NET_INPUT_SIZE, ACTIVATION_TANH, st->arch);
celt_assert(SIG_NET_FWC0_GLU_GATE_OUT_SIZE == model->sig_net_fwc0_glu_gate.nb_outputs);
compute_glu(&model->sig_net_fwc0_glu_gate, gru1_in, gru1_in, st->arch);
compute_generic_dense(&model->sig_net_gain_dense_out, pitch_gate, gru1_in, ACTIVATION_SIGMOID, st->arch);
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) gru1_in[SIG_NET_FWC0_GLU_GATE_OUT_SIZE+i] = pitch_gate[0]*pred[i+2];
OPUS_COPY(&gru1_in[SIG_NET_FWC0_GLU_GATE_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
compute_generic_gru(&model->sig_net_gru1_input, &model->sig_net_gru1_recurrent, st->gru1_state, gru1_in, st->arch);
compute_glu(&model->sig_net_gru1_glu_gate, gru2_in, st->gru1_state, st->arch);
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) gru2_in[SIG_NET_GRU1_OUT_SIZE+i] = pitch_gate[1]*pred[i+2];
OPUS_COPY(&gru2_in[SIG_NET_GRU1_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
compute_generic_gru(&model->sig_net_gru2_input, &model->sig_net_gru2_recurrent, st->gru2_state, gru2_in, st->arch);
compute_glu(&model->sig_net_gru2_glu_gate, gru3_in, st->gru2_state, st->arch);
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) gru3_in[SIG_NET_GRU2_OUT_SIZE+i] = pitch_gate[2]*pred[i+2];
OPUS_COPY(&gru3_in[SIG_NET_GRU2_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
compute_generic_gru(&model->sig_net_gru3_input, &model->sig_net_gru3_recurrent, st->gru3_state, gru3_in, st->arch);
compute_glu(&model->sig_net_gru3_glu_gate, &skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE], st->gru3_state, st->arch);
OPUS_COPY(skip_cat, gru2_in, SIG_NET_GRU1_OUT_SIZE);
OPUS_COPY(&skip_cat[SIG_NET_GRU1_OUT_SIZE], gru3_in, SIG_NET_GRU2_OUT_SIZE);
OPUS_COPY(&skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE+SIG_NET_GRU3_OUT_SIZE], gru1_in, SIG_NET_FWC0_CONV_OUT_SIZE);
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE+SIG_NET_GRU3_OUT_SIZE+SIG_NET_FWC0_CONV_OUT_SIZE+i] = pitch_gate[3]*pred[i+2];
OPUS_COPY(&skip_cat[SIG_NET_GRU1_OUT_SIZE+SIG_NET_GRU2_OUT_SIZE+SIG_NET_GRU3_OUT_SIZE+SIG_NET_FWC0_CONV_OUT_SIZE+FARGAN_SUBFRAME_SIZE], prev, FARGAN_SUBFRAME_SIZE);
compute_generic_dense(&model->sig_net_skip_dense, skip_out, skip_cat, ACTIVATION_TANH, st->arch);
compute_glu(&model->sig_net_skip_glu_gate, skip_out, skip_out, st->arch);
compute_generic_dense(&model->sig_net_sig_dense_out, pcm, skip_out, ACTIVATION_TANH, st->arch);
for (i=0;i<FARGAN_SUBFRAME_SIZE;i++) pcm[i] *= gain;
OPUS_MOVE(st->pitch_buf, &st->pitch_buf[FARGAN_SUBFRAME_SIZE], PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE);
OPUS_COPY(&st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE], pcm, FARGAN_SUBFRAME_SIZE);
fargan_deemphasis(pcm, &st->deemph_mem);
}
void fargan_cont(FARGANState *st, const float *pcm0, const float *features0)
{
int i;
float cond[COND_NET_FDENSE2_OUT_SIZE];
float x0[FARGAN_CONT_SAMPLES];
float dummy[FARGAN_SUBFRAME_SIZE];
int period=0;
/* Pre-load features. */
for (i=0;i<5;i++) {
const float *features = &features0[i*NB_FEATURES];
st->last_period = period;
period = (int)floor(.5+256./pow(2.f,((1./60.)*((features[NB_BANDS]+1.5)*60))));
compute_fargan_cond(st, cond, features, period);
}
x0[0] = 0;
for (i=1;i<FARGAN_CONT_SAMPLES;i++) {
x0[i] = pcm0[i] - FARGAN_DEEMPHASIS*pcm0[i-1];
}
OPUS_COPY(&st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_FRAME_SIZE], x0, FARGAN_FRAME_SIZE);
st->cont_initialized = 1;
for (i=0;i<FARGAN_NB_SUBFRAMES;i++) {
run_fargan_subframe(st, dummy, &cond[i*FARGAN_COND_SIZE], st->last_period);
OPUS_COPY(&st->pitch_buf[PITCH_MAX_PERIOD-FARGAN_SUBFRAME_SIZE], &x0[FARGAN_FRAME_SIZE+i*FARGAN_SUBFRAME_SIZE], FARGAN_SUBFRAME_SIZE);
}
st->deemph_mem = pcm0[FARGAN_CONT_SAMPLES-1];
}
void fargan_init(FARGANState *st)
{
int ret;
OPUS_CLEAR(st, 1);
st->arch = opus_select_arch();
#ifndef USE_WEIGHTS_FILE
ret = init_fargan(&st->model, fargan_arrays);
#else
ret = 0;
#endif
celt_assert(ret == 0);
}
int fargan_load_model(FARGANState *st, const void *data, int len) {
WeightArray *list;
int ret;
parse_weights(&list, data, len);
ret = init_fargan(&st->model, list);
opus_free(list);
if (ret == 0) return 0;
else return -1;
}
static void fargan_synthesize_impl(FARGANState *st, float *pcm, const float *features)
{
int subframe;
float cond[COND_NET_FDENSE2_OUT_SIZE];
int period;
celt_assert(st->cont_initialized);
period = (int)floor(.5+256./pow(2.f,((1./60.)*((features[NB_BANDS]+1.5)*60))));
compute_fargan_cond(st, cond, features, period);
for (subframe=0;subframe<FARGAN_NB_SUBFRAMES;subframe++) {
float *sub_cond;
sub_cond = &cond[subframe*FARGAN_COND_SIZE];
run_fargan_subframe(st, &pcm[subframe*FARGAN_SUBFRAME_SIZE], sub_cond, st->last_period);
}
st->last_period = period;
}
void fargan_synthesize(FARGANState *st, float *pcm, const float *features)
{
fargan_synthesize_impl(st, pcm, features);
}
void fargan_synthesize_int(FARGANState *st, opus_int16 *pcm, const float *features)
{
int i;
float fpcm[FARGAN_FRAME_SIZE];
fargan_synthesize(st, fpcm, features);
for (i=0;i<LPCNET_FRAME_SIZE;i++) pcm[i] = (int)floor(.5 + MIN32(32767, MAX32(-32767, 32768.f*fpcm[i])));
}

View File

@@ -0,0 +1,68 @@
/* Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef FARGAN_H
#define FARGAN_H
#include "freq.h"
#include "fargan_data.h"
#include "pitchdnn.h"
#define FARGAN_CONT_SAMPLES 320
#define FARGAN_NB_SUBFRAMES 4
#define FARGAN_SUBFRAME_SIZE 40
#define FARGAN_FRAME_SIZE (FARGAN_NB_SUBFRAMES*FARGAN_SUBFRAME_SIZE)
#define FARGAN_COND_SIZE (COND_NET_FDENSE2_OUT_SIZE/FARGAN_NB_SUBFRAMES)
#define FARGAN_DEEMPHASIS 0.85f
#define SIG_NET_INPUT_SIZE (FARGAN_COND_SIZE+2*FARGAN_SUBFRAME_SIZE+4)
#define SIG_NET_FWC0_STATE_SIZE (2*SIG_NET_INPUT_SIZE)
#define FARGAN_MAX_RNN_NEURONS SIG_NET_GRU1_OUT_SIZE
typedef struct {
FARGAN model;
int arch;
int cont_initialized;
float deemph_mem;
float pitch_buf[PITCH_MAX_PERIOD];
float cond_conv1_state[COND_NET_FCONV1_STATE_SIZE];
float fwc0_mem[SIG_NET_FWC0_STATE_SIZE];
float gru1_state[SIG_NET_GRU1_STATE_SIZE];
float gru2_state[SIG_NET_GRU2_STATE_SIZE];
float gru3_state[SIG_NET_GRU3_STATE_SIZE];
int last_period;
} FARGANState;
void fargan_init(FARGANState *st);
int fargan_load_model(FARGANState *st, const void *data, int len);
void fargan_cont(FARGANState *st, const float *pcm0, const float *features0);
void fargan_synthesize(FARGANState *st, float *pcm, const float *features);
void fargan_synthesize_int(FARGANState *st, opus_int16 *pcm, const float *features);
#endif /* FARGAN_H */

View File

@@ -0,0 +1,217 @@
/* Copyright (c) 2018 Mozilla */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <math.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "arch.h"
#include "lpcnet.h"
#include "freq.h"
#include "os_support.h"
#include "fargan.h"
#include "cpu_support.h"
#ifdef USE_WEIGHTS_FILE
# if __unix__
# include <fcntl.h>
# include <sys/mman.h>
# include <unistd.h>
# include <sys/stat.h>
/* When available, mmap() is preferable to reading the file, as it leads to
better resource utilization, especially if multiple processes are using the same
file (mapping will be shared in cache). */
void *load_blob(const char *filename, int *len) {
int fd;
void *data;
struct stat st;
if (stat(filename, &st)) {
*len = 0;
return NULL;
}
*len = st.st_size;
fd = open(filename, O_RDONLY);
if (fd<0) {
*len = 0;
return NULL;
}
data = mmap(NULL, *len, PROT_READ, MAP_SHARED, fd, 0);
if (data == MAP_FAILED) {
*len = 0;
data = NULL;
}
close(fd);
return data;
}
void free_blob(void *blob, int len) {
if (blob) munmap(blob, len);
}
# else
void *load_blob(const char *filename, int *len) {
FILE *file;
void *data;
file = fopen(filename, "r");
if (file == NULL)
{
perror("could not open blob file");
*len = 0;
return NULL;
}
fseek(file, 0L, SEEK_END);
*len = ftell(file);
fseek(file, 0L, SEEK_SET);
if (*len <= 0) {
*len = 0;
return NULL;
}
data = malloc(*len);
if (!data) {
*len = 0;
return NULL;
}
*len = fread(data, 1, *len, file);
return data;
}
void free_blob(void *blob, int len) {
free(blob);
(void)len;
}
# endif
#endif
#define MODE_FEATURES 2
/*#define MODE_SYNTHESIS 3*/
#define MODE_ADDLPC 5
#define MODE_FWGAN_SYNTHESIS 6
#define MODE_FARGAN_SYNTHESIS 7
void usage(void) {
fprintf(stderr, "usage: lpcnet_demo -features <input.pcm> <features.f32>\n");
fprintf(stderr, " lpcnet_demo -fargan-synthesis <features.f32> <output.pcm>\n");
fprintf(stderr, " lpcnet_demo -addlpc <features_without_lpc.f32> <features_with_lpc.lpc>\n\n");
fprintf(stderr, " plc_options:\n");
fprintf(stderr, " causal: normal (causal) PLC\n");
fprintf(stderr, " codec: normal (causal) PLC without cross-fade (will glitch)\n");
exit(1);
}
int main(int argc, char **argv) {
int mode=0;
int arch;
FILE *fin, *fout;
#ifdef USE_WEIGHTS_FILE
int len;
void *data;
const char *filename = "weights_blob.bin";
#endif
arch = opus_select_arch();
if (argc < 4) usage();
if (strcmp(argv[1], "-features") == 0) mode=MODE_FEATURES;
else if (strcmp(argv[1], "-fargan-synthesis") == 0) mode=MODE_FARGAN_SYNTHESIS;
else if (strcmp(argv[1], "-addlpc") == 0){
mode=MODE_ADDLPC;
} else {
usage();
}
if (argc != 4) usage();
fin = fopen(argv[2], "rb");
if (fin == NULL) {
fprintf(stderr, "Can't open %s\n", argv[2]);
exit(1);
}
fout = fopen(argv[3], "wb");
if (fout == NULL) {
fprintf(stderr, "Can't open %s\n", argv[3]);
exit(1);
}
#ifdef USE_WEIGHTS_FILE
data = load_blob(filename, &len);
#endif
if (mode == MODE_FEATURES) {
LPCNetEncState *net;
net = lpcnet_encoder_create();
while (1) {
float features[NB_TOTAL_FEATURES];
opus_int16 pcm[LPCNET_FRAME_SIZE];
size_t ret;
ret = fread(pcm, sizeof(pcm[0]), LPCNET_FRAME_SIZE, fin);
if (feof(fin) || ret != LPCNET_FRAME_SIZE) break;
lpcnet_compute_single_frame_features(net, pcm, features, arch);
fwrite(features, sizeof(float), NB_TOTAL_FEATURES, fout);
}
lpcnet_encoder_destroy(net);
} else if (mode == MODE_FARGAN_SYNTHESIS) {
FARGANState fargan;
size_t ret, i;
float in_features[5*NB_TOTAL_FEATURES];
float zeros[320] = {0};
fargan_init(&fargan);
#ifdef USE_WEIGHTS_FILE
fargan_load_model(&fargan, data, len);
#endif
/* uncomment the following to align with Python code */
/*ret = fread(&in_features[0], sizeof(in_features[0]), NB_TOTAL_FEATURES, fin);*/
for (i=0;i<5;i++) {
ret = fread(&in_features[i*NB_FEATURES], sizeof(in_features[0]), NB_TOTAL_FEATURES, fin);
}
fargan_cont(&fargan, zeros, in_features);
while (1) {
float features[NB_FEATURES];
float fpcm[LPCNET_FRAME_SIZE];
opus_int16 pcm[LPCNET_FRAME_SIZE];
ret = fread(in_features, sizeof(features[0]), NB_TOTAL_FEATURES, fin);
if (feof(fin) || ret != NB_TOTAL_FEATURES) break;
OPUS_COPY(features, in_features, NB_FEATURES);
fargan_synthesize(&fargan, fpcm, features);
for (i=0;i<LPCNET_FRAME_SIZE;i++) pcm[i] = (int)floor(.5 + MIN32(32767, MAX32(-32767, 32768.f*fpcm[i])));
fwrite(pcm, sizeof(pcm[0]), LPCNET_FRAME_SIZE, fout);
}
} else if (mode == MODE_ADDLPC) {
float features[36];
size_t ret;
while (1) {
ret = fread(features, sizeof(features[0]), 36, fin);
if (ret != 36 || feof(fin)) break;
lpc_from_cepstrum(&features[20], &features[0]);
fwrite(features, sizeof(features[0]), 36, fout);
}
} else {
fprintf(stderr, "unknown action\n");
}
fclose(fin);
fclose(fout);
#ifdef USE_WEIGHTS_FILE
free_blob(data, len);
#endif
return 0;
}

View File

@@ -0,0 +1,328 @@
/* Copyright (c) 2017-2018 Mozilla */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "kiss_fft.h"
#include <math.h>
#include "freq.h"
#include "pitch.h"
#include "arch.h"
#include "burg.h"
#include <assert.h>
#include "os_support.h"
#define SQUARE(x) ((x)*(x))
static const opus_int16 eband5ms[] = {
/*0 200 400 600 800 1k 1.2 1.4 1.6 2k 2.4 2.8 3.2 4k 4.8 5.6 6.8 8k*/
0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40
};
static const float compensation[] = {
0.8f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 0.666667f, 0.5f, 0.5f, 0.5f, 0.333333f, 0.25f, 0.25f, 0.2f, 0.166667f, 0.173913f
};
extern const kiss_fft_state kfft;
extern const float half_window[OVERLAP_SIZE];
extern const float dct_table[NB_BANDS*NB_BANDS];
static void compute_band_energy_inverse(float *bandE, const kiss_fft_cpx *X) {
int i;
float sum[NB_BANDS] = {0};
for (i=0;i<NB_BANDS-1;i++)
{
int j;
int band_size;
band_size = (eband5ms[i+1]-eband5ms[i])*WINDOW_SIZE_5MS;
for (j=0;j<band_size;j++) {
float tmp;
float frac = (float)j/band_size;
tmp = SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].r);
tmp += SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].i);
tmp = 1.f/(tmp + 1e-9);
sum[i] += (1-frac)*tmp;
sum[i+1] += frac*tmp;
}
}
sum[0] *= 2;
sum[NB_BANDS-1] *= 2;
for (i=0;i<NB_BANDS;i++)
{
bandE[i] = sum[i];
}
}
static float lpcn_lpc(
opus_val16 *lpc, /* out: [0...p-1] LPC coefficients */
opus_val16 *rc,
const opus_val32 *ac, /* in: [0...p] autocorrelation values */
int p
)
{
int i, j;
opus_val32 r;
opus_val32 error = ac[0];
OPUS_CLEAR(lpc, p);
OPUS_CLEAR(rc, p);
if (ac[0] != 0)
{
for (i = 0; i < p; i++) {
/* Sum up this iteration's reflection coefficient */
opus_val32 rr = 0;
for (j = 0; j < i; j++)
rr += MULT32_32_Q31(lpc[j],ac[i - j]);
rr += SHR32(ac[i + 1],3);
r = -SHL32(rr,3)/error;
rc[i] = r;
/* Update LPC coefficients and total error */
lpc[i] = SHR32(r,3);
for (j = 0; j < (i+1)>>1; j++)
{
opus_val32 tmp1, tmp2;
tmp1 = lpc[j];
tmp2 = lpc[i-1-j];
lpc[j] = tmp1 + MULT32_32_Q31(r,tmp2);
lpc[i-1-j] = tmp2 + MULT32_32_Q31(r,tmp1);
}
error = error - MULT32_32_Q31(MULT32_32_Q31(r,r),error);
/* Bail out once we get 30 dB gain */
if (error<.001f*ac[0])
break;
}
}
return error;
}
void lpcn_compute_band_energy(float *bandE, const kiss_fft_cpx *X) {
int i;
float sum[NB_BANDS] = {0};
for (i=0;i<NB_BANDS-1;i++)
{
int j;
int band_size;
band_size = (eband5ms[i+1]-eband5ms[i])*WINDOW_SIZE_5MS;
for (j=0;j<band_size;j++) {
float tmp;
float frac = (float)j/band_size;
tmp = SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].r);
tmp += SQUARE(X[(eband5ms[i]*WINDOW_SIZE_5MS) + j].i);
sum[i] += (1-frac)*tmp;
sum[i+1] += frac*tmp;
}
}
sum[0] *= 2;
sum[NB_BANDS-1] *= 2;
for (i=0;i<NB_BANDS;i++)
{
bandE[i] = sum[i];
}
}
static void compute_burg_cepstrum(const float *pcm, float *burg_cepstrum, int len, int order) {
int i;
float burg_in[FRAME_SIZE];
float burg_lpc[LPC_ORDER];
float x[WINDOW_SIZE];
float Eburg[NB_BANDS];
float g;
kiss_fft_cpx LPC[FREQ_SIZE];
float Ly[NB_BANDS];
float logMax = -2;
float follow = -2;
assert(order <= LPC_ORDER);
assert(len <= FRAME_SIZE);
for (i=0;i<len-1;i++) burg_in[i] = pcm[i+1] - PREEMPHASIS*pcm[i];
g = silk_burg_analysis(burg_lpc, burg_in, 1e-3, len-1, 1, order);
g /= len - 2*(order-1);
OPUS_CLEAR(x, WINDOW_SIZE);
x[0] = 1;
for (i=0;i<order;i++) x[i+1] = -burg_lpc[i]*pow(.995, i+1);
forward_transform(LPC, x);
compute_band_energy_inverse(Eburg, LPC);
for (i=0;i<NB_BANDS;i++) Eburg[i] *= .45*g*(1.f/((float)WINDOW_SIZE*WINDOW_SIZE*WINDOW_SIZE));
for (i=0;i<NB_BANDS;i++) {
Ly[i] = log10(1e-2+Eburg[i]);
Ly[i] = MAX16(logMax-8, MAX16(follow-2.5, Ly[i]));
logMax = MAX16(logMax, Ly[i]);
follow = MAX16(follow-2.5, Ly[i]);
}
dct(burg_cepstrum, Ly);
burg_cepstrum[0] += - 4;
}
void burg_cepstral_analysis(float *ceps, const float *x) {
int i;
compute_burg_cepstrum(x, &ceps[0 ], FRAME_SIZE/2, LPC_ORDER);
compute_burg_cepstrum(&x[FRAME_SIZE/2], &ceps[NB_BANDS], FRAME_SIZE/2, LPC_ORDER);
for (i=0;i<NB_BANDS;i++) {
float c0, c1;
c0 = ceps[i];
c1 = ceps[NB_BANDS+i];
ceps[i ] = .5*(c0+c1);
ceps[NB_BANDS+i] = (c0-c1);
}
}
static void interp_band_gain(float *g, const float *bandE) {
int i;
memset(g, 0, FREQ_SIZE);
for (i=0;i<NB_BANDS-1;i++)
{
int j;
int band_size;
band_size = (eband5ms[i+1]-eband5ms[i])*WINDOW_SIZE_5MS;
for (j=0;j<band_size;j++) {
float frac = (float)j/band_size;
g[(eband5ms[i]*WINDOW_SIZE_5MS) + j] = (1-frac)*bandE[i] + frac*bandE[i+1];
}
}
}
void dct(float *out, const float *in) {
int i;
for (i=0;i<NB_BANDS;i++) {
int j;
float sum = 0;
for (j=0;j<NB_BANDS;j++) {
sum += in[j] * dct_table[j*NB_BANDS + i];
}
out[i] = sum*sqrt(2./NB_BANDS);
}
}
static void idct(float *out, const float *in) {
int i;
for (i=0;i<NB_BANDS;i++) {
int j;
float sum = 0;
for (j=0;j<NB_BANDS;j++) {
sum += in[j] * dct_table[i*NB_BANDS + j];
}
out[i] = sum*sqrt(2./NB_BANDS);
}
}
void forward_transform(kiss_fft_cpx *out, const float *in) {
int i;
kiss_fft_cpx x[WINDOW_SIZE];
kiss_fft_cpx y[WINDOW_SIZE];
for (i=0;i<WINDOW_SIZE;i++) {
x[i].r = in[i];
x[i].i = 0;
}
opus_fft(&kfft, x, y, 0);
for (i=0;i<FREQ_SIZE;i++) {
out[i] = y[i];
}
}
static void inverse_transform(float *out, const kiss_fft_cpx *in) {
int i;
kiss_fft_cpx x[WINDOW_SIZE];
kiss_fft_cpx y[WINDOW_SIZE];
for (i=0;i<FREQ_SIZE;i++) {
x[i] = in[i];
}
for (;i<WINDOW_SIZE;i++) {
x[i].r = x[WINDOW_SIZE - i].r;
x[i].i = -x[WINDOW_SIZE - i].i;
}
opus_fft(&kfft, x, y, 0);
/* output in reverse order for IFFT. */
out[0] = WINDOW_SIZE*y[0].r;
for (i=1;i<WINDOW_SIZE;i++) {
out[i] = WINDOW_SIZE*y[WINDOW_SIZE - i].r;
}
}
static float lpc_from_bands(float *lpc, const float *Ex)
{
int i;
float e;
float ac[LPC_ORDER+1];
float rc[LPC_ORDER];
float Xr[FREQ_SIZE];
kiss_fft_cpx X_auto[FREQ_SIZE];
float x_auto[WINDOW_SIZE];
interp_band_gain(Xr, Ex);
Xr[FREQ_SIZE-1] = 0;
OPUS_CLEAR(X_auto, FREQ_SIZE);
for (i=0;i<FREQ_SIZE;i++) X_auto[i].r = Xr[i];
inverse_transform(x_auto, X_auto);
for (i=0;i<LPC_ORDER+1;i++) ac[i] = x_auto[i];
/* -40 dB noise floor. */
ac[0] += ac[0]*1e-4 + 320/12/38.;
/* Lag windowing. */
for (i=1;i<LPC_ORDER+1;i++) ac[i] *= (1 - 6e-5*i*i);
e = lpcn_lpc(lpc, rc, ac, LPC_ORDER);
return e;
}
void lpc_weighting(float *lpc, float gamma)
{
int i;
float gamma_i = gamma;
for (i = 0; i < LPC_ORDER; i++)
{
lpc[i] *= gamma_i;
gamma_i *= gamma;
}
}
float lpc_from_cepstrum(float *lpc, const float *cepstrum)
{
int i;
float Ex[NB_BANDS];
float tmp[NB_BANDS];
OPUS_COPY(tmp, cepstrum, NB_BANDS);
tmp[0] += 4;
idct(Ex, tmp);
for (i=0;i<NB_BANDS;i++) Ex[i] = pow(10.f, Ex[i])*compensation[i];
return lpc_from_bands(lpc, Ex);
}
void apply_window(float *x) {
int i;
for (i=0;i<OVERLAP_SIZE;i++) {
x[i] *= half_window[i];
x[WINDOW_SIZE - 1 - i] *= half_window[i];
}
}

View File

@@ -0,0 +1,61 @@
/* Copyright (c) 2017-2018 Mozilla */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef FREQ_H
#define FREQ_H
#include "kiss_fft.h"
#define LPC_ORDER 16
#define PREEMPHASIS (0.85f)
#define FRAME_SIZE_5MS (2)
#define OVERLAP_SIZE_5MS (2)
#define TRAINING_OFFSET_5MS (1)
#define WINDOW_SIZE_5MS (FRAME_SIZE_5MS + OVERLAP_SIZE_5MS)
#define FRAME_SIZE (80*FRAME_SIZE_5MS)
#define OVERLAP_SIZE (80*OVERLAP_SIZE_5MS)
#define TRAINING_OFFSET (80*TRAINING_OFFSET_5MS)
#define WINDOW_SIZE (FRAME_SIZE + OVERLAP_SIZE)
#define FREQ_SIZE (WINDOW_SIZE/2 + 1)
#define NB_BANDS 18
#define NB_BANDS_1 (NB_BANDS - 1)
void lpcn_compute_band_energy(float *bandE, const kiss_fft_cpx *X);
void burg_cepstral_analysis(float *ceps, const float *x);
void apply_window(float *x);
void dct(float *out, const float *in);
void forward_transform(kiss_fft_cpx *out, const float *in);
float lpc_from_cepstrum(float *lpc, const float *cepstrum);
void apply_window(float *x);
void lpc_weighting(float *lpc, float gamma);
#endif

View File

@@ -0,0 +1,322 @@
/* Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "fwgan.h"
#include "os_support.h"
#include "freq.h"
#include "fwgan_data.h"
#include "lpcnet.h"
#include "pitch.h"
#include "nnet.h"
#include "lpcnet_private.h"
#define FEAT_IN_SIZE (BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4 + FWGAN_FRAME_SIZE/2)
#define FWGAN_FEATURES (NB_FEATURES-1)
static void pitch_embeddings(float *pembed, float *phase, double w0) {
int i;
float wreal, wimag;
#if 1
/* This Taylor expansion should be good enough since w0 is always small. */
float w2 = w0*w0;
wreal = 1 - .5*w2*(1.f - 0.083333333f*w2);
wimag = w0*(1 - 0.166666667f*w2*(1.f - 0.05f*w2));
#else
wreal = cos(w0);
wimag = sin(w0);
#endif
/* Speed-up phase reference by making phase a unit-norm complex value and rotating it
by exp(-i*w0) each sample. */
for (i=0;i<SUBFRAME_SIZE;i++) {
float tmp;
tmp = phase[0]*wreal - phase[1]*wimag;
phase[1] = phase[0]*wimag + phase[1]*wreal;
phase[0] = tmp;
pembed[i] = phase[1];
pembed[SUBFRAME_SIZE+i] = phase[0];
}
/* Renormalize once per sub-frame, though we could probably do it even less frequently. */
{
float r = 1.f/sqrt(phase[0]*phase[0] + phase[1]*phase[1]);
phase[0] *= r;
phase[1] *= r;
}
}
static void compute_wlpc(float lpc[LPC_ORDER], const float *features) {
float lpc_weight;
int i;
lpc_from_cepstrum(lpc, features);
lpc_weight = 1.f;
for (i=0;i<LPC_ORDER;i++) {
lpc_weight *= FWGAN_GAMMA;
lpc[i] *= lpc_weight;
}
}
static void run_fwgan_upsampler(FWGANState *st, float *cond, const float *features)
{
FWGAN *model;
model = &st->model;
celt_assert(FWGAN_FEATURES == model->bfcc_with_corr_upsampler_fc.nb_inputs);
celt_assert(BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE == model->bfcc_with_corr_upsampler_fc.nb_outputs);
compute_generic_dense(&model->bfcc_with_corr_upsampler_fc, cond, features, ACTIVATION_TANH);
}
static void fwgan_synthesize_impl(FWGANState *st, float *pcm, const float *lpc, const float *features);
void fwgan_cont(FWGANState *st, const float *pcm0, const float *features0)
{
int i;
float norm2, norm_1;
float wpcm0[CONT_PCM_INPUTS];
float cont_inputs[CONT_PCM_INPUTS+1];
float tmp1[MAX_CONT_SIZE];
float tmp2[MAX_CONT_SIZE];
float lpc[LPC_ORDER];
float new_pcm[FWGAN_FRAME_SIZE];
FWGAN *model;
st->embed_phase[0] = 1;
model = &st->model;
compute_wlpc(lpc, features0);
/* Deemphasis memory is just the last continuation sample. */
st->deemph_mem = pcm0[CONT_PCM_INPUTS-1];
/* Apply analysis filter, considering that the preemphasis and deemphasis filter
cancel each other in this case since the LPC filter is constant across that boundary.
*/
for (i=LPC_ORDER;i<CONT_PCM_INPUTS;i++) {
int j;
wpcm0[i] = pcm0[i];
for (j=0;j<LPC_ORDER;j++) wpcm0[i] += lpc[j]*pcm0[i-j-1];
}
/* FIXME: Make this less stupid. */
for (i=0;i<LPC_ORDER;i++) wpcm0[i] = wpcm0[LPC_ORDER];
/* The memory of the pre-empahsis is the last sample of the weighted signal
(ignoring preemphasis+deemphasis combination). */
st->preemph_mem = wpcm0[CONT_PCM_INPUTS-1];
/* The memory of the synthesis filter is the pre-emphasized continuation. */
for (i=0;i<LPC_ORDER;i++) st->syn_mem[i] = pcm0[CONT_PCM_INPUTS-1-i] - FWGAN_DEEMPHASIS*pcm0[CONT_PCM_INPUTS-2-i];
norm2 = celt_inner_prod(wpcm0, wpcm0, CONT_PCM_INPUTS, st->arch);
norm_1 = 1.f/sqrt(1e-8f + norm2);
for (i=0;i<CONT_PCM_INPUTS;i++) cont_inputs[i+1] = norm_1*wpcm0[i];
cont_inputs[0] = log(sqrt(norm2) + 1e-7f);
/* Continuation network */
compute_generic_dense(&model->cont_net_0, tmp1, cont_inputs, ACTIVATION_TANH);
compute_generic_dense(&model->cont_net_2, tmp2, tmp1, ACTIVATION_TANH);
compute_generic_dense(&model->cont_net_4, tmp1, tmp2, ACTIVATION_TANH);
compute_generic_dense(&model->cont_net_6, tmp2, tmp1, ACTIVATION_TANH);
compute_generic_dense(&model->cont_net_8, tmp1, tmp2, ACTIVATION_TANH);
celt_assert(CONT_NET_10_OUT_SIZE == model->cont_net_10.nb_outputs);
compute_generic_dense(&model->cont_net_10, st->cont, tmp1, ACTIVATION_TANH);
/* Computing continuation for each layer. */
celt_assert(RNN_GRU_STATE_SIZE == model->rnn_cont_fc_0.nb_outputs);
compute_generic_dense(&model->rnn_cont_fc_0, st->rnn_state, st->cont, ACTIVATION_TANH);
celt_assert(FWC1_STATE_SIZE == model->fwc1_cont_fc_0.nb_outputs);
compute_generic_dense(&model->fwc1_cont_fc_0, st->fwc1_state, st->cont, ACTIVATION_TANH);
celt_assert(FWC2_STATE_SIZE == model->fwc2_cont_fc_0.nb_outputs);
compute_generic_dense(&model->fwc2_cont_fc_0, st->fwc2_state, st->cont, ACTIVATION_TANH);
celt_assert(FWC3_STATE_SIZE == model->fwc3_cont_fc_0.nb_outputs);
compute_generic_dense(&model->fwc3_cont_fc_0, st->fwc3_state, st->cont, ACTIVATION_TANH);
celt_assert(FWC4_STATE_SIZE == model->fwc4_cont_fc_0.nb_outputs);
compute_generic_dense(&model->fwc4_cont_fc_0, st->fwc4_state, st->cont, ACTIVATION_TANH);
celt_assert(FWC5_STATE_SIZE == model->fwc5_cont_fc_0.nb_outputs);
compute_generic_dense(&model->fwc5_cont_fc_0, st->fwc5_state, st->cont, ACTIVATION_TANH);
celt_assert(FWC6_STATE_SIZE == model->fwc6_cont_fc_0.nb_outputs);
compute_generic_dense(&model->fwc6_cont_fc_0, st->fwc6_state, st->cont, ACTIVATION_TANH);
celt_assert(FWC7_STATE_SIZE == model->fwc7_cont_fc_0.nb_outputs);
compute_generic_dense(&model->fwc7_cont_fc_0, st->fwc7_state, st->cont, ACTIVATION_TANH);
st->cont_initialized = 1;
/* Process the first frame, discard the first subframe, and keep the rest for the first
synthesis call. */
fwgan_synthesize_impl(st, new_pcm, lpc, features0);
OPUS_COPY(st->pcm_buf, &new_pcm[SUBFRAME_SIZE], FWGAN_FRAME_SIZE-SUBFRAME_SIZE);
}
static void apply_gain(float *pcm, float c0, float *last_gain) {
int i;
float gain = pow(10.f, (0.5f*c0/sqrt(18.f)));
for (i=0;i<SUBFRAME_SIZE;i++) pcm[i] *= *last_gain;
*last_gain = gain;
}
static void fwgan_lpc_syn(float *pcm, float *mem, const float *lpc, float last_lpc[LPC_ORDER]) {
int i;
for (i=0;i<SUBFRAME_SIZE;i++) {
int j;
for (j=0;j<LPC_ORDER;j++) pcm[i] -= mem[j]*last_lpc[j];
OPUS_MOVE(&mem[1], &mem[0], LPC_ORDER-1);
mem[0] = pcm[i];
}
OPUS_COPY(last_lpc, lpc, LPC_ORDER);
}
static void fwgan_preemphasis(float *pcm, float *preemph_mem) {
int i;
for (i=0;i<SUBFRAME_SIZE;i++) {
float tmp = pcm[i];
pcm[i] -= FWGAN_DEEMPHASIS * *preemph_mem;
*preemph_mem = tmp;
}
}
static void fwgan_deemphasis(float *pcm, float *deemph_mem) {
int i;
for (i=0;i<SUBFRAME_SIZE;i++) {
pcm[i] += FWGAN_DEEMPHASIS * *deemph_mem;
*deemph_mem = pcm[i];
}
}
static void run_fwgan_subframe(FWGANState *st, float *pcm, const float *cond, double w0, const float *lpc, float c0)
{
float tmp1[FWC1_FC_0_OUT_SIZE];
float tmp2[IMAX(RNN_GRU_STATE_SIZE, FWC2_FC_0_OUT_SIZE)];
float feat_in[FEAT_IN_SIZE];
float rnn_in[FEAT_IN_CONV1_CONV_OUT_SIZE];
float pembed[FWGAN_FRAME_SIZE/2];
FWGAN *model;
model = &st->model;
pitch_embeddings(pembed, st->embed_phase, w0);
/* Interleave bfcc_cond and pembed for each subframe in feat_in. */
OPUS_COPY(&feat_in[BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4], &cond[0], BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4);
OPUS_COPY(&feat_in[0], &pembed[0], FWGAN_FRAME_SIZE/2);
compute_generic_conv1d(&model->feat_in_conv1_conv, rnn_in, st->cont_conv1_mem, feat_in, FEAT_IN_CONV1_CONV_IN_SIZE, ACTIVATION_LINEAR);
celt_assert(FEAT_IN_NL1_GATE_OUT_SIZE == model->feat_in_nl1_gate.nb_outputs);
compute_gated_activation(&model->feat_in_nl1_gate, rnn_in, rnn_in, ACTIVATION_TANH);
if (st->cont_initialized == 1) {
/* On the very first subframe we stop here. We only want to run the feat_in layer since the
others are initialized via the continuation network. */
OPUS_CLEAR(pcm, SUBFRAME_SIZE);
st->cont_initialized = 2;
apply_gain(pcm, c0, &st->last_gain);
OPUS_COPY(st->last_lpc, lpc, LPC_ORDER);
return;
}
compute_generic_gru(&model->rnn_gru_input, &model->rnn_gru_recurrent, st->rnn_state, rnn_in);
celt_assert(IMAX(RNN_GRU_STATE_SIZE, FWC2_FC_0_OUT_SIZE) >= model->rnn_nl_gate.nb_outputs);
compute_gated_activation(&model->rnn_nl_gate, tmp2, st->rnn_state, ACTIVATION_TANH);
compute_generic_conv1d(&model->fwc1_fc_0, tmp1, st->fwc1_state, tmp2, RNN_GRU_STATE_SIZE, ACTIVATION_LINEAR);
compute_gated_activation(&model->fwc1_fc_1_gate, tmp1, tmp1, ACTIVATION_TANH);
compute_generic_conv1d(&model->fwc2_fc_0, tmp2, st->fwc2_state, tmp1, FWC1_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
compute_gated_activation(&model->fwc2_fc_1_gate, tmp2, tmp2, ACTIVATION_TANH);
compute_generic_conv1d(&model->fwc3_fc_0, tmp1, st->fwc3_state, tmp2, FWC2_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
compute_gated_activation(&model->fwc3_fc_1_gate, tmp1, tmp1, ACTIVATION_TANH);
compute_generic_conv1d(&model->fwc4_fc_0, tmp2, st->fwc4_state, tmp1, FWC3_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
compute_gated_activation(&model->fwc4_fc_1_gate, tmp2, tmp2, ACTIVATION_TANH);
compute_generic_conv1d(&model->fwc5_fc_0, tmp1, st->fwc5_state, tmp2, FWC4_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
compute_gated_activation(&model->fwc5_fc_1_gate, tmp1, tmp1, ACTIVATION_TANH);
compute_generic_conv1d(&model->fwc6_fc_0, tmp2, st->fwc6_state, tmp1, FWC5_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
compute_gated_activation(&model->fwc6_fc_1_gate, tmp2, tmp2, ACTIVATION_TANH);
compute_generic_conv1d(&model->fwc7_fc_0, tmp1, st->fwc7_state, tmp2, FWC6_FC_0_OUT_SIZE, ACTIVATION_LINEAR);
compute_gated_activation(&model->fwc7_fc_1_gate, pcm, tmp1, ACTIVATION_TANH);
apply_gain(pcm, c0, &st->last_gain);
fwgan_preemphasis(pcm, &st->preemph_mem);
fwgan_lpc_syn(pcm, st->syn_mem, lpc, st->last_lpc);
fwgan_deemphasis(pcm, &st->deemph_mem);
}
void fwgan_init(FWGANState *st)
{
int ret;
OPUS_CLEAR(st, 1);
ret = init_fwgan(&st->model, fwgan_arrays);
celt_assert(ret == 0);
/* FIXME: perform arch detection. */
}
int fwgan_load_model(FWGANState *st, const unsigned char *data, int len) {
WeightArray *list;
int ret;
parse_weights(&list, data, len);
ret = init_fwgan(&st->model, list);
opus_free(list);
if (ret == 0) return 0;
else return -1;
}
static void fwgan_synthesize_impl(FWGANState *st, float *pcm, const float *lpc, const float *features)
{
int subframe;
float cond[BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE];
double w0;
int period;
float fwgan_features[NB_FEATURES-1];
celt_assert(st->cont_initialized);
OPUS_COPY(fwgan_features, features, NB_FEATURES-2);
fwgan_features[NB_FEATURES-2] = features[NB_FEATURES-1]+.5;
period = (int)floor(.1 + 50*features[NB_BANDS]+100);
w0 = 2*M_PI/period;
run_fwgan_upsampler(st, cond, fwgan_features);
for (subframe=0;subframe<NB_SUBFRAMES;subframe++) {
float *sub_cond;
sub_cond = &cond[subframe*BFCC_WITH_CORR_UPSAMPLER_FC_OUT_SIZE/4];
run_fwgan_subframe(st, &pcm[subframe*SUBFRAME_SIZE], sub_cond, w0, lpc, features[0]);
}
}
void fwgan_synthesize(FWGANState *st, float *pcm, const float *features)
{
float lpc[LPC_ORDER];
float new_pcm[FWGAN_FRAME_SIZE];
compute_wlpc(lpc, features);
fwgan_synthesize_impl(st, new_pcm, lpc, features);
/* Handle buffering. */
OPUS_COPY(pcm, st->pcm_buf, FWGAN_FRAME_SIZE-SUBFRAME_SIZE);
OPUS_COPY(&pcm[FWGAN_FRAME_SIZE-SUBFRAME_SIZE], new_pcm, SUBFRAME_SIZE);
OPUS_COPY(st->pcm_buf, &new_pcm[SUBFRAME_SIZE], FWGAN_FRAME_SIZE-SUBFRAME_SIZE);
}
void fwgan_synthesize_int(FWGANState *st, opus_int16 *pcm, const float *features)
{
int i;
float fpcm[FWGAN_FRAME_SIZE];
fwgan_synthesize(st, fpcm, features);
for (i=0;i<LPCNET_FRAME_SIZE;i++) pcm[i] = (int)floor(.5 + MIN32(32767, MAX32(-32767, 32768.f*fpcm[i])));
}

View File

@@ -0,0 +1,83 @@
/* Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef FWGAN_H
#define FWGAN_H
#include "freq.h"
#include "fwgan_data.h"
#define FWGAN_CONT_SAMPLES 320
#define NB_SUBFRAMES 4
#define SUBFRAME_SIZE 40
#define FWGAN_FRAME_SIZE (NB_SUBFRAMES*SUBFRAME_SIZE)
#define CONT_PCM_INPUTS 320
#define MAX_CONT_SIZE CONT_NET_0_OUT_SIZE
#define FWGAN_GAMMA 0.92f
#define FWGAN_DEEMPHASIS 0.85f
/* FIXME: Derive those from the model rather than hardcoding. */
#define FWC1_STATE_SIZE 512
#define FWC2_STATE_SIZE 512
#define FWC3_STATE_SIZE 256
#define FWC4_STATE_SIZE 256
#define FWC5_STATE_SIZE 128
#define FWC6_STATE_SIZE 128
#define FWC7_STATE_SIZE 80
typedef struct {
FWGAN model;
int arch;
int cont_initialized;
float embed_phase[2];
float last_gain;
float last_lpc[LPC_ORDER];
float syn_mem[LPC_ORDER];
float preemph_mem;
float deemph_mem;
float pcm_buf[FWGAN_FRAME_SIZE];
float cont[CONT_NET_10_OUT_SIZE];
float cont_conv1_mem[FEAT_IN_CONV1_CONV_STATE_SIZE];
float rnn_state[RNN_GRU_STATE_SIZE];
float fwc1_state[FWC1_STATE_SIZE];
float fwc2_state[FWC2_STATE_SIZE];
float fwc3_state[FWC3_STATE_SIZE];
float fwc4_state[FWC4_STATE_SIZE];
float fwc5_state[FWC5_STATE_SIZE];
float fwc6_state[FWC6_STATE_SIZE];
float fwc7_state[FWC7_STATE_SIZE];
} FWGANState;
void fwgan_init(FWGANState *st);
int fwgan_load_model(FWGANState *st, const unsigned char *data, int len);
void fwgan_cont(FWGANState *st, const float *pcm0, const float *features0);
void fwgan_synthesize(FWGANState *st, float *pcm, const float *features);
void fwgan_synthesize_int(FWGANState *st, opus_int16 *pcm, const float *features);
#endif /* FWGAN_H */

View File

@@ -0,0 +1,81 @@
/*Daala video codec
Copyright (c) 2012 Daala project contributors. All rights reserved.
Author: Timothy B. Terriberry
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "kiss99.h"
void kiss99_srand(kiss99_ctx *_this,const unsigned char *_data,int _ndata){
int i;
_this->z=362436069;
_this->w=521288629;
_this->jsr=123456789;
_this->jcong=380116160;
for(i=3;i<_ndata;i+=4){
_this->z^=_data[i-3];
_this->w^=_data[i-2];
_this->jsr^=_data[i-1];
_this->jcong^=_data[i];
kiss99_rand(_this);
}
if(i-3<_ndata)_this->z^=_data[i-3];
if(i-2<_ndata)_this->w^=_data[i-2];
if(i-1<_ndata)_this->jsr^=_data[i-1];
/*Fix any potential short cycles that show up.
These are not too likely, given the way we initialize the state, but they
are technically possible, so let us go ahead and eliminate that
possibility.
See Gregory G. Rose: "KISS: A Bit Too Simple", Cryptographic Communications
No. 10, pp. 123---137, 2018.*/
if(_this->z==0||_this->z==0x9068FFFF)_this->z++;
if(_this->w==0||_this->w==0x464FFFFF)_this->w++;
if(_this->jsr==0)_this->jsr++;
}
uint32_t kiss99_rand(kiss99_ctx *_this){
uint32_t znew;
uint32_t wnew;
uint32_t mwc;
uint32_t shr3;
uint32_t cong;
znew=36969*(_this->z&0xFFFF)+(_this->z>>16);
wnew=18000*(_this->w&0xFFFF)+(_this->w>>16);
mwc=(znew<<16)+wnew;
/*We swap the 13 and 17 from the original 1999 algorithm to produce a single
cycle of maximal length, matching KISS11.
We are not actually using KISS11 because of the impractically large (16 MB)
internal state of the full algorithm.*/
shr3=_this->jsr^(_this->jsr<<13);
shr3^=shr3>>17;
shr3^=shr3<<5;
cong=69069*_this->jcong+1234567;
_this->z=znew;
_this->w=wnew;
_this->jsr=shr3;
_this->jcong=cong;
return (mwc^cong)+shr3;
}

View File

@@ -0,0 +1,46 @@
/*Daala video codec
Copyright (c) 2012 Daala project contributors. All rights reserved.
Author: Timothy B. Terriberry
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.*/
#if !defined(_kiss99_H)
# define _kiss99_H (1)
# include <stdint.h>
/*KISS PRNG from George Marsaglia (1999 version).
See https://en.wikipedia.org/wiki/KISS_(algorithm) for details.
This is suitable for simulations, but not for use in crytographic contexts.*/
typedef struct kiss99_ctx kiss99_ctx;
struct kiss99_ctx{
uint32_t z;
uint32_t w;
uint32_t jsr;
uint32_t jcong;
};
void kiss99_srand(kiss99_ctx *_this,const unsigned char *_data,int _ndata);
uint32_t kiss99_rand(kiss99_ctx *_this);
#endif

View File

@@ -0,0 +1,192 @@
/* Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/* This packet loss simulator can be used independently of the Opus codebase.
To do that, you need to compile the following files:
dnn/lossgen.c
dnn/lossgen_data.c
with the following files needed as #include
dnn/lossgen_data.h
dnn/lossgen.h
dnn/nnet_arch.h
dnn/nnet.h
dnn/parse_lpcnet_weights.c (included despite being a C file)
dnn/vec_avx.h
dnn/vec.h
celt/os_support.h
celt/arch.h
celt/x86/x86_arch_macros.h
include/opus_defines.h
include/opus_types.h
Additionally, the code in dnn/lossgen_demo.c can be used to generate losses from
the command line.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "arch.h"
#include <math.h>
#include "lossgen.h"
#include "os_support.h"
#include "nnet.h"
#include "assert.h"
/* Disable RTCD for this. */
#define RTCD_ARCH c
/* Override assert to avoid undefined/redefined symbols. */
#undef celt_assert
#define celt_assert assert
/* Directly include the C files we need since the symbols won't be exposed if we link in a shared object. */
#include "parse_lpcnet_weights.c"
#include "nnet_arch.h"
#undef compute_linear
#undef compute_activation
/* Force the C version since the SIMD versions may be hidden. */
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_c(linear, out, in))
#define compute_activation(output, input, N, activation, arch) ((void)(arch),compute_activation_c(output, input, N, activation))
#define MAX_RNN_NEURONS_ALL IMAX(LOSSGEN_GRU1_STATE_SIZE, LOSSGEN_GRU2_STATE_SIZE)
/* These two functions are copied from nnet.c to make sure we don't have linking issues. */
void compute_generic_gru_lossgen(const LinearLayer *input_weights, const LinearLayer *recurrent_weights, float *state, const float *in, int arch)
{
int i;
int N;
float zrh[3*MAX_RNN_NEURONS_ALL];
float recur[3*MAX_RNN_NEURONS_ALL];
float *z;
float *r;
float *h;
celt_assert(3*recurrent_weights->nb_inputs == recurrent_weights->nb_outputs);
celt_assert(input_weights->nb_outputs == recurrent_weights->nb_outputs);
N = recurrent_weights->nb_inputs;
z = zrh;
r = &zrh[N];
h = &zrh[2*N];
celt_assert(recurrent_weights->nb_outputs <= 3*MAX_RNN_NEURONS_ALL);
celt_assert(in != state);
compute_linear(input_weights, zrh, in, arch);
compute_linear(recurrent_weights, recur, state, arch);
for (i=0;i<2*N;i++)
zrh[i] += recur[i];
compute_activation(zrh, zrh, 2*N, ACTIVATION_SIGMOID, arch);
for (i=0;i<N;i++)
h[i] += recur[2*N+i]*r[i];
compute_activation(h, h, N, ACTIVATION_TANH, arch);
for (i=0;i<N;i++)
h[i] = z[i]*state[i] + (1-z[i])*h[i];
for (i=0;i<N;i++)
state[i] = h[i];
}
void compute_generic_dense_lossgen(const LinearLayer *layer, float *output, const float *input, int activation, int arch)
{
compute_linear(layer, output, input, arch);
compute_activation(output, output, layer->nb_outputs, activation, arch);
}
static int sample_loss_impl(
LossGenState *st,
float percent_loss)
{
float input[2];
float tmp[LOSSGEN_DENSE_IN_OUT_SIZE];
float out;
int loss;
LossGen *model = &st->model;
input[0] = st->last_loss;
input[1] = percent_loss;
compute_generic_dense_lossgen(&model->lossgen_dense_in, tmp, input, ACTIVATION_TANH, 0);
compute_generic_gru_lossgen(&model->lossgen_gru1_input, &model->lossgen_gru1_recurrent, st->gru1_state, tmp, 0);
compute_generic_gru_lossgen(&model->lossgen_gru2_input, &model->lossgen_gru2_recurrent, st->gru2_state, st->gru1_state, 0);
compute_generic_dense_lossgen(&model->lossgen_dense_out, &out, st->gru2_state, ACTIVATION_SIGMOID, 0);
loss = (float)rand()/RAND_MAX < out;
st->last_loss = loss;
return loss;
}
int sample_loss(
LossGenState *st,
float percent_loss)
{
/* Due to GRU being initialized with zeros, the first packets aren't quite random,
so we skip them. */
if (!st->used) {
int i;
for (i=0;i<1000;i++) sample_loss_impl(st, percent_loss);
st->used = 1;
}
return sample_loss_impl(st, percent_loss);
}
void lossgen_init(LossGenState *st)
{
int ret;
OPUS_CLEAR(st, 1);
ret = init_lossgen(&st->model, lossgen_arrays);
celt_assert(ret == 0);
(void)ret;
}
int lossgen_load_model(LossGenState *st, const void *data, int len) {
WeightArray *list;
int ret;
parse_weights(&list, data, len);
ret = init_lossgen(&st->model, list);
opus_free(list);
if (ret == 0) return 0;
else return -1;
}
#if 0
#include <stdio.h>
int main(int argc, char **argv) {
int i, N;
float p;
LossGenState st;
if (argc!=3) {
fprintf(stderr, "usage: lossgen <percentage> <length>\n");
return 1;
}
lossgen_init(&st);
p = atof(argv[1]);
N = atoi(argv[2]);
for (i=0;i<N;i++) {
printf("%d\n", sample_loss(&st, p));
}
}
#endif

View File

@@ -0,0 +1,55 @@
/* Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef LOSSGEN_H
#define LOSSGEN_H
#include "lossgen_data.h"
#define PITCH_MIN_PERIOD 32
#define PITCH_MAX_PERIOD 256
#define NB_XCORR_FEATURES (PITCH_MAX_PERIOD-PITCH_MIN_PERIOD)
typedef struct {
LossGen model;
float gru1_state[LOSSGEN_GRU1_STATE_SIZE];
float gru2_state[LOSSGEN_GRU2_STATE_SIZE];
int last_loss;
int used;
} LossGenState;
void lossgen_init(LossGenState *st);
int lossgen_load_model(LossGenState *st, const void *data, int len);
int sample_loss(
LossGenState *st,
float percent_loss);
#endif

View File

@@ -0,0 +1,22 @@
#include <stdio.h>
#include <stdlib.h>
#include "lossgen.h"
int main(int argc, char **argv)
{
LossGenState st;
long num_packets;
long i;
float percent;
if (argc != 3) {
fprintf(stderr, "usage: %s <percent_loss> <nb packets>\n", argv[0]);
return 1;
}
lossgen_init(&st);
percent = atof(argv[1]);
num_packets = atol(argv[2]);
/*printf("loss: %f %d\n", percent, num_packets);*/
for (i=0;i<num_packets;i++) {
printf("%d\n", sample_loss(&st, percent*0.01f));
}
return 0;
}

View File

@@ -0,0 +1,283 @@
/* Copyright (c) 2018 Mozilla */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <math.h>
#include <stdio.h>
#include "nnet_data.h"
#include "nnet.h"
#include "common.h"
#include "arch.h"
#include "lpcnet.h"
#include "lpcnet_private.h"
#include "os_support.h"
#define PREEMPH 0.85f
#define PDF_FLOOR 0.002
#define FRAME_INPUT_SIZE (NB_FEATURES + EMBED_PITCH_OUT_SIZE)
#if 0
static void print_vector(float *x, int N)
{
int i;
for (i=0;i<N;i++) printf("%f ", x[i]);
printf("\n");
}
#endif
#ifdef END2END
void rc2lpc(float *lpc, const float *rc)
{
int i, j, k;
float tmp[LPC_ORDER];
float ntmp[LPC_ORDER] = {0.0};
OPUS_COPY(tmp, rc, LPC_ORDER);
for(i = 0; i < LPC_ORDER ; i++)
{
for(j = 0; j <= i-1; j++)
{
ntmp[j] = tmp[j] + tmp[i]*tmp[i - j - 1];
}
for(k = 0; k <= i-1; k++)
{
tmp[k] = ntmp[k];
}
}
for(i = 0; i < LPC_ORDER ; i++)
{
lpc[i] = tmp[i];
}
}
#endif
void run_frame_network(LPCNetState *lpcnet, float *gru_a_condition, float *gru_b_condition, float *lpc, const float *features)
{
NNetState *net;
float condition[FEATURE_DENSE2_OUT_SIZE];
float in[FRAME_INPUT_SIZE];
float conv1_out[FEATURE_CONV1_OUT_SIZE];
float conv2_out[FEATURE_CONV2_OUT_SIZE];
float dense1_out[FEATURE_DENSE1_OUT_SIZE];
int pitch;
float rc[LPC_ORDER];
/* Matches the Python code -- the 0.1 avoids rounding issues. */
pitch = (int)floor(.1 + 50*features[NB_BANDS]+100);
pitch = IMIN(255, IMAX(33, pitch));
net = &lpcnet->nnet;
OPUS_COPY(in, features, NB_FEATURES);
compute_embedding(&lpcnet->model.embed_pitch, &in[NB_FEATURES], pitch);
compute_conv1d(&lpcnet->model.feature_conv1, conv1_out, net->feature_conv1_state, in);
if (lpcnet->frame_count < FEATURE_CONV1_DELAY) OPUS_CLEAR(conv1_out, FEATURE_CONV1_OUT_SIZE);
compute_conv1d(&lpcnet->model.feature_conv2, conv2_out, net->feature_conv2_state, conv1_out);
if (lpcnet->frame_count < FEATURES_DELAY) OPUS_CLEAR(conv2_out, FEATURE_CONV2_OUT_SIZE);
_lpcnet_compute_dense(&lpcnet->model.feature_dense1, dense1_out, conv2_out);
_lpcnet_compute_dense(&lpcnet->model.feature_dense2, condition, dense1_out);
OPUS_COPY(rc, condition, LPC_ORDER);
_lpcnet_compute_dense(&lpcnet->model.gru_a_dense_feature, gru_a_condition, condition);
_lpcnet_compute_dense(&lpcnet->model.gru_b_dense_feature, gru_b_condition, condition);
#ifdef END2END
rc2lpc(lpc, rc);
#elif FEATURES_DELAY>0
memcpy(lpc, lpcnet->old_lpc[FEATURES_DELAY-1], LPC_ORDER*sizeof(lpc[0]));
memmove(lpcnet->old_lpc[1], lpcnet->old_lpc[0], (FEATURES_DELAY-1)*LPC_ORDER*sizeof(lpc[0]));
lpc_from_cepstrum(lpcnet->old_lpc[0], features);
#else
lpc_from_cepstrum(lpc, features);
#endif
#ifdef LPC_GAMMA
lpc_weighting(lpc, LPC_GAMMA);
#endif
if (lpcnet->frame_count < 1000) lpcnet->frame_count++;
}
void run_frame_network_deferred(LPCNetState *lpcnet, const float *features)
{
int max_buffer_size = lpcnet->model.feature_conv1.kernel_size + lpcnet->model.feature_conv2.kernel_size - 2;
celt_assert(max_buffer_size <= MAX_FEATURE_BUFFER_SIZE);
if (lpcnet->feature_buffer_fill == max_buffer_size) {
OPUS_MOVE(lpcnet->feature_buffer, &lpcnet->feature_buffer[NB_FEATURES], (max_buffer_size-1)*NB_FEATURES);
} else {
lpcnet->feature_buffer_fill++;
}
OPUS_COPY(&lpcnet->feature_buffer[(lpcnet->feature_buffer_fill-1)*NB_FEATURES], features, NB_FEATURES);
}
void run_frame_network_flush(LPCNetState *lpcnet)
{
int i;
for (i=0;i<lpcnet->feature_buffer_fill;i++) {
float lpc[LPC_ORDER];
float gru_a_condition[3*GRU_A_STATE_SIZE];
float gru_b_condition[3*GRU_B_STATE_SIZE];
run_frame_network(lpcnet, gru_a_condition, gru_b_condition, lpc, &lpcnet->feature_buffer[i*NB_FEATURES]);
}
lpcnet->feature_buffer_fill = 0;
}
int run_sample_network(LPCNetState *lpcnet, const float *gru_a_condition, const float *gru_b_condition, int last_exc, int last_sig, int pred, const float *sampling_logit_table, kiss99_ctx *rng)
{
NNetState *net;
float gru_a_input[3*GRU_A_STATE_SIZE];
float in_b[GRU_A_STATE_SIZE+FEATURE_DENSE2_OUT_SIZE];
float gru_b_input[3*GRU_B_STATE_SIZE];
net = &lpcnet->nnet;
#if 1
compute_gru_a_input(gru_a_input, gru_a_condition, GRU_A_STATE_SIZE, &lpcnet->model.gru_a_embed_sig, last_sig, &lpcnet->model.gru_a_embed_pred, pred, &lpcnet->model.gru_a_embed_exc, last_exc);
#else
OPUS_COPY(gru_a_input, gru_a_condition, 3*GRU_A_STATE_SIZE);
accum_embedding(&lpcnet->model.gru_a_embed_sig, gru_a_input, last_sig);
accum_embedding(&lpcnet->model.gru_a_embed_pred, gru_a_input, pred);
accum_embedding(&lpcnet->model.gru_a_embed_exc, gru_a_input, last_exc);
#endif
/*compute_gru3(&gru_a, net->gru_a_state, gru_a_input);*/
compute_sparse_gru(&lpcnet->model.sparse_gru_a, net->gru_a_state, gru_a_input);
OPUS_COPY(in_b, net->gru_a_state, GRU_A_STATE_SIZE);
OPUS_COPY(gru_b_input, gru_b_condition, 3*GRU_B_STATE_SIZE);
compute_gruB(&lpcnet->model.gru_b, gru_b_input, net->gru_b_state, in_b);
return sample_mdense(&lpcnet->model.dual_fc, net->gru_b_state, sampling_logit_table, rng);
}
int lpcnet_get_size()
{
return sizeof(LPCNetState);
}
void lpcnet_reset(LPCNetState *lpcnet)
{
const char* rng_string="LPCNet";
OPUS_CLEAR((char*)&lpcnet->LPCNET_RESET_START,
sizeof(LPCNetState)-
((char*)&lpcnet->LPCNET_RESET_START - (char*)lpcnet));
lpcnet->last_exc = lin2ulaw(0.f);
kiss99_srand(&lpcnet->rng, (const unsigned char *)rng_string, strlen(rng_string));
}
int lpcnet_init(LPCNetState *lpcnet)
{
int i;
int ret;
for (i=0;i<256;i++) {
float prob = .025f+.95f*i/255.f;
lpcnet->sampling_logit_table[i] = -log((1-prob)/prob);
}
#ifndef USE_WEIGHTS_FILE
ret = init_lpcnet_model(&lpcnet->model, lpcnet_arrays);
#else
ret = 0;
#endif
lpcnet_reset(lpcnet);
celt_assert(ret == 0);
return ret;
}
int lpcnet_load_model(LPCNetState *st, const unsigned char *data, int len) {
WeightArray *list;
int ret;
parse_weights(&list, data, len);
ret = init_lpcnet_model(&st->model, list);
opus_free(list);
if (ret == 0) return 0;
else return -1;
}
LPCNetState *lpcnet_create()
{
LPCNetState *lpcnet;
lpcnet = (LPCNetState *)opus_alloc(lpcnet_get_size(), 1);
OPUS_CLEAR(lpcnet, 1);
lpcnet_init(lpcnet);
return lpcnet;
}
void lpcnet_destroy(LPCNetState *lpcnet)
{
opus_free(lpcnet);
}
void lpcnet_reset_signal(LPCNetState *lpcnet)
{
lpcnet->deemph_mem = 0;
lpcnet->last_exc = lin2ulaw(0.f);
OPUS_CLEAR(lpcnet->last_sig, LPC_ORDER);
OPUS_CLEAR(lpcnet->nnet.gru_a_state, GRU_A_STATE_SIZE);
OPUS_CLEAR(lpcnet->nnet.gru_b_state, GRU_B_STATE_SIZE);
}
void lpcnet_synthesize_tail_impl(LPCNetState *lpcnet, opus_int16 *output, int N, int preload)
{
int i;
if (lpcnet->frame_count <= FEATURES_DELAY)
{
OPUS_CLEAR(output, N);
return;
}
for (i=0;i<N;i++)
{
int j;
float pcm;
int exc;
int last_sig_ulaw;
int pred_ulaw;
float pred = 0;
for (j=0;j<LPC_ORDER;j++) pred -= lpcnet->last_sig[j]*lpcnet->lpc[j];
last_sig_ulaw = lin2ulaw(lpcnet->last_sig[0]);
pred_ulaw = lin2ulaw(pred);
exc = run_sample_network(lpcnet, lpcnet->gru_a_condition, lpcnet->gru_b_condition, lpcnet->last_exc, last_sig_ulaw, pred_ulaw, lpcnet->sampling_logit_table, &lpcnet->rng);
if (i < preload) {
exc = lin2ulaw(output[i]-PREEMPH*lpcnet->deemph_mem - pred);
pcm = output[i]-PREEMPH*lpcnet->deemph_mem;
} else {
pcm = pred + ulaw2lin(exc);
}
OPUS_MOVE(&lpcnet->last_sig[1], &lpcnet->last_sig[0], LPC_ORDER-1);
lpcnet->last_sig[0] = pcm;
lpcnet->last_exc = exc;
pcm += PREEMPH*lpcnet->deemph_mem;
lpcnet->deemph_mem = pcm;
if (pcm<-32767) pcm = -32767;
if (pcm>32767) pcm = 32767;
if (i >= preload) output[i] = (int)floor(.5 + pcm);
}
}
void lpcnet_synthesize_impl(LPCNetState *lpcnet, const float *features, opus_int16 *output, int N, int preload)
{
run_frame_network(lpcnet, lpcnet->gru_a_condition, lpcnet->gru_b_condition, lpcnet->lpc, features);
lpcnet_synthesize_tail_impl(lpcnet, output, N, preload);
}
void lpcnet_synthesize(LPCNetState *lpcnet, const float *features, opus_int16 *output, int N) {
lpcnet_synthesize_impl(lpcnet, features, output, N, 0);
}

View File

@@ -0,0 +1,183 @@
/* Copyright (c) 2018 Mozilla */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef LPCNET_H_
#define LPCNET_H_
#include "opus_types.h"
#define NB_FEATURES 20
#define NB_TOTAL_FEATURES 36
/** Number of audio samples in a feature frame (not for encoding/decoding). */
#define LPCNET_FRAME_SIZE (160)
typedef struct LPCNetState LPCNetState;
typedef struct LPCNetDecState LPCNetDecState;
typedef struct LPCNetEncState LPCNetEncState;
typedef struct LPCNetPLCState LPCNetPLCState;
/** Gets the size of an <code>LPCNetDecState</code> structure.
* @returns The size in bytes.
*/
int lpcnet_decoder_get_size(void);
/** Initializes a previously allocated decoder state
* The memory pointed to by st must be at least the size returned by lpcnet_decoder_get_size().
* This is intended for applications which use their own allocator instead of malloc.
* @see lpcnet_decoder_create(),lpcnet_decoder_get_size()
* @param [in] st <tt>LPCNetDecState*</tt>: Decoder state
* @retval 0 Success
*/
int lpcnet_decoder_init(LPCNetDecState *st);
void lpcnet_reset(LPCNetState *lpcnet);
/** Allocates and initializes a decoder state.
* @returns The newly created state
*/
LPCNetDecState *lpcnet_decoder_create(void);
/** Frees an <code>LPCNetDecState</code> allocated by lpcnet_decoder_create().
* @param[in] st <tt>LPCNetDecState*</tt>: State to be freed.
*/
void lpcnet_decoder_destroy(LPCNetDecState *st);
/** Decodes a packet of LPCNET_COMPRESSED_SIZE bytes (currently 8) into LPCNET_PACKET_SAMPLES samples (currently 640).
* @param [in] st <tt>LPCNetDecState*</tt>: Decoder state
* @param [in] buf <tt>const unsigned char *</tt>: Compressed packet
* @param [out] pcm <tt>opus_int16 *</tt>: Decoded audio
* @retval 0 Success
*/
int lpcnet_decode(LPCNetDecState *st, const unsigned char *buf, opus_int16 *pcm);
/** Gets the size of an <code>LPCNetEncState</code> structure.
* @returns The size in bytes.
*/
int lpcnet_encoder_get_size(void);
/** Initializes a previously allocated encoder state
* The memory pointed to by st must be at least the size returned by lpcnet_encoder_get_size().
* This is intended for applications which use their own allocator instead of malloc.
* @see lpcnet_encoder_create(),lpcnet_encoder_get_size()
* @param [in] st <tt>LPCNetEncState*</tt>: Encoder state
* @retval 0 Success
*/
int lpcnet_encoder_init(LPCNetEncState *st);
int lpcnet_encoder_load_model(LPCNetEncState *st, const void *data, int len);
/** Allocates and initializes an encoder state.
* @returns The newly created state
*/
LPCNetEncState *lpcnet_encoder_create(void);
/** Frees an <code>LPCNetEncState</code> allocated by lpcnet_encoder_create().
* @param[in] st <tt>LPCNetEncState*</tt>: State to be freed.
*/
void lpcnet_encoder_destroy(LPCNetEncState *st);
/** Encodes LPCNET_PACKET_SAMPLES speech samples (currently 640) into a packet of LPCNET_COMPRESSED_SIZE bytes (currently 8).
* @param [in] st <tt>LPCNetDecState*</tt>: Encoder state
* @param [in] pcm <tt>opus_int16 *</tt>: Input speech to be encoded
* @param [out] buf <tt>const unsigned char *</tt>: Compressed packet
* @retval 0 Success
*/
int lpcnet_encode(LPCNetEncState *st, const opus_int16 *pcm, unsigned char *buf);
/** Compute features on LPCNET_FRAME_SIZE speech samples (currently 160) and output features for one 10-ms frame.
* @param [in] st <tt>LPCNetDecState*</tt>: Encoder state
* @param [in] pcm <tt>opus_int16 *</tt>: Input speech to be analyzed
* @param [out] features <tt>float[NB_TOTAL_FEATURES]</tt>: Four feature vectors
* @retval 0 Success
*/
int lpcnet_compute_single_frame_features(LPCNetEncState *st, const opus_int16 *pcm, float features[NB_TOTAL_FEATURES], int arch);
/** Compute features on LPCNET_FRAME_SIZE speech samples (currently 160) and output features for one 10-ms frame.
* @param [in] st <tt>LPCNetDecState*</tt>: Encoder state
* @param [in] pcm <tt>float *</tt>: Input speech to be analyzed
* @param [out] features <tt>float[NB_TOTAL_FEATURES]</tt>: Four feature vectors
* @retval 0 Success
*/
int lpcnet_compute_single_frame_features_float(LPCNetEncState *st, const float *pcm, float features[NB_TOTAL_FEATURES], int arch);
/** Gets the size of an <code>LPCNetState</code> structure.
* @returns The size in bytes.
*/
int lpcnet_get_size(void);
/** Initializes a previously allocated synthesis state
* The memory pointed to by st must be at least the size returned by lpcnet_get_size().
* This is intended for applications which use their own allocator instead of malloc.
* @see lpcnet_create(),lpcnet_get_size()
* @param [in] st <tt>LPCNetState*</tt>: Synthesis state
* @retval 0 Success
*/
int lpcnet_init(LPCNetState *st);
/** Allocates and initializes a synthesis state.
* @returns The newly created state
*/
LPCNetState *lpcnet_create(void);
/** Frees an <code>LPCNetState</code> allocated by lpcnet_create().
* @param[in] st <tt>LPCNetState*</tt>: State to be freed.
*/
void lpcnet_destroy(LPCNetState *st);
/** Synthesizes speech from an LPCNet feature vector.
* @param [in] st <tt>LPCNetState*</tt>: Synthesis state
* @param [in] features <tt>const float *</tt>: Compressed packet
* @param [out] output <tt>opus_int16 **</tt>: Synthesized speech
* @param [in] N <tt>int</tt>: Number of samples to generate
* @retval 0 Success
*/
void lpcnet_synthesize(LPCNetState *st, const float *features, opus_int16 *output, int N);
int lpcnet_plc_init(LPCNetPLCState *st);
void lpcnet_plc_reset(LPCNetPLCState *st);
int lpcnet_plc_update(LPCNetPLCState *st, opus_int16 *pcm);
int lpcnet_plc_conceal(LPCNetPLCState *st, opus_int16 *pcm);
void lpcnet_plc_fec_add(LPCNetPLCState *st, const float *features);
void lpcnet_plc_fec_clear(LPCNetPLCState *st);
int lpcnet_load_model(LPCNetState *st, const void *data, int len);
int lpcnet_plc_load_model(LPCNetPLCState *st, const void *data, int len);
#endif

View File

@@ -0,0 +1,230 @@
/* Copyright (c) 2017-2019 Mozilla */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "kiss_fft.h"
#include "common.h"
#include <math.h>
#include "freq.h"
#include "pitch.h"
#include "arch.h"
#include <assert.h>
#include "lpcnet_private.h"
#include "lpcnet.h"
#include "os_support.h"
#include "_kiss_fft_guts.h"
#include "celt_lpc.h"
#include "mathops.h"
int lpcnet_encoder_get_size(void) {
return sizeof(LPCNetEncState);
}
int lpcnet_encoder_init(LPCNetEncState *st) {
memset(st, 0, sizeof(*st));
pitchdnn_init(&st->pitchdnn);
return 0;
}
int lpcnet_encoder_load_model(LPCNetEncState *st, const void *data, int len) {
return pitchdnn_load_model(&st->pitchdnn, data, len);
}
LPCNetEncState *lpcnet_encoder_create(void) {
LPCNetEncState *st;
st = opus_alloc(lpcnet_encoder_get_size());
lpcnet_encoder_init(st);
return st;
}
void lpcnet_encoder_destroy(LPCNetEncState *st) {
opus_free(st);
}
static void frame_analysis(LPCNetEncState *st, kiss_fft_cpx *X, float *Ex, const float *in) {
float x[WINDOW_SIZE];
OPUS_COPY(x, st->analysis_mem, OVERLAP_SIZE);
OPUS_COPY(&x[OVERLAP_SIZE], in, FRAME_SIZE);
OPUS_COPY(st->analysis_mem, &in[FRAME_SIZE-OVERLAP_SIZE], OVERLAP_SIZE);
apply_window(x);
forward_transform(X, x);
lpcn_compute_band_energy(Ex, X);
}
static void biquad(float *y, float mem[2], const float *x, const float *b, const float *a, int N) {
int i;
float mem0, mem1;
mem0 = mem[0];
mem1 = mem[1];
for (i=0;i<N;i++) {
float xi, yi, mem00;
xi = x[i];
yi = x[i] + mem0;
mem00 = mem0;
/* Original code:
mem0 = mem1 + (b[0]*xi - a[0]*yi);
mem1 = (b[1]*xi - a[1]*yi);
Modified to reduce dependency chains: (the +1e-30f forces the ordering and has no effect on the output)
*/
mem0 = (b[0]-a[0])*xi + mem1 - a[0]*mem0;
mem1 = (b[1]-a[1])*xi + 1e-30f - a[1]*mem00;
y[i] = yi;
}
mem[0] = mem0;
mem[1] = mem1;
}
#define celt_log10(x) (0.3010299957f*celt_log2(x))
void compute_frame_features(LPCNetEncState *st, const float *in, int arch) {
float aligned_in[FRAME_SIZE];
int i;
float Ly[NB_BANDS];
float follow, logMax;
kiss_fft_cpx X[FREQ_SIZE];
float Ex[NB_BANDS];
float xcorr[PITCH_MAX_PERIOD];
float ener0;
float ener;
float x[FRAME_SIZE+LPC_ORDER];
float frame_corr;
float xy, xx, yy;
int pitch;
float ener_norm[PITCH_MAX_PERIOD - PITCH_MIN_PERIOD];
/* [b,a]=ellip(2, 2, 20, 1200/8000); */
static const float lp_b[2] = {-0.84946f, 1.f};
static const float lp_a[2] = {-1.54220f, 0.70781f};
OPUS_COPY(aligned_in, &st->analysis_mem[OVERLAP_SIZE-TRAINING_OFFSET], TRAINING_OFFSET);
frame_analysis(st, X, Ex, in);
st->if_features[0] = MAX16(-1.f, MIN16(1.f, (1.f/64)*(10.f*celt_log10(1e-15f + X[0].r*X[0].r)-6.f)));
for (i=1;i<PITCH_IF_MAX_FREQ;i++) {
kiss_fft_cpx prod;
float norm_1;
C_MULC(prod, X[i], st->prev_if[i]);
norm_1 = 1.f/sqrt(1e-15f + prod.r*prod.r + prod.i*prod.i);
C_MULBYSCALAR(prod, norm_1);
st->if_features[3*i-2] = prod.r;
st->if_features[3*i-1] = prod.i;
st->if_features[3*i] = MAX16(-1.f, MIN16(1.f, (1.f/64)*(10.f*celt_log10(1e-15f + X[i].r*X[i].r + X[i].i*X[i].i)-6.f)));
}
OPUS_COPY(st->prev_if, X, PITCH_IF_MAX_FREQ);
/*for (i=0;i<88;i++) printf("%f ", st->if_features[i]);printf("\n");*/
logMax = -2;
follow = -2;
for (i=0;i<NB_BANDS;i++) {
Ly[i] = celt_log10(1e-2f+Ex[i]);
Ly[i] = MAX16(logMax-8, MAX16(follow-2.5f, Ly[i]));
logMax = MAX16(logMax, Ly[i]);
follow = MAX16(follow-2.5f, Ly[i]);
}
dct(st->features, Ly);
st->features[0] -= 4;
lpc_from_cepstrum(st->lpc, st->features);
for (i=0;i<LPC_ORDER;i++) st->features[NB_BANDS+2+i] = st->lpc[i];
OPUS_MOVE(st->exc_buf, &st->exc_buf[FRAME_SIZE], PITCH_MAX_PERIOD);
OPUS_MOVE(st->lp_buf, &st->lp_buf[FRAME_SIZE], PITCH_MAX_PERIOD);
OPUS_COPY(&aligned_in[TRAINING_OFFSET], in, FRAME_SIZE-TRAINING_OFFSET);
OPUS_COPY(&x[0], st->pitch_mem, LPC_ORDER);
OPUS_COPY(&x[LPC_ORDER], aligned_in, FRAME_SIZE);
OPUS_COPY(st->pitch_mem, &aligned_in[FRAME_SIZE-LPC_ORDER], LPC_ORDER);
celt_fir(&x[LPC_ORDER], st->lpc, &st->lp_buf[PITCH_MAX_PERIOD], FRAME_SIZE, LPC_ORDER, arch);
for (i=0;i<FRAME_SIZE;i++) {
st->exc_buf[PITCH_MAX_PERIOD+i] = st->lp_buf[PITCH_MAX_PERIOD+i] + .7f*st->pitch_filt;
st->pitch_filt = st->lp_buf[PITCH_MAX_PERIOD+i];
/*printf("%f\n", st->exc_buf[PITCH_MAX_PERIOD+i]);*/
}
biquad(&st->lp_buf[PITCH_MAX_PERIOD], st->lp_mem, &st->lp_buf[PITCH_MAX_PERIOD], lp_b, lp_a, FRAME_SIZE);
{
double ener1;
float *buf = st->exc_buf;
celt_pitch_xcorr(&buf[PITCH_MAX_PERIOD], buf, xcorr, FRAME_SIZE, PITCH_MAX_PERIOD-PITCH_MIN_PERIOD, arch);
ener0 = celt_inner_prod(&buf[PITCH_MAX_PERIOD], &buf[PITCH_MAX_PERIOD], FRAME_SIZE, arch);
ener1 = celt_inner_prod(&buf[0], &buf[0], FRAME_SIZE, arch);
/*printf("%f\n", st->frame_weight[sub]);*/
for (i=0;i<PITCH_MAX_PERIOD-PITCH_MIN_PERIOD;i++) {
ener = 1 + ener0 + ener1;
st->xcorr_features[i] = 2*xcorr[i];
ener_norm[i] = ener;
ener1 += buf[i+FRAME_SIZE]*(double)buf[i+FRAME_SIZE] - buf[i]*(double)buf[i];
/*printf("%f ", st->xcorr_features[i]);*/
}
/* Split in a separate loop so the compiler can vectorize it */
for (i=0;i<PITCH_MAX_PERIOD-PITCH_MIN_PERIOD;i++) {
st->xcorr_features[i] /= ener_norm[i];
}
/*printf("\n");*/
}
st->dnn_pitch = compute_pitchdnn(&st->pitchdnn, st->if_features, st->xcorr_features, arch);
pitch = (int)floor(.5+256./pow(2.f,((1./60.)*((st->dnn_pitch+1.5)*60))));
xx = celt_inner_prod(&st->lp_buf[PITCH_MAX_PERIOD], &st->lp_buf[PITCH_MAX_PERIOD], FRAME_SIZE, arch);
yy = celt_inner_prod(&st->lp_buf[PITCH_MAX_PERIOD-pitch], &st->lp_buf[PITCH_MAX_PERIOD-pitch], FRAME_SIZE, arch);
xy = celt_inner_prod(&st->lp_buf[PITCH_MAX_PERIOD], &st->lp_buf[PITCH_MAX_PERIOD-pitch], FRAME_SIZE, arch);
/*printf("%f %f\n", frame_corr, xy/sqrt(1e-15+xx*yy));*/
frame_corr = xy/sqrt(1+xx*yy);
frame_corr = log(1.f+exp(5.f*frame_corr))/log(1+exp(5.f));
st->features[NB_BANDS] = st->dnn_pitch;
st->features[NB_BANDS + 1] = frame_corr-.5f;
}
void preemphasis(float *y, float *mem, const float *x, float coef, int N) {
int i;
for (i=0;i<N;i++) {
float yi;
yi = x[i] + *mem;
*mem = -coef*x[i];
y[i] = yi;
}
}
static int lpcnet_compute_single_frame_features_impl(LPCNetEncState *st, float *x, float features[NB_TOTAL_FEATURES], int arch) {
preemphasis(x, &st->mem_preemph, x, PREEMPHASIS, FRAME_SIZE);
compute_frame_features(st, x, arch);
OPUS_COPY(features, &st->features[0], NB_TOTAL_FEATURES);
return 0;
}
int lpcnet_compute_single_frame_features(LPCNetEncState *st, const opus_int16 *pcm, float features[NB_TOTAL_FEATURES], int arch) {
int i;
float x[FRAME_SIZE];
for (i=0;i<FRAME_SIZE;i++) x[i] = pcm[i];
lpcnet_compute_single_frame_features_impl(st, x, features, arch);
return 0;
}
int lpcnet_compute_single_frame_features_float(LPCNetEncState *st, const float *pcm, float features[NB_TOTAL_FEATURES], int arch) {
int i;
float x[FRAME_SIZE];
for (i=0;i<FRAME_SIZE;i++) x[i] = pcm[i];
lpcnet_compute_single_frame_features_impl(st, x, features, arch);
return 0;
}

View File

@@ -0,0 +1,211 @@
/* Copyright (c) 2021 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "lpcnet_private.h"
#include "lpcnet.h"
#include "plc_data.h"
#include "os_support.h"
#include "common.h"
#include "cpu_support.h"
#ifndef M_PI
#define M_PI 3.141592653
#endif
/* Comment this out to have LPCNet update its state on every good packet (slow). */
#define PLC_SKIP_UPDATES
void lpcnet_plc_reset(LPCNetPLCState *st) {
OPUS_CLEAR((char*)&st->LPCNET_PLC_RESET_START,
sizeof(LPCNetPLCState)-
((char*)&st->LPCNET_PLC_RESET_START - (char*)st));
lpcnet_encoder_init(&st->enc);
OPUS_CLEAR(st->pcm, PLC_BUF_SIZE);
st->blend = 0;
st->loss_count = 0;
st->analysis_gap = 1;
st->analysis_pos = PLC_BUF_SIZE;
st->predict_pos = PLC_BUF_SIZE;
}
int lpcnet_plc_init(LPCNetPLCState *st) {
int ret;
st->arch = opus_select_arch();
fargan_init(&st->fargan);
lpcnet_encoder_init(&st->enc);
st->loaded = 0;
#ifndef USE_WEIGHTS_FILE
ret = init_plcmodel(&st->model, plcmodel_arrays);
if (ret == 0) st->loaded = 1;
#else
ret = 0;
#endif
celt_assert(ret == 0);
lpcnet_plc_reset(st);
return ret;
}
int lpcnet_plc_load_model(LPCNetPLCState *st, const void *data, int len) {
WeightArray *list;
int ret;
parse_weights(&list, data, len);
ret = init_plcmodel(&st->model, list);
opus_free(list);
if (ret == 0) {
ret = lpcnet_encoder_load_model(&st->enc, data, len);
}
if (ret == 0) {
ret = fargan_load_model(&st->fargan, data, len);
}
if (ret == 0) st->loaded = 1;
return ret;
}
void lpcnet_plc_fec_add(LPCNetPLCState *st, const float *features) {
if (features == NULL) {
st->fec_skip++;
return;
}
if (st->fec_fill_pos == PLC_MAX_FEC) {
OPUS_MOVE(&st->fec[0][0], &st->fec[st->fec_read_pos][0], (st->fec_fill_pos-st->fec_read_pos)*NB_FEATURES);
st->fec_fill_pos = st->fec_fill_pos-st->fec_read_pos;
st->fec_read_pos -= st->fec_read_pos;
}
OPUS_COPY(&st->fec[st->fec_fill_pos][0], features, NB_FEATURES);
st->fec_fill_pos++;
}
void lpcnet_plc_fec_clear(LPCNetPLCState *st) {
st->fec_read_pos = st->fec_fill_pos = st->fec_skip = 0;
}
static void compute_plc_pred(LPCNetPLCState *st, float *out, const float *in) {
float tmp[PLC_DENSE_IN_OUT_SIZE];
PLCModel *model = &st->model;
PLCNetState *net = &st->plc_net;
celt_assert(st->loaded);
compute_generic_dense(&model->plc_dense_in, tmp, in, ACTIVATION_TANH, st->arch);
compute_generic_gru(&model->plc_gru1_input, &model->plc_gru1_recurrent, net->gru1_state, tmp, st->arch);
compute_generic_gru(&model->plc_gru2_input, &model->plc_gru2_recurrent, net->gru2_state, net->gru1_state, st->arch);
compute_generic_dense(&model->plc_dense_out, out, net->gru2_state, ACTIVATION_LINEAR, st->arch);
}
static int get_fec_or_pred(LPCNetPLCState *st, float *out) {
if (st->fec_read_pos != st->fec_fill_pos && st->fec_skip==0) {
float plc_features[2*NB_BANDS+NB_FEATURES+1] = {0};
float discard[NB_FEATURES];
OPUS_COPY(out, &st->fec[st->fec_read_pos][0], NB_FEATURES);
st->fec_read_pos++;
/* Update PLC state using FEC, so without Burg features. */
OPUS_COPY(&plc_features[2*NB_BANDS], out, NB_FEATURES);
plc_features[2*NB_BANDS+NB_FEATURES] = -1;
compute_plc_pred(st, discard, plc_features);
return 1;
} else {
float zeros[2*NB_BANDS+NB_FEATURES+1] = {0};
compute_plc_pred(st, out, zeros);
if (st->fec_skip > 0) st->fec_skip--;
return 0;
}
}
static void queue_features(LPCNetPLCState *st, const float *features) {
OPUS_MOVE(&st->cont_features[0], &st->cont_features[NB_FEATURES], (CONT_VECTORS-1)*NB_FEATURES);
OPUS_COPY(&st->cont_features[(CONT_VECTORS-1)*NB_FEATURES], features, NB_FEATURES);
}
/* In this causal version of the code, the DNN model implemented by compute_plc_pred()
needs to generate two feature vectors to conceal the first lost packet.*/
int lpcnet_plc_update(LPCNetPLCState *st, opus_int16 *pcm) {
int i;
if (st->analysis_pos - FRAME_SIZE >= 0) st->analysis_pos -= FRAME_SIZE;
else st->analysis_gap = 1;
if (st->predict_pos - FRAME_SIZE >= 0) st->predict_pos -= FRAME_SIZE;
OPUS_MOVE(st->pcm, &st->pcm[FRAME_SIZE], PLC_BUF_SIZE-FRAME_SIZE);
for (i=0;i<FRAME_SIZE;i++) st->pcm[PLC_BUF_SIZE-FRAME_SIZE+i] = (1.f/32768.f)*pcm[i];
st->loss_count = 0;
st->blend = 0;
return 0;
}
static const float att_table[10] = {0, 0, -.2, -.2, -.4, -.4, -.8, -.8, -1.6, -1.6};
int lpcnet_plc_conceal(LPCNetPLCState *st, opus_int16 *pcm) {
int i;
celt_assert(st->loaded);
if (st->blend == 0) {
int count = 0;
st->plc_net = st->plc_bak[0];
while (st->analysis_pos + FRAME_SIZE <= PLC_BUF_SIZE) {
float x[FRAME_SIZE];
float plc_features[2*NB_BANDS+NB_FEATURES+1];
celt_assert(st->analysis_pos >= 0);
for (i=0;i<FRAME_SIZE;i++) x[i] = 32768.f*st->pcm[st->analysis_pos+i];
burg_cepstral_analysis(plc_features, x);
lpcnet_compute_single_frame_features_float(&st->enc, x, st->features, st->arch);
if ((!st->analysis_gap || count>0) && st->analysis_pos >= st->predict_pos) {
queue_features(st, st->features);
OPUS_COPY(&plc_features[2*NB_BANDS], st->features, NB_FEATURES);
plc_features[2*NB_BANDS+NB_FEATURES] = 1;
st->plc_bak[0] = st->plc_bak[1];
st->plc_bak[1] = st->plc_net;
compute_plc_pred(st, st->features, plc_features);
}
st->analysis_pos += FRAME_SIZE;
count++;
}
st->plc_bak[0] = st->plc_bak[1];
st->plc_bak[1] = st->plc_net;
get_fec_or_pred(st, st->features);
queue_features(st, st->features);
st->plc_bak[0] = st->plc_bak[1];
st->plc_bak[1] = st->plc_net;
get_fec_or_pred(st, st->features);
queue_features(st, st->features);
fargan_cont(&st->fargan, &st->pcm[PLC_BUF_SIZE-FARGAN_CONT_SAMPLES], st->cont_features);
st->analysis_gap = 0;
}
st->plc_bak[0] = st->plc_bak[1];
st->plc_bak[1] = st->plc_net;
if (get_fec_or_pred(st, st->features)) st->loss_count = 0;
else st->loss_count++;
if (st->loss_count >= 10) st->features[0] = MAX16(-10, st->features[0]+att_table[9] - 2*(st->loss_count-9));
else st->features[0] = MAX16(-10, st->features[0]+att_table[st->loss_count]);
fargan_synthesize_int(&st->fargan, pcm, &st->features[0]);
queue_features(st, st->features);
if (st->analysis_pos - FRAME_SIZE >= 0) st->analysis_pos -= FRAME_SIZE;
else st->analysis_gap = 1;
st->predict_pos = PLC_BUF_SIZE;
OPUS_MOVE(st->pcm, &st->pcm[FRAME_SIZE], PLC_BUF_SIZE-FRAME_SIZE);
for (i=0;i<FRAME_SIZE;i++) st->pcm[PLC_BUF_SIZE-FRAME_SIZE+i] = (1.f/32768.f)*pcm[i];
st->blend = 1;
return 0;
}

View File

@@ -0,0 +1,90 @@
#ifndef LPCNET_PRIVATE_H
#define LPCNET_PRIVATE_H
#include <stdio.h>
#include "freq.h"
#include "lpcnet.h"
#include "plc_data.h"
#include "pitchdnn.h"
#include "fargan.h"
#define PITCH_FRAME_SIZE 320
#define PITCH_BUF_SIZE (PITCH_MAX_PERIOD+PITCH_FRAME_SIZE)
#define PLC_MAX_FEC 100
#define MAX_FEATURE_BUFFER_SIZE 4
#define PITCH_IF_MAX_FREQ 30
#define PITCH_IF_FEATURES (3*PITCH_IF_MAX_FREQ - 2)
#define CONT_VECTORS 5
#define FEATURES_DELAY 1
struct LPCNetEncState{
PitchDNNState pitchdnn;
float analysis_mem[OVERLAP_SIZE];
float mem_preemph;
kiss_fft_cpx prev_if[PITCH_IF_MAX_FREQ];
float if_features[PITCH_IF_FEATURES];
float xcorr_features[PITCH_MAX_PERIOD - PITCH_MIN_PERIOD];
float dnn_pitch;
float pitch_mem[LPC_ORDER];
float pitch_filt;
float exc_buf[PITCH_BUF_SIZE];
float lp_buf[PITCH_BUF_SIZE];
float lp_mem[4];
float lpc[LPC_ORDER];
float features[NB_TOTAL_FEATURES];
float sig_mem[LPC_ORDER];
float burg_cepstrum[2*NB_BANDS];
};
typedef struct {
float gru1_state[PLC_GRU1_STATE_SIZE];
float gru2_state[PLC_GRU2_STATE_SIZE];
} PLCNetState;
#define PLC_BUF_SIZE ((CONT_VECTORS+10)*FRAME_SIZE)
struct LPCNetPLCState {
PLCModel model;
FARGANState fargan;
LPCNetEncState enc;
int loaded;
int arch;
#define LPCNET_PLC_RESET_START fec
float fec[PLC_MAX_FEC][NB_FEATURES];
int analysis_gap;
int fec_read_pos;
int fec_fill_pos;
int fec_skip;
int analysis_pos;
int predict_pos;
float pcm[PLC_BUF_SIZE];
int blend;
float features[NB_TOTAL_FEATURES];
float cont_features[CONT_VECTORS*NB_FEATURES];
int loss_count;
PLCNetState plc_net;
PLCNetState plc_bak[2];
};
void preemphasis(float *y, float *mem, const float *x, float coef, int N);
void compute_frame_features(LPCNetEncState *st, const float *in, int arch);
void lpcnet_reset_signal(LPCNetState *lpcnet);
void run_frame_network(LPCNetState *lpcnet, float *gru_a_condition, float *gru_b_condition, float *lpc, const float *features);
void run_frame_network_deferred(LPCNetState *lpcnet, const float *features);
void run_frame_network_flush(LPCNetState *lpcnet);
void lpcnet_synthesize_tail_impl(LPCNetState *lpcnet, opus_int16 *output, int N, int preload);
void lpcnet_synthesize_impl(LPCNetState *lpcnet, const float *features, opus_int16 *output, int N, int preload);
void lpcnet_synthesize_blend_impl(LPCNetState *lpcnet, const opus_int16 *pcm_in, opus_int16 *output, int N);
void run_frame_network(LPCNetState *lpcnet, float *gru_a_condition, float *gru_b_condition, float *lpc, const float *features);
#endif

View File

@@ -0,0 +1,307 @@
/* The contents of this file was automatically generated by dump_lpcnet_tables.c*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "kiss_fft.h"
static const arch_fft_state arch_fft = {0, NULL};
static const opus_int16 fft_bitrev[320] = {
0, 64, 128, 192, 256, 16, 80, 144, 208, 272, 32, 96, 160, 224, 288,
48, 112, 176, 240, 304, 4, 68, 132, 196, 260, 20, 84, 148, 212, 276,
36, 100, 164, 228, 292, 52, 116, 180, 244, 308, 8, 72, 136, 200, 264,
24, 88, 152, 216, 280, 40, 104, 168, 232, 296, 56, 120, 184, 248, 312,
12, 76, 140, 204, 268, 28, 92, 156, 220, 284, 44, 108, 172, 236, 300,
60, 124, 188, 252, 316, 1, 65, 129, 193, 257, 17, 81, 145, 209, 273,
33, 97, 161, 225, 289, 49, 113, 177, 241, 305, 5, 69, 133, 197, 261,
21, 85, 149, 213, 277, 37, 101, 165, 229, 293, 53, 117, 181, 245, 309,
9, 73, 137, 201, 265, 25, 89, 153, 217, 281, 41, 105, 169, 233, 297,
57, 121, 185, 249, 313, 13, 77, 141, 205, 269, 29, 93, 157, 221, 285,
45, 109, 173, 237, 301, 61, 125, 189, 253, 317, 2, 66, 130, 194, 258,
18, 82, 146, 210, 274, 34, 98, 162, 226, 290, 50, 114, 178, 242, 306,
6, 70, 134, 198, 262, 22, 86, 150, 214, 278, 38, 102, 166, 230, 294,
54, 118, 182, 246, 310, 10, 74, 138, 202, 266, 26, 90, 154, 218, 282,
42, 106, 170, 234, 298, 58, 122, 186, 250, 314, 14, 78, 142, 206, 270,
30, 94, 158, 222, 286, 46, 110, 174, 238, 302, 62, 126, 190, 254, 318,
3, 67, 131, 195, 259, 19, 83, 147, 211, 275, 35, 99, 163, 227, 291,
51, 115, 179, 243, 307, 7, 71, 135, 199, 263, 23, 87, 151, 215, 279,
39, 103, 167, 231, 295, 55, 119, 183, 247, 311, 11, 75, 139, 203, 267,
27, 91, 155, 219, 283, 43, 107, 171, 235, 299, 59, 123, 187, 251, 315,
15, 79, 143, 207, 271, 31, 95, 159, 223, 287, 47, 111, 175, 239, 303,
63, 127, 191, 255, 319, };
static const kiss_twiddle_cpx fft_twiddles[320] = {
{1.00000000f, -0.00000000f}, {0.999807239f, -0.0196336918f},
{0.999229014f, -0.0392598175f}, {0.998265624f, -0.0588708036f},
{0.996917307f, -0.0784590989f}, {0.995184720f, -0.0980171412f},
{0.993068457f, -0.117537394f}, {0.990569353f, -0.137012348f},
{0.987688363f, -0.156434461f}, {0.984426558f, -0.175796285f},
{0.980785251f, -0.195090324f}, {0.976765871f, -0.214309156f},
{0.972369909f, -0.233445361f}, {0.967599094f, -0.252491564f},
{0.962455213f, -0.271440446f}, {0.956940353f, -0.290284663f},
{0.951056540f, -0.309017003f}, {0.944806039f, -0.327630192f},
{0.938191354f, -0.346117049f}, {0.931214929f, -0.364470512f},
{0.923879504f, -0.382683426f}, {0.916187942f, -0.400748819f},
{0.908143163f, -0.418659747f}, {0.899748266f, -0.436409235f},
{0.891006529f, -0.453990489f}, {0.881921291f, -0.471396744f},
{0.872496009f, -0.488621235f}, {0.862734377f, -0.505657375f},
{0.852640152f, -0.522498548f}, {0.842217207f, -0.539138317f},
{0.831469595f, -0.555570245f}, {0.820401430f, -0.571787953f},
{0.809017003f, -0.587785244f}, {0.797320664f, -0.603555918f},
{0.785316944f, -0.619093955f}, {0.773010433f, -0.634393275f},
{0.760405958f, -0.649448037f}, {0.747508347f, -0.664252460f},
{0.734322488f, -0.678800762f}, {0.720853567f, -0.693087339f},
{0.707106769f, -0.707106769f}, {0.693087339f, -0.720853567f},
{0.678800762f, -0.734322488f}, {0.664252460f, -0.747508347f},
{0.649448037f, -0.760405958f}, {0.634393275f, -0.773010433f},
{0.619093955f, -0.785316944f}, {0.603555918f, -0.797320664f},
{0.587785244f, -0.809017003f}, {0.571787953f, -0.820401430f},
{0.555570245f, -0.831469595f}, {0.539138317f, -0.842217207f},
{0.522498548f, -0.852640152f}, {0.505657375f, -0.862734377f},
{0.488621235f, -0.872496009f}, {0.471396744f, -0.881921291f},
{0.453990489f, -0.891006529f}, {0.436409235f, -0.899748266f},
{0.418659747f, -0.908143163f}, {0.400748819f, -0.916187942f},
{0.382683426f, -0.923879504f}, {0.364470512f, -0.931214929f},
{0.346117049f, -0.938191354f}, {0.327630192f, -0.944806039f},
{0.309017003f, -0.951056540f}, {0.290284663f, -0.956940353f},
{0.271440446f, -0.962455213f}, {0.252491564f, -0.967599094f},
{0.233445361f, -0.972369909f}, {0.214309156f, -0.976765871f},
{0.195090324f, -0.980785251f}, {0.175796285f, -0.984426558f},
{0.156434461f, -0.987688363f}, {0.137012348f, -0.990569353f},
{0.117537394f, -0.993068457f}, {0.0980171412f, -0.995184720f},
{0.0784590989f, -0.996917307f}, {0.0588708036f, -0.998265624f},
{0.0392598175f, -0.999229014f}, {0.0196336918f, -0.999807239f},
{6.12323426e-17f, -1.00000000f}, {-0.0196336918f, -0.999807239f},
{-0.0392598175f, -0.999229014f}, {-0.0588708036f, -0.998265624f},
{-0.0784590989f, -0.996917307f}, {-0.0980171412f, -0.995184720f},
{-0.117537394f, -0.993068457f}, {-0.137012348f, -0.990569353f},
{-0.156434461f, -0.987688363f}, {-0.175796285f, -0.984426558f},
{-0.195090324f, -0.980785251f}, {-0.214309156f, -0.976765871f},
{-0.233445361f, -0.972369909f}, {-0.252491564f, -0.967599094f},
{-0.271440446f, -0.962455213f}, {-0.290284663f, -0.956940353f},
{-0.309017003f, -0.951056540f}, {-0.327630192f, -0.944806039f},
{-0.346117049f, -0.938191354f}, {-0.364470512f, -0.931214929f},
{-0.382683426f, -0.923879504f}, {-0.400748819f, -0.916187942f},
{-0.418659747f, -0.908143163f}, {-0.436409235f, -0.899748266f},
{-0.453990489f, -0.891006529f}, {-0.471396744f, -0.881921291f},
{-0.488621235f, -0.872496009f}, {-0.505657375f, -0.862734377f},
{-0.522498548f, -0.852640152f}, {-0.539138317f, -0.842217207f},
{-0.555570245f, -0.831469595f}, {-0.571787953f, -0.820401430f},
{-0.587785244f, -0.809017003f}, {-0.603555918f, -0.797320664f},
{-0.619093955f, -0.785316944f}, {-0.634393275f, -0.773010433f},
{-0.649448037f, -0.760405958f}, {-0.664252460f, -0.747508347f},
{-0.678800762f, -0.734322488f}, {-0.693087339f, -0.720853567f},
{-0.707106769f, -0.707106769f}, {-0.720853567f, -0.693087339f},
{-0.734322488f, -0.678800762f}, {-0.747508347f, -0.664252460f},
{-0.760405958f, -0.649448037f}, {-0.773010433f, -0.634393275f},
{-0.785316944f, -0.619093955f}, {-0.797320664f, -0.603555918f},
{-0.809017003f, -0.587785244f}, {-0.820401430f, -0.571787953f},
{-0.831469595f, -0.555570245f}, {-0.842217207f, -0.539138317f},
{-0.852640152f, -0.522498548f}, {-0.862734377f, -0.505657375f},
{-0.872496009f, -0.488621235f}, {-0.881921291f, -0.471396744f},
{-0.891006529f, -0.453990489f}, {-0.899748266f, -0.436409235f},
{-0.908143163f, -0.418659747f}, {-0.916187942f, -0.400748819f},
{-0.923879504f, -0.382683426f}, {-0.931214929f, -0.364470512f},
{-0.938191354f, -0.346117049f}, {-0.944806039f, -0.327630192f},
{-0.951056540f, -0.309017003f}, {-0.956940353f, -0.290284663f},
{-0.962455213f, -0.271440446f}, {-0.967599094f, -0.252491564f},
{-0.972369909f, -0.233445361f}, {-0.976765871f, -0.214309156f},
{-0.980785251f, -0.195090324f}, {-0.984426558f, -0.175796285f},
{-0.987688363f, -0.156434461f}, {-0.990569353f, -0.137012348f},
{-0.993068457f, -0.117537394f}, {-0.995184720f, -0.0980171412f},
{-0.996917307f, -0.0784590989f}, {-0.998265624f, -0.0588708036f},
{-0.999229014f, -0.0392598175f}, {-0.999807239f, -0.0196336918f},
{-1.00000000f, -1.22464685e-16f}, {-0.999807239f, 0.0196336918f},
{-0.999229014f, 0.0392598175f}, {-0.998265624f, 0.0588708036f},
{-0.996917307f, 0.0784590989f}, {-0.995184720f, 0.0980171412f},
{-0.993068457f, 0.117537394f}, {-0.990569353f, 0.137012348f},
{-0.987688363f, 0.156434461f}, {-0.984426558f, 0.175796285f},
{-0.980785251f, 0.195090324f}, {-0.976765871f, 0.214309156f},
{-0.972369909f, 0.233445361f}, {-0.967599094f, 0.252491564f},
{-0.962455213f, 0.271440446f}, {-0.956940353f, 0.290284663f},
{-0.951056540f, 0.309017003f}, {-0.944806039f, 0.327630192f},
{-0.938191354f, 0.346117049f}, {-0.931214929f, 0.364470512f},
{-0.923879504f, 0.382683426f}, {-0.916187942f, 0.400748819f},
{-0.908143163f, 0.418659747f}, {-0.899748266f, 0.436409235f},
{-0.891006529f, 0.453990489f}, {-0.881921291f, 0.471396744f},
{-0.872496009f, 0.488621235f}, {-0.862734377f, 0.505657375f},
{-0.852640152f, 0.522498548f}, {-0.842217207f, 0.539138317f},
{-0.831469595f, 0.555570245f}, {-0.820401430f, 0.571787953f},
{-0.809017003f, 0.587785244f}, {-0.797320664f, 0.603555918f},
{-0.785316944f, 0.619093955f}, {-0.773010433f, 0.634393275f},
{-0.760405958f, 0.649448037f}, {-0.747508347f, 0.664252460f},
{-0.734322488f, 0.678800762f}, {-0.720853567f, 0.693087339f},
{-0.707106769f, 0.707106769f}, {-0.693087339f, 0.720853567f},
{-0.678800762f, 0.734322488f}, {-0.664252460f, 0.747508347f},
{-0.649448037f, 0.760405958f}, {-0.634393275f, 0.773010433f},
{-0.619093955f, 0.785316944f}, {-0.603555918f, 0.797320664f},
{-0.587785244f, 0.809017003f}, {-0.571787953f, 0.820401430f},
{-0.555570245f, 0.831469595f}, {-0.539138317f, 0.842217207f},
{-0.522498548f, 0.852640152f}, {-0.505657375f, 0.862734377f},
{-0.488621235f, 0.872496009f}, {-0.471396744f, 0.881921291f},
{-0.453990489f, 0.891006529f}, {-0.436409235f, 0.899748266f},
{-0.418659747f, 0.908143163f}, {-0.400748819f, 0.916187942f},
{-0.382683426f, 0.923879504f}, {-0.364470512f, 0.931214929f},
{-0.346117049f, 0.938191354f}, {-0.327630192f, 0.944806039f},
{-0.309017003f, 0.951056540f}, {-0.290284663f, 0.956940353f},
{-0.271440446f, 0.962455213f}, {-0.252491564f, 0.967599094f},
{-0.233445361f, 0.972369909f}, {-0.214309156f, 0.976765871f},
{-0.195090324f, 0.980785251f}, {-0.175796285f, 0.984426558f},
{-0.156434461f, 0.987688363f}, {-0.137012348f, 0.990569353f},
{-0.117537394f, 0.993068457f}, {-0.0980171412f, 0.995184720f},
{-0.0784590989f, 0.996917307f}, {-0.0588708036f, 0.998265624f},
{-0.0392598175f, 0.999229014f}, {-0.0196336918f, 0.999807239f},
{-1.83697015e-16f, 1.00000000f}, {0.0196336918f, 0.999807239f},
{0.0392598175f, 0.999229014f}, {0.0588708036f, 0.998265624f},
{0.0784590989f, 0.996917307f}, {0.0980171412f, 0.995184720f},
{0.117537394f, 0.993068457f}, {0.137012348f, 0.990569353f},
{0.156434461f, 0.987688363f}, {0.175796285f, 0.984426558f},
{0.195090324f, 0.980785251f}, {0.214309156f, 0.976765871f},
{0.233445361f, 0.972369909f}, {0.252491564f, 0.967599094f},
{0.271440446f, 0.962455213f}, {0.290284663f, 0.956940353f},
{0.309017003f, 0.951056540f}, {0.327630192f, 0.944806039f},
{0.346117049f, 0.938191354f}, {0.364470512f, 0.931214929f},
{0.382683426f, 0.923879504f}, {0.400748819f, 0.916187942f},
{0.418659747f, 0.908143163f}, {0.436409235f, 0.899748266f},
{0.453990489f, 0.891006529f}, {0.471396744f, 0.881921291f},
{0.488621235f, 0.872496009f}, {0.505657375f, 0.862734377f},
{0.522498548f, 0.852640152f}, {0.539138317f, 0.842217207f},
{0.555570245f, 0.831469595f}, {0.571787953f, 0.820401430f},
{0.587785244f, 0.809017003f}, {0.603555918f, 0.797320664f},
{0.619093955f, 0.785316944f}, {0.634393275f, 0.773010433f},
{0.649448037f, 0.760405958f}, {0.664252460f, 0.747508347f},
{0.678800762f, 0.734322488f}, {0.693087339f, 0.720853567f},
{0.707106769f, 0.707106769f}, {0.720853567f, 0.693087339f},
{0.734322488f, 0.678800762f}, {0.747508347f, 0.664252460f},
{0.760405958f, 0.649448037f}, {0.773010433f, 0.634393275f},
{0.785316944f, 0.619093955f}, {0.797320664f, 0.603555918f},
{0.809017003f, 0.587785244f}, {0.820401430f, 0.571787953f},
{0.831469595f, 0.555570245f}, {0.842217207f, 0.539138317f},
{0.852640152f, 0.522498548f}, {0.862734377f, 0.505657375f},
{0.872496009f, 0.488621235f}, {0.881921291f, 0.471396744f},
{0.891006529f, 0.453990489f}, {0.899748266f, 0.436409235f},
{0.908143163f, 0.418659747f}, {0.916187942f, 0.400748819f},
{0.923879504f, 0.382683426f}, {0.931214929f, 0.364470512f},
{0.938191354f, 0.346117049f}, {0.944806039f, 0.327630192f},
{0.951056540f, 0.309017003f}, {0.956940353f, 0.290284663f},
{0.962455213f, 0.271440446f}, {0.967599094f, 0.252491564f},
{0.972369909f, 0.233445361f}, {0.976765871f, 0.214309156f},
{0.980785251f, 0.195090324f}, {0.984426558f, 0.175796285f},
{0.987688363f, 0.156434461f}, {0.990569353f, 0.137012348f},
{0.993068457f, 0.117537394f}, {0.995184720f, 0.0980171412f},
{0.996917307f, 0.0784590989f}, {0.998265624f, 0.0588708036f},
{0.999229014f, 0.0392598175f}, {0.999807239f, 0.0196336918f},
};
const kiss_fft_state kfft = {
320, /* nfft */
0.0031250000f, /* scale */
-1, /* shift */
{5, 64, 4, 16, 4, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, }, /* factors */
fft_bitrev, /* bitrev*/
fft_twiddles, /* twiddles*/
(arch_fft_state *)&arch_fft, /* arch_fft*/
};
const float half_window[] = {
3.78491532e-05f, 0.000340620492f, 0.000946046319f, 0.00185389258f, 0.00306380726f,
0.00457531959f, 0.00638783723f, 0.00850064680f, 0.0109129101f, 0.0136236614f,
0.0166318044f, 0.0199361145f, 0.0235352255f, 0.0274276342f, 0.0316116922f,
0.0360856056f, 0.0408474281f, 0.0458950549f, 0.0512262285f, 0.0568385124f,
0.0627293140f, 0.0688958541f, 0.0753351897f, 0.0820441842f, 0.0890194997f,
0.0962576419f, 0.103754878f, 0.111507311f, 0.119510807f, 0.127761051f,
0.136253506f, 0.144983411f, 0.153945804f, 0.163135484f, 0.172547072f,
0.182174906f, 0.192013159f, 0.202055752f, 0.212296382f, 0.222728521f,
0.233345464f, 0.244140238f, 0.255105674f, 0.266234398f, 0.277518868f,
0.288951218f, 0.300523549f, 0.312227666f, 0.324055225f, 0.335997701f,
0.348046392f, 0.360192508f, 0.372427016f, 0.384740859f, 0.397124738f,
0.409569323f, 0.422065198f, 0.434602767f, 0.447172493f, 0.459764689f,
0.472369671f, 0.484977663f, 0.497579008f, 0.510163903f, 0.522722721f,
0.535245717f, 0.547723293f, 0.560145974f, 0.572504222f, 0.584788740f,
0.596990347f, 0.609099925f, 0.621108532f, 0.633007407f, 0.644788086f,
0.656442165f, 0.667961538f, 0.679338276f, 0.690564752f, 0.701633692f,
0.712537885f, 0.723270535f, 0.733825266f, 0.744195819f, 0.754376352f,
0.764361382f, 0.774145722f, 0.783724606f, 0.793093503f, 0.802248418f,
0.811185598f, 0.819901764f, 0.828393936f, 0.836659551f, 0.844696403f,
0.852502763f, 0.860077202f, 0.867418647f, 0.874526560f, 0.881400526f,
0.888040781f, 0.894447744f, 0.900622249f, 0.906565487f, 0.912279010f,
0.917764664f, 0.923024654f, 0.928061485f, 0.932878017f, 0.937477291f,
0.941862822f, 0.946038187f, 0.950007319f, 0.953774393f, 0.957343817f,
0.960720181f, 0.963908315f, 0.966913164f, 0.969739914f, 0.972393870f,
0.974880517f, 0.977205336f, 0.979374051f, 0.981392324f, 0.983266115f,
0.985001266f, 0.986603677f, 0.988079309f, 0.989434063f, 0.990674019f,
0.991804957f, 0.992832899f, 0.993763626f, 0.994602919f, 0.995356441f,
0.996029854f, 0.996628702f, 0.997158289f, 0.997623861f, 0.998030603f,
0.998383403f, 0.998687088f, 0.998946249f, 0.999165416f, 0.999348700f,
0.999500215f, 0.999623775f, 0.999723017f, 0.999801278f, 0.999861658f,
0.999907196f, 0.999940455f, 0.999963880f, 0.999979615f, 0.999989510f,
0.999995291f, 0.999998271f, 0.999999523f, 0.999999940f, 1.00000000f,
};
const float dct_table[] = {
0.707106769f, 0.996194720f, 0.984807730f, 0.965925813f, 0.939692616f,
0.906307817f, 0.866025388f, 0.819152057f, 0.766044438f, 0.707106769f,
0.642787635f, 0.573576450f, 0.500000000f, 0.422618270f, 0.342020154f,
0.258819044f, 0.173648179f, 0.0871557444f, 0.707106769f, 0.965925813f,
0.866025388f, 0.707106769f, 0.500000000f, 0.258819044f, 6.12323426e-17f,
-0.258819044f, -0.500000000f, -0.707106769f, -0.866025388f, -0.965925813f,
-1.00000000f, -0.965925813f, -0.866025388f, -0.707106769f, -0.500000000f,
-0.258819044f, 0.707106769f, 0.906307817f, 0.642787635f, 0.258819044f,
-0.173648179f, -0.573576450f, -0.866025388f, -0.996194720f, -0.939692616f,
-0.707106769f, -0.342020154f, 0.0871557444f, 0.500000000f, 0.819152057f,
0.984807730f, 0.965925813f, 0.766044438f, 0.422618270f, 0.707106769f,
0.819152057f, 0.342020154f, -0.258819044f, -0.766044438f, -0.996194720f,
-0.866025388f, -0.422618270f, 0.173648179f, 0.707106769f, 0.984807730f,
0.906307817f, 0.500000000f, -0.0871557444f, -0.642787635f, -0.965925813f,
-0.939692616f, -0.573576450f, 0.707106769f, 0.707106769f, 6.12323426e-17f,
-0.707106769f, -1.00000000f, -0.707106769f, -1.83697015e-16f, 0.707106769f,
1.00000000f, 0.707106769f, 3.06161700e-16f, -0.707106769f, -1.00000000f,
-0.707106769f, -4.28626385e-16f, 0.707106769f, 1.00000000f, 0.707106769f,
0.707106769f, 0.573576450f, -0.342020154f, -0.965925813f, -0.766044438f,
0.0871557444f, 0.866025388f, 0.906307817f, 0.173648179f, -0.707106769f,
-0.984807730f, -0.422618270f, 0.500000000f, 0.996194720f, 0.642787635f,
-0.258819044f, -0.939692616f, -0.819152057f, 0.707106769f, 0.422618270f,
-0.642787635f, -0.965925813f, -0.173648179f, 0.819152057f, 0.866025388f,
-0.0871557444f, -0.939692616f, -0.707106769f, 0.342020154f, 0.996194720f,
0.500000000f, -0.573576450f, -0.984807730f, -0.258819044f, 0.766044438f,
0.906307817f, 0.707106769f, 0.258819044f, -0.866025388f, -0.707106769f,
0.500000000f, 0.965925813f, 3.06161700e-16f, -0.965925813f, -0.500000000f,
0.707106769f, 0.866025388f, -0.258819044f, -1.00000000f, -0.258819044f,
0.866025388f, 0.707106769f, -0.500000000f, -0.965925813f, 0.707106769f,
0.0871557444f, -0.984807730f, -0.258819044f, 0.939692616f, 0.422618270f,
-0.866025388f, -0.573576450f, 0.766044438f, 0.707106769f, -0.642787635f,
-0.819152057f, 0.500000000f, 0.906307817f, -0.342020154f, -0.965925813f,
0.173648179f, 0.996194720f, 0.707106769f, -0.0871557444f, -0.984807730f,
0.258819044f, 0.939692616f, -0.422618270f, -0.866025388f, 0.573576450f,
0.766044438f, -0.707106769f, -0.642787635f, 0.819152057f, 0.500000000f,
-0.906307817f, -0.342020154f, 0.965925813f, 0.173648179f, -0.996194720f,
0.707106769f, -0.258819044f, -0.866025388f, 0.707106769f, 0.500000000f,
-0.965925813f, -4.28626385e-16f, 0.965925813f, -0.500000000f, -0.707106769f,
0.866025388f, 0.258819044f, -1.00000000f, 0.258819044f, 0.866025388f,
-0.707106769f, -0.500000000f, 0.965925813f, 0.707106769f, -0.422618270f,
-0.642787635f, 0.965925813f, -0.173648179f, -0.819152057f, 0.866025388f,
0.0871557444f, -0.939692616f, 0.707106769f, 0.342020154f, -0.996194720f,
0.500000000f, 0.573576450f, -0.984807730f, 0.258819044f, 0.766044438f,
-0.906307817f, 0.707106769f, -0.573576450f, -0.342020154f, 0.965925813f,
-0.766044438f, -0.0871557444f, 0.866025388f, -0.906307817f, 0.173648179f,
0.707106769f, -0.984807730f, 0.422618270f, 0.500000000f, -0.996194720f,
0.642787635f, 0.258819044f, -0.939692616f, 0.819152057f, 0.707106769f,
-0.707106769f, -1.83697015e-16f, 0.707106769f, -1.00000000f, 0.707106769f,
5.51091070e-16f, -0.707106769f, 1.00000000f, -0.707106769f, -2.69484189e-15f,
0.707106769f, -1.00000000f, 0.707106769f, -4.90477710e-16f, -0.707106769f,
1.00000000f, -0.707106769f, 0.707106769f, -0.819152057f, 0.342020154f,
0.258819044f, -0.766044438f, 0.996194720f, -0.866025388f, 0.422618270f,
0.173648179f, -0.707106769f, 0.984807730f, -0.906307817f, 0.500000000f,
0.0871557444f, -0.642787635f, 0.965925813f, -0.939692616f, 0.573576450f,
0.707106769f, -0.906307817f, 0.642787635f, -0.258819044f, -0.173648179f,
0.573576450f, -0.866025388f, 0.996194720f, -0.939692616f, 0.707106769f,
-0.342020154f, -0.0871557444f, 0.500000000f, -0.819152057f, 0.984807730f,
-0.965925813f, 0.766044438f, -0.422618270f, 0.707106769f, -0.965925813f,
0.866025388f, -0.707106769f, 0.500000000f, -0.258819044f, 1.10280111e-15f,
0.258819044f, -0.500000000f, 0.707106769f, -0.866025388f, 0.965925813f,
-1.00000000f, 0.965925813f, -0.866025388f, 0.707106769f, -0.500000000f,
0.258819044f, 0.707106769f, -0.996194720f, 0.984807730f, -0.965925813f,
0.939692616f, -0.906307817f, 0.866025388f, -0.819152057f, 0.766044438f,
-0.707106769f, 0.642787635f, -0.573576450f, 0.500000000f, -0.422618270f,
0.342020154f, -0.258819044f, 0.173648179f, -0.0871557444f, };

View File

@@ -0,0 +1,64 @@
dnn_sources = sources['DEEP_PLC_SOURCES']
dred_sources = sources['DRED_SOURCES']
if opt_dred.enabled()
dnn_sources += dred_sources
endif
osce_sources = sources['OSCE_SOURCES']
if opt_osce.enabled()
dnn_sources += osce_sources
endif
dnn_sources_sse2 = sources['DNN_SOURCES_SSE2']
dnn_sources_sse4_1 = sources['DNN_SOURCES_SSE4_1']
dnn_sources_avx2 = sources['DNN_SOURCES_AVX2']
dnn_sources_neon_intr = sources['DNN_SOURCES_NEON']
dnn_sources_dotprod_intr = sources['DNN_SOURCES_DOTPROD']
dnn_includes = [opus_includes]
dnn_static_libs = []
if host_cpu_family in ['x86', 'x86_64'] and opus_conf.has('OPUS_HAVE_RTCD')
dnn_sources += sources['DNN_SOURCES_X86_RTCD']
endif
if host_cpu_family in ['arm', 'aarch64'] and have_arm_intrinsics_or_asm
if opus_conf.has('OPUS_HAVE_RTCD')
dnn_sources += sources['DNN_SOURCES_ARM_RTCD']
endif
endif
foreach intr_name : ['sse2', 'sse4_1', 'avx2', 'neon_intr', 'dotprod_intr']
have_intr = get_variable('have_' + intr_name)
if not have_intr
continue
endif
intr_sources = get_variable('dnn_sources_' + intr_name)
intr_args = get_variable('opus_@0@_args'.format(intr_name), [])
dnn_static_libs += static_library('dnn_' + intr_name, intr_sources,
c_args: intr_args,
include_directories: dnn_includes,
install: false)
endforeach
dnn_c_args = []
if host_machine.system() == 'windows'
dnn_c_args += ['-DDLL_EXPORT']
endif
if opt_deep_plc.enabled()
dnn_lib = static_library('opus-dnn',
dnn_sources,
c_args: dnn_c_args,
include_directories: dnn_includes,
link_whole: [dnn_static_libs],
dependencies: libm,
install: false)
else
dnn_lib = []
endif

View File

@@ -0,0 +1,416 @@
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "nndsp.h"
#include "arch.h"
#include "nnet.h"
#include "os_support.h"
#include "pitch.h"
#include <math.h>
#ifndef M_PI
#define M_PI 3.141592653589793f
#endif
#define KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel) ((((i_out_channels) * in_channels) + (i_in_channels)) * kernel_size + (i_kernel))
void init_adaconv_state(AdaConvState *hAdaConv)
{
OPUS_CLEAR(hAdaConv, 1);
}
void init_adacomb_state(AdaCombState *hAdaComb)
{
OPUS_CLEAR(hAdaComb, 1);
}
void init_adashape_state(AdaShapeState *hAdaShape)
{
OPUS_CLEAR(hAdaShape, 1);
}
void compute_overlap_window(float *window, int overlap_size)
{
int i_sample;
for (i_sample=0; i_sample < overlap_size; i_sample++)
{
window[i_sample] = 0.5f + 0.5f * cos(M_PI * (i_sample + 0.5f) / overlap_size);
}
}
#ifdef DEBUG_NNDSP
void print_float_vector(const char* name, const float *vec, int length)
{
for (int i = 0; i < length; i ++)
{
printf("%s[%d]: %f\n", name, i, vec[i]);
}
}
#endif
static void scale_kernel(
float *kernel,
int in_channels,
int out_channels,
int kernel_size,
float *gain
)
/* normalizes (p-norm) kernel over input channel and kernel dimension */
{
float norm;
int i_in_channels, i_out_channels, i_kernel;
for (i_out_channels = 0; i_out_channels < out_channels; i_out_channels++)
{
norm = 0;
for (i_in_channels = 0; i_in_channels < in_channels; i_in_channels ++)
{
for (i_kernel = 0; i_kernel < kernel_size; i_kernel++)
{
norm += kernel[KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel)] * kernel[KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel)];
}
}
#ifdef DEBUG_NNDSP
printf("kernel norm: %f, %f\n", norm, sqrt(norm));
#endif
norm = 1.f / (1e-6f + sqrt(norm));
for (i_in_channels = 0; i_in_channels < in_channels; i_in_channels++)
{
for (i_kernel = 0; i_kernel < kernel_size; i_kernel++)
{
kernel[KERNEL_INDEX(i_out_channels, i_in_channels, i_kernel)] *= norm * gain[i_out_channels];
}
}
}
}
static void transform_gains(
float *gains,
int num_gains,
float filter_gain_a,
float filter_gain_b
)
{
int i;
for (i = 0; i < num_gains; i++)
{
gains[i] = exp(filter_gain_a * gains[i] + filter_gain_b);
}
}
void adaconv_process_frame(
AdaConvState* hAdaConv,
float *x_out,
const float *x_in,
const float *features,
const LinearLayer *kernel_layer,
const LinearLayer *gain_layer,
int feature_dim,
int frame_size,
int overlap_size,
int in_channels,
int out_channels,
int kernel_size,
int left_padding,
float filter_gain_a,
float filter_gain_b,
float shape_gain,
float *window,
int arch
)
{
float output_buffer[ADACONV_MAX_FRAME_SIZE * ADACONV_MAX_OUTPUT_CHANNELS];
float kernel_buffer[ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS * ADACONV_MAX_OUTPUT_CHANNELS];
float input_buffer[ADACONV_MAX_INPUT_CHANNELS * (ADACONV_MAX_FRAME_SIZE + ADACONV_MAX_KERNEL_SIZE)];
float kernel0[ADACONV_MAX_KERNEL_SIZE];
float kernel1[ADACONV_MAX_KERNEL_SIZE];
float channel_buffer0[ADACONV_MAX_OVERLAP_SIZE];
float channel_buffer1[ADACONV_MAX_FRAME_SIZE];
float gain_buffer[ADACONV_MAX_OUTPUT_CHANNELS];
float *p_input;
int i_in_channels, i_out_channels, i_sample;
(void) feature_dim; /* ToDo: figure out whether we might need this information */
celt_assert(shape_gain == 1);
celt_assert(left_padding == kernel_size - 1); /* currently only supports causal version. Non-causal version not difficult to implement but will require third loop */
celt_assert(kernel_size < frame_size);
OPUS_CLEAR(output_buffer, ADACONV_MAX_FRAME_SIZE * ADACONV_MAX_OUTPUT_CHANNELS);
OPUS_CLEAR(kernel_buffer, ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS * ADACONV_MAX_OUTPUT_CHANNELS);
OPUS_CLEAR(input_buffer, ADACONV_MAX_INPUT_CHANNELS * (ADACONV_MAX_FRAME_SIZE + ADACONV_MAX_KERNEL_SIZE));
#ifdef DEBUG_NNDSP
print_float_vector("x_in", x_in, in_channels * frame_size);
#endif
/* prepare input */
for (i_in_channels=0; i_in_channels < in_channels; i_in_channels ++)
{
OPUS_COPY(input_buffer + i_in_channels * (kernel_size + frame_size), hAdaConv->history + i_in_channels * kernel_size, kernel_size);
OPUS_COPY(input_buffer + kernel_size + i_in_channels * (kernel_size + frame_size), x_in + frame_size * i_in_channels, frame_size);
}
p_input = input_buffer + kernel_size;
/* calculate new kernel and new gain */
compute_generic_dense(kernel_layer, kernel_buffer, features, ACTIVATION_LINEAR, arch);
compute_generic_dense(gain_layer, gain_buffer, features, ACTIVATION_TANH, arch);
#ifdef DEBUG_NNDSP
print_float_vector("features", features, feature_dim);
print_float_vector("adaconv_kernel_raw", kernel_buffer, in_channels * out_channels * kernel_size);
print_float_vector("adaconv_gain_raw", gain_buffer, out_channels);
#endif
transform_gains(gain_buffer, out_channels, filter_gain_a, filter_gain_b);
scale_kernel(kernel_buffer, in_channels, out_channels, kernel_size, gain_buffer);
#ifdef DEBUG_NNDSP
print_float_vector("adaconv_kernel", kernel_buffer, in_channels * out_channels * kernel_size);
print_float_vector("adaconv_gain", gain_buffer, out_channels);
#endif
/* calculate overlapping part using kernel from last frame */
for (i_out_channels = 0; i_out_channels < out_channels; i_out_channels++)
{
for (i_in_channels = 0; i_in_channels < in_channels; i_in_channels++)
{
OPUS_CLEAR(kernel0, ADACONV_MAX_KERNEL_SIZE);
OPUS_CLEAR(kernel1, ADACONV_MAX_KERNEL_SIZE);
OPUS_COPY(kernel0, hAdaConv->last_kernel + KERNEL_INDEX(i_out_channels, i_in_channels, 0), kernel_size);
OPUS_COPY(kernel1, kernel_buffer + KERNEL_INDEX(i_out_channels, i_in_channels, 0), kernel_size);
celt_pitch_xcorr(kernel0, p_input + i_in_channels * (frame_size + kernel_size) - left_padding, channel_buffer0, ADACONV_MAX_KERNEL_SIZE, overlap_size, arch);
celt_pitch_xcorr(kernel1, p_input + i_in_channels * (frame_size + kernel_size) - left_padding, channel_buffer1, ADACONV_MAX_KERNEL_SIZE, frame_size, arch);
for (i_sample = 0; i_sample < overlap_size; i_sample++)
{
output_buffer[i_sample + i_out_channels * frame_size] += window[i_sample] * channel_buffer0[i_sample];
output_buffer[i_sample + i_out_channels * frame_size] += (1.f - window[i_sample]) * channel_buffer1[i_sample];
}
for (i_sample = overlap_size; i_sample < frame_size; i_sample++)
{
output_buffer[i_sample + i_out_channels * frame_size] += channel_buffer1[i_sample];
}
}
}
OPUS_COPY(x_out, output_buffer, out_channels * frame_size);
#ifdef DEBUG_NNDSP
print_float_vector("x_out", x_out, out_channels * frame_size);
#endif
/* buffer update */
for (i_in_channels=0; i_in_channels < in_channels; i_in_channels ++)
{
OPUS_COPY(hAdaConv->history + i_in_channels * kernel_size, p_input + i_in_channels * (frame_size + kernel_size) + frame_size - kernel_size, kernel_size);
}
OPUS_COPY(hAdaConv->last_kernel, kernel_buffer, kernel_size * in_channels * out_channels);
}
void adacomb_process_frame(
AdaCombState* hAdaComb,
float *x_out,
const float *x_in,
const float *features,
const LinearLayer *kernel_layer,
const LinearLayer *gain_layer,
const LinearLayer *global_gain_layer,
int pitch_lag,
int feature_dim,
int frame_size,
int overlap_size,
int kernel_size,
int left_padding,
float filter_gain_a,
float filter_gain_b,
float log_gain_limit,
float *window,
int arch
)
{
float output_buffer[ADACOMB_MAX_FRAME_SIZE];
float output_buffer_last[ADACOMB_MAX_FRAME_SIZE];
float kernel_buffer[ADACOMB_MAX_KERNEL_SIZE];
float input_buffer[ADACOMB_MAX_FRAME_SIZE + ADACOMB_MAX_LAG + ADACOMB_MAX_KERNEL_SIZE];
float gain, global_gain;
float *p_input;
int i_sample;
float kernel[16];
float last_kernel[16];
(void) feature_dim; /* ToDo: figure out whether we might need this information */
OPUS_CLEAR(output_buffer, ADACOMB_MAX_FRAME_SIZE);
OPUS_CLEAR(kernel_buffer, ADACOMB_MAX_KERNEL_SIZE);
OPUS_CLEAR(input_buffer, ADACOMB_MAX_FRAME_SIZE + ADACOMB_MAX_LAG + ADACOMB_MAX_KERNEL_SIZE);
OPUS_COPY(input_buffer, hAdaComb->history, kernel_size + ADACOMB_MAX_LAG);
OPUS_COPY(input_buffer + kernel_size + ADACOMB_MAX_LAG, x_in, frame_size);
p_input = input_buffer + kernel_size + ADACOMB_MAX_LAG;
/* calculate new kernel and new gain */
compute_generic_dense(kernel_layer, kernel_buffer, features, ACTIVATION_LINEAR, arch);
compute_generic_dense(gain_layer, &gain, features, ACTIVATION_RELU, arch);
compute_generic_dense(global_gain_layer, &global_gain, features, ACTIVATION_TANH, arch);
#ifdef DEBUG_NNDSP
print_float_vector("features", features, feature_dim);
print_float_vector("adacomb_kernel_raw", kernel_buffer, kernel_size);
print_float_vector("adacomb_gain_raw", &gain, 1);
print_float_vector("adacomb_global_gain_raw", &global_gain, 1);
#endif
gain = exp(log_gain_limit - gain);
global_gain = exp(filter_gain_a * global_gain + filter_gain_b);
scale_kernel(kernel_buffer, 1, 1, kernel_size, &gain);
#ifdef DEBUG_NNDSP
print_float_vector("adacomb_kernel", kernel_buffer, kernel_size);
print_float_vector("adacomb_gain", &gain, 1);
#endif
OPUS_CLEAR(kernel, ADACOMB_MAX_KERNEL_SIZE);
OPUS_CLEAR(last_kernel, ADACOMB_MAX_KERNEL_SIZE);
OPUS_COPY(kernel, kernel_buffer, kernel_size);
OPUS_COPY(last_kernel, hAdaComb->last_kernel, kernel_size);
celt_pitch_xcorr(last_kernel, &p_input[- left_padding - hAdaComb->last_pitch_lag], output_buffer_last, ADACOMB_MAX_KERNEL_SIZE, overlap_size, arch);
celt_pitch_xcorr(kernel, &p_input[- left_padding - pitch_lag], output_buffer, ADACOMB_MAX_KERNEL_SIZE, frame_size, arch);
for (i_sample = 0; i_sample < overlap_size; i_sample++)
{
output_buffer[i_sample] = hAdaComb->last_global_gain * window[i_sample] * output_buffer_last[i_sample] + global_gain * (1.f - window[i_sample]) * output_buffer[i_sample];
}
for (i_sample = 0; i_sample < overlap_size; i_sample++)
{
output_buffer[i_sample] += (window[i_sample] * hAdaComb->last_global_gain + (1.f - window[i_sample]) * global_gain) * p_input[i_sample];
}
for (i_sample = overlap_size; i_sample < frame_size; i_sample++)
{
output_buffer[i_sample] = global_gain * (output_buffer[i_sample] + p_input[i_sample]);
}
OPUS_COPY(x_out, output_buffer, frame_size);
#ifdef DEBUG_NNDSP
print_float_vector("x_out", x_out, frame_size);
#endif
/* buffer update */
OPUS_COPY(hAdaComb->last_kernel, kernel_buffer, kernel_size);
OPUS_COPY(hAdaComb->history, p_input + frame_size - kernel_size - ADACOMB_MAX_LAG, kernel_size + ADACOMB_MAX_LAG);
hAdaComb->last_pitch_lag = pitch_lag;
hAdaComb->last_global_gain = global_gain;
}
void adashape_process_frame(
AdaShapeState *hAdaShape,
float *x_out,
const float *x_in,
const float *features,
const LinearLayer *alpha1f,
const LinearLayer *alpha1t,
const LinearLayer *alpha2,
int feature_dim,
int frame_size,
int avg_pool_k,
int arch
)
{
float in_buffer[ADASHAPE_MAX_INPUT_DIM + ADASHAPE_MAX_FRAME_SIZE];
float out_buffer[ADASHAPE_MAX_FRAME_SIZE];
float tmp_buffer[ADASHAPE_MAX_FRAME_SIZE];
int i, k;
int tenv_size;
float mean;
float *tenv;
celt_assert(frame_size % avg_pool_k == 0);
celt_assert(feature_dim + frame_size / avg_pool_k + 1 < ADASHAPE_MAX_INPUT_DIM);
tenv_size = frame_size / avg_pool_k;
tenv = in_buffer + feature_dim;
OPUS_CLEAR(tenv, tenv_size + 1);
OPUS_COPY(in_buffer, features, feature_dim);
/* calculate temporal envelope */
mean = 0;
for (i = 0; i < tenv_size; i++)
{
for (k = 0; k < avg_pool_k; k++)
{
tenv[i] += fabs(x_in[i * avg_pool_k + k]);
}
tenv[i] = log(tenv[i] / avg_pool_k + 1.52587890625e-05f);
mean += tenv[i];
}
mean /= tenv_size;
for (i = 0; i < tenv_size; i++)
{
tenv[i] -= mean;
}
tenv[tenv_size] = mean;
#ifdef DEBUG_NNDSP
print_float_vector("tenv", tenv, tenv_size + 1);
#endif
/* calculate temporal weights */
#ifdef DEBUG_NNDSP
print_float_vector("alpha1_in", in_buffer, feature_dim + tenv_size + 1);
#endif
compute_generic_conv1d(alpha1f, out_buffer, hAdaShape->conv_alpha1f_state, in_buffer, feature_dim, ACTIVATION_LINEAR, arch);
compute_generic_conv1d(alpha1t, tmp_buffer, hAdaShape->conv_alpha1t_state, tenv, tenv_size + 1, ACTIVATION_LINEAR, arch);
#ifdef DEBUG_NNDSP
print_float_vector("alpha1_out", out_buffer, frame_size);
#endif
/* compute leaky ReLU by hand. ToDo: try tanh activation */
for (i = 0; i < frame_size; i ++)
{
float tmp = out_buffer[i] + tmp_buffer[i];
in_buffer[i] = tmp >= 0 ? tmp : 0.2 * tmp;
}
#ifdef DEBUG_NNDSP
print_float_vector("post_alpha1", in_buffer, frame_size);
#endif
compute_generic_conv1d(alpha2, out_buffer, hAdaShape->conv_alpha2_state, in_buffer, frame_size, ACTIVATION_LINEAR, arch);
/* shape signal */
for (i = 0; i < frame_size; i ++)
{
x_out[i] = exp(out_buffer[i]) * x_in[i];
}
}

View File

@@ -0,0 +1,143 @@
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef NNDSP_H
#define NNDSP_H
#include "opus_types.h"
#include "nnet.h"
#include <string.h>
#define ADACONV_MAX_KERNEL_SIZE 16
#define ADACONV_MAX_INPUT_CHANNELS 2
#define ADACONV_MAX_OUTPUT_CHANNELS 2
#define ADACONV_MAX_FRAME_SIZE 80
#define ADACONV_MAX_OVERLAP_SIZE 40
#define ADACOMB_MAX_LAG 300
#define ADACOMB_MAX_KERNEL_SIZE 16
#define ADACOMB_MAX_FRAME_SIZE 80
#define ADACOMB_MAX_OVERLAP_SIZE 40
#define ADASHAPE_MAX_INPUT_DIM 512
#define ADASHAPE_MAX_FRAME_SIZE 160
/*#define DEBUG_NNDSP*/
#ifdef DEBUG_NNDSP
#include <stdio.h>
#endif
void print_float_vector(const char* name, const float *vec, int length);
typedef struct {
float history[ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS];
float last_kernel[ADACONV_MAX_KERNEL_SIZE * ADACONV_MAX_INPUT_CHANNELS * ADACONV_MAX_OUTPUT_CHANNELS];
float last_gain;
} AdaConvState;
typedef struct {
float history[ADACOMB_MAX_KERNEL_SIZE + ADACOMB_MAX_LAG];
float last_kernel[ADACOMB_MAX_KERNEL_SIZE];
float last_global_gain;
int last_pitch_lag;
} AdaCombState;
typedef struct {
float conv_alpha1f_state[ADASHAPE_MAX_INPUT_DIM];
float conv_alpha1t_state[ADASHAPE_MAX_INPUT_DIM];
float conv_alpha2_state[ADASHAPE_MAX_FRAME_SIZE];
} AdaShapeState;
void init_adaconv_state(AdaConvState *hAdaConv);
void init_adacomb_state(AdaCombState *hAdaComb);
void init_adashape_state(AdaShapeState *hAdaShape);
void compute_overlap_window(float *window, int overlap_size);
void adaconv_process_frame(
AdaConvState* hAdaConv,
float *x_out,
const float *x_in,
const float *features,
const LinearLayer *kernel_layer,
const LinearLayer *gain_layer,
int feature_dim, /* not strictly necessary */
int frame_size,
int overlap_size,
int in_channels,
int out_channels,
int kernel_size,
int left_padding,
float filter_gain_a,
float filter_gain_b,
float shape_gain,
float *window,
int arch
);
void adacomb_process_frame(
AdaCombState* hAdaComb,
float *x_out,
const float *x_in,
const float *features,
const LinearLayer *kernel_layer,
const LinearLayer *gain_layer,
const LinearLayer *global_gain_layer,
int pitch_lag,
int feature_dim,
int frame_size,
int overlap_size,
int kernel_size,
int left_padding,
float filter_gain_a,
float filter_gain_b,
float log_gain_limit,
float *window,
int arch
);
void adashape_process_frame(
AdaShapeState *hAdaShape,
float *x_out,
const float *x_in,
const float *features,
const LinearLayer *alpha1f,
const LinearLayer *alpha1t,
const LinearLayer *alpha2,
int feature_dim,
int frame_size,
int avg_pool_k,
int arch
);
#endif

View File

@@ -0,0 +1,149 @@
/* Copyright (c) 2018 Mozilla
2008-2011 Octasic Inc.
2012-2017 Jean-Marc Valin */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <stdlib.h>
#include <math.h>
#include "opus_types.h"
#include "arch.h"
#include "nnet.h"
#include "dred_rdovae_constants.h"
#include "plc_data.h"
#include "fargan.h"
#include "os_support.h"
#include "vec.h"
#ifdef ENABLE_OSCE
#include "osce.h"
#endif
#ifdef NO_OPTIMIZATIONS
#if defined(_MSC_VER)
#pragma message ("Compiling without any vectorization. This code will be very slow")
#else
#warning Compiling without any vectorization. This code will be very slow
#endif
#endif
#define SOFTMAX_HACK
void compute_generic_dense(const LinearLayer *layer, float *output, const float *input, int activation, int arch)
{
compute_linear(layer, output, input, arch);
compute_activation(output, output, layer->nb_outputs, activation, arch);
}
#ifdef ENABLE_OSCE
#define MAX_RNN_NEURONS_ALL IMAX(IMAX(IMAX(FARGAN_MAX_RNN_NEURONS, PLC_MAX_RNN_UNITS), DRED_MAX_RNN_NEURONS), OSCE_MAX_RNN_NEURONS)
#else
#define MAX_RNN_NEURONS_ALL IMAX(IMAX(FARGAN_MAX_RNN_NEURONS, PLC_MAX_RNN_UNITS), DRED_MAX_RNN_NEURONS)
#endif
void compute_generic_gru(const LinearLayer *input_weights, const LinearLayer *recurrent_weights, float *state, const float *in, int arch)
{
int i;
int N;
float zrh[3*MAX_RNN_NEURONS_ALL];
float recur[3*MAX_RNN_NEURONS_ALL];
float *z;
float *r;
float *h;
celt_assert(3*recurrent_weights->nb_inputs == recurrent_weights->nb_outputs);
celt_assert(input_weights->nb_outputs == recurrent_weights->nb_outputs);
N = recurrent_weights->nb_inputs;
z = zrh;
r = &zrh[N];
h = &zrh[2*N];
celt_assert(recurrent_weights->nb_outputs <= 3*MAX_RNN_NEURONS_ALL);
celt_assert(in != state);
compute_linear(input_weights, zrh, in, arch);
compute_linear(recurrent_weights, recur, state, arch);
for (i=0;i<2*N;i++)
zrh[i] += recur[i];
compute_activation(zrh, zrh, 2*N, ACTIVATION_SIGMOID, arch);
for (i=0;i<N;i++)
h[i] += recur[2*N+i]*r[i];
compute_activation(h, h, N, ACTIVATION_TANH, arch);
for (i=0;i<N;i++)
h[i] = z[i]*state[i] + (1-z[i])*h[i];
for (i=0;i<N;i++)
state[i] = h[i];
}
void compute_glu(const LinearLayer *layer, float *output, const float *input, int arch)
{
int i;
float act2[MAX_INPUTS];
celt_assert(layer->nb_inputs == layer->nb_outputs);
compute_linear(layer, act2, input, arch);
compute_activation(act2, act2, layer->nb_outputs, ACTIVATION_SIGMOID, arch);
if (input == output) {
/* Give a vectorization hint to the compiler for the in-place case. */
for (i=0;i<layer->nb_outputs;i++) output[i] = output[i]*act2[i];
} else {
for (i=0;i<layer->nb_outputs;i++) output[i] = input[i]*act2[i];
}
}
#define MAX_CONV_INPUTS_ALL DRED_MAX_CONV_INPUTS
void compute_generic_conv1d(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int activation, int arch)
{
float tmp[MAX_CONV_INPUTS_ALL];
celt_assert(input != output);
celt_assert(layer->nb_inputs <= MAX_CONV_INPUTS_ALL);
if (layer->nb_inputs!=input_size) OPUS_COPY(tmp, mem, layer->nb_inputs-input_size);
OPUS_COPY(&tmp[layer->nb_inputs-input_size], input, input_size);
compute_linear(layer, output, tmp, arch);
compute_activation(output, output, layer->nb_outputs, activation, arch);
if (layer->nb_inputs!=input_size) OPUS_COPY(mem, &tmp[input_size], layer->nb_inputs-input_size);
}
void compute_generic_conv1d_dilation(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int dilation, int activation, int arch)
{
float tmp[MAX_CONV_INPUTS_ALL];
int ksize = layer->nb_inputs/input_size;
int i;
celt_assert(input != output);
celt_assert(layer->nb_inputs <= MAX_CONV_INPUTS_ALL);
if (dilation==1) OPUS_COPY(tmp, mem, layer->nb_inputs-input_size);
else for (i=0;i<ksize-1;i++) OPUS_COPY(&tmp[i*input_size], &mem[i*input_size*dilation], input_size);
OPUS_COPY(&tmp[layer->nb_inputs-input_size], input, input_size);
compute_linear(layer, output, tmp, arch);
compute_activation(output, output, layer->nb_outputs, activation, arch);
if (dilation==1) OPUS_COPY(mem, &tmp[input_size], layer->nb_inputs-input_size);
else {
OPUS_COPY(mem, &mem[input_size], input_size*dilation*(ksize-1)-input_size);
OPUS_COPY(&mem[input_size*dilation*(ksize-1)-input_size], input, input_size);
}
}

View File

@@ -0,0 +1,163 @@
/* Copyright (c) 2018 Mozilla
Copyright (c) 2017 Jean-Marc Valin */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef NNET_H_
#define NNET_H_
#include <stddef.h>
#include "opus_types.h"
#define ACTIVATION_LINEAR 0
#define ACTIVATION_SIGMOID 1
#define ACTIVATION_TANH 2
#define ACTIVATION_RELU 3
#define ACTIVATION_SOFTMAX 4
#define ACTIVATION_SWISH 5
#define WEIGHT_BLOB_VERSION 0
#define WEIGHT_BLOCK_SIZE 64
typedef struct {
const char *name;
int type;
int size;
const void *data;
} WeightArray;
#define WEIGHT_TYPE_float 0
#define WEIGHT_TYPE_int 1
#define WEIGHT_TYPE_qweight 2
#define WEIGHT_TYPE_int8 3
typedef struct {
char head[4];
int version;
int type;
int size;
int block_size;
char name[44];
} WeightHead;
/* Generic sparse affine transformation. */
typedef struct {
const float *bias;
const float *subias;
const opus_int8 *weights;
const float *float_weights;
const int *weights_idx;
const float *diag;
const float *scale;
int nb_inputs;
int nb_outputs;
} LinearLayer;
/* Generic sparse affine transformation. */
typedef struct {
const float *bias;
const float *float_weights;
int in_channels;
int out_channels;
int ktime;
int kheight;
} Conv2dLayer;
void compute_generic_dense(const LinearLayer *layer, float *output, const float *input, int activation, int arch);
void compute_generic_gru(const LinearLayer *input_weights, const LinearLayer *recurrent_weights, float *state, const float *in, int arch);
void compute_generic_conv1d(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int activation, int arch);
void compute_generic_conv1d_dilation(const LinearLayer *layer, float *output, float *mem, const float *input, int input_size, int dilation, int activation, int arch);
void compute_glu(const LinearLayer *layer, float *output, const float *input, int arch);
void compute_gated_activation(const LinearLayer *layer, float *output, const float *input, int activation, int arch);
int parse_weights(WeightArray **list, const void *data, int len);
extern const WeightArray lpcnet_arrays[];
extern const WeightArray plcmodel_arrays[];
extern const WeightArray rdovaeenc_arrays[];
extern const WeightArray rdovaedec_arrays[];
extern const WeightArray fwgan_arrays[];
extern const WeightArray fargan_arrays[];
extern const WeightArray pitchdnn_arrays[];
extern const WeightArray lossgen_arrays[];
int linear_init(LinearLayer *layer, const WeightArray *arrays,
const char *bias,
const char *subias,
const char *weights,
const char *float_weights,
const char *weights_idx,
const char *diag,
const char *scale,
int nb_inputs,
int nb_outputs);
int conv2d_init(Conv2dLayer *layer, const WeightArray *arrays,
const char *bias,
const char *float_weights,
int in_channels,
int out_channels,
int ktime,
int kheight);
void compute_linear_c(const LinearLayer *linear, float *out, const float *in);
void compute_activation_c(float *output, const float *input, int N, int activation);
void compute_conv2d_c(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation);
#if defined(OPUS_ARM_MAY_HAVE_DOTPROD) || defined(OPUS_ARM_MAY_HAVE_NEON_INTR)
#include "arm/dnn_arm.h"
#endif
#if defined(OPUS_X86_MAY_HAVE_SSE2)
#include "x86/dnn_x86.h"
#endif
#ifndef OVERRIDE_COMPUTE_LINEAR
#define compute_linear(linear, out, in, arch) ((void)(arch),compute_linear_c(linear, out, in))
#endif
#ifndef OVERRIDE_COMPUTE_ACTIVATION
#define compute_activation(output, input, N, activation, arch) ((void)(arch),compute_activation_c(output, input, N, activation))
#endif
#ifndef OVERRIDE_COMPUTE_CONV2D
#define compute_conv2d(conv, out, mem, in, height, hstride, activation, arch) ((void)(arch),compute_conv2d_c(conv, out, mem, in, height, hstride, activation))
#endif
#if defined(__x86_64__) && !defined(OPUS_X86_MAY_HAVE_SSE4_1) && !defined(OPUS_X86_MAY_HAVE_AVX2)
#if defined(_MSC_VER)
#pragma message ("Only SSE and SSE2 are available. On newer machines, enable SSSE3/AVX/AVX2 to get better performance")
#else
#warning "Only SSE and SSE2 are available. On newer machines, enable SSSE3/AVX/AVX2 using -march= to get better performance"
#endif
#endif
#endif /* NNET_H_ */

View File

@@ -0,0 +1,247 @@
/* Copyright (c) 2018-2019 Mozilla
2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef NNET_ARCH_H
#define NNET_ARCH_H
#include "nnet.h"
#include "arch.h"
#include "os_support.h"
#include "vec.h"
#define CAT_SUFFIX2(a,b) a ## b
#define CAT_SUFFIX(a,b) CAT_SUFFIX2(a, b)
#define RTCD_SUF(name) CAT_SUFFIX(name, RTCD_ARCH)
/* Force vectorization on for DNN code because some of the loops rely on
compiler vectorization rather than explicitly using intrinsics. */
#if OPUS_GNUC_PREREQ(5,1)
#define GCC_POP_OPTIONS
#pragma GCC push_options
#pragma GCC optimize("tree-vectorize")
#endif
#define MAX_ACTIVATIONS (4096)
static OPUS_INLINE void vec_swish(float *y, const float *x, int N)
{
int i;
float tmp[MAX_ACTIVATIONS];
celt_assert(N <= MAX_ACTIVATIONS);
vec_sigmoid(tmp, x, N);
for (i=0;i<N;i++)
y[i] = x[i]*tmp[i];
}
static OPUS_INLINE float relu(float x)
{
return x < 0 ? 0 : x;
}
/*#define HIGH_ACCURACY */
void RTCD_SUF(compute_activation_)(float *output, const float *input, int N, int activation)
{
int i;
if (activation == ACTIVATION_SIGMOID) {
#ifdef HIGH_ACCURACY
for (int n=0; n<N; n++)
{
output[n] = 1.f / (1 + exp(-input[n]));
}
#else
vec_sigmoid(output, input, N);
#endif
} else if (activation == ACTIVATION_TANH) {
#ifdef HIGH_ACCURACY
for (int n=0; n<N; n++)
{
output[n] = tanh(input[n]);
}
#else
vec_tanh(output, input, N);
#endif
} else if (activation == ACTIVATION_SWISH) {
vec_swish(output, input, N);
} else if (activation == ACTIVATION_RELU) {
for (i=0;i<N;i++)
output[i] = relu(input[i]);
} else if (activation == ACTIVATION_SOFTMAX) {
#ifdef SOFTMAX_HACK
OPUS_COPY(output, input, N);
/*for (i=0;i<N;i++)
output[i] = input[i];*/
#else
float sum = 0;
softmax(output, input, N);
for (i=0;i<N;i++) {
sum += output[i];
}
sum = 1.f/(sum+1e-30);
for (i=0;i<N;i++)
output[i] = sum*output[i];
#endif
} else {
celt_assert(activation == ACTIVATION_LINEAR);
if (input != output) {
for (i=0;i<N;i++)
output[i] = input[i];
}
}
}
void RTCD_SUF(compute_linear_) (const LinearLayer *linear, float *out, const float *in)
{
int i, M, N;
const float *bias;
celt_assert(in != out);
bias = linear->bias;
M = linear->nb_inputs;
N = linear->nb_outputs;
if (linear->float_weights != NULL) {
if (linear->weights_idx != NULL) sparse_sgemv8x4(out, linear->float_weights, linear->weights_idx, N, in);
else sgemv(out, linear->float_weights, N, M, N, in);
} else if (linear->weights != NULL) {
if (linear->weights_idx != NULL) sparse_cgemv8x4(out, linear->weights, linear->weights_idx, linear->scale, N, M, in);
else cgemv8x4(out, linear->weights, linear->scale, N, M, in);
/* Only use SU biases on for integer matrices on SU archs. */
#ifdef USE_SU_BIAS
bias = linear->subias;
#endif
}
else OPUS_CLEAR(out, N);
if (bias != NULL) {
for (i=0;i<N;i++) out[i] += bias[i];
}
if (linear->diag) {
/* Diag is only used for GRU recurrent weights. */
celt_assert(3*M == N);
for (i=0;i<M;i++) {
out[i] += linear->diag[i]*in[i];
out[i+M] += linear->diag[i+M]*in[i];
out[i+2*M] += linear->diag[i+2*M]*in[i];
}
}
}
/* Computes non-padded convolution for input [ ksize1 x in_channels x (len2+ksize2) ],
kernel [ out_channels x in_channels x ksize1 x ksize2 ],
storing the output as [ out_channels x len2 ].
We assume that the output dimension along the ksize1 axis is 1,
i.e. processing one frame at a time. */
static void conv2d_float(float *out, const float *weights, int in_channels, int out_channels, int ktime, int kheight, const float *in, int height, int hstride)
{
int i;
int in_stride;
in_stride = height+kheight-1;
for (i=0;i<out_channels;i++) {
int m;
OPUS_CLEAR(&out[i*hstride], height);
for (m=0;m<in_channels;m++) {
int t;
for (t=0;t<ktime;t++) {
int h;
for (h=0;h<kheight;h++) {
int j;
for (j=0;j<height;j++) {
out[i*hstride + j] += weights[i*in_channels*ktime*kheight + m*ktime*kheight + t*kheight + h] *
in[t*in_channels*in_stride + m*in_stride + j + h];
}
}
}
}
}
}
/* There's no intrinsics in this function (or the one above) because the gcc (and hopefully other compiler) auto-vectorizer is smart enough to
produce the right code by itself based on the compile flags. */
static void conv2d_3x3_float(float *out, const float *weights, int in_channels, int out_channels, const float *in, int height, int hstride)
{
int i;
int in_stride;
int kheight, ktime;
kheight = ktime = 3;
in_stride = height+kheight-1;
for (i=0;i<out_channels;i++) {
int m;
OPUS_CLEAR(&out[i*hstride], height);
for (m=0;m<in_channels;m++) {
int j;
for (j=0;j<height;j++) {
/* Unrolled version of previous function -- compiler will figure out the indexing simplifications. */
out[i*hstride + j] += weights[i*in_channels*ktime*kheight + m*ktime*kheight + 0*kheight + 0]*in[0*in_channels*in_stride + m*in_stride + j + 0]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 0*kheight + 1]*in[0*in_channels*in_stride + m*in_stride + j + 1]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 0*kheight + 2]*in[0*in_channels*in_stride + m*in_stride + j + 2]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 1*kheight + 0]*in[1*in_channels*in_stride + m*in_stride + j + 0]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 1*kheight + 1]*in[1*in_channels*in_stride + m*in_stride + j + 1]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 1*kheight + 2]*in[1*in_channels*in_stride + m*in_stride + j + 2]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 2*kheight + 0]*in[2*in_channels*in_stride + m*in_stride + j + 0]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 2*kheight + 1]*in[2*in_channels*in_stride + m*in_stride + j + 1]
+ weights[i*in_channels*ktime*kheight + m*ktime*kheight + 2*kheight + 2]*in[2*in_channels*in_stride + m*in_stride + j + 2];
}
}
}
}
#define MAX_CONV2D_INPUTS 8192
void RTCD_SUF(compute_conv2d_)(const Conv2dLayer *conv, float *out, float *mem, const float *in, int height, int hstride, int activation)
{
int i;
const float *bias;
float in_buf[MAX_CONV2D_INPUTS];
int time_stride;
celt_assert(in != out);
time_stride = conv->in_channels*(height+conv->kheight-1);
celt_assert(conv->ktime*time_stride <= MAX_CONV2D_INPUTS);
OPUS_COPY(in_buf, mem, (conv->ktime-1)*time_stride);
OPUS_COPY(&in_buf[(conv->ktime-1)*time_stride], in, time_stride);
OPUS_COPY(mem, &in_buf[time_stride], (conv->ktime-1)*time_stride);
bias = conv->bias;
if (conv->kheight == 3 && conv->ktime == 3)
conv2d_3x3_float(out, conv->float_weights, conv->in_channels, conv->out_channels, in_buf, height, hstride);
else
conv2d_float(out, conv->float_weights, conv->in_channels, conv->out_channels, conv->ktime, conv->kheight, in_buf, height, hstride);
if (bias != NULL) {
for (i=0;i<conv->out_channels;i++) {
int j;
for (j=0;j<height;j++) out[i*hstride+j] += bias[i];
}
}
for (i=0;i<conv->out_channels;i++) {
RTCD_SUF(compute_activation_)(&out[i*hstride], &out[i*hstride], height, activation);
}
}
#ifdef GCC_POP_OPTIONS
#pragma GCC pop_options
#endif
#endif

View File

@@ -0,0 +1,35 @@
/* Copyright (c) 2018-2019 Mozilla
2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#define RTCD_ARCH c
#include "nnet_arch.h"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,84 @@
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef OSCE_H
#define OSCE_H
#include "opus_types.h"
/*#include "osce_config.h"*/
#ifndef DISABLE_LACE
#include "lace_data.h"
#endif
#ifndef DISABLE_NOLACE
#include "nolace_data.h"
#endif
#include "nndsp.h"
#include "nnet.h"
#include "osce_structs.h"
#include "structs.h"
#define OSCE_METHOD_NONE 0
#ifndef DISABLE_LACE
#define OSCE_METHOD_LACE 1
#endif
#ifndef DISABLE_NOLACE
#define OSCE_METHOD_NOLACE 2
#endif
#if !defined(DISABLE_NOLACE)
#define OSCE_DEFAULT_METHOD OSCE_METHOD_NOLACE
#define OSCE_MAX_RNN_NEURONS NOLACE_FNET_GRU_STATE_SIZE
#elif !defined(DISABLE_LACE)
#define OSCE_DEFAULT_METHOD OSCE_METHOD_LACE
#define OSCE_MAX_RNN_NEURONS LACE_FNET_GRU_STATE_SIZE
#else
#define OSCE_DEFAULT_METHOD OSCE_METHOD_NONE
#define OSCE_MAX_RNN_NEURONS 0
#endif
/* API */
void osce_enhance_frame(
OSCEModel *model, /* I OSCE model struct */
silk_decoder_state *psDec, /* I/O Decoder state */
silk_decoder_control *psDecCtrl, /* I Decoder control */
opus_int16 xq[], /* I/O Decoded speech */
opus_int32 num_bits, /* I Size of SILK payload in bits */
int arch /* I Run-time architecture */
);
int osce_load_models(OSCEModel *hModel, const void *data, int len);
void osce_reset(silk_OSCE_struct *hOSCE, int method);
#endif

View File

@@ -0,0 +1,60 @@
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef OSCE_CONFIG
#define OSCE_CONFIG
#define OSCE_FEATURES_MAX_HISTORY 350
#define OSCE_FEATURE_DIM 93
#define OSCE_MAX_FEATURE_FRAMES 4
#define OSCE_CLEAN_SPEC_NUM_BANDS 64
#define OSCE_NOISY_SPEC_NUM_BANDS 18
#define OSCE_NO_PITCH_VALUE 7
#define OSCE_PREEMPH 0.85f
#define OSCE_PITCH_HANGOVER 0
#define OSCE_CLEAN_SPEC_START 0
#define OSCE_CLEAN_SPEC_LENGTH 64
#define OSCE_NOISY_CEPSTRUM_START 64
#define OSCE_NOISY_CEPSTRUM_LENGTH 18
#define OSCE_ACORR_START 82
#define OSCE_ACORR_LENGTH 5
#define OSCE_LTP_START 87
#define OSCE_LTP_LENGTH 5
#define OSCE_LOG_GAIN_START 92
#define OSCE_LOG_GAIN_LENGTH 1
#endif

View File

@@ -0,0 +1,454 @@
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#define OSCE_SPEC_WINDOW_SIZE 320
#define OSCE_SPEC_NUM_FREQS 161
/*DEBUG*/
/*#define WRITE_FEATURES*/
/*#define DEBUG_PRING*/
/*******/
#include "stack_alloc.h"
#include "osce_features.h"
#include "kiss_fft.h"
#include "os_support.h"
#include "osce.h"
#include "freq.h"
#if defined(WRITE_FEATURES) || defined(DEBUG_PRING)
#include <stdio.h>
#include <stdlib.h>
#endif
static const int center_bins_clean[64] = {
0, 2, 5, 8, 10, 12, 15, 18,
20, 22, 25, 28, 30, 33, 35, 38,
40, 42, 45, 48, 50, 52, 55, 58,
60, 62, 65, 68, 70, 73, 75, 78,
80, 82, 85, 88, 90, 92, 95, 98,
100, 102, 105, 108, 110, 112, 115, 118,
120, 122, 125, 128, 130, 132, 135, 138,
140, 142, 145, 148, 150, 152, 155, 160
};
static const int center_bins_noisy[18] = {
0, 4, 8, 12, 16, 20, 24, 28,
32, 40, 48, 56, 64, 80, 96, 112,
136, 160
};
static const float band_weights_clean[64] = {
0.666666666667f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.400000000000f, 0.400000000000f, 0.400000000000f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.400000000000f, 0.400000000000f, 0.400000000000f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.333333333333f, 0.400000000000f,
0.500000000000f, 0.400000000000f, 0.250000000000f, 0.333333333333f
};
static const float band_weights_noisy[18] = {
0.400000000000f, 0.250000000000f, 0.250000000000f, 0.250000000000f,
0.250000000000f, 0.250000000000f, 0.250000000000f, 0.250000000000f,
0.166666666667f, 0.125000000000f, 0.125000000000f, 0.125000000000f,
0.083333333333f, 0.062500000000f, 0.062500000000f, 0.050000000000f,
0.041666666667f, 0.080000000000f
};
static float osce_window[OSCE_SPEC_WINDOW_SIZE] = {
0.004908718808f, 0.014725683311f, 0.024541228523f, 0.034354408400f, 0.044164277127f,
0.053969889210f, 0.063770299562f, 0.073564563600f, 0.083351737332f, 0.093130877450f,
0.102901041421f, 0.112661287575f, 0.122410675199f, 0.132148264628f, 0.141873117332f,
0.151584296010f, 0.161280864678f, 0.170961888760f, 0.180626435180f, 0.190273572448f,
0.199902370753f, 0.209511902052f, 0.219101240157f, 0.228669460829f, 0.238215641862f,
0.247738863176f, 0.257238206902f, 0.266712757475f, 0.276161601717f, 0.285583828929f,
0.294978530977f, 0.304344802381f, 0.313681740399f, 0.322988445118f, 0.332264019538f,
0.341507569661f, 0.350718204573f, 0.359895036535f, 0.369037181064f, 0.378143757022f,
0.387213886697f, 0.396246695891f, 0.405241314005f, 0.414196874117f, 0.423112513073f,
0.431987371563f, 0.440820594212f, 0.449611329655f, 0.458358730621f, 0.467061954019f,
0.475720161014f, 0.484332517110f, 0.492898192230f, 0.501416360796f, 0.509886201809f,
0.518306898929f, 0.526677640552f, 0.534997619887f, 0.543266035038f, 0.551482089078f,
0.559644990127f, 0.567753951426f, 0.575808191418f, 0.583806933818f, 0.591749407690f,
0.599634847523f, 0.607462493302f, 0.615231590581f, 0.622941390558f, 0.630591150148f,
0.638180132051f, 0.645707604824f, 0.653172842954f, 0.660575126926f, 0.667913743292f,
0.675187984742f, 0.682397150168f, 0.689540544737f, 0.696617479953f, 0.703627273726f,
0.710569250438f, 0.717442741007f, 0.724247082951f, 0.730981620454f, 0.737645704427f,
0.744238692572f, 0.750759949443f, 0.757208846506f, 0.763584762206f, 0.769887082016f,
0.776115198508f, 0.782268511401f, 0.788346427627f, 0.794348361383f, 0.800273734191f,
0.806121974951f, 0.811892519997f, 0.817584813152f, 0.823198305781f, 0.828732456844f,
0.834186732948f, 0.839560608398f, 0.844853565250f, 0.850065093356f, 0.855194690420f,
0.860241862039f, 0.865206121757f, 0.870086991109f, 0.874883999665f, 0.879596685080f,
0.884224593137f, 0.888767277786f, 0.893224301196f, 0.897595233788f, 0.901879654283f,
0.906077149740f, 0.910187315596f, 0.914209755704f, 0.918144082372f, 0.921989916403f,
0.925746887127f, 0.929414632439f, 0.932992798835f, 0.936481041442f, 0.939879024058f,
0.943186419177f, 0.946402908026f, 0.949528180593f, 0.952561935658f, 0.955503880820f,
0.958353732530f, 0.961111216112f, 0.963776065795f, 0.966348024735f, 0.968826845041f,
0.971212287799f, 0.973504123096f, 0.975702130039f, 0.977806096779f, 0.979815820533f,
0.981731107599f, 0.983551773378f, 0.985277642389f, 0.986908548290f, 0.988444333892f,
0.989884851171f, 0.991229961288f, 0.992479534599f, 0.993633450666f, 0.994691598273f,
0.995653875433f, 0.996520189401f, 0.997290456679f, 0.997964603026f, 0.998542563469f,
0.999024282300f, 0.999409713092f, 0.999698818696f, 0.999891571247f, 0.999987952167f,
0.999987952167f, 0.999891571247f, 0.999698818696f, 0.999409713092f, 0.999024282300f,
0.998542563469f, 0.997964603026f, 0.997290456679f, 0.996520189401f, 0.995653875433f,
0.994691598273f, 0.993633450666f, 0.992479534599f, 0.991229961288f, 0.989884851171f,
0.988444333892f, 0.986908548290f, 0.985277642389f, 0.983551773378f, 0.981731107599f,
0.979815820533f, 0.977806096779f, 0.975702130039f, 0.973504123096f, 0.971212287799f,
0.968826845041f, 0.966348024735f, 0.963776065795f, 0.961111216112f, 0.958353732530f,
0.955503880820f, 0.952561935658f, 0.949528180593f, 0.946402908026f, 0.943186419177f,
0.939879024058f, 0.936481041442f, 0.932992798835f, 0.929414632439f, 0.925746887127f,
0.921989916403f, 0.918144082372f, 0.914209755704f, 0.910187315596f, 0.906077149740f,
0.901879654283f, 0.897595233788f, 0.893224301196f, 0.888767277786f, 0.884224593137f,
0.879596685080f, 0.874883999665f, 0.870086991109f, 0.865206121757f, 0.860241862039f,
0.855194690420f, 0.850065093356f, 0.844853565250f, 0.839560608398f, 0.834186732948f,
0.828732456844f, 0.823198305781f, 0.817584813152f, 0.811892519997f, 0.806121974951f,
0.800273734191f, 0.794348361383f, 0.788346427627f, 0.782268511401f, 0.776115198508f,
0.769887082016f, 0.763584762206f, 0.757208846506f, 0.750759949443f, 0.744238692572f,
0.737645704427f, 0.730981620454f, 0.724247082951f, 0.717442741007f, 0.710569250438f,
0.703627273726f, 0.696617479953f, 0.689540544737f, 0.682397150168f, 0.675187984742f,
0.667913743292f, 0.660575126926f, 0.653172842954f, 0.645707604824f, 0.638180132051f,
0.630591150148f, 0.622941390558f, 0.615231590581f, 0.607462493302f, 0.599634847523f,
0.591749407690f, 0.583806933818f, 0.575808191418f, 0.567753951426f, 0.559644990127f,
0.551482089078f, 0.543266035038f, 0.534997619887f, 0.526677640552f, 0.518306898929f,
0.509886201809f, 0.501416360796f, 0.492898192230f, 0.484332517110f, 0.475720161014f,
0.467061954019f, 0.458358730621f, 0.449611329655f, 0.440820594212f, 0.431987371563f,
0.423112513073f, 0.414196874117f, 0.405241314005f, 0.396246695891f, 0.387213886697f,
0.378143757022f, 0.369037181064f, 0.359895036535f, 0.350718204573f, 0.341507569661f,
0.332264019538f, 0.322988445118f, 0.313681740399f, 0.304344802381f, 0.294978530977f,
0.285583828929f, 0.276161601717f, 0.266712757475f, 0.257238206902f, 0.247738863176f,
0.238215641862f, 0.228669460829f, 0.219101240157f, 0.209511902052f, 0.199902370753f,
0.190273572448f, 0.180626435180f, 0.170961888760f, 0.161280864678f, 0.151584296010f,
0.141873117332f, 0.132148264628f, 0.122410675199f, 0.112661287575f, 0.102901041421f,
0.093130877450f, 0.083351737332f, 0.073564563600f, 0.063770299562f, 0.053969889210f,
0.044164277127f, 0.034354408400f, 0.024541228523f, 0.014725683311f, 0.004908718808f
};
static void apply_filterbank(float *x_out, float *x_in, const int *center_bins, const float* band_weights, int num_bands)
{
int b, i;
float frac;
celt_assert(x_in != x_out)
x_out[0] = 0;
for (b = 0; b < num_bands - 1; b++)
{
x_out[b+1] = 0;
for (i = center_bins[b]; i < center_bins[b+1]; i++)
{
frac = (float) (center_bins[b+1] - i) / (center_bins[b+1] - center_bins[b]);
x_out[b] += band_weights[b] * frac * x_in[i];
x_out[b+1] += band_weights[b+1] * (1 - frac) * x_in[i];
}
}
x_out[num_bands - 1] += band_weights[num_bands - 1] * x_in[center_bins[num_bands - 1]];
#ifdef DEBUG_PRINT
for (b = 0; b < num_bands; b++)
{
printf("band[%d]: %f\n", b, x_out[b]);
}
#endif
}
static void mag_spec_320_onesided(float *out, float *in)
{
celt_assert(OSCE_SPEC_WINDOW_SIZE == 320);
kiss_fft_cpx buffer[OSCE_SPEC_WINDOW_SIZE];
int k;
forward_transform(buffer, in);
for (k = 0; k < OSCE_SPEC_NUM_FREQS; k++)
{
out[k] = OSCE_SPEC_WINDOW_SIZE * sqrt(buffer[k].r * buffer[k].r + buffer[k].i * buffer[k].i);
#ifdef DEBUG_PRINT
printf("magspec[%d]: %f\n", k, out[k]);
#endif
}
}
static void calculate_log_spectrum_from_lpc(float *spec, opus_int16 *a_q12, int lpc_order)
{
float buffer[OSCE_SPEC_WINDOW_SIZE] = {0};
int i;
/* zero expansion */
buffer[0] = 1;
for (i = 0; i < lpc_order; i++)
{
buffer[i+1] = - (float)a_q12[i] / (1U << 12);
}
/* calculate and invert magnitude spectrum */
mag_spec_320_onesided(buffer, buffer);
for (i = 0; i < OSCE_SPEC_NUM_FREQS; i++)
{
buffer[i] = 1.f / (buffer[i] + 1e-9f);
}
/* apply filterbank */
apply_filterbank(spec, buffer, center_bins_clean, band_weights_clean, OSCE_CLEAN_SPEC_NUM_BANDS);
/* log and scaling */
for (i = 0; i < OSCE_CLEAN_SPEC_NUM_BANDS; i++)
{
spec[i] = 0.3f * log(spec[i] + 1e-9f);
}
}
static void calculate_cepstrum(float *cepstrum, float *signal)
{
float buffer[OSCE_SPEC_WINDOW_SIZE];
float *spec = &buffer[OSCE_SPEC_NUM_FREQS + 3];
int n;
celt_assert(cepstrum != signal)
for (n = 0; n < OSCE_SPEC_WINDOW_SIZE; n++)
{
buffer[n] = osce_window[n] * signal[n];
}
/* calculate magnitude spectrum */
mag_spec_320_onesided(buffer, buffer);
/* accumulate bands */
apply_filterbank(spec, buffer, center_bins_noisy, band_weights_noisy, OSCE_NOISY_SPEC_NUM_BANDS);
/* log domain conversion */
for (n = 0; n < OSCE_NOISY_SPEC_NUM_BANDS; n++)
{
spec[n] = log(spec[n] + 1e-9f);
#ifdef DEBUG_PRINT
printf("logspec[%d]: %f\n", n, spec[n]);
#endif
}
/* DCT-II (orthonormal) */
celt_assert(OSCE_NOISY_SPEC_NUM_BANDS == NB_BANDS);
dct(cepstrum, spec);
}
static void calculate_acorr(float *acorr, float *signal, int lag)
{
int n, k;
celt_assert(acorr != signal)
for (k = -2; k <= 2; k++)
{
acorr[k+2] = 0;
float xx = 0;
float xy = 0;
float yy = 0;
for (n = 0; n < 80; n++)
{
/* obviously wasteful -> fix later */
xx += signal[n] * signal[n];
yy += signal[n - lag + k] * signal[n - lag + k];
xy += signal[n] * signal[n - lag + k];
}
acorr[k+2] = xy / sqrt(xx * yy + 1e-9f);
}
}
static int pitch_postprocessing(OSCEFeatureState *psFeatures, int lag, int type)
{
int new_lag;
int modulus;
#ifdef OSCE_HANGOVER_BUGFIX
#define TESTBIT 1
#else
#define TESTBIT 0
#endif
modulus = OSCE_PITCH_HANGOVER;
if (modulus == 0) modulus ++;
/* hangover is currently disabled to reflect a bug in the python code. ToDo: re-evaluate hangover */
if (type != TYPE_VOICED && psFeatures->last_type == TYPE_VOICED && TESTBIT)
/* enter hangover */
{
new_lag = OSCE_NO_PITCH_VALUE;
if (psFeatures->pitch_hangover_count < OSCE_PITCH_HANGOVER)
{
new_lag = psFeatures->last_lag;
psFeatures->pitch_hangover_count = (psFeatures->pitch_hangover_count + 1) % modulus;
}
}
else if (type != TYPE_VOICED && psFeatures->pitch_hangover_count && TESTBIT)
/* continue hangover */
{
new_lag = psFeatures->last_lag;
psFeatures->pitch_hangover_count = (psFeatures->pitch_hangover_count + 1) % modulus;
}
else if (type != TYPE_VOICED)
/* unvoiced frame after hangover */
{
new_lag = OSCE_NO_PITCH_VALUE;
psFeatures->pitch_hangover_count = 0;
}
else
/* voiced frame: update last_lag */
{
new_lag = lag;
psFeatures->last_lag = lag;
psFeatures->pitch_hangover_count = 0;
}
/* buffer update */
psFeatures->last_type = type;
/* with the current setup this should never happen (but who knows...) */
celt_assert(new_lag)
return new_lag;
}
void osce_calculate_features(
silk_decoder_state *psDec, /* I/O Decoder state */
silk_decoder_control *psDecCtrl, /* I Decoder control */
float *features, /* O input features */
float *numbits, /* O numbits and smoothed numbits */
int *periods, /* O pitch lags on subframe basis */
const opus_int16 xq[], /* I Decoded speech */
opus_int32 num_bits /* I Size of SILK payload in bits */
)
{
int num_subframes, num_samples;
float buffer[OSCE_FEATURES_MAX_HISTORY + OSCE_MAX_FEATURE_FRAMES * 80];
float *frame, *pfeatures;
OSCEFeatureState *psFeatures;
int i, n, k;
#ifdef WRITE_FEATURES
static FILE *f_feat = NULL;
if (f_feat == NULL)
{
f_feat = fopen("assembled_features.f32", "wb");
}
#endif
/*OPUS_CLEAR(buffer, 1);*/
memset(buffer, 0, sizeof(buffer));
num_subframes = psDec->nb_subfr;
num_samples = num_subframes * 80;
psFeatures = &psDec->osce.features;
/* smooth bit count */
psFeatures->numbits_smooth = 0.9f * psFeatures->numbits_smooth + 0.1f * num_bits;
numbits[0] = num_bits;
numbits[1] = psFeatures->numbits_smooth;
for (n = 0; n < num_samples; n++)
{
buffer[OSCE_FEATURES_MAX_HISTORY + n] = (float) xq[n] / (1U<<15);
}
OPUS_COPY(buffer, psFeatures->signal_history, OSCE_FEATURES_MAX_HISTORY);
for (k = 0; k < num_subframes; k++)
{
pfeatures = features + k * OSCE_FEATURE_DIM;
frame = &buffer[OSCE_FEATURES_MAX_HISTORY + k * 80];
memset(pfeatures, 0, OSCE_FEATURE_DIM); /* precaution */
/* clean spectrum from lpcs (update every other frame) */
if (k % 2 == 0)
{
calculate_log_spectrum_from_lpc(pfeatures + OSCE_CLEAN_SPEC_START, psDecCtrl->PredCoef_Q12[k >> 1], psDec->LPC_order);
}
else
{
OPUS_COPY(pfeatures + OSCE_CLEAN_SPEC_START, pfeatures + OSCE_CLEAN_SPEC_START - OSCE_FEATURE_DIM, OSCE_CLEAN_SPEC_LENGTH);
}
/* noisy cepstrum from signal (update every other frame) */
if (k % 2 == 0)
{
calculate_cepstrum(pfeatures + OSCE_NOISY_CEPSTRUM_START, frame - 160);
}
else
{
OPUS_COPY(pfeatures + OSCE_NOISY_CEPSTRUM_START, pfeatures + OSCE_NOISY_CEPSTRUM_START - OSCE_FEATURE_DIM, OSCE_NOISY_CEPSTRUM_LENGTH);
}
/* pitch hangover and zero value replacement */
periods[k] = pitch_postprocessing(psFeatures, psDecCtrl->pitchL[k], psDec->indices.signalType);
/* auto-correlation around pitch lag */
calculate_acorr(pfeatures + OSCE_ACORR_START, frame, periods[k]);
/* ltp */
celt_assert(OSCE_LTP_LENGTH == LTP_ORDER)
for (i = 0; i < OSCE_LTP_LENGTH; i++)
{
pfeatures[OSCE_LTP_START + i] = (float) psDecCtrl->LTPCoef_Q14[k * LTP_ORDER + i] / (1U << 14);
}
/* frame gain */
pfeatures[OSCE_LOG_GAIN_START] = log((float) psDecCtrl->Gains_Q16[k] / (1UL << 16) + 1e-9f);
#ifdef WRITE_FEATURES
fwrite(pfeatures, sizeof(*pfeatures), 93, f_feat);
#endif
}
/* buffer update */
OPUS_COPY(psFeatures->signal_history, &buffer[num_samples], OSCE_FEATURES_MAX_HISTORY);
}
void osce_cross_fade_10ms(float *x_enhanced, float *x_in, int length)
{
int i;
celt_assert(length >= 160);
for (i = 0; i < 160; i++)
{
x_enhanced[i] = osce_window[i] * x_enhanced[i] + (1.f - osce_window[i]) * x_in[i];
}
}

View File

@@ -0,0 +1,50 @@
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef OSCE_FEATURES_H
#define OSCE_FEATURES_H
#include "structs.h"
#include "opus_types.h"
#define OSCE_NUMBITS_BUGFIX
void osce_calculate_features(
silk_decoder_state *psDec, /* I/O Decoder state */
silk_decoder_control *psDecCtrl, /* I Decoder control */
float *features, /* O input features */
float *numbits, /* O numbits and smoothed numbits */
int *periods, /* O pitch lags on subframe basis */
const opus_int16 xq[], /* I Decoded speech */
opus_int32 num_bits /* I Size of SILK payload in bits */
);
void osce_cross_fade_10ms(float *x_enhanced, float *x_in, int length);
#endif

View File

@@ -0,0 +1,125 @@
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef OSCE_STRUCTS_H
#define OSCE_STRUCTS_H
#include "opus_types.h"
#include "osce_config.h"
#ifndef DISABLE_LACE
#include "lace_data.h"
#endif
#ifndef DISABLE_NOLACE
#include "nolace_data.h"
#endif
#include "nndsp.h"
#include "nnet.h"
/* feature calculation */
typedef struct {
float numbits_smooth;
int pitch_hangover_count;
int last_lag;
int last_type;
float signal_history[OSCE_FEATURES_MAX_HISTORY];
int reset;
} OSCEFeatureState;
#ifndef DISABLE_LACE
/* LACE */
typedef struct {
float feature_net_conv2_state[LACE_FNET_CONV2_STATE_SIZE];
float feature_net_gru_state[LACE_COND_DIM];
AdaCombState cf1_state;
AdaCombState cf2_state;
AdaConvState af1_state;
float preemph_mem;
float deemph_mem;
} LACEState;
typedef struct
{
LACELayers layers;
float window[LACE_OVERLAP_SIZE];
} LACE;
#endif /* #ifndef DISABLE_LACE */
#ifndef DISABLE_NOLACE
/* NoLACE */
typedef struct {
float feature_net_conv2_state[NOLACE_FNET_CONV2_STATE_SIZE];
float feature_net_gru_state[NOLACE_COND_DIM];
float post_cf1_state[NOLACE_COND_DIM];
float post_cf2_state[NOLACE_COND_DIM];
float post_af1_state[NOLACE_COND_DIM];
float post_af2_state[NOLACE_COND_DIM];
float post_af3_state[NOLACE_COND_DIM];
AdaCombState cf1_state;
AdaCombState cf2_state;
AdaConvState af1_state;
AdaConvState af2_state;
AdaConvState af3_state;
AdaConvState af4_state;
AdaShapeState tdshape1_state;
AdaShapeState tdshape2_state;
AdaShapeState tdshape3_state;
float preemph_mem;
float deemph_mem;
} NoLACEState;
typedef struct {
NOLACELayers layers;
float window[LACE_OVERLAP_SIZE];
} NoLACE;
#endif /* #ifndef DISABLE_NOLACE */
/* OSCEModel */
typedef struct {
int loaded;
#ifndef DISABLE_LACE
LACE lace;
#endif
#ifndef DISABLE_NOLACE
NoLACE nolace;
#endif
} OSCEModel;
typedef union {
#ifndef DISABLE_LACE
LACEState lace;
#endif
#ifndef DISABLE_NOLACE
NoLACEState nolace;
#endif
} OSCEState;
#endif

View File

@@ -0,0 +1,238 @@
/* Copyright (c) 2023 Amazon */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <string.h>
#include <stdlib.h>
#include "nnet.h"
#include "os_support.h"
#define SPARSE_BLOCK_SIZE 32
int parse_record(const void **data, int *len, WeightArray *array) {
WeightHead *h = (WeightHead *)*data;
if (*len < WEIGHT_BLOCK_SIZE) return -1;
if (h->block_size < h->size) return -1;
if (h->block_size > *len-WEIGHT_BLOCK_SIZE) return -1;
if (h->name[sizeof(h->name)-1] != 0) return -1;
if (h->size < 0) return -1;
array->name = h->name;
array->type = h->type;
array->size = h->size;
array->data = (void*)((unsigned char*)(*data)+WEIGHT_BLOCK_SIZE);
*data = (void*)((unsigned char*)*data + h->block_size+WEIGHT_BLOCK_SIZE);
*len -= h->block_size+WEIGHT_BLOCK_SIZE;
return array->size;
}
int parse_weights(WeightArray **list, const void *data, int len)
{
int nb_arrays=0;
int capacity=20;
*list = opus_alloc(capacity*sizeof(WeightArray));
while (len > 0) {
int ret;
WeightArray array = {NULL, 0, 0, 0};
ret = parse_record(&data, &len, &array);
if (ret > 0) {
if (nb_arrays+1 >= capacity) {
/* Make sure there's room for the ending NULL element too. */
capacity = capacity*3/2;
*list = opus_realloc(*list, capacity*sizeof(WeightArray));
}
(*list)[nb_arrays++] = array;
} else {
opus_free(*list);
*list = NULL;
return -1;
}
}
(*list)[nb_arrays].name=NULL;
return nb_arrays;
}
static const void *find_array_entry(const WeightArray *arrays, const char *name) {
while (arrays->name && strcmp(arrays->name, name) != 0) arrays++;
return arrays;
}
static const void *find_array_check(const WeightArray *arrays, const char *name, int size) {
const WeightArray *a = find_array_entry(arrays, name);
if (a->name && a->size == size) return a->data;
else return NULL;
}
static const void *opt_array_check(const WeightArray *arrays, const char *name, int size, int *error) {
const WeightArray *a = find_array_entry(arrays, name);
*error = (a->name != NULL && a->size != size);
if (a->name && a->size == size) return a->data;
else return NULL;
}
static const void *find_idx_check(const WeightArray *arrays, const char *name, int nb_in, int nb_out, int *total_blocks) {
int remain;
const int *idx;
const WeightArray *a = find_array_entry(arrays, name);
*total_blocks = 0;
if (a == NULL) return NULL;
idx = a->data;
remain = a->size/sizeof(int);
while (remain > 0) {
int nb_blocks;
int i;
nb_blocks = *idx++;
if (remain < nb_blocks+1) return NULL;
for (i=0;i<nb_blocks;i++) {
int pos = *idx++;
if (pos+3 >= nb_in || (pos&0x3)) return NULL;
}
nb_out -= 8;
remain -= nb_blocks+1;
*total_blocks += nb_blocks;
}
if (nb_out != 0) return NULL;
return a->data;
}
int linear_init(LinearLayer *layer, const WeightArray *arrays,
const char *bias,
const char *subias,
const char *weights,
const char *float_weights,
const char *weights_idx,
const char *diag,
const char *scale,
int nb_inputs,
int nb_outputs)
{
int err;
layer->bias = NULL;
layer->subias = NULL;
layer->weights = NULL;
layer->float_weights = NULL;
layer->weights_idx = NULL;
layer->diag = NULL;
layer->scale = NULL;
if (bias != NULL) {
if ((layer->bias = find_array_check(arrays, bias, nb_outputs*sizeof(layer->bias[0]))) == NULL) return 1;
}
if (subias != NULL) {
if ((layer->subias = find_array_check(arrays, subias, nb_outputs*sizeof(layer->subias[0]))) == NULL) return 1;
}
if (weights_idx != NULL) {
int total_blocks;
if ((layer->weights_idx = find_idx_check(arrays, weights_idx, nb_inputs, nb_outputs, &total_blocks)) == NULL) return 1;
if (weights != NULL) {
if ((layer->weights = find_array_check(arrays, weights, SPARSE_BLOCK_SIZE*total_blocks*sizeof(layer->weights[0]))) == NULL) return 1;
}
if (float_weights != NULL) {
layer->float_weights = opt_array_check(arrays, float_weights, SPARSE_BLOCK_SIZE*total_blocks*sizeof(layer->float_weights[0]), &err);
if (err) return 1;
}
} else {
if (weights != NULL) {
if ((layer->weights = find_array_check(arrays, weights, nb_inputs*nb_outputs*sizeof(layer->weights[0]))) == NULL) return 1;
}
if (float_weights != NULL) {
layer->float_weights = opt_array_check(arrays, float_weights, nb_inputs*nb_outputs*sizeof(layer->float_weights[0]), &err);
if (err) return 1;
}
}
if (diag != NULL) {
if ((layer->diag = find_array_check(arrays, diag, nb_outputs*sizeof(layer->diag[0]))) == NULL) return 1;
}
if (weights != NULL) {
if ((layer->scale = find_array_check(arrays, scale, nb_outputs*sizeof(layer->scale[0]))) == NULL) return 1;
}
layer->nb_inputs = nb_inputs;
layer->nb_outputs = nb_outputs;
return 0;
}
int conv2d_init(Conv2dLayer *layer, const WeightArray *arrays,
const char *bias,
const char *float_weights,
int in_channels,
int out_channels,
int ktime,
int kheight)
{
int err;
layer->bias = NULL;
layer->float_weights = NULL;
if (bias != NULL) {
if ((layer->bias = find_array_check(arrays, bias, out_channels*sizeof(layer->bias[0]))) == NULL) return 1;
}
if (float_weights != NULL) {
layer->float_weights = opt_array_check(arrays, float_weights, in_channels*out_channels*ktime*kheight*sizeof(layer->float_weights[0]), &err);
if (err) return 1;
}
layer->in_channels = in_channels;
layer->out_channels = out_channels;
layer->ktime = ktime;
layer->kheight = kheight;
return 0;
}
#if 0
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/stat.h>
#include <stdio.h>
int main()
{
int fd;
void *data;
int len;
int nb_arrays;
int i;
WeightArray *list;
struct stat st;
const char *filename = "weights_blob.bin";
stat(filename, &st);
len = st.st_size;
fd = open(filename, O_RDONLY);
data = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
printf("size is %d\n", len);
nb_arrays = parse_weights(&list, data, len);
for (i=0;i<nb_arrays;i++) {
printf("found %s: size %d\n", list[i].name, list[i].size);
}
printf("%p\n", list[i].name);
opus_free(list);
munmap(data, len);
close(fd);
return 0;
}
#endif

View File

@@ -0,0 +1,79 @@
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <math.h>
#include "pitchdnn.h"
#include "os_support.h"
#include "nnet.h"
#include "lpcnet_private.h"
float compute_pitchdnn(
PitchDNNState *st,
const float *if_features,
const float *xcorr_features,
int arch
)
{
float if1_out[DENSE_IF_UPSAMPLER_1_OUT_SIZE];
float downsampler_in[NB_XCORR_FEATURES + DENSE_IF_UPSAMPLER_2_OUT_SIZE];
float downsampler_out[DENSE_DOWNSAMPLER_OUT_SIZE];
float conv1_tmp1[(NB_XCORR_FEATURES + 2)*8] = {0};
float conv1_tmp2[(NB_XCORR_FEATURES + 2)*8] = {0};
float output[DENSE_FINAL_UPSAMPLER_OUT_SIZE];
int i;
int pos=0;
float maxval=-1;
float sum=0;
float count=0;
PitchDNN *model = &st->model;
/* IF */
compute_generic_dense(&model->dense_if_upsampler_1, if1_out, if_features, ACTIVATION_TANH, arch);
compute_generic_dense(&model->dense_if_upsampler_2, &downsampler_in[NB_XCORR_FEATURES], if1_out, ACTIVATION_TANH, arch);
/* xcorr*/
OPUS_COPY(&conv1_tmp1[1], xcorr_features, NB_XCORR_FEATURES);
compute_conv2d(&model->conv2d_1, &conv1_tmp2[1], st->xcorr_mem1, conv1_tmp1, NB_XCORR_FEATURES, NB_XCORR_FEATURES+2, ACTIVATION_TANH, arch);
compute_conv2d(&model->conv2d_2, downsampler_in, st->xcorr_mem2, conv1_tmp2, NB_XCORR_FEATURES, NB_XCORR_FEATURES, ACTIVATION_TANH, arch);
compute_generic_dense(&model->dense_downsampler, downsampler_out, downsampler_in, ACTIVATION_TANH, arch);
compute_generic_gru(&model->gru_1_input, &model->gru_1_recurrent, st->gru_state, downsampler_out, arch);
compute_generic_dense(&model->dense_final_upsampler, output, st->gru_state, ACTIVATION_LINEAR, arch);
for (i=0;i<180;i++) {
if (output[i] > maxval) {
pos = i;
maxval = output[i];
}
}
for (i=IMAX(0, pos-2); i<=IMIN(179, pos+2); i++) {
float p = exp(output[i]);
sum += p*i;
count += p;
}
/*printf("%d %f\n", pos, sum/count);*/
return (1.f/60.f)*(sum/count) - 1.5;
/*return 256.f/pow(2.f, (1.f/60.f)*i);*/
}
void pitchdnn_init(PitchDNNState *st)
{
int ret;
OPUS_CLEAR(st, 1);
#ifndef USE_WEIGHTS_FILE
ret = init_pitchdnn(&st->model, pitchdnn_arrays);
#else
ret = 0;
#endif
celt_assert(ret == 0);
}
int pitchdnn_load_model(PitchDNNState *st, const void *data, int len) {
WeightArray *list;
int ret;
parse_weights(&list, data, len);
ret = init_pitchdnn(&st->model, list);
opus_free(list);
if (ret == 0) return 0;
else return -1;
}

View File

@@ -0,0 +1,34 @@
#ifndef PITCHDNN_H
#define PITCHDNN_H
typedef struct PitchDNN PitchDNN;
#include "pitchdnn_data.h"
#define PITCH_MIN_PERIOD 32
#define PITCH_MAX_PERIOD 256
#define NB_XCORR_FEATURES (PITCH_MAX_PERIOD-PITCH_MIN_PERIOD)
typedef struct {
PitchDNN model;
float gru_state[GRU_1_STATE_SIZE];
float xcorr_mem1[(NB_XCORR_FEATURES + 2)*2];
float xcorr_mem2[(NB_XCORR_FEATURES + 2)*2*8];
float xcorr_mem3[(NB_XCORR_FEATURES + 2)*2*8];
} PitchDNNState;
void pitchdnn_init(PitchDNNState *st);
int pitchdnn_load_model(PitchDNNState *st, const void *data, int len);
float compute_pitchdnn(
PitchDNNState *st,
const float *if_features,
const float *xcorr_features,
int arch
);
#endif

View File

@@ -0,0 +1,50 @@
/* This file is auto-generated by gen_tables */
#ifndef TANSIG_TABLE_H
#define TANSIG_TABLE_H
static const float tansig_table[201] = {
0.000000f, 0.039979f, 0.079830f, 0.119427f, 0.158649f,
0.197375f, 0.235496f, 0.272905f, 0.309507f, 0.345214f,
0.379949f, 0.413644f, 0.446244f, 0.477700f, 0.507977f,
0.537050f, 0.564900f, 0.591519f, 0.616909f, 0.641077f,
0.664037f, 0.685809f, 0.706419f, 0.725897f, 0.744277f,
0.761594f, 0.777888f, 0.793199f, 0.807569f, 0.821040f,
0.833655f, 0.845456f, 0.856485f, 0.866784f, 0.876393f,
0.885352f, 0.893698f, 0.901468f, 0.908698f, 0.915420f,
0.921669f, 0.927473f, 0.932862f, 0.937863f, 0.942503f,
0.946806f, 0.950795f, 0.954492f, 0.957917f, 0.961090f,
0.964028f, 0.966747f, 0.969265f, 0.971594f, 0.973749f,
0.975743f, 0.977587f, 0.979293f, 0.980869f, 0.982327f,
0.983675f, 0.984921f, 0.986072f, 0.987136f, 0.988119f,
0.989027f, 0.989867f, 0.990642f, 0.991359f, 0.992020f,
0.992631f, 0.993196f, 0.993718f, 0.994199f, 0.994644f,
0.995055f, 0.995434f, 0.995784f, 0.996108f, 0.996407f,
0.996682f, 0.996937f, 0.997172f, 0.997389f, 0.997590f,
0.997775f, 0.997946f, 0.998104f, 0.998249f, 0.998384f,
0.998508f, 0.998623f, 0.998728f, 0.998826f, 0.998916f,
0.999000f, 0.999076f, 0.999147f, 0.999213f, 0.999273f,
0.999329f, 0.999381f, 0.999428f, 0.999472f, 0.999513f,
0.999550f, 0.999585f, 0.999617f, 0.999646f, 0.999673f,
0.999699f, 0.999722f, 0.999743f, 0.999763f, 0.999781f,
0.999798f, 0.999813f, 0.999828f, 0.999841f, 0.999853f,
0.999865f, 0.999875f, 0.999885f, 0.999893f, 0.999902f,
0.999909f, 0.999916f, 0.999923f, 0.999929f, 0.999934f,
0.999939f, 0.999944f, 0.999948f, 0.999952f, 0.999956f,
0.999959f, 0.999962f, 0.999965f, 0.999968f, 0.999970f,
0.999973f, 0.999975f, 0.999977f, 0.999978f, 0.999980f,
0.999982f, 0.999983f, 0.999984f, 0.999986f, 0.999987f,
0.999988f, 0.999989f, 0.999990f, 0.999990f, 0.999991f,
0.999992f, 0.999992f, 0.999993f, 0.999994f, 0.999994f,
0.999994f, 0.999995f, 0.999995f, 0.999996f, 0.999996f,
0.999996f, 0.999997f, 0.999997f, 0.999997f, 0.999997f,
0.999997f, 0.999998f, 0.999998f, 0.999998f, 0.999998f,
0.999998f, 0.999998f, 0.999999f, 0.999999f, 0.999999f,
0.999999f, 0.999999f, 0.999999f, 0.999999f, 0.999999f,
0.999999f, 0.999999f, 0.999999f, 0.999999f, 0.999999f,
1.000000f, 1.000000f, 1.000000f, 1.000000f, 1.000000f,
1.000000f, 1.000000f, 1.000000f, 1.000000f, 1.000000f,
1.000000f,
};
#endif /*TANSIG_TABLE_H*/

View File

@@ -0,0 +1,128 @@
#include <stdio.h>
#include <math.h>
#include "opus_types.h"
#include "arch.h"
#include "common.h"
#include "tansig_table.h"
#define LPCNET_TEST
// we need to call two versions of each functions that have the same
// name, so use #defines to temp rename them
#define lpcnet_exp2 lpcnet_exp2_fast
#define tansig_approx tansig_approx_fast
#define sigmoid_approx sigmoid_approx_fast
#define softmax softmax_fast
#define vec_tanh vec_tanh_fast
#define vec_sigmoid vec_sigmoid_fast
#define sgemv_accum16 sgemv_accum16_fast
#define sparse_sgemv_accum16 sparse_sgemv_accum16_fast
#ifdef __AVX__
#include "vec_avx.h"
#ifdef __AVX2__
const char simd[]="AVX2";
#else
const char simd[]="AVX";
#endif
#elif __ARM_NEON__
#include "vec_neon.h"
const char simd[]="NEON";
#else
const char simd[]="none";
#endif
#undef lpcnet_exp2
#undef tansig_approx
#undef sigmoid_approx
#undef softmax
#undef vec_tanh
#undef vec_sigmoid
#undef sgemv_accum16
#undef sparse_sgemv_accum16
#include "vec.h"
#define ROW_STEP 16
#define ROWS ROW_STEP*10
#define COLS 2
#define ENTRIES 2
int test_sgemv_accum16() {
float weights[ROWS*COLS];
float x[COLS];
float out[ROWS], out_fast[ROWS];
int i;
printf("sgemv_accum16.....................: ");
for(i=0; i<ROWS*COLS; i++) {
weights[i] = i;
}
for(i=0; i<ROWS; i++) {
out[i] = 0;
out_fast[i] = 0;
}
for(i=0; i<COLS; i++) {
x[i] = i+1;
}
sgemv_accum16(out, weights, ROWS, COLS, 1, x);
sgemv_accum16_fast(out_fast, weights, ROWS, COLS, 1, x);
for(i=0; i<ROWS; i++) {
if (out[i] != out_fast[i]) {
printf("fail\n");
for(i=0; i<ROWS; i++) {
printf("%d %f %f\n", i, out[i], out_fast[i]);
if (out[i] != out_fast[i])
return 1;
}
}
}
printf("pass\n");
return 0;
}
int test_sparse_sgemv_accum16() {
int rows = ROW_STEP*ENTRIES;
int indx[] = {1,0,2,0,1};
float w[ROW_STEP*(1+2)];
float x[ENTRIES] = {1,2};
float out[ROW_STEP*(1+2)], out_fast[ROW_STEP*(1+2)];
int i;
printf("sparse_sgemv_accum16..............: ");
for(i=0; i<ROW_STEP*(1+2); i++) {
w[i] = i;
out[i] = 0;
out_fast[i] = 0;
}
sparse_sgemv_accum16(out, w, rows, indx, x);
sparse_sgemv_accum16_fast(out_fast, w, rows, indx, x);
for(i=0; i<ROW_STEP*ENTRIES; i++) {
if (out[i] != out_fast[i]) {
printf("fail\n");
for(i=0; i<ROW_STEP*ENTRIES; i++) {
printf("%d %f %f\n", i, out[i], out_fast[i]);
if (out[i] != out_fast[i])
return 1;
}
}
}
printf("pass\n");
return 0;
}
int main() {
printf("testing vector routines on SIMD: %s\n", simd);
int test1 = test_sgemv_accum16();
int test2 = test_sparse_sgemv_accum16();
return test1 || test2;
}

View File

@@ -0,0 +1,2 @@
from . import quantization
from . import sparsification

View File

@@ -0,0 +1 @@
from .softquant import soft_quant, remove_soft_quant

View File

@@ -0,0 +1,113 @@
import torch
@torch.no_grad()
def compute_optimal_scale(weight):
with torch.no_grad():
n_out, n_in = weight.shape
assert n_in % 4 == 0
if n_out % 8:
# add padding
pad = n_out - n_out % 8
weight = torch.cat((weight, torch.zeros((pad, n_in), dtype=weight.dtype, device=weight.device)), dim=0)
weight_max_abs, _ = torch.max(torch.abs(weight), dim=1)
weight_max_sum, _ = torch.max(torch.abs(weight[:, : n_in : 2] + weight[:, 1 : n_in : 2]), dim=1)
scale_max = weight_max_abs / 127
scale_sum = weight_max_sum / 129
scale = torch.maximum(scale_max, scale_sum)
return scale[:n_out]
@torch.no_grad()
def q_scaled_noise(module, weight):
if isinstance(module, torch.nn.Conv1d):
w = weight.permute(0, 2, 1).flatten(1)
noise = torch.rand_like(w) - 0.5
noise[w == 0] = 0 # ignore zero entries from sparsification
scale = compute_optimal_scale(w)
noise = noise * scale.unsqueeze(-1)
noise = noise.reshape(weight.size(0), weight.size(2), weight.size(1)).permute(0, 2, 1)
elif isinstance(module, torch.nn.ConvTranspose1d):
i, o, k = weight.shape
w = weight.permute(2, 1, 0).reshape(k * o, i)
noise = torch.rand_like(w) - 0.5
noise[w == 0] = 0 # ignore zero entries from sparsification
scale = compute_optimal_scale(w)
noise = noise * scale.unsqueeze(-1)
noise = noise.reshape(k, o, i).permute(2, 1, 0)
elif len(weight.shape) == 2:
noise = torch.rand_like(weight) - 0.5
noise[weight == 0] = 0 # ignore zero entries from sparsification
scale = compute_optimal_scale(weight)
noise = noise * scale.unsqueeze(-1)
else:
raise ValueError('unknown quantization setting')
return noise
class SoftQuant:
name: str
def __init__(self, names: str, scale: float) -> None:
self.names = names
self.quantization_noise = None
self.scale = scale
def __call__(self, module, inputs, *args, before=True):
if not module.training: return
if before:
self.quantization_noise = dict()
for name in self.names:
weight = getattr(module, name)
if self.scale is None:
self.quantization_noise[name] = q_scaled_noise(module, weight)
else:
self.quantization_noise[name] = \
self.scale * (torch.rand_like(weight) - 0.5)
with torch.no_grad():
weight.data[:] = weight + self.quantization_noise[name]
else:
for name in self.names:
weight = getattr(module, name)
with torch.no_grad():
weight.data[:] = weight - self.quantization_noise[name]
self.quantization_noise = None
def apply(module, names=['weight'], scale=None):
fn = SoftQuant(names, scale)
for name in names:
if not hasattr(module, name):
raise ValueError("")
fn_before = lambda *x : fn(*x, before=True)
fn_after = lambda *x : fn(*x, before=False)
setattr(fn_before, 'sqm', fn)
setattr(fn_after, 'sqm', fn)
module.register_forward_pre_hook(fn_before)
module.register_forward_hook(fn_after)
module
return fn
def soft_quant(module, names=['weight'], scale=None):
fn = SoftQuant.apply(module, names, scale)
return module
def remove_soft_quant(module, names=['weight']):
for k, hook in module._forward_pre_hooks.items():
if hasattr(hook, 'sqm'):
if isinstance(hook.sqm, SoftQuant) and hook.sqm.names == names:
del module._forward_pre_hooks[k]
for k, hook in module._forward_hooks.items():
if hasattr(hook, 'sqm'):
if isinstance(hook.sqm, SoftQuant) and hook.sqm.names == names:
del module._forward_hooks[k]
return module

View File

@@ -0,0 +1,2 @@
from .relegance import relegance_gradient_weighting, relegance_create_tconv_kernel, relegance_map_relevance_to_input_domain, relegance_resize_relevance_to_input_size
from .meta_critic import MetaCritic

View File

@@ -0,0 +1,85 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import torch
class MetaCritic():
def __init__(self, normalize=False, gamma=0.9, beta=0.0, joint_stats=False):
""" Class for assessing relevance of discriminator scores
Args:
gamma (float, optional): update rate for tracking discriminator stats. Defaults to 0.9.
beta (float, optional): Miminum confidence related threshold. Defaults to 0.0.
"""
self.normalize = normalize
self.gamma = gamma
self.beta = beta
self.joint_stats = joint_stats
self.disc_stats = dict()
def __call__(self, disc_id, real_scores, generated_scores):
""" calculates relevance from normalized scores
Args:
disc_id (any valid key): id for tracking discriminator statistics
real_scores (torch.tensor): scores for real data
generated_scores (torch.tensor): scores for generated data; expecting device to match real_scores.device
Returns:
torch.tensor: output-domain relevance
"""
if self.normalize:
real_std = torch.std(real_scores.detach()).cpu().item()
gen_std = torch.std(generated_scores.detach()).cpu().item()
std = (real_std**2 + gen_std**2) ** .5
mean = torch.mean(real_scores.detach()).cpu().item() - torch.mean(generated_scores.detach()).cpu().item()
key = 0 if self.joint_stats else disc_id
if key in self.disc_stats:
self.disc_stats[key]['std'] = self.gamma * self.disc_stats[key]['std'] + (1 - self.gamma) * std
self.disc_stats[key]['mean'] = self.gamma * self.disc_stats[key]['mean'] + (1 - self.gamma) * mean
else:
self.disc_stats[key] = {
'std': std + 1e-5,
'mean': mean
}
std = self.disc_stats[key]['std']
mean = self.disc_stats[key]['mean']
else:
mean, std = 0, 1
relevance = torch.relu((real_scores - generated_scores - mean) / std + mean - self.beta)
if False: print(f"relevance({disc_id}): {relevance.min()=} {relevance.max()=} {relevance.mean()=}")
return relevance

View File

@@ -0,0 +1,449 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import torch
import torch.nn.functional as F
def view_one_hot(index, length):
vec = length * [1]
vec[index] = -1
return vec
def create_smoothing_kernel(widths, gamma=1.5):
""" creates a truncated gaussian smoothing kernel for the given widths
Parameters:
-----------
widths: list[Int] or torch.LongTensor
specifies the shape of the smoothing kernel, entries must be > 0.
gamma: float, optional
decay factor for gaussian relative to kernel size
Returns:
--------
kernel: torch.FloatTensor
"""
widths = torch.LongTensor(widths)
num_dims = len(widths)
assert(widths.min() > 0)
centers = widths.float() / 2 - 0.5
sigmas = gamma * (centers + 1)
vals = []
vals= [((torch.arange(widths[i]) - centers[i]) / sigmas[i]) ** 2 for i in range(num_dims)]
vals = sum([vals[i].view(view_one_hot(i, num_dims)) for i in range(num_dims)])
kernel = torch.exp(- vals)
kernel = kernel / kernel.sum()
return kernel
def create_partition_kernel(widths, strides):
""" creates a partition kernel for mapping a convolutional network output back to the input domain
Given a fully convolutional network with receptive field of shape widths and the given strides, this
function construncts an intorpolation kernel whose tranlations by multiples of the given strides form
a partition of one on the input domain.
Parameter:
----------
widths: list[Int] or torch.LongTensor
shape of receptive field
strides: list[Int] or torch.LongTensor
total strides of convolutional network
Returns:
kernel: torch.FloatTensor
"""
num_dims = len(widths)
assert num_dims == len(strides) and num_dims in {1, 2, 3}
convs = {1 : F.conv1d, 2 : F.conv2d, 3 : F.conv3d}
widths = torch.LongTensor(widths)
strides = torch.LongTensor(strides)
proto_kernel = torch.ones(torch.minimum(strides, widths).tolist())
# create interpolation kernel eta
eta_widths = widths - strides + 1
if eta_widths.min() <= 0:
print("[create_partition_kernel] warning: receptive field does not cover input domain")
eta_widths = torch.maximum(eta_widths, torch.ones_like(eta_widths))
eta = create_smoothing_kernel(eta_widths).view(1, 1, *eta_widths.tolist())
padding = torch.repeat_interleave(eta_widths - 1, 2, 0).tolist()[::-1] # ordering of dimensions for padding and convolution functions is reversed in torch
padded_proto_kernel = F.pad(proto_kernel, padding)
padded_proto_kernel = padded_proto_kernel.view(1, 1, *padded_proto_kernel.shape)
kernel = convs[num_dims](padded_proto_kernel, eta)
return kernel
def receptive_field(conv_model, input_shape, output_position):
""" estimates boundaries of receptive field connected to output_position via autograd
Parameters:
-----------
conv_model: nn.Module or autograd function
function or model implementing fully convolutional model
input_shape: List[Int]
input shape ignoring batch dimension, i.e. [num_channels, dim1, dim2, ...]
output_position: List[Int]
output position for which the receptive field is determined; the function raises an exception
if output_position is out of bounds for the given input_shape.
Returns:
--------
low: List[Int]
start indices of receptive field
high: List[Int]
stop indices of receptive field
"""
x = torch.randn((1,) + tuple(input_shape), requires_grad=True)
y = conv_model(x)
# collapse channels and remove batch dimension
y = torch.sum(y, 1)[0]
# create mask
mask = torch.zeros_like(y)
index = [torch.tensor(i) for i in output_position]
try:
mask.index_put_(index, torch.tensor(1, dtype=mask.dtype))
except IndexError:
raise ValueError('output_position out of bounds')
(mask * y).sum().backward()
# sum over channels and remove batch dimension
grad = torch.sum(x.grad, dim=1)[0]
tmp = torch.nonzero(grad, as_tuple=True)
low = [t.min().item() for t in tmp]
high = [t.max().item() for t in tmp]
return low, high
def estimate_conv_parameters(model, num_channels, num_dims, width, max_stride=10):
""" attempts to estimate receptive field size, strides and left paddings for given model
Parameters:
-----------
model: nn.Module or autograd function
fully convolutional model for which parameters are estimated
num_channels: Int
number of input channels for model
num_dims: Int
number of input dimensions for model (without channel dimension)
width: Int
width of the input tensor (a hyper-square) on which the receptive fields are derived via autograd
max_stride: Int, optional
assumed maximal stride of the model for any dimension, when set too low the function may fail for
any value of width
Returns:
--------
receptive_field_size: List[Int]
receptive field size in all dimension
strides: List[Int]
stride in all dimensions
left_paddings: List[Int]
left padding in all dimensions; this is relevant for aligning the receptive field on the input plane
Raises:
-------
ValueError, KeyError
"""
input_shape = [num_channels] + num_dims * [width]
output_position1 = num_dims * [width // (2 * max_stride)]
output_position2 = num_dims * [width // (2 * max_stride) + 1]
low1, high1 = receptive_field(model, input_shape, output_position1)
low2, high2 = receptive_field(model, input_shape, output_position2)
widths1 = [h - l + 1 for l, h in zip(low1, high1)]
widths2 = [h - l + 1 for l, h in zip(low2, high2)]
if not all([w1 - w2 == 0 for w1, w2 in zip(widths1, widths2)]) or not all([l1 != l2 for l1, l2 in zip(low1, low2)]):
raise ValueError("[estimate_strides]: widths to small to determine strides")
receptive_field_size = widths1
strides = [l2 - l1 for l1, l2 in zip(low1, low2)]
left_paddings = [s * p - l for l, s, p in zip(low1, strides, output_position1)]
return receptive_field_size, strides, left_paddings
def inspect_conv_model(model, num_channels, num_dims, max_width=10000, width_hint=None, stride_hint=None, verbose=False):
""" determines size of receptive field, strides and padding probabilistically
Parameters:
-----------
model: nn.Module or autograd function
fully convolutional model for which parameters are estimated
num_channels: Int
number of input channels for model
num_dims: Int
number of input dimensions for model (without channel dimension)
max_width: Int
maximum width of the input tensor (a hyper-square) on which the receptive fields are derived via autograd
verbose: bool, optional
if true, the function prints parameters for individual trials
Returns:
--------
receptive_field_size: List[Int]
receptive field size in all dimension
strides: List[Int]
stride in all dimensions
left_paddings: List[Int]
left padding in all dimensions; this is relevant for aligning the receptive field on the input plane
Raises:
-------
ValueError
"""
max_stride = max_width // 2
stride = max_stride // 100
width = max_width // 100
if width_hint is not None: width = 2 * width_hint
if stride_hint is not None: stride = stride_hint
did_it = False
while width < max_width and stride < max_stride:
try:
if verbose: print(f"[inspect_conv_model] trying parameters {width=}, {stride=}")
receptive_field_size, strides, left_paddings = estimate_conv_parameters(model, num_channels, num_dims, width, stride)
did_it = True
except:
pass
if did_it: break
width *= 2
if width >= max_width and stride < max_stride:
stride *= 2
width = 2 * stride
if not did_it:
raise ValueError(f'could not determine conv parameter with given max_width={max_width}')
return receptive_field_size, strides, left_paddings
class GradWeight(torch.autograd.Function):
def __init__(self):
super().__init__()
@staticmethod
def forward(ctx, x, weight):
ctx.save_for_backward(weight)
return x.clone()
@staticmethod
def backward(ctx, grad_output):
weight, = ctx.saved_tensors
grad_input = grad_output * weight
return grad_input, None
# API
def relegance_gradient_weighting(x, weight):
"""
Args:
x (torch.tensor): input tensor
weight (torch.tensor or None): weight tensor for gradients of x; if None, no gradient weighting will be applied in backward pass
Returns:
torch.tensor: the unmodified input tensor x
Raises:
RuntimeError: if estimation of parameters fails due to exceeded compute budget
"""
if weight is None:
return x
else:
return GradWeight.apply(x, weight)
def relegance_create_tconv_kernel(model, num_channels, num_dims, width_hint=None, stride_hint=None, verbose=False):
""" creates parameters for mapping back output domain relevance to input tomain
Args:
model (nn.Module or autograd.Function): fully convolutional model
num_channels (int): number of input channels to model
num_dims (int): number of input dimensions of model (without channel and batch dimension)
width_hint(int or None): optional hint at maximal width of receptive field
stride_hint(int or None): optional hint at maximal stride
Returns:
dict: contains kernel, kernel dimensions, strides and left paddings for transposed convolution
"""
max_width = int(100000 / (10 ** num_dims))
did_it = False
try:
receptive_field_size, strides, left_paddings = inspect_conv_model(model, num_channels, num_dims, max_width=max_width, width_hint=width_hint, stride_hint=stride_hint, verbose=verbose)
did_it = True
except:
# try once again with larger max_width
max_width *= 10
# crash if exception is raised
try:
if not did_it: receptive_field_size, strides, left_paddings = inspect_conv_model(model, num_channels, num_dims, max_width=max_width, width_hint=width_hint, stride_hint=stride_hint, verbose=verbose)
except:
raise RuntimeError("could not determine parameters within given compute budget")
partition_kernel = create_partition_kernel(receptive_field_size, strides)
partition_kernel = torch.repeat_interleave(partition_kernel, num_channels, 1)
tconv_parameters = {
'kernel': partition_kernel,
'receptive_field_shape': receptive_field_size,
'stride': strides,
'left_padding': left_paddings,
'num_dims': num_dims
}
return tconv_parameters
def relegance_map_relevance_to_input_domain(od_relevance, tconv_parameters):
""" maps output-domain relevance to input-domain relevance via transpose convolution
Args:
od_relevance (torch.tensor): output-domain relevance
tconv_parameters (dict): parameter dict as created by relegance_create_tconv_kernel
Returns:
torch.tensor: input-domain relevance. The tensor is left aligned, i.e. the all-zero index of the output corresponds to the all-zero index of the discriminator input.
Otherwise, the size of the output tensor does not need to match the size of the discriminator input. Use relegance_resize_relevance_to_input_size for a
convenient way to adjust the output to the correct size.
Raises:
ValueError: if number of dimensions is not supported
"""
kernel = tconv_parameters['kernel'].to(od_relevance.device)
rf_shape = tconv_parameters['receptive_field_shape']
stride = tconv_parameters['stride']
left_padding = tconv_parameters['left_padding']
num_dims = len(kernel.shape) - 2
# repeat boundary values
od_padding = [rf_shape[i//2] // stride[i//2] + 1 for i in range(2 * num_dims)]
padded_od_relevance = F.pad(od_relevance, od_padding[::-1], mode='replicate')
od_padding = od_padding[::2]
# apply mapping and left trimming
if num_dims == 1:
id_relevance = F.conv_transpose1d(padded_od_relevance, kernel, stride=stride)
id_relevance = id_relevance[..., left_padding[0] + stride[0] * od_padding[0] :]
elif num_dims == 2:
id_relevance = F.conv_transpose2d(padded_od_relevance, kernel, stride=stride)
id_relevance = id_relevance[..., left_padding[0] + stride[0] * od_padding[0] :, left_padding[1] + stride[1] * od_padding[1]:]
elif num_dims == 3:
id_relevance = F.conv_transpose2d(padded_od_relevance, kernel, stride=stride)
id_relevance = id_relevance[..., left_padding[0] + stride[0] * od_padding[0] :, left_padding[1] + stride[1] * od_padding[1]:, left_padding[2] + stride[2] * od_padding[2] :]
else:
raise ValueError(f'[relegance_map_to_input_domain] error: num_dims = {num_dims} not supported')
return id_relevance
def relegance_resize_relevance_to_input_size(reference_input, relevance):
""" adjusts size of relevance tensor to reference input size
Args:
reference_input (torch.tensor): discriminator input tensor for reference
relevance (torch.tensor): input-domain relevance corresponding to input tensor reference_input
Returns:
torch.tensor: resized relevance
Raises:
ValueError: if number of dimensions is not supported
"""
resized_relevance = torch.zeros_like(reference_input)
num_dims = len(reference_input.shape) - 2
with torch.no_grad():
if num_dims == 1:
resized_relevance[:] = relevance[..., : min(reference_input.size(-1), relevance.size(-1))]
elif num_dims == 2:
resized_relevance[:] = relevance[..., : min(reference_input.size(-2), relevance.size(-2)), : min(reference_input.size(-1), relevance.size(-1))]
elif num_dims == 3:
resized_relevance[:] = relevance[..., : min(reference_input.size(-3), relevance.size(-3)), : min(reference_input.size(-2), relevance.size(-2)), : min(reference_input.size(-1), relevance.size(-1))]
else:
raise ValueError(f'[relegance_map_to_input_domain] error: num_dims = {num_dims} not supported')
return resized_relevance

View File

@@ -0,0 +1,6 @@
from .gru_sparsifier import GRUSparsifier
from .conv1d_sparsifier import Conv1dSparsifier
from .conv_transpose1d_sparsifier import ConvTranspose1dSparsifier
from .linear_sparsifier import LinearSparsifier
from .common import sparsify_matrix, calculate_gru_flops_per_step
from .utils import mark_for_sparsification, create_sparsifier

View File

@@ -0,0 +1,58 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
class BaseSparsifier:
def __init__(self, task_list, start, stop, interval, exponent=3):
# just copying parameters...
self.start = start
self.stop = stop
self.interval = interval
self.exponent = exponent
self.task_list = task_list
# ... and setting counter to 0
self.step_counter = 0
def step(self, verbose=False):
# compute current interpolation factor
self.step_counter += 1
if self.step_counter < self.start:
return
elif self.step_counter < self.stop:
# update only every self.interval-th interval
if self.step_counter % self.interval:
return
alpha = ((self.stop - self.step_counter) / (self.stop - self.start)) ** self.exponent
else:
alpha = 0
self.sparsify(alpha, verbose=verbose)

View File

@@ -0,0 +1,123 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import torch
debug=True
def sparsify_matrix(matrix : torch.tensor, density : float, block_size, keep_diagonal : bool=False, return_mask : bool=False):
""" sparsifies matrix with specified block size
Parameters:
-----------
matrix : torch.tensor
matrix to sparsify
density : int
target density
block_size : [int, int]
block size dimensions
keep_diagonal : bool
If true, the diagonal will be kept. This option requires block_size[0] == block_size[1] and defaults to False
"""
m, n = matrix.shape
m1, n1 = block_size
if m % m1 or n % n1:
raise ValueError(f"block size {(m1, n1)} does not divide matrix size {(m, n)}")
# extract diagonal if keep_diagonal = True
if keep_diagonal:
if m != n:
raise ValueError("Attempting to sparsify non-square matrix with keep_diagonal=True")
to_spare = torch.diag(torch.diag(matrix))
matrix = matrix - to_spare
else:
to_spare = torch.zeros_like(matrix)
# calculate energy in sub-blocks
x = torch.reshape(matrix, (m // m1, m1, n // n1, n1))
x = x ** 2
block_energies = torch.sum(torch.sum(x, dim=3), dim=1)
number_of_blocks = (m * n) // (m1 * n1)
number_of_survivors = round(number_of_blocks * density)
# masking threshold
if number_of_survivors == 0:
threshold = 0
else:
threshold = torch.sort(torch.flatten(block_energies)).values[-number_of_survivors]
# create mask
mask = torch.ones_like(block_energies)
mask[block_energies < threshold] = 0
mask = torch.repeat_interleave(mask, m1, dim=0)
mask = torch.repeat_interleave(mask, n1, dim=1)
# perform masking
masked_matrix = mask * matrix + to_spare
if return_mask:
return masked_matrix, mask
else:
return masked_matrix
def calculate_gru_flops_per_step(gru, sparsification_dict=dict(), drop_input=False):
input_size = gru.input_size
hidden_size = gru.hidden_size
flops = 0
input_density = (
sparsification_dict.get('W_ir', [1])[0]
+ sparsification_dict.get('W_in', [1])[0]
+ sparsification_dict.get('W_iz', [1])[0]
) / 3
recurrent_density = (
sparsification_dict.get('W_hr', [1])[0]
+ sparsification_dict.get('W_hn', [1])[0]
+ sparsification_dict.get('W_hz', [1])[0]
) / 3
# input matrix vector multiplications
if not drop_input:
flops += 2 * 3 * input_size * hidden_size * input_density
# recurrent matrix vector multiplications
flops += 2 * 3 * hidden_size * hidden_size * recurrent_density
# biases
flops += 6 * hidden_size
# activations estimated by 10 flops per activation
flops += 30 * hidden_size
return flops

View File

@@ -0,0 +1,133 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import torch
from .base_sparsifier import BaseSparsifier
from .common import sparsify_matrix, debug
class Conv1dSparsifier(BaseSparsifier):
def __init__(self, task_list, start, stop, interval, exponent=3):
""" Sparsifier for torch.nn.GRUs
Parameters:
-----------
task_list : list
task_list contains a list of tuples (conv1d, params), where conv1d is an instance
of torch.nn.Conv1d and params is a tuple (density, [m, n]),
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
sparsification is applied.
start : int
training step after which sparsification will be started.
stop : int
training step after which sparsification will be completed.
interval : int
sparsification interval for steps between start and stop. After stop sparsification will be
carried out after every call to GRUSparsifier.step()
exponent : float
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
with density (alpha + target_density * (1 * alpha)), where
alpha = ((stop - i) / (start - stop)) ** exponent
Example:
--------
>>> import torch
>>> conv = torch.nn.Conv1d(8, 16, 8)
>>> params = (0.2, [8, 4])
>>> sparsifier = Conv1dSparsifier([(conv, params)], 0, 100, 50)
>>> for i in range(100):
... sparsifier.step()
"""
super().__init__(task_list, start, stop, interval, exponent=3)
self.last_mask = None
def sparsify(self, alpha, verbose=False):
""" carries out sparsification step
Call this function after optimizer.step in your
training loop.
Parameters:
----------
alpha : float
density interpolation parameter (1: dense, 0: target density)
verbose : bool
if true, densities are printed out
Returns:
--------
None
"""
with torch.no_grad():
for conv, params in self.task_list:
# reshape weight
if hasattr(conv, 'weight_v'):
weight = conv.weight_v
else:
weight = conv.weight
i, o, k = weight.shape
w = weight.permute(0, 2, 1).flatten(1)
target_density, block_size = params
density = alpha + (1 - alpha) * target_density
w, new_mask = sparsify_matrix(w, density, block_size, return_mask=True)
w = w.reshape(i, k, o).permute(0, 2, 1)
weight[:] = w
if self.last_mask is not None:
if not torch.all(self.last_mask * new_mask == new_mask) and debug:
print("weight resurrection in conv.weight")
self.last_mask = new_mask
if verbose:
print(f"conv1d_sparsier[{self.step_counter}]: {density=}")
if __name__ == "__main__":
print("Testing sparsifier")
import torch
conv = torch.nn.Conv1d(8, 16, 8)
params = (0.2, [8, 4])
sparsifier = Conv1dSparsifier([(conv, params)], 0, 100, 5)
for i in range(100):
sparsifier.step(verbose=True)
print(conv.weight)

View File

@@ -0,0 +1,134 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import torch
from .base_sparsifier import BaseSparsifier
from .common import sparsify_matrix, debug
class ConvTranspose1dSparsifier(BaseSparsifier):
def __init__(self, task_list, start, stop, interval, exponent=3):
""" Sparsifier for torch.nn.GRUs
Parameters:
-----------
task_list : list
task_list contains a list of tuples (conv1d, params), where conv1d is an instance
of torch.nn.Conv1d and params is a tuple (density, [m, n]),
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
sparsification is applied.
start : int
training step after which sparsification will be started.
stop : int
training step after which sparsification will be completed.
interval : int
sparsification interval for steps between start and stop. After stop sparsification will be
carried out after every call to GRUSparsifier.step()
exponent : float
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
with density (alpha + target_density * (1 * alpha)), where
alpha = ((stop - i) / (start - stop)) ** exponent
Example:
--------
>>> import torch
>>> conv = torch.nn.ConvTranspose1d(8, 16, 8)
>>> params = (0.2, [8, 4])
>>> sparsifier = ConvTranspose1dSparsifier([(conv, params)], 0, 100, 50)
>>> for i in range(100):
... sparsifier.step()
"""
super().__init__(task_list, start, stop, interval, exponent=3)
self.last_mask = None
def sparsify(self, alpha, verbose=False):
""" carries out sparsification step
Call this function after optimizer.step in your
training loop.
Parameters:
----------
alpha : float
density interpolation parameter (1: dense, 0: target density)
verbose : bool
if true, densities are printed out
Returns:
--------
None
"""
with torch.no_grad():
for conv, params in self.task_list:
# reshape weight
if hasattr(conv, 'weight_v'):
weight = conv.weight_v
else:
weight = conv.weight
i, o, k = weight.shape
w = weight.permute(2, 1, 0).reshape(k * o, i)
target_density, block_size = params
density = alpha + (1 - alpha) * target_density
w, new_mask = sparsify_matrix(w, density, block_size, return_mask=True)
w = w.reshape(k, o, i).permute(2, 1, 0)
weight[:] = w
if self.last_mask is not None:
if not torch.all(self.last_mask * new_mask == new_mask) and debug:
print("weight resurrection in conv.weight")
self.last_mask = new_mask
if verbose:
print(f"convtrans1d_sparsier[{self.step_counter}]: {density=}")
if __name__ == "__main__":
print("Testing sparsifier")
import torch
conv = torch.nn.ConvTranspose1d(8, 16, 4, 4)
params = (0.2, [8, 4])
sparsifier = ConvTranspose1dSparsifier([(conv, params)], 0, 100, 5)
for i in range(100):
sparsifier.step(verbose=True)
print(conv.weight)

View File

@@ -0,0 +1,178 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import torch
from .base_sparsifier import BaseSparsifier
from .common import sparsify_matrix, debug
class GRUSparsifier(BaseSparsifier):
def __init__(self, task_list, start, stop, interval, exponent=3):
""" Sparsifier for torch.nn.GRUs
Parameters:
-----------
task_list : list
task_list contains a list of tuples (gru, sparsify_dict), where gru is an instance
of torch.nn.GRU and sparsify_dic is a dictionary with keys in {'W_ir', 'W_iz', 'W_in',
'W_hr', 'W_hz', 'W_hn'} corresponding to the input and recurrent weights for the reset,
update, and new gate. The values of sparsify_dict are tuples (density, [m, n], keep_diagonal),
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
sparsification is applied and keep_diagonal is a bool variable indicating whether the diagonal
should be kept.
start : int
training step after which sparsification will be started.
stop : int
training step after which sparsification will be completed.
interval : int
sparsification interval for steps between start and stop. After stop sparsification will be
carried out after every call to GRUSparsifier.step()
exponent : float
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
with density (alpha + target_density * (1 * alpha)), where
alpha = ((stop - i) / (start - stop)) ** exponent
Example:
--------
>>> import torch
>>> gru = torch.nn.GRU(10, 20)
>>> sparsify_dict = {
... 'W_ir' : (0.5, [2, 2], False),
... 'W_iz' : (0.6, [2, 2], False),
... 'W_in' : (0.7, [2, 2], False),
... 'W_hr' : (0.1, [4, 4], True),
... 'W_hz' : (0.2, [4, 4], True),
... 'W_hn' : (0.3, [4, 4], True),
... }
>>> sparsifier = GRUSparsifier([(gru, sparsify_dict)], 0, 100, 50)
>>> for i in range(100):
... sparsifier.step()
"""
super().__init__(task_list, start, stop, interval, exponent=3)
self.last_masks = {key : None for key in ['W_ir', 'W_in', 'W_iz', 'W_hr', 'W_hn', 'W_hz']}
def sparsify(self, alpha, verbose=False):
""" carries out sparsification step
Call this function after optimizer.step in your
training loop.
Parameters:
----------
alpha : float
density interpolation parameter (1: dense, 0: target density)
verbose : bool
if true, densities are printed out
Returns:
--------
None
"""
with torch.no_grad():
for gru, params in self.task_list:
hidden_size = gru.hidden_size
# input weights
for i, key in enumerate(['W_ir', 'W_iz', 'W_in']):
if key in params:
if hasattr(gru, 'weight_ih_l0_v'):
weight = gru.weight_ih_l0_v
else:
weight = gru.weight_ih_l0
density = alpha + (1 - alpha) * params[key][0]
if verbose:
print(f"[{self.step_counter}]: {key} density: {density}")
weight[i * hidden_size : (i+1) * hidden_size, : ], new_mask = sparsify_matrix(
weight[i * hidden_size : (i + 1) * hidden_size, : ],
density, # density
params[key][1], # block_size
params[key][2], # keep_diagonal (might want to set this to False)
return_mask=True
)
if type(self.last_masks[key]) != type(None):
if not torch.all(self.last_masks[key] * new_mask == new_mask) and debug:
print("weight resurrection in weight_ih_l0_v")
self.last_masks[key] = new_mask
# recurrent weights
for i, key in enumerate(['W_hr', 'W_hz', 'W_hn']):
if key in params:
if hasattr(gru, 'weight_hh_l0_v'):
weight = gru.weight_hh_l0_v
else:
weight = gru.weight_hh_l0
density = alpha + (1 - alpha) * params[key][0]
if verbose:
print(f"[{self.step_counter}]: {key} density: {density}")
weight[i * hidden_size : (i+1) * hidden_size, : ], new_mask = sparsify_matrix(
weight[i * hidden_size : (i + 1) * hidden_size, : ],
density,
params[key][1], # block_size
params[key][2], # keep_diagonal (might want to set this to False)
return_mask=True
)
if type(self.last_masks[key]) != type(None):
if not torch.all(self.last_masks[key] * new_mask == new_mask) and True:
print("weight resurrection in weight_hh_l0_v")
self.last_masks[key] = new_mask
if __name__ == "__main__":
print("Testing sparsifier")
gru = torch.nn.GRU(10, 20)
sparsify_dict = {
'W_ir' : (0.5, [2, 2], False),
'W_iz' : (0.6, [2, 2], False),
'W_in' : (0.7, [2, 2], False),
'W_hr' : (0.1, [4, 4], True),
'W_hz' : (0.2, [4, 4], True),
'W_hn' : (0.3, [4, 4], True),
}
sparsifier = GRUSparsifier([(gru, sparsify_dict)], 0, 100, 10)
for i in range(100):
sparsifier.step(verbose=True)
print(gru.weight_hh_l0)

View File

@@ -0,0 +1,128 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import torch
from .base_sparsifier import BaseSparsifier
from .common import sparsify_matrix
class LinearSparsifier(BaseSparsifier):
def __init__(self, task_list, start, stop, interval, exponent=3):
""" Sparsifier for torch.nn.GRUs
Parameters:
-----------
task_list : list
task_list contains a list of tuples (linear, params), where linear is an instance
of torch.nn.Linear and params is a tuple (density, [m, n]),
where density is the target density in [0, 1], [m, n] is the shape sub-blocks to which
sparsification is applied.
start : int
training step after which sparsification will be started.
stop : int
training step after which sparsification will be completed.
interval : int
sparsification interval for steps between start and stop. After stop sparsification will be
carried out after every call to GRUSparsifier.step()
exponent : float
Interpolation exponent for sparsification interval. In step i sparsification will be carried out
with density (alpha + target_density * (1 * alpha)), where
alpha = ((stop - i) / (start - stop)) ** exponent
Example:
--------
>>> import torch
>>> linear = torch.nn.Linear(8, 16)
>>> params = (0.2, [8, 4])
>>> sparsifier = LinearSparsifier([(linear, params)], 0, 100, 50)
>>> for i in range(100):
... sparsifier.step()
"""
super().__init__(task_list, start, stop, interval, exponent=3)
self.last_mask = None
def sparsify(self, alpha, verbose=False):
""" carries out sparsification step
Call this function after optimizer.step in your
training loop.
Parameters:
----------
alpha : float
density interpolation parameter (1: dense, 0: target density)
verbose : bool
if true, densities are printed out
Returns:
--------
None
"""
with torch.no_grad():
for linear, params in self.task_list:
if hasattr(linear, 'weight_v'):
weight = linear.weight_v
else:
weight = linear.weight
target_density, block_size = params
density = alpha + (1 - alpha) * target_density
weight[:], new_mask = sparsify_matrix(weight, density, block_size, return_mask=True)
if self.last_mask is not None:
if not torch.all(self.last_mask * new_mask == new_mask) and debug:
print("weight resurrection in conv.weight")
self.last_mask = new_mask
if verbose:
print(f"linear_sparsifier[{self.step_counter}]: {density=}")
if __name__ == "__main__":
print("Testing sparsifier")
import torch
linear = torch.nn.Linear(8, 16)
params = (0.2, [4, 2])
sparsifier = LinearSparsifier([(linear, params)], 0, 100, 5)
for i in range(100):
sparsifier.step(verbose=True)
print(linear.weight)

View File

@@ -0,0 +1,64 @@
import torch
from dnntools.sparsification import GRUSparsifier, LinearSparsifier, Conv1dSparsifier, ConvTranspose1dSparsifier
def mark_for_sparsification(module, params):
setattr(module, 'sparsify', True)
setattr(module, 'sparsification_params', params)
return module
def create_sparsifier(module, start, stop, interval):
sparsifier_list = []
for m in module.modules():
if hasattr(m, 'sparsify'):
if isinstance(m, torch.nn.GRU):
sparsifier_list.append(
GRUSparsifier([(m, m.sparsification_params)], start, stop, interval)
)
elif isinstance(m, torch.nn.Linear):
sparsifier_list.append(
LinearSparsifier([(m, m.sparsification_params)], start, stop, interval)
)
elif isinstance(m, torch.nn.Conv1d):
sparsifier_list.append(
Conv1dSparsifier([(m, m.sparsification_params)], start, stop, interval)
)
elif isinstance(m, torch.nn.ConvTranspose1d):
sparsifier_list.append(
ConvTranspose1dSparsifier([(m, m.sparsification_params)], start, stop, interval)
)
else:
print(f"[create_sparsifier] warning: module {m} marked for sparsification but no suitable sparsifier exists.")
def sparsify(verbose=False):
for sparsifier in sparsifier_list:
sparsifier.step(verbose)
return sparsify
def count_parameters(model, verbose=False):
total = 0
for name, p in model.named_parameters():
count = torch.ones_like(p).sum().item()
if verbose:
print(f"{name}: {count} parameters")
total += count
return total
def estimate_nonzero_parameters(module):
num_zero_parameters = 0
if hasattr(module, 'sparsify'):
params = module.sparsification_params
if isinstance(module, torch.nn.Conv1d) or isinstance(module, torch.nn.ConvTranspose1d):
num_zero_parameters = torch.ones_like(module.weight).sum().item() * (1 - params[0])
elif isinstance(module, torch.nn.GRU):
num_zero_parameters = module.input_size * module.hidden_size * (3 - params['W_ir'][0] - params['W_iz'][0] - params['W_in'][0])
num_zero_parameters += module.hidden_size * module.hidden_size * (3 - params['W_hr'][0] - params['W_hz'][0] - params['W_hn'][0])
elif isinstance(module, torch.nn.Linear):
num_zero_parameters = module.in_features * module.out_features * params[0]
else:
raise ValueError(f'unknown sparsification method for module of type {type(module)}')

View File

@@ -0,0 +1 @@
torch

View File

@@ -0,0 +1,48 @@
"""
/* Copyright (c) 2023 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
#!/usr/bin/env/python
import os
from setuptools import setup
lib_folder = os.path.dirname(os.path.realpath(__file__))
with open(os.path.join(lib_folder, 'requirements.txt'), 'r') as f:
install_requires = list(f.read().splitlines())
print(install_requires)
setup(name='dnntools',
version='1.0',
author='Jan Buethe',
author_email='jbuethe@amazon.de',
description='Non-Standard tools for deep neural network training with PyTorch',
packages=['dnntools', 'dnntools.sparsification', 'dnntools.quantization'],
install_requires=install_requires
)

View File

@@ -0,0 +1,54 @@
# Framewise Auto-Regressive GAN (FARGAN)
Implementation of FARGAN, a low-complexity neural vocoder. Pre-trained models
are provided as C code in the dnn/ directory with the corresponding model in
dnn/models/ directory (name starts with fargan_). If you don't want to train
a new FARGAN model, you can skip straight to the Inference section.
## Data preparation
For data preparation you need to build Opus as detailed in the top-level README.
You will need to use the --enable-deep-plc configure option.
The build will produce an executable named "dump_data".
To prepare the training data, run:
```
./dump_data -train in_speech.pcm out_features.f32 out_speech.pcm
```
Where the in_speech.pcm speech file is a raw 16-bit PCM file sampled at 16 kHz.
The speech data used for training the model can be found at:
https://media.xiph.org/lpcnet/speech/tts_speech_negative_16k.sw
## Training
To perform pre-training, run the following command:
```
python ./train_fargan.py out_features.f32 out_speech.pcm output_dir --epochs 400 --batch-size 4096 --lr 0.002 --cuda-visible-devices 0
```
Once pre-training is complete, run adversarial training using:
```
python adv_train_fargan.py out_features.f32 out_speech.pcm output_dir --lr 0.000002 --reg-weight 5 --batch-size 160 --cuda-visible-devices 0 --initial-checkpoint output_dir/checkpoints/fargan_400.pth
```
The final model will be in output_dir/checkpoints/fargan_adv_50.pth.
The model can optionally be converted to C using:
```
python dump_fargan_weights.py output_dir/checkpoints/fargan_adv_50.pth fargan_c_dir
```
which will create a fargan_data.c and a fargan_data.h file in the fargan_c_dir directory.
Copy these files to the opus/dnn/ directory (replacing the existing ones) and recompile Opus.
## Inference
To run the inference, start by generating the features from the audio using:
```
./fargan_demo -features test_speech.pcm test_features.f32
```
Synthesis can be achieved either using the PyTorch code or the C code.
To synthesize from PyTorch, run:
```
python test_fargan.py output_dir/checkpoints/fargan_adv_50.pth test_features.f32 output_speech.pcm
```
To synthesize from the C code, run:
```
./fargan_demo -fargan-synthesis test_features.f32 output_speech.pcm
```

View File

@@ -0,0 +1,278 @@
import os
import argparse
import random
import numpy as np
import sys
import math as m
import torch
from torch import nn
import torch.nn.functional as F
import tqdm
import fargan
from dataset import FARGANDataset
from stft_loss import *
source_dir = os.path.split(os.path.abspath(__file__))[0]
sys.path.append(os.path.join(source_dir, "../osce/"))
import models as osce_models
def fmap_loss(scores_real, scores_gen):
num_discs = len(scores_real)
loss_feat = 0
for k in range(num_discs):
num_layers = len(scores_gen[k]) - 1
f = 4 / num_discs / num_layers
for l in range(num_layers):
loss_feat += f * F.l1_loss(scores_gen[k][l], scores_real[k][l].detach())
return loss_feat
parser = argparse.ArgumentParser()
parser.add_argument('features', type=str, help='path to feature file in .f32 format')
parser.add_argument('signal', type=str, help='path to signal file in .s16 format')
parser.add_argument('output', type=str, help='path to output folder')
parser.add_argument('--suffix', type=str, help="model name suffix", default="")
parser.add_argument('--cuda-visible-devices', type=str, help="comma separates list of cuda visible device indices, default: CUDA_VISIBLE_DEVICES", default=None)
model_group = parser.add_argument_group(title="model parameters")
model_group.add_argument('--cond-size', type=int, help="first conditioning size, default: 256", default=256)
model_group.add_argument('--gamma', type=float, help="Use A(z/gamma), default: 0.9", default=0.9)
model_group.add_argument('--softquant', action="store_true", help="enables soft quantization during training")
training_group = parser.add_argument_group(title="training parameters")
training_group.add_argument('--batch-size', type=int, help="batch size, default: 128", default=128)
training_group.add_argument('--lr', type=float, help='learning rate, default: 5e-4', default=5e-4)
training_group.add_argument('--epochs', type=int, help='number of training epochs, default: 50', default=50)
training_group.add_argument('--sequence-length', type=int, help='sequence length, default: 60', default=60)
training_group.add_argument('--lr-decay', type=float, help='learning rate decay factor, default: 0.0', default=0.0)
training_group.add_argument('--initial-checkpoint', type=str, help='initial checkpoint to start training from, default: None', default=None)
training_group.add_argument('--reg-weight', type=float, help='regression loss weight, default: 1.0', default=1.0)
training_group.add_argument('--fmap-weight', type=float, help='feature matchin loss weight, default: 1.0', default=1.)
args = parser.parse_args()
if args.cuda_visible_devices != None:
os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda_visible_devices
# checkpoints
checkpoint_dir = os.path.join(args.output, 'checkpoints')
checkpoint = dict()
os.makedirs(checkpoint_dir, exist_ok=True)
# training parameters
batch_size = args.batch_size
lr = args.lr
epochs = args.epochs
sequence_length = args.sequence_length
lr_decay = args.lr_decay
adam_betas = [0.8, 0.99]
adam_eps = 1e-8
features_file = args.features
signal_file = args.signal
# model parameters
cond_size = args.cond_size
checkpoint['batch_size'] = batch_size
checkpoint['lr'] = lr
checkpoint['lr_decay'] = lr_decay
checkpoint['epochs'] = epochs
checkpoint['sequence_length'] = sequence_length
checkpoint['adam_betas'] = adam_betas
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
checkpoint['model_args'] = ()
checkpoint['model_kwargs'] = {'cond_size': cond_size, 'gamma': args.gamma, 'softquant': args.softquant}
print(checkpoint['model_kwargs'])
model = fargan.FARGAN(*checkpoint['model_args'], **checkpoint['model_kwargs'])
#discriminator
disc_name = 'fdmresdisc'
disc = osce_models.model_dict[disc_name](
architecture='free',
design='f_down',
fft_sizes_16k=[2**n for n in range(6, 12)],
freq_roi=[0, 7400],
max_channels=256,
noise_gain=0.0
)
if type(args.initial_checkpoint) != type(None):
checkpoint = torch.load(args.initial_checkpoint, map_location='cpu')
model.load_state_dict(checkpoint['state_dict'], strict=False)
checkpoint['state_dict'] = model.state_dict()
dataset = FARGANDataset(features_file, signal_file, sequence_length=sequence_length)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, betas=adam_betas, eps=adam_eps)
optimizer_disc = torch.optim.AdamW([p for p in disc.parameters() if p.requires_grad], lr=lr, betas=adam_betas, eps=adam_eps)
# learning rate scheduler
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lambda x : 1 / (1 + lr_decay * x))
scheduler_disc = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer_disc, lr_lambda=lambda x : 1 / (1 + lr_decay * x))
states = None
spect_loss = MultiResolutionSTFTLoss(device).to(device)
for param in model.parameters():
param.requires_grad = False
batch_count = 0
if __name__ == '__main__':
model.to(device)
disc.to(device)
for epoch in range(1, epochs + 1):
m_r = 0
m_f = 0
s_r = 1
s_f = 1
running_cont_loss = 0
running_disc_loss = 0
running_gen_loss = 0
running_fmap_loss = 0
running_reg_loss = 0
running_wc = 0
print(f"training epoch {epoch}...")
with tqdm.tqdm(dataloader, unit='batch') as tepoch:
for i, (features, periods, target, lpc) in enumerate(tepoch):
if epoch == 1 and i == 400:
for param in model.parameters():
param.requires_grad = True
for param in model.cond_net.parameters():
param.requires_grad = False
for param in model.sig_net.cond_gain_dense.parameters():
param.requires_grad = False
optimizer.zero_grad()
features = features.to(device)
#lpc = lpc.to(device)
#lpc = lpc*(args.gamma**torch.arange(1,17, device=device))
#lpc = fargan.interp_lpc(lpc, 4)
periods = periods.to(device)
if True:
target = target[:, :sequence_length*160]
#lpc = lpc[:,:sequence_length*4,:]
features = features[:,:sequence_length+4,:]
periods = periods[:,:sequence_length+4]
else:
target=target[::2, :]
#lpc=lpc[::2,:]
features=features[::2,:]
periods=periods[::2,:]
target = target.to(device)
#target = fargan.analysis_filter(target, lpc[:,:,:], nb_subframes=1, gamma=args.gamma)
#nb_pre = random.randrange(1, 6)
nb_pre = 2
pre = target[:, :nb_pre*160]
output, _ = model(features, periods, target.size(1)//160 - nb_pre, pre=pre, states=None)
output = torch.cat([pre, output], -1)
# discriminator update
scores_gen = disc(output.detach().unsqueeze(1))
scores_real = disc(target.unsqueeze(1))
disc_loss = 0
for scale in scores_gen:
disc_loss += ((scale[-1]) ** 2).mean()
m_f = 0.9 * m_f + 0.1 * scale[-1].detach().mean().cpu().item()
s_f = 0.9 * s_f + 0.1 * scale[-1].detach().std().cpu().item()
for scale in scores_real:
disc_loss += ((1 - scale[-1]) ** 2).mean()
m_r = 0.9 * m_r + 0.1 * scale[-1].detach().mean().cpu().item()
s_r = 0.9 * s_r + 0.1 * scale[-1].detach().std().cpu().item()
disc_loss = 0.5 * disc_loss / len(scores_gen)
winning_chance = 0.5 * m.erfc( (m_r - m_f) / m.sqrt(2 * (s_f**2 + s_r**2)) )
running_wc += winning_chance
disc.zero_grad()
disc_loss.backward()
optimizer_disc.step()
# model update
scores_gen = disc(output.unsqueeze(1))
if False: # todo: check whether that makes a difference
with torch.no_grad():
scores_real = disc(target.unsqueeze(1))
cont_loss = fargan.sig_loss(target[:, nb_pre*160:nb_pre*160+80], output[:, nb_pre*160:nb_pre*160+80])
specc_loss = spect_loss(output, target.detach())
reg_loss = (.00*cont_loss + specc_loss)
loss_gen = 0
for scale in scores_gen:
loss_gen += ((1 - scale[-1]) ** 2).mean() / len(scores_gen)
feat_loss = args.fmap_weight * fmap_loss(scores_real, scores_gen)
reg_weight = args.reg_weight# + 15./(1 + (batch_count/7600.))
gen_loss = reg_weight * reg_loss + feat_loss + loss_gen
model.zero_grad()
gen_loss.backward()
optimizer.step()
#model.clip_weights()
scheduler.step()
scheduler_disc.step()
running_cont_loss += cont_loss.detach().cpu().item()
running_gen_loss += loss_gen.detach().cpu().item()
running_disc_loss += disc_loss.detach().cpu().item()
running_fmap_loss += feat_loss.detach().cpu().item()
running_reg_loss += reg_loss.detach().cpu().item()
tepoch.set_postfix(cont_loss=f"{running_cont_loss/(i+1):8.5f}",
reg_weight=f"{reg_weight:8.5f}",
gen_loss=f"{running_gen_loss/(i+1):8.5f}",
disc_loss=f"{running_disc_loss/(i+1):8.5f}",
fmap_loss=f"{running_fmap_loss/(i+1):8.5f}",
reg_loss=f"{running_reg_loss/(i+1):8.5f}",
wc = f"{running_wc/(i+1):8.5f}",
)
batch_count = batch_count + 1
# save checkpoint
checkpoint_path = os.path.join(checkpoint_dir, f'fargan{args.suffix}_adv_{epoch}.pth')
checkpoint['state_dict'] = model.state_dict()
checkpoint['disc_sate_dict'] = disc.state_dict()
checkpoint['loss'] = {
'cont': running_cont_loss / len(dataloader),
'gen': running_gen_loss / len(dataloader),
'disc': running_disc_loss / len(dataloader),
'fmap': running_fmap_loss / len(dataloader),
'reg': running_reg_loss / len(dataloader)
}
checkpoint['epoch'] = epoch
torch.save(checkpoint, checkpoint_path)

View File

@@ -0,0 +1,61 @@
import torch
import numpy as np
import fargan
class FARGANDataset(torch.utils.data.Dataset):
def __init__(self,
feature_file,
signal_file,
frame_size=160,
sequence_length=15,
lookahead=1,
nb_used_features=20,
nb_features=36):
self.frame_size = frame_size
self.sequence_length = sequence_length
self.lookahead = lookahead
self.nb_features = nb_features
self.nb_used_features = nb_used_features
pcm_chunk_size = self.frame_size*self.sequence_length
self.data = np.memmap(signal_file, dtype='int16', mode='r')
#self.data = self.data[1::2]
self.nb_sequences = len(self.data)//(pcm_chunk_size)-4
self.data = self.data[(4-self.lookahead)*self.frame_size:]
self.data = self.data[:self.nb_sequences*pcm_chunk_size]
#self.data = np.reshape(self.data, (self.nb_sequences, pcm_chunk_size))
sizeof = self.data.strides[-1]
self.data = np.lib.stride_tricks.as_strided(self.data, shape=(self.nb_sequences, pcm_chunk_size*2),
strides=(pcm_chunk_size*sizeof, sizeof))
self.features = np.reshape(np.memmap(feature_file, dtype='float32', mode='r'), (-1, nb_features))
sizeof = self.features.strides[-1]
self.features = np.lib.stride_tricks.as_strided(self.features, shape=(self.nb_sequences, self.sequence_length*2+4, nb_features),
strides=(self.sequence_length*self.nb_features*sizeof, self.nb_features*sizeof, sizeof))
#self.periods = np.round(50*self.features[:,:,self.nb_used_features-2]+100).astype('int')
self.periods = np.round(np.clip(256./2**(self.features[:,:,self.nb_used_features-2]+1.5), 32, 255)).astype('int')
self.lpc = self.features[:, :, self.nb_used_features:]
self.features = self.features[:, :, :self.nb_used_features]
print("lpc_size:", self.lpc.shape)
def __len__(self):
return self.nb_sequences
def __getitem__(self, index):
features = self.features[index, :, :].copy()
if self.lookahead != 0:
lpc = self.lpc[index, 4-self.lookahead:-self.lookahead, :].copy()
else:
lpc = self.lpc[index, 4:, :].copy()
data = self.data[index, :].copy().astype(np.float32) / 2**15
periods = self.periods[index, :].copy()
#lpc = lpc*(self.gamma**np.arange(1,17))
#lpc=lpc[None,:,:]
#lpc = fargan.interp_lpc(lpc, 4)
#lpc=lpc[0,:,:]
return features, periods, data, lpc

View File

@@ -0,0 +1,112 @@
import os
import sys
import argparse
import torch
from torch import nn
sys.path.append(os.path.join(os.path.split(__file__)[0], '../weight-exchange'))
import wexchange.torch
import fargan
#from models import model_dict
unquantized = [ 'cond_net.pembed', 'cond_net.fdense1', 'sig_net.cond_gain_dense', 'sig_net.gain_dense_out' ]
unquantized2 = [
'cond_net.pembed',
'cond_net.fdense1',
'cond_net.fconv1',
'cond_net.fconv2',
'cont_net.0',
'sig_net.cond_gain_dense',
'sig_net.fwc0.conv',
'sig_net.fwc0.glu.gate',
'sig_net.dense1_glu.gate',
'sig_net.gru1_glu.gate',
'sig_net.gru2_glu.gate',
'sig_net.gru3_glu.gate',
'sig_net.skip_glu.gate',
'sig_net.skip_dense',
'sig_net.sig_dense_out',
'sig_net.gain_dense_out'
]
description=f"""
This is an unsafe dumping script for FARGAN models. It assumes that all weights are included in Linear, Conv1d or GRU layer
and will fail to export any other weights.
Furthermore, the quanitze option relies on the following explicit list of layers to be excluded:
{unquantized}.
Modify this script manually if adjustments are needed.
"""
parser = argparse.ArgumentParser(description=description)
parser.add_argument('weightfile', type=str, help='weight file path')
parser.add_argument('export_folder', type=str)
parser.add_argument('--export-filename', type=str, default='fargan_data', help='filename for source and header file (.c and .h will be added), defaults to fargan_data')
parser.add_argument('--struct-name', type=str, default='FARGAN', help='name for C struct, defaults to FARGAN')
parser.add_argument('--quantize', action='store_true', help='apply quantization')
if __name__ == "__main__":
args = parser.parse_args()
print(f"loading weights from {args.weightfile}...")
saved_gen= torch.load(args.weightfile, map_location='cpu')
saved_gen['model_args'] = ()
saved_gen['model_kwargs'] = {'cond_size': 256, 'gamma': 0.9}
model = fargan.FARGAN(*saved_gen['model_args'], **saved_gen['model_kwargs'])
model.load_state_dict(saved_gen['state_dict'], strict=False)
def _remove_weight_norm(m):
try:
torch.nn.utils.remove_weight_norm(m)
except ValueError: # this module didn't have weight norm
return
model.apply(_remove_weight_norm)
print("dumping model...")
quantize_model=args.quantize
output_folder = args.export_folder
os.makedirs(output_folder, exist_ok=True)
writer = wexchange.c_export.c_writer.CWriter(os.path.join(output_folder, args.export_filename), model_struct_name=args.struct_name, add_typedef=True)
for name, module in model.named_modules():
if quantize_model:
quantize=name not in unquantized
scale = None if quantize else 1/128
else:
quantize=False
scale=1/128
if isinstance(module, nn.Linear):
print(f"dumping linear layer {name}...")
wexchange.torch.dump_torch_dense_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
elif isinstance(module, nn.Conv1d):
print(f"dumping conv1d layer {name}...")
wexchange.torch.dump_torch_conv1d_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
elif isinstance(module, nn.GRU):
print(f"dumping GRU layer {name}...")
wexchange.torch.dump_torch_gru_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale, recurrent_scale=scale)
elif isinstance(module, nn.GRUCell):
print(f"dumping GRUCell layer {name}...")
wexchange.torch.dump_torch_grucell_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale, recurrent_scale=scale)
elif isinstance(module, nn.Embedding):
print(f"dumping Embedding layer {name}...")
wexchange.torch.dump_torch_embedding_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
#wexchange.torch.dump_torch_embedding_weights(writer, module)
else:
print(f"Ignoring layer {name}...")
writer.close()

View File

@@ -0,0 +1,346 @@
import os
import sys
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
import filters
from torch.nn.utils import weight_norm
#from convert_lsp import lpc_to_lsp, lsp_to_lpc
from rc import lpc2rc, rc2lpc
source_dir = os.path.split(os.path.abspath(__file__))[0]
sys.path.append(os.path.join(source_dir, "../dnntools"))
from dnntools.quantization import soft_quant
Fs = 16000
fid_dict = {}
def dump_signal(x, filename):
return
if filename in fid_dict:
fid = fid_dict[filename]
else:
fid = open(filename, "w")
fid_dict[filename] = fid
x = x.detach().numpy().astype('float32')
x.tofile(fid)
def sig_l1(y_true, y_pred):
return torch.mean(abs(y_true-y_pred))/torch.mean(abs(y_true))
def sig_loss(y_true, y_pred):
t = y_true/(1e-15+torch.norm(y_true, dim=-1, p=2, keepdim=True))
p = y_pred/(1e-15+torch.norm(y_pred, dim=-1, p=2, keepdim=True))
return torch.mean(1.-torch.sum(p*t, dim=-1))
def interp_lpc(lpc, factor):
#print(lpc.shape)
#f = (np.arange(factor)+.5*((factor+1)%2))/factor
lsp = torch.atanh(lpc2rc(lpc))
#print("lsp0:")
#print(lsp)
shape = lsp.shape
#print("shape is", shape)
shape = (shape[0], shape[1]*factor, shape[2])
interp_lsp = torch.zeros(shape, device=lpc.device)
for k in range(factor):
f = (k+.5*((factor+1)%2))/factor
interp = (1-f)*lsp[:,:-1,:] + f*lsp[:,1:,:]
interp_lsp[:,factor//2+k:-(factor//2):factor,:] = interp
for k in range(factor//2):
interp_lsp[:,k,:] = interp_lsp[:,factor//2,:]
for k in range((factor+1)//2):
interp_lsp[:,-k-1,:] = interp_lsp[:,-(factor+3)//2,:]
#print("lsp:")
#print(interp_lsp)
return rc2lpc(torch.tanh(interp_lsp))
def analysis_filter(x, lpc, nb_subframes=4, subframe_size=40, gamma=.9):
device = x.device
batch_size = lpc.size(0)
nb_frames = lpc.shape[1]
sig = torch.zeros(batch_size, subframe_size+16, device=device)
x = torch.reshape(x, (batch_size, nb_frames*nb_subframes, subframe_size))
out = torch.zeros((batch_size, 0), device=device)
#if gamma is not None:
# bw = gamma**(torch.arange(1, 17, device=device))
# lpc = lpc*bw[None,None,:]
ones = torch.ones((*(lpc.shape[:-1]), 1), device=device)
zeros = torch.zeros((*(lpc.shape[:-1]), subframe_size-1), device=device)
a = torch.cat([ones, lpc], -1)
a_big = torch.cat([a, zeros], -1)
fir_mat_big = filters.toeplitz_from_filter(a_big)
#print(a_big[:,0,:])
for n in range(nb_frames):
for k in range(nb_subframes):
sig = torch.cat([sig[:,subframe_size:], x[:,n*nb_subframes + k, :]], 1)
exc = torch.bmm(fir_mat_big[:,n,:,:], sig[:,:,None])
out = torch.cat([out, exc[:,-subframe_size:,0]], 1)
return out
# weight initialization and clipping
def init_weights(module):
if isinstance(module, nn.GRU):
for p in module.named_parameters():
if p[0].startswith('weight_hh_'):
nn.init.orthogonal_(p[1])
def gen_phase_embedding(periods, frame_size):
device = periods.device
batch_size = periods.size(0)
nb_frames = periods.size(1)
w0 = 2*torch.pi/periods
w0_shift = torch.cat([2*torch.pi*torch.rand((batch_size, 1), device=device)/frame_size, w0[:,:-1]], 1)
cum_phase = frame_size*torch.cumsum(w0_shift, 1)
fine_phase = w0[:,:,None]*torch.broadcast_to(torch.arange(frame_size, device=device), (batch_size, nb_frames, frame_size))
embed = torch.unsqueeze(cum_phase, 2) + fine_phase
embed = torch.reshape(embed, (batch_size, -1))
return torch.cos(embed), torch.sin(embed)
class GLU(nn.Module):
def __init__(self, feat_size, softquant=False):
super(GLU, self).__init__()
torch.manual_seed(5)
self.gate = weight_norm(nn.Linear(feat_size, feat_size, bias=False))
if softquant:
self.gate = soft_quant(self.gate)
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x):
out = x * torch.sigmoid(self.gate(x))
return out
class FWConv(nn.Module):
def __init__(self, in_size, out_size, kernel_size=2, softquant=False):
super(FWConv, self).__init__()
torch.manual_seed(5)
self.in_size = in_size
self.kernel_size = kernel_size
self.conv = weight_norm(nn.Linear(in_size*self.kernel_size, out_size, bias=False))
self.glu = GLU(out_size, softquant=softquant)
if softquant:
self.conv = soft_quant(self.conv)
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x, state):
xcat = torch.cat((state, x), -1)
#print(x.shape, state.shape, xcat.shape, self.in_size, self.kernel_size)
out = self.glu(torch.tanh(self.conv(xcat)))
return out, xcat[:,self.in_size:]
def n(x):
return torch.clamp(x + (1./127.)*(torch.rand_like(x)-.5), min=-1., max=1.)
class FARGANCond(nn.Module):
def __init__(self, feature_dim=20, cond_size=256, pembed_dims=12, softquant=False):
super(FARGANCond, self).__init__()
self.feature_dim = feature_dim
self.cond_size = cond_size
self.pembed = nn.Embedding(224, pembed_dims)
self.fdense1 = nn.Linear(self.feature_dim + pembed_dims, 64, bias=False)
self.fconv1 = nn.Conv1d(64, 128, kernel_size=3, padding='valid', bias=False)
self.fdense2 = nn.Linear(128, 80*4, bias=False)
if softquant:
self.fconv1 = soft_quant(self.fconv1)
self.fdense2 = soft_quant(self.fdense2)
self.apply(init_weights)
nb_params = sum(p.numel() for p in self.parameters())
print(f"cond model: {nb_params} weights")
def forward(self, features, period):
features = features[:,2:,:]
period = period[:,2:]
p = self.pembed(period-32)
features = torch.cat((features, p), -1)
tmp = torch.tanh(self.fdense1(features))
tmp = tmp.permute(0, 2, 1)
tmp = torch.tanh(self.fconv1(tmp))
tmp = tmp.permute(0, 2, 1)
tmp = torch.tanh(self.fdense2(tmp))
#tmp = torch.tanh(self.fdense2(tmp))
return tmp
class FARGANSub(nn.Module):
def __init__(self, subframe_size=40, nb_subframes=4, cond_size=256, softquant=False):
super(FARGANSub, self).__init__()
self.subframe_size = subframe_size
self.nb_subframes = nb_subframes
self.cond_size = cond_size
self.cond_gain_dense = nn.Linear(80, 1)
#self.sig_dense1 = nn.Linear(4*self.subframe_size+self.passthrough_size+self.cond_size, self.cond_size, bias=False)
self.fwc0 = FWConv(2*self.subframe_size+80+4, 192, softquant=softquant)
self.gru1 = nn.GRUCell(192+2*self.subframe_size, 160, bias=False)
self.gru2 = nn.GRUCell(160+2*self.subframe_size, 128, bias=False)
self.gru3 = nn.GRUCell(128+2*self.subframe_size, 128, bias=False)
self.gru1_glu = GLU(160, softquant=softquant)
self.gru2_glu = GLU(128, softquant=softquant)
self.gru3_glu = GLU(128, softquant=softquant)
self.skip_glu = GLU(128, softquant=softquant)
#self.ptaps_dense = nn.Linear(4*self.cond_size, 5)
self.skip_dense = nn.Linear(192+160+2*128+2*self.subframe_size, 128, bias=False)
self.sig_dense_out = nn.Linear(128, self.subframe_size, bias=False)
self.gain_dense_out = nn.Linear(192, 4)
if softquant:
self.gru1 = soft_quant(self.gru1, names=['weight_hh', 'weight_ih'])
self.gru2 = soft_quant(self.gru2, names=['weight_hh', 'weight_ih'])
self.gru3 = soft_quant(self.gru3, names=['weight_hh', 'weight_ih'])
self.skip_dense = soft_quant(self.skip_dense)
self.sig_dense_out = soft_quant(self.sig_dense_out)
self.apply(init_weights)
nb_params = sum(p.numel() for p in self.parameters())
print(f"subframe model: {nb_params} weights")
def forward(self, cond, prev_pred, exc_mem, period, states, gain=None):
device = exc_mem.device
#print(cond.shape, prev.shape)
cond = n(cond)
dump_signal(gain, 'gain0.f32')
gain = torch.exp(self.cond_gain_dense(cond))
dump_signal(gain, 'gain1.f32')
idx = 256-period[:,None]
rng = torch.arange(self.subframe_size+4, device=device)
idx = idx + rng[None,:] - 2
mask = idx >= 256
idx = idx - mask*period[:,None]
pred = torch.gather(exc_mem, 1, idx)
pred = n(pred/(1e-5+gain))
prev = exc_mem[:,-self.subframe_size:]
dump_signal(prev, 'prev_in.f32')
prev = n(prev/(1e-5+gain))
dump_signal(prev, 'pitch_exc.f32')
dump_signal(exc_mem, 'exc_mem.f32')
tmp = torch.cat((cond, pred, prev), 1)
#fpitch = taps[:,0:1]*pred[:,:-4] + taps[:,1:2]*pred[:,1:-3] + taps[:,2:3]*pred[:,2:-2] + taps[:,3:4]*pred[:,3:-1] + taps[:,4:]*pred[:,4:]
fpitch = pred[:,2:-2]
#tmp = self.dense1_glu(torch.tanh(self.sig_dense1(tmp)))
fwc0_out, fwc0_state = self.fwc0(tmp, states[3])
fwc0_out = n(fwc0_out)
pitch_gain = torch.sigmoid(self.gain_dense_out(fwc0_out))
gru1_state = self.gru1(torch.cat([fwc0_out, pitch_gain[:,0:1]*fpitch, prev], 1), states[0])
gru1_out = self.gru1_glu(n(gru1_state))
gru1_out = n(gru1_out)
gru2_state = self.gru2(torch.cat([gru1_out, pitch_gain[:,1:2]*fpitch, prev], 1), states[1])
gru2_out = self.gru2_glu(n(gru2_state))
gru2_out = n(gru2_out)
gru3_state = self.gru3(torch.cat([gru2_out, pitch_gain[:,2:3]*fpitch, prev], 1), states[2])
gru3_out = self.gru3_glu(n(gru3_state))
gru3_out = n(gru3_out)
gru3_out = torch.cat([gru1_out, gru2_out, gru3_out, fwc0_out], 1)
skip_out = torch.tanh(self.skip_dense(torch.cat([gru3_out, pitch_gain[:,3:4]*fpitch, prev], 1)))
skip_out = self.skip_glu(n(skip_out))
sig_out = torch.tanh(self.sig_dense_out(skip_out))
dump_signal(sig_out, 'exc_out.f32')
#taps = self.ptaps_dense(gru3_out)
#taps = .2*taps + torch.exp(taps)
#taps = taps / (1e-2 + torch.sum(torch.abs(taps), dim=-1, keepdim=True))
#dump_signal(taps, 'taps.f32')
dump_signal(pitch_gain, 'pgain.f32')
#sig_out = (sig_out + pitch_gain*fpitch) * gain
sig_out = sig_out * gain
exc_mem = torch.cat([exc_mem[:,self.subframe_size:], sig_out], 1)
prev_pred = torch.cat([prev_pred[:,self.subframe_size:], fpitch], 1)
dump_signal(sig_out, 'sig_out.f32')
return sig_out, exc_mem, prev_pred, (gru1_state, gru2_state, gru3_state, fwc0_state)
class FARGAN(nn.Module):
def __init__(self, subframe_size=40, nb_subframes=4, feature_dim=20, cond_size=256, passthrough_size=0, has_gain=False, gamma=None, softquant=False):
super(FARGAN, self).__init__()
self.subframe_size = subframe_size
self.nb_subframes = nb_subframes
self.frame_size = self.subframe_size*self.nb_subframes
self.feature_dim = feature_dim
self.cond_size = cond_size
self.cond_net = FARGANCond(feature_dim=feature_dim, cond_size=cond_size, softquant=softquant)
self.sig_net = FARGANSub(subframe_size=subframe_size, nb_subframes=nb_subframes, cond_size=cond_size, softquant=softquant)
def forward(self, features, period, nb_frames, pre=None, states=None):
device = features.device
batch_size = features.size(0)
prev = torch.zeros(batch_size, 256, device=device)
exc_mem = torch.zeros(batch_size, 256, device=device)
nb_pre_frames = pre.size(1)//self.frame_size if pre is not None else 0
states = (
torch.zeros(batch_size, 160, device=device),
torch.zeros(batch_size, 128, device=device),
torch.zeros(batch_size, 128, device=device),
torch.zeros(batch_size, (2*self.subframe_size+80+4)*1, device=device)
)
sig = torch.zeros((batch_size, 0), device=device)
cond = self.cond_net(features, period)
if pre is not None:
exc_mem[:,-self.frame_size:] = pre[:, :self.frame_size]
start = 1 if nb_pre_frames>0 else 0
for n in range(start, nb_frames+nb_pre_frames):
for k in range(self.nb_subframes):
pos = n*self.frame_size + k*self.subframe_size
#print("now: ", preal.shape, prev.shape, sig_in.shape)
pitch = period[:, 3+n]
gain = .03*10**(0.5*features[:, 3+n, 0:1]/np.sqrt(18.0))
#gain = gain[:,:,None]
out, exc_mem, prev, states = self.sig_net(cond[:, n, k*80:(k+1)*80], prev, exc_mem, pitch, states, gain=gain)
if n < nb_pre_frames:
out = pre[:, pos:pos+self.subframe_size]
exc_mem[:,-self.subframe_size:] = out
else:
sig = torch.cat([sig, out], 1)
states = [s.detach() for s in states]
return sig, states

View File

@@ -0,0 +1,46 @@
import torch
from torch import nn
import torch.nn.functional as F
import math
def toeplitz_from_filter(a):
device = a.device
L = a.size(-1)
size0 = (*(a.shape[:-1]), L, L+1)
size = (*(a.shape[:-1]), L, L)
rnge = torch.arange(0, L, dtype=torch.int64, device=device)
z = torch.tensor(0, device=device)
idx = torch.maximum(rnge[:,None] - rnge[None,:] + 1, z)
a = torch.cat([a[...,:1]*0, a], -1)
#print(a)
a = a[...,None,:]
#print(idx)
a = torch.broadcast_to(a, size0)
idx = torch.broadcast_to(idx, size)
#print(idx)
return torch.gather(a, -1, idx)
def filter_iir_response(a, N):
device = a.device
L = a.size(-1)
ar = a.flip(dims=(2,))
size = (*(a.shape[:-1]), N)
R = torch.zeros(size, device=device)
R[:,:,0] = torch.ones((a.shape[:-1]), device=device)
for i in range(1, L):
R[:,:,i] = - torch.sum(ar[:,:,L-i-1:-1] * R[:,:,:i], axis=-1)
#R[:,:,i] = - torch.einsum('ijk,ijk->ij', ar[:,:,L-i-1:-1], R[:,:,:i])
for i in range(L, N):
R[:,:,i] = - torch.sum(ar[:,:,:-1] * R[:,:,i-L+1:i], axis=-1)
#R[:,:,i] = - torch.einsum('ijk,ijk->ij', ar[:,:,:-1], R[:,:,i-L+1:i])
return R
if __name__ == '__main__':
#a = torch.tensor([ [[1, -.9, 0.02], [1, -.8, .01]], [[1, .9, 0], [1, .8, 0]]])
a = torch.tensor([ [[1, -.9, 0.02], [1, -.8, .01]]])
A = toeplitz_from_filter(a)
#print(A)
R = filter_iir_response(a, 5)
RA = toeplitz_from_filter(R)
print(RA)

View File

@@ -0,0 +1,29 @@
import torch
def rc2lpc(rc):
order = rc.shape[-1]
lpc=rc[...,0:1]
for i in range(1, order):
lpc = torch.cat([lpc + rc[...,i:i+1]*torch.flip(lpc,dims=(-1,)), rc[...,i:i+1]], -1)
#print("to:", lpc)
return lpc
def lpc2rc(lpc):
order = lpc.shape[-1]
rc = lpc[...,-1:]
for i in range(order-1, 0, -1):
ki = lpc[...,-1:]
lpc = lpc[...,:-1]
lpc = (lpc - ki*torch.flip(lpc,dims=(-1,)))/(1 - ki*ki)
rc = torch.cat([lpc[...,-1:] , rc], -1)
return rc
if __name__ == "__main__":
rc = torch.tensor([[.5, -.5, .6, -.6]])
print(rc)
lpc = rc2lpc(rc)
print(lpc)
rc2 = lpc2rc(lpc)
print(rc2)

View File

@@ -0,0 +1,186 @@
"""STFT-based Loss modules."""
import torch
import torch.nn.functional as F
import numpy as np
import torchaudio
def stft(x, fft_size, hop_size, win_length, window):
"""Perform STFT and convert to magnitude spectrogram.
Args:
x (Tensor): Input signal tensor (B, T).
fft_size (int): FFT size.
hop_size (int): Hop size.
win_length (int): Window length.
window (str): Window function type.
Returns:
Tensor: Magnitude spectrogram (B, #frames, fft_size // 2 + 1).
"""
#x_stft = torch.stft(x, fft_size, hop_size, win_length, window, return_complex=False)
#real = x_stft[..., 0]
#imag = x_stft[..., 1]
# (kan-bayashi): clamp is needed to avoid nan or inf
#return torchaudio.functional.amplitude_to_DB(torch.abs(x_stft),db_multiplier=0.0, multiplier=20,amin=1e-05,top_db=80)
#return torch.clamp(torch.abs(x_stft), min=1e-7)
x_stft = torch.stft(x, fft_size, hop_size, win_length, window, return_complex=True)
return torch.clamp(torch.abs(x_stft), min=1e-7)
class SpectralConvergenceLoss(torch.nn.Module):
"""Spectral convergence loss module."""
def __init__(self):
"""Initilize spectral convergence loss module."""
super(SpectralConvergenceLoss, self).__init__()
def forward(self, x_mag, y_mag):
"""Calculate forward propagation.
Args:
x_mag (Tensor): Magnitude spectrogram of predicted signal (B, #frames, #freq_bins).
y_mag (Tensor): Magnitude spectrogram of groundtruth signal (B, #frames, #freq_bins).
Returns:
Tensor: Spectral convergence loss value.
"""
x_mag = torch.sqrt(x_mag)
y_mag = torch.sqrt(y_mag)
return torch.norm(y_mag - x_mag, p=1) / torch.norm(y_mag, p=1)
class LogSTFTMagnitudeLoss(torch.nn.Module):
"""Log STFT magnitude loss module."""
def __init__(self):
"""Initilize los STFT magnitude loss module."""
super(LogSTFTMagnitudeLoss, self).__init__()
def forward(self, x, y):
"""Calculate forward propagation.
Args:
x_mag (Tensor): Magnitude spectrogram of predicted signal (B, #frames, #freq_bins).
y_mag (Tensor): Magnitude spectrogram of groundtruth signal (B, #frames, #freq_bins).
Returns:
Tensor: Log STFT magnitude loss value.
"""
#F.l1_loss(torch.sqrt(y_mag), torch.sqrt(x_mag)) +
#F.l1_loss(torchaudio.functional.amplitude_to_DB(y_mag,db_multiplier=0.0, multiplier=20,amin=1e-05,top_db=80),\
#torchaudio.functional.amplitude_to_DB(x_mag,db_multiplier=0.0, multiplier=20,amin=1e-05,top_db=80))
#y_mag[:,:y_mag.size(1)//2,:] = y_mag[:,:y_mag.size(1)//2,:] *0.0
#return F.l1_loss(torch.log(y_mag) + torch.sqrt(y_mag), torch.log(x_mag) + torch.sqrt(x_mag))
#return F.l1_loss(y_mag, x_mag)
error_loss = F.l1_loss(y, x) #+ F.l1_loss(torch.sqrt(y), torch.sqrt(x))#F.l1_loss(torch.log(y), torch.log(x))#
#x = torch.log(x)
#y = torch.log(y)
#x = x.permute(0,2,1).contiguous()
#y = y.permute(0,2,1).contiguous()
'''mean_x = torch.mean(x, dim=1, keepdim=True)
mean_y = torch.mean(y, dim=1, keepdim=True)
var_x = torch.var(x, dim=1, keepdim=True)
var_y = torch.var(y, dim=1, keepdim=True)
std_x = torch.std(x, dim=1, keepdim=True)
std_y = torch.std(y, dim=1, keepdim=True)
x_minus_mean = x - mean_x
y_minus_mean = y - mean_y
pearson_corr = torch.sum(x_minus_mean * y_minus_mean, dim=1, keepdim=True) / \
(torch.sqrt(torch.sum(x_minus_mean ** 2, dim=1, keepdim=True) + 1e-7) * \
torch.sqrt(torch.sum(y_minus_mean ** 2, dim=1, keepdim=True) + 1e-7))
numerator = 2.0 * pearson_corr * std_x * std_y
denominator = var_x + var_y + (mean_y - mean_x)**2
ccc = numerator/denominator
ccc_loss = F.l1_loss(1.0 - ccc, torch.zeros_like(ccc))'''
return error_loss #+ ccc_loss#+ ccc_loss
class STFTLoss(torch.nn.Module):
"""STFT loss module."""
def __init__(self, device, fft_size=1024, shift_size=120, win_length=600, window="hann_window"):
"""Initialize STFT loss module."""
super(STFTLoss, self).__init__()
self.fft_size = fft_size
self.shift_size = shift_size
self.win_length = win_length
self.window = getattr(torch, window)(win_length).to(device)
self.spectral_convergenge_loss = SpectralConvergenceLoss()
self.log_stft_magnitude_loss = LogSTFTMagnitudeLoss()
def forward(self, x, y):
"""Calculate forward propagation.
Args:
x (Tensor): Predicted signal (B, T).
y (Tensor): Groundtruth signal (B, T).
Returns:
Tensor: Spectral convergence loss value.
Tensor: Log STFT magnitude loss value.
"""
x_mag = stft(x, self.fft_size, self.shift_size, self.win_length, self.window)
y_mag = stft(y, self.fft_size, self.shift_size, self.win_length, self.window)
sc_loss = self.spectral_convergenge_loss(x_mag, y_mag)
mag_loss = self.log_stft_magnitude_loss(x_mag, y_mag)
return sc_loss, mag_loss
class MultiResolutionSTFTLoss(torch.nn.Module):
'''def __init__(self,
device,
fft_sizes=[2048, 1024, 512, 256, 128, 64],
hop_sizes=[512, 256, 128, 64, 32, 16],
win_lengths=[2048, 1024, 512, 256, 128, 64],
window="hann_window"):'''
'''def __init__(self,
device,
fft_sizes=[2048, 1024, 512, 256, 128, 64],
hop_sizes=[256, 128, 64, 32, 16, 8],
win_lengths=[1024, 512, 256, 128, 64, 32],
window="hann_window"):'''
def __init__(self,
device,
fft_sizes=[2560, 1280, 640, 320, 160, 80],
hop_sizes=[640, 320, 160, 80, 40, 20],
win_lengths=[2560, 1280, 640, 320, 160, 80],
window="hann_window"):
super(MultiResolutionSTFTLoss, self).__init__()
assert len(fft_sizes) == len(hop_sizes) == len(win_lengths)
self.stft_losses = torch.nn.ModuleList()
for fs, ss, wl in zip(fft_sizes, hop_sizes, win_lengths):
self.stft_losses += [STFTLoss(device, fs, ss, wl, window)]
def forward(self, x, y):
"""Calculate forward propagation.
Args:
x (Tensor): Predicted signal (B, T).
y (Tensor): Groundtruth signal (B, T).
Returns:
Tensor: Multi resolution spectral convergence loss value.
Tensor: Multi resolution log STFT magnitude loss value.
"""
sc_loss = 0.0
mag_loss = 0.0
for f in self.stft_losses:
sc_l, mag_l = f(x, y)
sc_loss += sc_l
#mag_loss += mag_l
sc_loss /= len(self.stft_losses)
mag_loss /= len(self.stft_losses)
return sc_loss #mag_loss #+

View File

@@ -0,0 +1,128 @@
import os
import argparse
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
import tqdm
import fargan
from dataset import FARGANDataset
nb_features = 36
nb_used_features = 20
parser = argparse.ArgumentParser()
parser.add_argument('model', type=str, help='CELPNet model')
parser.add_argument('features', type=str, help='path to feature file in .f32 format')
parser.add_argument('output', type=str, help='path to output file (16-bit PCM)')
parser.add_argument('--cuda-visible-devices', type=str, help="comma separates list of cuda visible device indices, default: CUDA_VISIBLE_DEVICES", default=None)
model_group = parser.add_argument_group(title="model parameters")
model_group.add_argument('--cond-size', type=int, help="first conditioning size, default: 256", default=256)
args = parser.parse_args()
if args.cuda_visible_devices != None:
os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda_visible_devices
features_file = args.features
signal_file = args.output
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
checkpoint = torch.load(args.model, map_location='cpu')
model = fargan.FARGAN(*checkpoint['model_args'], **checkpoint['model_kwargs'])
model.load_state_dict(checkpoint['state_dict'], strict=False)
features = np.reshape(np.memmap(features_file, dtype='float32', mode='r'), (1, -1, nb_features))
lpc = features[:,4-1:-1,nb_used_features:]
features = features[:, :, :nb_used_features]
#periods = np.round(50*features[:,:,nb_used_features-2]+100).astype('int')
periods = np.round(np.clip(256./2**(features[:,:,nb_used_features-2]+1.5), 32, 255)).astype('int')
nb_frames = features.shape[1]
#nb_frames = 1000
gamma = checkpoint['model_kwargs']['gamma']
def lpc_synthesis_one_frame(frame, filt, buffer, weighting_vector=np.ones(16)):
out = np.zeros_like(frame)
filt = np.flip(filt)
inp = frame[:]
for i in range(0, inp.shape[0]):
s = inp[i] - np.dot(buffer*weighting_vector, filt)
buffer[0] = s
buffer = np.roll(buffer, -1)
out[i] = s
return out
def inverse_perceptual_weighting (pw_signal, filters, weighting_vector):
#inverse perceptual weighting= H_preemph / W(z/gamma)
signal = np.zeros_like(pw_signal)
buffer = np.zeros(16)
num_frames = pw_signal.shape[0] //160
assert num_frames == filters.shape[0]
for frame_idx in range(0, num_frames):
in_frame = pw_signal[frame_idx*160: (frame_idx+1)*160][:]
out_sig_frame = lpc_synthesis_one_frame(in_frame, filters[frame_idx, :], buffer, weighting_vector)
signal[frame_idx*160: (frame_idx+1)*160] = out_sig_frame[:]
buffer[:] = out_sig_frame[-16:]
return signal
def inverse_perceptual_weighting40 (pw_signal, filters):
#inverse perceptual weighting= H_preemph / W(z/gamma)
signal = np.zeros_like(pw_signal)
buffer = np.zeros(16)
num_frames = pw_signal.shape[0] //40
assert num_frames == filters.shape[0]
for frame_idx in range(0, num_frames):
in_frame = pw_signal[frame_idx*40: (frame_idx+1)*40][:]
out_sig_frame = lpc_synthesis_one_frame(in_frame, filters[frame_idx, :], buffer)
signal[frame_idx*40: (frame_idx+1)*40] = out_sig_frame[:]
buffer[:] = out_sig_frame[-16:]
return signal
from scipy.signal import lfilter
if __name__ == '__main__':
model.to(device)
features = torch.tensor(features).to(device)
#lpc = torch.tensor(lpc).to(device)
periods = torch.tensor(periods).to(device)
weighting = gamma**np.arange(1, 17)
lpc = lpc*weighting
lpc = fargan.interp_lpc(torch.tensor(lpc), 4).numpy()
sig, _ = model(features, periods, nb_frames - 4)
#weighting_vector = np.array([gamma**i for i in range(16,0,-1)])
sig = sig.detach().numpy().flatten()
sig = lfilter(np.array([1.]), np.array([1., -.85]), sig)
#sig = inverse_perceptual_weighting40(sig, lpc[0,:,:])
pcm = np.round(32768*np.clip(sig, a_max=.99, a_min=-.99)).astype('int16')
pcm.tofile(signal_file)

View File

@@ -0,0 +1,169 @@
import os
import argparse
import random
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
import tqdm
import fargan
from dataset import FARGANDataset
from stft_loss import *
parser = argparse.ArgumentParser()
parser.add_argument('features', type=str, help='path to feature file in .f32 format')
parser.add_argument('signal', type=str, help='path to signal file in .s16 format')
parser.add_argument('output', type=str, help='path to output folder')
parser.add_argument('--suffix', type=str, help="model name suffix", default="")
parser.add_argument('--cuda-visible-devices', type=str, help="comma separates list of cuda visible device indices, default: CUDA_VISIBLE_DEVICES", default=None)
model_group = parser.add_argument_group(title="model parameters")
model_group.add_argument('--cond-size', type=int, help="first conditioning size, default: 256", default=256)
model_group.add_argument('--gamma', type=float, help="Use A(z/gamma), default: 0.9", default=0.9)
model_group.add_argument('--softquant', action="store_true", help="enables soft quantization during training")
training_group = parser.add_argument_group(title="training parameters")
training_group.add_argument('--batch-size', type=int, help="batch size, default: 512", default=512)
training_group.add_argument('--lr', type=float, help='learning rate, default: 1e-3', default=1e-3)
training_group.add_argument('--epochs', type=int, help='number of training epochs, default: 20', default=20)
training_group.add_argument('--sequence-length', type=int, help='sequence length, default: 15', default=15)
training_group.add_argument('--lr-decay', type=float, help='learning rate decay factor, default: 1e-4', default=1e-4)
training_group.add_argument('--initial-checkpoint', type=str, help='initial checkpoint to start training from, default: None', default=None)
args = parser.parse_args()
if args.cuda_visible_devices != None:
os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda_visible_devices
# checkpoints
checkpoint_dir = os.path.join(args.output, 'checkpoints')
checkpoint = dict()
os.makedirs(checkpoint_dir, exist_ok=True)
# training parameters
batch_size = args.batch_size
lr = args.lr
epochs = args.epochs
sequence_length = args.sequence_length
lr_decay = args.lr_decay
adam_betas = [0.8, 0.95]
adam_eps = 1e-8
features_file = args.features
signal_file = args.signal
# model parameters
cond_size = args.cond_size
checkpoint['batch_size'] = batch_size
checkpoint['lr'] = lr
checkpoint['lr_decay'] = lr_decay
checkpoint['epochs'] = epochs
checkpoint['sequence_length'] = sequence_length
checkpoint['adam_betas'] = adam_betas
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
checkpoint['model_args'] = ()
checkpoint['model_kwargs'] = {'cond_size': cond_size, 'gamma': args.gamma, 'softquant': args.softquant}
print(checkpoint['model_kwargs'])
model = fargan.FARGAN(*checkpoint['model_args'], **checkpoint['model_kwargs'])
#model = fargan.FARGAN()
#model = nn.DataParallel(model)
if type(args.initial_checkpoint) != type(None):
checkpoint = torch.load(args.initial_checkpoint, map_location='cpu')
model.load_state_dict(checkpoint['state_dict'], strict=False)
checkpoint['state_dict'] = model.state_dict()
dataset = FARGANDataset(features_file, signal_file, sequence_length=sequence_length)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, betas=adam_betas, eps=adam_eps)
# learning rate scheduler
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lambda x : 1 / (1 + lr_decay * x))
states = None
spect_loss = MultiResolutionSTFTLoss(device).to(device)
if __name__ == '__main__':
model.to(device)
for epoch in range(1, epochs + 1):
running_specc = 0
running_cont_loss = 0
running_loss = 0
print(f"training epoch {epoch}...")
with tqdm.tqdm(dataloader, unit='batch') as tepoch:
for i, (features, periods, target, lpc) in enumerate(tepoch):
optimizer.zero_grad()
features = features.to(device)
#lpc = torch.tensor(fargan.interp_lpc(lpc.numpy(), 4))
#print("interp size", lpc.shape)
#lpc = lpc.to(device)
#lpc = lpc*(args.gamma**torch.arange(1,17, device=device))
#lpc = fargan.interp_lpc(lpc, 4)
periods = periods.to(device)
if (np.random.rand() > 0.1):
target = target[:, :sequence_length*160]
#lpc = lpc[:,:sequence_length*4,:]
features = features[:,:sequence_length+4,:]
periods = periods[:,:sequence_length+4]
else:
target=target[::2, :]
#lpc=lpc[::2,:]
features=features[::2,:]
periods=periods[::2,:]
target = target.to(device)
#print(target.shape, lpc.shape)
#target = fargan.analysis_filter(target, lpc[:,:,:], nb_subframes=1, gamma=args.gamma)
#nb_pre = random.randrange(1, 6)
nb_pre = 2
pre = target[:, :nb_pre*160]
sig, states = model(features, periods, target.size(1)//160 - nb_pre, pre=pre, states=None)
sig = torch.cat([pre, sig], -1)
cont_loss = fargan.sig_loss(target[:, nb_pre*160:nb_pre*160+160], sig[:, nb_pre*160:nb_pre*160+160])
specc_loss = spect_loss(sig, target.detach())
loss = .03*cont_loss + specc_loss
loss.backward()
optimizer.step()
#model.clip_weights()
scheduler.step()
running_specc += specc_loss.detach().cpu().item()
running_cont_loss += cont_loss.detach().cpu().item()
running_loss += loss.detach().cpu().item()
tepoch.set_postfix(loss=f"{running_loss/(i+1):8.5f}",
cont_loss=f"{running_cont_loss/(i+1):8.5f}",
specc=f"{running_specc/(i+1):8.5f}",
)
# save checkpoint
checkpoint_path = os.path.join(checkpoint_dir, f'fargan{args.suffix}_{epoch}.pth')
checkpoint['state_dict'] = model.state_dict()
checkpoint['loss'] = running_loss / len(dataloader)
checkpoint['epoch'] = epoch
torch.save(checkpoint, checkpoint_path)

View File

@@ -0,0 +1,88 @@
import os
import sys
import argparse
import torch
from torch import nn
sys.path.append(os.path.join(os.path.split(__file__)[0], '../weight-exchange'))
import wexchange.torch
from models import model_dict
unquantized = [
'bfcc_with_corr_upsampler.fc',
'cont_net.0',
'fwc6.cont_fc.0',
'fwc6.fc.0',
'fwc6.fc.1.gate',
'fwc7.cont_fc.0',
'fwc7.fc.0',
'fwc7.fc.1.gate'
]
description=f"""
This is an unsafe dumping script for FWGAN models. It assumes that all weights are included in Linear, Conv1d or GRU layer
and will fail to export any other weights.
Furthermore, the quanitze option relies on the following explicit list of layers to be excluded:
{unquantized}.
Modify this script manually if adjustments are needed.
"""
parser = argparse.ArgumentParser(description=description)
parser.add_argument('model', choices=['fwgan400', 'fwgan500'], help='model name')
parser.add_argument('weightfile', type=str, help='weight file path')
parser.add_argument('export_folder', type=str)
parser.add_argument('--export-filename', type=str, default='fwgan_data', help='filename for source and header file (.c and .h will be added), defaults to fwgan_data')
parser.add_argument('--struct-name', type=str, default='FWGAN', help='name for C struct, defaults to FWGAN')
parser.add_argument('--quantize', action='store_true', help='apply quantization')
if __name__ == "__main__":
args = parser.parse_args()
model = model_dict[args.model]()
print(f"loading weights from {args.weightfile}...")
saved_gen= torch.load(args.weightfile, map_location='cpu')
model.load_state_dict(saved_gen)
def _remove_weight_norm(m):
try:
torch.nn.utils.remove_weight_norm(m)
except ValueError: # this module didn't have weight norm
return
model.apply(_remove_weight_norm)
print("dumping model...")
quantize_model=args.quantize
output_folder = args.export_folder
os.makedirs(output_folder, exist_ok=True)
writer = wexchange.c_export.c_writer.CWriter(os.path.join(output_folder, args.export_filename), model_struct_name=args.struct_name)
for name, module in model.named_modules():
if quantize_model:
quantize=name not in unquantized
scale = None if quantize else 1/128
else:
quantize=False
scale=1/128
if isinstance(module, nn.Linear):
print(f"dumping linear layer {name}...")
wexchange.torch.dump_torch_dense_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
if isinstance(module, nn.Conv1d):
print(f"dumping conv1d layer {name}...")
wexchange.torch.dump_torch_conv1d_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale)
if isinstance(module, nn.GRU):
print(f"dumping GRU layer {name}...")
wexchange.torch.dump_torch_gru_weights(writer, module, name.replace('.', '_'), quantize=quantize, scale=scale, recurrent_scale=scale)
writer.close()

View File

@@ -0,0 +1,141 @@
import os
import time
import torch
import numpy as np
from scipy import signal as si
from scipy.io import wavfile
import argparse
from models import model_dict
parser = argparse.ArgumentParser()
parser.add_argument('model', choices=['fwgan400', 'fwgan500'], help='model name')
parser.add_argument('weightfile', type=str, help='weight file')
parser.add_argument('input', type=str, help='input: feature file or folder with feature files')
parser.add_argument('output', type=str, help='output: wav file name or folder name, depending on input')
########################### Signal Processing Layers ###########################
def preemphasis(x, coef= -0.85):
return si.lfilter(np.array([1.0, coef]), np.array([1.0]), x).astype('float32')
def deemphasis(x, coef= -0.85):
return si.lfilter(np.array([1.0]), np.array([1.0, coef]), x).astype('float32')
gamma = 0.92
weighting_vector = np.array([gamma**i for i in range(16,0,-1)])
def lpc_synthesis_one_frame(frame, filt, buffer, weighting_vector=np.ones(16)):
out = np.zeros_like(frame)
filt = np.flip(filt)
inp = frame[:]
for i in range(0, inp.shape[0]):
s = inp[i] - np.dot(buffer*weighting_vector, filt)
buffer[0] = s
buffer = np.roll(buffer, -1)
out[i] = s
return out
def inverse_perceptual_weighting (pw_signal, filters, weighting_vector):
#inverse perceptual weighting= H_preemph / W(z/gamma)
pw_signal = preemphasis(pw_signal)
signal = np.zeros_like(pw_signal)
buffer = np.zeros(16)
num_frames = pw_signal.shape[0] //160
assert num_frames == filters.shape[0]
for frame_idx in range(0, num_frames):
in_frame = pw_signal[frame_idx*160: (frame_idx+1)*160][:]
out_sig_frame = lpc_synthesis_one_frame(in_frame, filters[frame_idx, :], buffer, weighting_vector)
signal[frame_idx*160: (frame_idx+1)*160] = out_sig_frame[:]
buffer[:] = out_sig_frame[-16:]
return signal
def process_item(generator, feature_filename, output_filename, verbose=False):
feat = np.memmap(feature_filename, dtype='float32', mode='r')
num_feat_frames = len(feat) // 36
feat = np.reshape(feat, (num_feat_frames, 36))
bfcc = np.copy(feat[:, :18])
corr = np.copy(feat[:, 19:20]) + 0.5
bfcc_with_corr = torch.from_numpy(np.hstack((bfcc, corr))).type(torch.FloatTensor).unsqueeze(0)#.to(device)
period = torch.from_numpy((0.1 + 50 * np.copy(feat[:, 18:19]) + 100)\
.astype('int32')).type(torch.long).view(1,-1)#.to(device)
lpc_filters = np.copy(feat[:, -16:])
start_time = time.time()
x1 = generator(period, bfcc_with_corr, torch.zeros(1,320)) #this means the vocoder runs in complete synthesis mode with zero history audio frames
end_time = time.time()
total_time = end_time - start_time
x1 = x1.squeeze(1).squeeze(0).detach().cpu().numpy()
gen_seconds = len(x1)/16000
out = deemphasis(inverse_perceptual_weighting(x1, lpc_filters, weighting_vector))
if verbose:
print(f"Took {total_time:.3f}s to generate {len(x1)} samples ({gen_seconds}s) -> {gen_seconds/total_time:.2f}x real time")
out = np.clip(np.round(2**15 * out), -2**15, 2**15 -1).astype(np.int16)
wavfile.write(output_filename, 16000, out)
########################### The inference loop over folder containing lpcnet feature files #################################
if __name__ == "__main__":
args = parser.parse_args()
generator = model_dict[args.model]()
#Load the FWGAN500Hz Checkpoint
saved_gen= torch.load(args.weightfile, map_location='cpu')
generator.load_state_dict(saved_gen)
#this is just to remove the weight_norm from the model layers as it's no longer needed
def _remove_weight_norm(m):
try:
torch.nn.utils.remove_weight_norm(m)
except ValueError: # this module didn't have weight norm
return
generator.apply(_remove_weight_norm)
#enable inference mode
generator = generator.eval()
print('Successfully loaded the generator model ... start generation:')
if os.path.isdir(args.input):
os.makedirs(args.output, exist_ok=True)
for fn in os.listdir(args.input):
print(f"processing input {fn}...")
feature_filename = os.path.join(args.input, fn)
output_filename = os.path.join(args.output, os.path.splitext(fn)[0] + f"_{args.model}.wav")
process_item(generator, feature_filename, output_filename)
else:
process_item(generator, args.input, args.output)
print("Finished!")

View File

@@ -0,0 +1,7 @@
from .fwgan400 import FWGAN400ContLarge
from .fwgan500 import FWGAN500Cont
model_dict = {
'fwgan400': FWGAN400ContLarge,
'fwgan500': FWGAN500Cont
}

View File

@@ -0,0 +1,308 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils import weight_norm
import numpy as np
which_norm = weight_norm
#################### Definition of basic model components ####################
#Convolutional layer with 1 frame look-ahead (used for feature PreCondNet)
class ConvLookahead(nn.Module):
def __init__(self, in_ch, out_ch, kernel_size, dilation=1, groups=1, bias= False):
super(ConvLookahead, self).__init__()
torch.manual_seed(5)
self.padding_left = (kernel_size - 2) * dilation
self.padding_right = 1 * dilation
self.conv = which_norm(nn.Conv1d(in_ch,out_ch,kernel_size,dilation=dilation, groups=groups, bias= bias))
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x):
x = F.pad(x,(self.padding_left, self.padding_right))
conv_out = self.conv(x)
return conv_out
#(modified) GLU Activation layer definition
class GLU(nn.Module):
def __init__(self, feat_size):
super(GLU, self).__init__()
torch.manual_seed(5)
self.gate = which_norm(nn.Linear(feat_size, feat_size, bias=False))
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x):
out = torch.tanh(x) * torch.sigmoid(self.gate(x))
return out
#GRU layer definition
class ContForwardGRU(nn.Module):
def __init__(self, input_size, hidden_size, num_layers=1):
super(ContForwardGRU, self).__init__()
torch.manual_seed(5)
self.hidden_size = hidden_size
self.cont_fc = nn.Sequential(which_norm(nn.Linear(64, self.hidden_size, bias=False)),
nn.Tanh())
self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True,\
bias=False)
self.nl = GLU(self.hidden_size)
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x, x0):
self.gru.flatten_parameters()
h0 = self.cont_fc(x0).unsqueeze(0)
output, h0 = self.gru(x, h0)
return self.nl(output)
# Framewise convolution layer definition
class ContFramewiseConv(torch.nn.Module):
def __init__(self, frame_len, out_dim, frame_kernel_size=3, act='glu', causal=True):
super(ContFramewiseConv, self).__init__()
torch.manual_seed(5)
self.frame_kernel_size = frame_kernel_size
self.frame_len = frame_len
if (causal == True) or (self.frame_kernel_size == 2):
self.required_pad_left = (self.frame_kernel_size - 1) * self.frame_len
self.required_pad_right = 0
self.cont_fc = nn.Sequential(which_norm(nn.Linear(64, self.required_pad_left, bias=False)),
nn.Tanh()
)
else:
self.required_pad_left = (self.frame_kernel_size - 1)//2 * self.frame_len
self.required_pad_right = (self.frame_kernel_size - 1)//2 * self.frame_len
self.fc_input_dim = self.frame_kernel_size * self.frame_len
self.fc_out_dim = out_dim
if act=='glu':
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
GLU(self.fc_out_dim)
)
if act=='tanh':
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
nn.Tanh()
)
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x, x0):
if self.frame_kernel_size == 1:
return self.fc(x)
x_flat = x.reshape(x.size(0),1,-1)
pad = self.cont_fc(x0).view(x0.size(0),1,-1)
x_flat_padded = torch.cat((pad, x_flat), dim=-1).unsqueeze(2)
x_flat_padded_unfolded = F.unfold(x_flat_padded,\
kernel_size= (1,self.fc_input_dim), stride=self.frame_len).permute(0,2,1).contiguous()
out = self.fc(x_flat_padded_unfolded)
return out
# A fully-connected based upsampling layer definition
class UpsampleFC(nn.Module):
def __init__(self, in_ch, out_ch, upsample_factor):
super(UpsampleFC, self).__init__()
torch.manual_seed(5)
self.in_ch = in_ch
self.out_ch = out_ch
self.upsample_factor = upsample_factor
self.fc = nn.Linear(in_ch, out_ch * upsample_factor, bias=False)
self.nl = nn.Tanh()
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or\
isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x):
batch_size = x.size(0)
x = x.permute(0, 2, 1)
x = self.nl(self.fc(x))
x = x.reshape((batch_size, -1, self.out_ch))
x = x.permute(0, 2, 1)
return x
########################### The complete model definition #################################
class FWGAN400ContLarge(nn.Module):
def __init__(self):
super().__init__()
torch.manual_seed(5)
self.bfcc_with_corr_upsampler = UpsampleFC(19,80,4)
self.feat_in_conv1 = ConvLookahead(160,256,kernel_size=5)
self.feat_in_nl1 = GLU(256)
self.cont_net = nn.Sequential(which_norm(nn.Linear(321, 160, bias=False)),
nn.Tanh(),
which_norm(nn.Linear(160, 160, bias=False)),
nn.Tanh(),
which_norm(nn.Linear(160, 80, bias=False)),
nn.Tanh(),
which_norm(nn.Linear(80, 80, bias=False)),
nn.Tanh(),
which_norm(nn.Linear(80, 64, bias=False)),
nn.Tanh(),
which_norm(nn.Linear(64, 64, bias=False)),
nn.Tanh())
self.rnn = ContForwardGRU(256,256)
self.fwc1 = ContFramewiseConv(256, 256)
self.fwc2 = ContFramewiseConv(256, 128)
self.fwc3 = ContFramewiseConv(128, 128)
self.fwc4 = ContFramewiseConv(128, 64)
self.fwc5 = ContFramewiseConv(64, 64)
self.fwc6 = ContFramewiseConv(64, 40)
self.fwc7 = ContFramewiseConv(40, 40)
self.init_weights()
self.count_parameters()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def count_parameters(self):
num_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
print(f"Total number of {self.__class__.__name__} network parameters = {num_params}\n")
def create_phase_signals(self, periods):
batch_size = periods.size(0)
progression = torch.arange(1, 160 + 1, dtype=periods.dtype, device=periods.device).view((1, -1))
progression = torch.repeat_interleave(progression, batch_size, 0)
phase0 = torch.zeros(batch_size, dtype=periods.dtype, device=periods.device).unsqueeze(-1)
chunks = []
for sframe in range(periods.size(1)):
f = (2.0 * torch.pi / periods[:, sframe]).unsqueeze(-1)
chunk_sin = torch.sin(f * progression + phase0)
chunk_sin = chunk_sin.reshape(chunk_sin.size(0),-1,40)
chunk_cos = torch.cos(f * progression + phase0)
chunk_cos = chunk_cos.reshape(chunk_cos.size(0),-1,40)
chunk = torch.cat((chunk_sin, chunk_cos), dim = -1)
phase0 = phase0 + 160 * f
chunks.append(chunk)
phase_signals = torch.cat(chunks, dim=1)
return phase_signals
def gain_multiply(self, x, c0):
gain = 10**(0.5*c0/np.sqrt(18.0))
gain = torch.repeat_interleave(gain, 160, dim=-1)
gain = gain.reshape(gain.size(0),1,-1).squeeze(1)
return x * gain
def forward(self, pitch_period, bfcc_with_corr, x0):
norm_x0 = torch.norm(x0,2, dim=-1, keepdim=True)
x0 = x0 / torch.sqrt((1e-8) + norm_x0**2)
x0 = torch.cat((torch.log(norm_x0 + 1e-7), x0), dim=-1)
p_embed = self.create_phase_signals(pitch_period).permute(0, 2, 1).contiguous()
envelope = self.bfcc_with_corr_upsampler(bfcc_with_corr.permute(0,2,1).contiguous())
feat_in = torch.cat((p_embed , envelope), dim=1)
wav_latent1 = self.feat_in_nl1(self.feat_in_conv1(feat_in).permute(0,2,1).contiguous())
cont_latent = self.cont_net(x0)
rnn_out = self.rnn(wav_latent1, cont_latent)
fwc1_out = self.fwc1(rnn_out, cont_latent)
fwc2_out = self.fwc2(fwc1_out, cont_latent)
fwc3_out = self.fwc3(fwc2_out, cont_latent)
fwc4_out = self.fwc4(fwc3_out, cont_latent)
fwc5_out = self.fwc5(fwc4_out, cont_latent)
fwc6_out = self.fwc6(fwc5_out, cont_latent)
fwc7_out = self.fwc7(fwc6_out, cont_latent)
waveform = fwc7_out.reshape(fwc7_out.size(0),1,-1).squeeze(1)
waveform = self.gain_multiply(waveform,bfcc_with_corr[:,:,:1])
return waveform

View File

@@ -0,0 +1,260 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils import weight_norm
import numpy as np
which_norm = weight_norm
#################### Definition of basic model components ####################
#Convolutional layer with 1 frame look-ahead (used for feature PreCondNet)
class ConvLookahead(nn.Module):
def __init__(self, in_ch, out_ch, kernel_size, dilation=1, groups=1, bias= False):
super(ConvLookahead, self).__init__()
torch.manual_seed(5)
self.padding_left = (kernel_size - 2) * dilation
self.padding_right = 1 * dilation
self.conv = which_norm(nn.Conv1d(in_ch,out_ch,kernel_size,dilation=dilation, groups=groups, bias= bias))
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x):
x = F.pad(x,(self.padding_left, self.padding_right))
conv_out = self.conv(x)
return conv_out
#(modified) GLU Activation layer definition
class GLU(nn.Module):
def __init__(self, feat_size):
super(GLU, self).__init__()
torch.manual_seed(5)
self.gate = which_norm(nn.Linear(feat_size, feat_size, bias=False))
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d)\
or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x):
out = torch.tanh(x) * torch.sigmoid(self.gate(x))
return out
#GRU layer definition
class ContForwardGRU(nn.Module):
def __init__(self, input_size, hidden_size, num_layers=1):
super(ContForwardGRU, self).__init__()
torch.manual_seed(5)
self.hidden_size = hidden_size
#This is to initialize the layer with history audio samples for continuation.
self.cont_fc = nn.Sequential(which_norm(nn.Linear(320, self.hidden_size, bias=False)),
nn.Tanh())
self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True,\
bias=False)
self.nl = GLU(self.hidden_size)
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x, x0):
self.gru.flatten_parameters()
h0 = self.cont_fc(x0).unsqueeze(0)
output, h0 = self.gru(x, h0)
return self.nl(output)
# Framewise convolution layer definition
class ContFramewiseConv(torch.nn.Module):
def __init__(self, frame_len, out_dim, frame_kernel_size=3, act='glu', causal=True):
super(ContFramewiseConv, self).__init__()
torch.manual_seed(5)
self.frame_kernel_size = frame_kernel_size
self.frame_len = frame_len
if (causal == True) or (self.frame_kernel_size == 2):
self.required_pad_left = (self.frame_kernel_size - 1) * self.frame_len
self.required_pad_right = 0
#This is to initialize the layer with history audio samples for continuation.
self.cont_fc = nn.Sequential(which_norm(nn.Linear(320, self.required_pad_left, bias=False)),
nn.Tanh()
)
else:
#This means non-causal frame-wise convolution. We don't use it at the moment
self.required_pad_left = (self.frame_kernel_size - 1)//2 * self.frame_len
self.required_pad_right = (self.frame_kernel_size - 1)//2 * self.frame_len
self.fc_input_dim = self.frame_kernel_size * self.frame_len
self.fc_out_dim = out_dim
if act=='glu':
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
GLU(self.fc_out_dim)
)
if act=='tanh':
self.fc = nn.Sequential(which_norm(nn.Linear(self.fc_input_dim, self.fc_out_dim, bias=False)),
nn.Tanh()
)
self.init_weights()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def forward(self, x, x0):
if self.frame_kernel_size == 1:
return self.fc(x)
x_flat = x.reshape(x.size(0),1,-1)
pad = self.cont_fc(x0).view(x0.size(0),1,-1)
x_flat_padded = torch.cat((pad, x_flat), dim=-1).unsqueeze(2)
x_flat_padded_unfolded = F.unfold(x_flat_padded,\
kernel_size= (1,self.fc_input_dim), stride=self.frame_len).permute(0,2,1).contiguous()
out = self.fc(x_flat_padded_unfolded)
return out
########################### The complete model definition #################################
class FWGAN500Cont(nn.Module):
def __init__(self):
super().__init__()
torch.manual_seed(5)
#PrecondNet:
self.bfcc_with_corr_upsampler = nn.Sequential(nn.ConvTranspose1d(19,64,kernel_size=5,stride=5,padding=0,\
bias=False),
nn.Tanh())
self.feat_in_conv = ConvLookahead(128,256,kernel_size=5)
self.feat_in_nl = GLU(256)
#GRU:
self.rnn = ContForwardGRU(256,256)
#Frame-wise convolution stack:
self.fwc1 = ContFramewiseConv(256, 256)
self.fwc2 = ContFramewiseConv(256, 128)
self.fwc3 = ContFramewiseConv(128, 128)
self.fwc4 = ContFramewiseConv(128, 64)
self.fwc5 = ContFramewiseConv(64, 64)
self.fwc6 = ContFramewiseConv(64, 32)
self.fwc7 = ContFramewiseConv(32, 32, act='tanh')
self.init_weights()
self.count_parameters()
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d) or isinstance(m, nn.ConvTranspose1d) or isinstance(m, nn.Linear) or\
isinstance(m, nn.Embedding):
nn.init.orthogonal_(m.weight.data)
def count_parameters(self):
num_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
print(f"Total number of {self.__class__.__name__} network parameters = {num_params}\n")
def create_phase_signals(self, periods):
batch_size = periods.size(0)
progression = torch.arange(1, 160 + 1, dtype=periods.dtype, device=periods.device).view((1, -1))
progression = torch.repeat_interleave(progression, batch_size, 0)
phase0 = torch.zeros(batch_size, dtype=periods.dtype, device=periods.device).unsqueeze(-1)
chunks = []
for sframe in range(periods.size(1)):
f = (2.0 * torch.pi / periods[:, sframe]).unsqueeze(-1)
chunk_sin = torch.sin(f * progression + phase0)
chunk_sin = chunk_sin.reshape(chunk_sin.size(0),-1,32)
chunk_cos = torch.cos(f * progression + phase0)
chunk_cos = chunk_cos.reshape(chunk_cos.size(0),-1,32)
chunk = torch.cat((chunk_sin, chunk_cos), dim = -1)
phase0 = phase0 + 160 * f
chunks.append(chunk)
phase_signals = torch.cat(chunks, dim=1)
return phase_signals
def gain_multiply(self, x, c0):
gain = 10**(0.5*c0/np.sqrt(18.0))
gain = torch.repeat_interleave(gain, 160, dim=-1)
gain = gain.reshape(gain.size(0),1,-1).squeeze(1)
return x * gain
def forward(self, pitch_period, bfcc_with_corr, x0):
#This should create a latent representation of shape [Batch_dim, 500 frames, 256 elemets per frame]
p_embed = self.create_phase_signals(pitch_period).permute(0, 2, 1).contiguous()
envelope = self.bfcc_with_corr_upsampler(bfcc_with_corr.permute(0,2,1).contiguous())
feat_in = torch.cat((p_embed , envelope), dim=1)
wav_latent = self.feat_in_nl(self.feat_in_conv(feat_in).permute(0,2,1).contiguous())
#Generation with continuation using history samples x0 starts from here:
rnn_out = self.rnn(wav_latent, x0)
fwc1_out = self.fwc1(rnn_out, x0)
fwc2_out = self.fwc2(fwc1_out, x0)
fwc3_out = self.fwc3(fwc2_out, x0)
fwc4_out = self.fwc4(fwc3_out, x0)
fwc5_out = self.fwc5(fwc4_out, x0)
fwc6_out = self.fwc6(fwc5_out, x0)
fwc7_out = self.fwc7(fwc6_out, x0)
waveform_unscaled = fwc7_out.reshape(fwc7_out.size(0),1,-1).squeeze(1)
waveform = self.gain_multiply(waveform_unscaled,bfcc_with_corr[:,:,:1])
return waveform

View File

@@ -0,0 +1,27 @@
#Packet loss simulator
This code is an attempt at simulating better packet loss scenarios. The most common way of simulating
packet loss is to use a random sequence where each packet loss event is uncorrelated with previous events.
That is a simplistic model since we know that losses often occur in bursts. This model uses real data
to build a generative model for packet loss.
We use the training data provided for the Audio Deep Packet Loss Concealment Challenge, which is available at:
http://plcchallenge2022pub.blob.core.windows.net/plcchallengearchive/test_train.tar.gz
To create the training data, run:
`./process_data.sh /<path>/test_train/train/lossy_signals/`
That will create an ascii loss\_sorted.txt file with all loss data sorted in increasing packet loss
percentage. Then just run:
`python ./train_lossgen.py`
to train a model
To generate a sequence, run
`python3 ./test_lossgen.py <checkpoint> <percentage> output.txt --length 10000`
where <checkpoint> is the .pth model file and <percentage> is the amount of loss (e.g. 0.2 for 20% loss).

View File

@@ -0,0 +1,101 @@
"""
/* Copyright (c) 2022 Amazon
Written by Jan Buethe */
/*
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
"""
import os
import argparse
import sys
sys.path.append(os.path.join(os.path.dirname(__file__), '../weight-exchange'))
parser = argparse.ArgumentParser()
parser.add_argument('checkpoint', type=str, help='model checkpoint')
parser.add_argument('output_dir', type=str, help='output folder')
args = parser.parse_args()
import torch
import numpy as np
import lossgen
from wexchange.torch import dump_torch_weights
from wexchange.c_export import CWriter, print_vector
def c_export(args, model):
message = f"Auto generated from checkpoint {os.path.basename(args.checkpoint)}"
writer = CWriter(os.path.join(args.output_dir, "lossgen_data"), message=message, model_struct_name='LossGen', enable_binary_blob=False, add_typedef=True)
writer.header.write(
f"""
#include "opus_types.h"
"""
)
dense_layers = [
('dense_in', "lossgen_dense_in"),
('dense_out', "lossgen_dense_out")
]
for name, export_name in dense_layers:
layer = model.get_submodule(name)
dump_torch_weights(writer, layer, name=export_name, verbose=True, quantize=False, scale=None)
gru_layers = [
("gru1", "lossgen_gru1"),
("gru2", "lossgen_gru2"),
]
max_rnn_units = max([dump_torch_weights(writer, model.get_submodule(name), export_name, verbose=True, input_sparse=False, quantize=True, scale=None, recurrent_scale=None)
for name, export_name in gru_layers])
writer.header.write(
f"""
#define LOSSGEN_MAX_RNN_UNITS {max_rnn_units}
"""
)
writer.close()
if __name__ == "__main__":
os.makedirs(args.output_dir, exist_ok=True)
checkpoint = torch.load(args.checkpoint, map_location='cpu')
model = lossgen.LossGen(*checkpoint['model_args'], **checkpoint['model_kwargs'])
model.load_state_dict(checkpoint['state_dict'], strict=False)
#model = LossGen()
#checkpoint = torch.load(args.checkpoint, map_location='cpu')
#model.load_state_dict(checkpoint['state_dict'])
c_export(args, model)

View File

@@ -0,0 +1,29 @@
import torch
from torch import nn
import torch.nn.functional as F
class LossGen(nn.Module):
def __init__(self, gru1_size=16, gru2_size=16):
super(LossGen, self).__init__()
self.gru1_size = gru1_size
self.gru2_size = gru2_size
self.dense_in = nn.Linear(2, 8)
self.gru1 = nn.GRU(8, self.gru1_size, batch_first=True)
self.gru2 = nn.GRU(self.gru1_size, self.gru2_size, batch_first=True)
self.dense_out = nn.Linear(self.gru2_size, 1)
def forward(self, loss, perc, states=None):
#print(states)
device = loss.device
batch_size = loss.size(0)
if states is None:
gru1_state = torch.zeros((1, batch_size, self.gru1_size), device=device)
gru2_state = torch.zeros((1, batch_size, self.gru2_size), device=device)
else:
gru1_state = states[0]
gru2_state = states[1]
x = torch.tanh(self.dense_in(torch.cat([loss, perc], dim=-1)))
gru1_out, gru1_state = self.gru1(x, gru1_state)
gru2_out, gru2_state = self.gru2(gru1_out, gru2_state)
return self.dense_out(gru2_out), [gru1_state, gru2_state]

View File

@@ -0,0 +1,17 @@
#!/bin/sh
#directory containing the loss files
datadir=$1
for i in $datadir/*_is_lost.txt
do
perc=`cat $i | awk '{a+=$1}END{print a/NR}'`
echo $perc $i
done > percentage_list.txt
sort -n percentage_list.txt | awk '{print $2}' > percentage_sorted.txt
for i in `cat percentage_sorted.txt`
do
cat $i
done > loss_sorted.txt

View File

@@ -0,0 +1,42 @@
import lossgen
import os
import argparse
import torch
import numpy as np
parser = argparse.ArgumentParser()
parser.add_argument('model', type=str, help='CELPNet model')
parser.add_argument('percentage', type=float, help='percentage loss')
parser.add_argument('output', type=str, help='path to output file (ascii)')
parser.add_argument('--length', type=int, help="length of sequence to generate", default=500)
args = parser.parse_args()
checkpoint = torch.load(args.model, map_location='cpu')
model = lossgen.LossGen(*checkpoint['model_args'], **checkpoint['model_kwargs'])
model.load_state_dict(checkpoint['state_dict'], strict=False)
states=None
last = torch.zeros((1,1,1))
perc = torch.tensor((args.percentage,))[None,None,:]
seq = torch.zeros((0,1,1))
one = torch.ones((1,1,1))
zero = torch.zeros((1,1,1))
if __name__ == '__main__':
for i in range(args.length):
prob, states = model(last, perc, states=states)
prob = torch.sigmoid(prob)
states[0] = states[0].detach()
states[1] = states[1].detach()
loss = one if np.random.rand() < prob else zero
last = loss
seq = torch.cat([seq, loss])
np.savetxt(args.output, seq[:,:,0].numpy().astype('int'), fmt='%d')

Some files were not shown because too many files have changed in this diff Show More