1

TL;DR: I can't figure out why my neural network wont give me a sensible output. I assume it's something to do with how I'm presenting the input data to it but I have no idea how to fix it.

Background:

I am using matched pairs of speech samples to generate a model which morphs one persons voice into another. There are some standard pre-processing steps which have been done and can be reversed in order to generate a new speech file.

With these I am attempting to generate a very simple neural network that translates the input vector into the output one and then reconstructs a waveform.

I understand what I'm trying to do mathematically but that's not helping me make keras/tensorflow actually do it.

Inputs:

As inputs to my model I have vectors containing Fourier Transform values from the input speech sample matched with their counterpart target vectors.

These vectors contain the FT values from each 25ms fragment of utterance are in the form $[r_1, i_1, ..., r_n, i_n]$ where $r$ is the real part of the number and $i$ is the imaginary one.

I am constructing these pairs into a dataset reshaping each input vector as I do so:

def create_dataset(filepaths):
    """
    :param filepaths: array containing the locations of the relevant files
    :return: a tensorflow dataset constructed from the source data
    """
    examples = []
    labels = []

    for item in filepaths:
        try:
            source = np.load(Path(item[0]))
            target = np.load(Path(item[1]))

            # load mapping
            with open(Path(item[2]), 'r') as f:
                l = [int(s) for s in list(f.read()) if s.isdigit()]
                it = iter(l)
                mapping = zip(it, it)

            for entry in mapping:
                x, y = entry
                ex, lab = source[x], target[y]
                ex_ph, lab_ph = np.empty(1102), np.empty(1102)

                # split the values into their real and imaginary parts and append to the appropriate array
                for i in range(0, 1102, 2):
                    idx = int(i / 2)

                    ex_ph[i] = ex[idx].real
                    ex_ph[i+1] = ex[idx].imag
                    lab_ph[i] = lab[idx].real
                    lab_ph[i+1] = lab[idx].imag

                examples.append(ex_ph.reshape(1,1102))

                # I'm not reshaping the labels based on a theory that doing so was messing with my loss function
                labels.append(lab_ph)

        except FileNotFoundError as e:
            print(e)

    return tf.data.Dataset.from_tensor_slices((examples, labels))

This is then being passed to the neural network:

def train(training_set, validation_set, test_set, filename):
    model = tf.keras.Sequential([tf.keras.layers.Input(shape=(1102,)),
                                 tf.keras.layers.Dense(551, activation='relu'),
                                 tf.keras.layers.Dense(1102)])

    model.compile(loss="mean_squared_error", optimizer="sgd")

    model.fit(training_set, epochs=1, validation_data=validation_set)

    model.evaluate(test_set)
    model.save(f'../data/models/{filename}.h5')
    print(model.summary())

and I get out... crackling. Every time, no matter how much data I throw at it. I assume I'm doing something obviously and horribly wrong with the way I'm setting this up.

0 Answers0