TL;DR: I can't figure out why my neural network wont give me a sensible output. I assume it's something to do with how I'm presenting the input data to it but I have no idea how to fix it.
Background:
I am using matched pairs of speech samples to generate a model which morphs one persons voice into another. There are some standard pre-processing steps which have been done and can be reversed in order to generate a new speech file.
With these I am attempting to generate a very simple neural network that translates the input vector into the output one and then reconstructs a waveform.
I understand what I'm trying to do mathematically but that's not helping me make keras/tensorflow actually do it.
Inputs:
As inputs to my model I have vectors containing Fourier Transform values from the input speech sample matched with their counterpart target vectors.
These vectors contain the FT values from each 25ms fragment of utterance are in the form $[r_1, i_1, ..., r_n, i_n]$ where $r$ is the real part of the number and $i$ is the imaginary one.
I am constructing these pairs into a dataset reshaping each input vector as I do so:
def create_dataset(filepaths):
"""
:param filepaths: array containing the locations of the relevant files
:return: a tensorflow dataset constructed from the source data
"""
examples = []
labels = []
for item in filepaths:
try:
source = np.load(Path(item[0]))
target = np.load(Path(item[1]))
# load mapping
with open(Path(item[2]), 'r') as f:
l = [int(s) for s in list(f.read()) if s.isdigit()]
it = iter(l)
mapping = zip(it, it)
for entry in mapping:
x, y = entry
ex, lab = source[x], target[y]
ex_ph, lab_ph = np.empty(1102), np.empty(1102)
# split the values into their real and imaginary parts and append to the appropriate array
for i in range(0, 1102, 2):
idx = int(i / 2)
ex_ph[i] = ex[idx].real
ex_ph[i+1] = ex[idx].imag
lab_ph[i] = lab[idx].real
lab_ph[i+1] = lab[idx].imag
examples.append(ex_ph.reshape(1,1102))
# I'm not reshaping the labels based on a theory that doing so was messing with my loss function
labels.append(lab_ph)
except FileNotFoundError as e:
print(e)
return tf.data.Dataset.from_tensor_slices((examples, labels))
This is then being passed to the neural network:
def train(training_set, validation_set, test_set, filename):
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(1102,)),
tf.keras.layers.Dense(551, activation='relu'),
tf.keras.layers.Dense(1102)])
model.compile(loss="mean_squared_error", optimizer="sgd")
model.fit(training_set, epochs=1, validation_data=validation_set)
model.evaluate(test_set)
model.save(f'../data/models/{filename}.h5')
print(model.summary())
and I get out... crackling. Every time, no matter how much data I throw at it. I assume I'm doing something obviously and horribly wrong with the way I'm setting this up.