3

In this tutorial, they build a speech recognition model to classify a one-second audio clip as one of ten predefined words. Suppose that we modified this problem as the following: Given an Arabic dataset, we aim to build a dialects recognition model to classify a two-second audio clip as one of $n$ local dialects using ten predefined sentences. I.e. for each of these ten sentences, there are $x$ different phrases and idioms which refer to the same meaning$^*$. Now how can I take advantage of the mentioned tutorial to solve the modified problem?

$*$ The $x$ different phrases and idioms for each sentence are not predefined.

Abdulkader
  • 43
  • 5

1 Answers1

1

The tutorials you link are not much relevant, there are already existing implementations of your exact problem.

You can use https://github.com/swshon/dialectID_e2e, there are many other similar implementations on github.

Nikolay Shmyrev
  • 271
  • 2
  • 4