First of all, you mention that you have categorical data. I don't see how you can define similarity so that you can also define the distance between the predicted value and the ground truth (error). You can do that only if the data are ordinal.
If you want to just classify between normal and anomalous points (binary classification), without caring about further classification of the anomaly types themselves, one of the most common algorithms is the One-Class Support Vector Machine (OC-SVM).
Anomalies are unpredictable in nature and sometimes hard to replicate and record. Therefore, there is usually lack of anomalous data and supervised learning approaches suffer because if you sacrifice some "precious" anomalous points to train the algorithm, you cannot use them to test it.
The main advantage of OC-SVM is that it is semi-supervised learning, meaning that you train it only with normal data and then it can detect samples that deviate from the trained behaviour during testing and classify them as anomalous. Thus, you "save" all the rare anomalous points for testing purposes!
Take a look at this short Python example, it has all you need :)