Neural network to detect "spam"?

Question

I've inherited a neural network project at the company I work for. The person who developed gave me some very basic training to get up and running. I've maintained it for a while. The current neural network is able to classify messages for telcos: it can send them to support people in different areas, like "activation", "no signal", "internet", etc. The network has been working flawlessly. The structure of this neural network is as follows:

 model = Sequential()
 model.add(Dense(500, input_shape=(len(train_x[0]),)))
 model.add(Activation('relu'))
 model.add(Dropout(0.6))
 model.add(Dense(250, input_shape=(500,)))
 model.add(Activation('relu'))
 model.add(Dropout(0.5))
 model.add(Dense(len(train_y[0])))
 model.add(Activation('softmax'))
 model.compile(loss='categorical_crossentropy',
                  optimizer='Adamax',
                  metrics=['accuracy'])

This uses a Word2Vec embedding, and has been trained with a "clean" file: all special characters and numbers are removed from both the training file and the input data.

Now I've been assigned to make a neural network to detect if a message will be catalog as "moderated" (meaning it's an insult, spam, or just people commenting on a facebook post), or "operative", meaning the message is actually a question for the company.

What I did was start from the current model and reduce the number of categories to two. It didn't go very well: the word embedding was in spanish from Argentina, and the training data was spanish from Peru. I made a new embedding and accuracy increased by a fair margin (we are looking for insults and other curse words. In spanish a curse word from a country can be a normal word for another: in Spain "coger" means "to take", and in Argentina it means "to f__k". "concha" means shell in most countries, but in Argentina it means "c__t". You get the idea).

I trained the network with 300.000 messages. Roughly 40% of these were classified as "moderated". I tried all sorts of combinations of cycles and epochs. The accuracy slowly increased to nearly 0.9, and loss stays around 0.5000.

But when testing the neural network, "operative" messages generally seem to be correctly classified, with accuracy around 0.9, but "moderated" messages aren't. They are classified around 0.6 or less. At some point I tried multiple insults in a message (even pasting sample data as input data), but it didn't seem to improve.

Word2Vec works fantastically. The words are correctly "lumped" together (learned a few insults in Peruvian spanish thanks to it).

I put the neural network in production for a week, to gather statistics. Basically 90% of the messages went unclassified, and 5% were correctly classified and 5% wrong. Since the network has two categories, this seems to mean the neural network is just giving random guesses.

So, the questions are:

Is it possible to accomplish this task with a neural network?
Is the structure of this neural network correct for this task?
Are 300k messages enough to train the neural network?
Do I need to clean up the data from uppercase, special characters, numbers etc?

Douglas Daseeco · Answer 1 · 2019-01-04T00:04:56.810

The Project Summarized

The project goal appears to be a common one: Routing correspondence in an efficient manner to maintain good but low cost customer and public relations. A few features of the project were mentioned.

Neural network project
Received some design and project history from predecessor
Classifies messages for telcos
Sends results to support groups at appropriate locales
Uses 2 relu layers, ending with a softmax
Word2Vec embedding
Trained with a clean language file
All special characters and numbers removed

The requirements for current development were indicated. The current work is to develop an artificial network that places incoming messages into one of two categories accurately and reliably.

Moderated — insulting, fraudulent in purpose (spam), trivial routine
Operative — relevant question requiring internal human attention

Research and development is beginning along reasonable lines.

Trained with 300,000 messages
Word2Vec used
40% of classified as moderated
Permuted cycles and epochs
Achieved 90% accuracy
Loss stays near 0.5
In test, operative accuracy 0.9, moderated accuracy max of 0.6

First Obstacle and Feasibility

The first obstacle encountered is that in QA using production environment data, 90% of the messages where left unclassified, 5% of the classifications were accurate, and the remaining 5% were inaccurately classified.

It is correct that the even split of 5% accuracy and 5% inaccuracy indicates that information learned is not yet transferable to the quality assurance test phase using real production environment messages. In information theory phraseology, no bits of usable information were transferred and entropy remained unchanged on this first experiment.

These kinds of disappointments are not uncommon when first approaching the use of AI in an existing business environment, so this initial outcome should not be taken as a sign that the idea won't work. The approach will likely work, especially with foul language, which is not dependent on cultural references, analogies, or other semantic complexity.

Recognizing notices that are for audit purposes only, from a social network accounts or purchase confirmations, can be handled through rules. The rule creation and maintenance can theoretically be automated too, and some proprietary systems exist that do exactly that. Such automation can be learned using the appropriate training data, but real time feedback is usually employed, and those systems are usually model based. That is an option for further down the R&D road.

The scope of the project is probably too small, but that's not a big surprise either. Most projects suffer from early overoptimism. A pertinent quote from Redford's The Melagro Beanfield War illuminates the practical purpose of optimism.

APPARITION

I don't know if your friend knows what he's in for.

AMARANTE

Nobody would do anything if they knew what they were in for.

Initial Comments

It is not necessary to reduce the number of message categories to two, but there is nothing wrong with starting R&D by refining approach and high level design with the simplest case.

The last layer may be more training efficient if a binary threshold is used for the activation function instead of softmax, since there is only one bit of output needed when there are only two categories. This also forces the network training objective to be the definitive selection of a category, which may benefit the overall rate of R&D progress.

There may be ways of improving outcomes by adding more metrics in the code to beyond just 'accuracy'. Others who work with such details every day may have more domain specific knowledge in this regard.

Culture and Pattern Detection

Insults and curse words are entirely different kinds of things. Foul language is a linguistic symbol or phrase that fits into a broadcasting or publishing category of prohibition. The rules of prohibition are well established in most languages and could be held in a configuration file along with the permutations of each symbol or phrase. In the case of sh*t, related forms include sh*tty, sh*thead, and so on.

It is also useful to distinguish the sub-sets of foul language.

Cursing (expressing the wish for calamity to befall the recipient)
Swearing (considered blasphemy by some)
Exclamations that are considered foul by publishers and broadcasters
Additional items parents don't want their children to hear
Edge cases like crap

The term foul language is a super-set of these.

Distribution Alignment

Learning algorithms and theory are based on probabilistic alignment of feature distributions between training and use. The distribution of training data must closely resembles the distribution found when the trained AI component is later used. If not, the convergence of learning processes on some optimal behavior defined by gain or loss functions may succeed but the execution of that behavior in the business or industry may fail.

Internationalization

Multilingual AI should usually be fully internationalized. Training and use of training with two distinct dialects will almost always perform poorly. That creates a data acquisition challenge.

As stated above, classification and learning depend on the alignment of statistical distributions between data used in training and data processing relying on the use of what was learned. This is also true of human learning, so this requirement will not likely be overcome any time soon.

All these forms of foul language must be programmed flexibly across these cultural dimensions.

Character set
Collation order
Language
Dialect
Other locale related determinants
Education level
Economic strata

Once one of these is included in the model (which will be imperative) then there is no reason why the others cannot be included at little cost, so it is wise to begin with standard dimensions of flexibility. The alternative will likely lead to costly branching complexity to represent specific rules, which could have been made more maintainable by generalizing for international use up front.

Insult Recognition

Insults require comprehension beyond the current state of technology. Cognitive science may change that in the future, but projections are mere conjecture.

Use of a regular expression engine with a fuzzy logic comparator is achievable and may appease the stakeholders of the project, but identifying insults may be infeasible at this time, and the expectations should be set with stakeholders to avoid later surprises. Consider these examples.

The nose on your face looks like a camel.
Kiss the darkest part of my little white. (From Avatar screenplay)

The word combinations in these are not likely to be in some data set you can use for training, so Word2Vec will not help in these types of cases. Additional layers may assist with proper handling of the at least some of the semantic and referential complexity of insults, but only some.

Explicit Answers to Explicit Questions

Is it possible to accomplish this task with a neural network?

Yes, in combination with excellence in higher level system design and best practices for internationalization.

Is the structure of this neural network correct for this task?

The initial experiments look like a reasonable beginning toward what would later be correct enough. Do not be discouraged, but don't expect the first pass at something like this to look much like what passes user acceptance testing a year from now. Experts can't pull that rate of R&D progress off, unless they hack and cobble something together from previous work.

Are 300k messages enough to train the neural network?

Probably not. In fact, 300m messages will not catch all combinations of cultural references, analogies, colloquialisms, variations in dialect, plays on words, and games that spammers play to avoid detection.

What would really help is a feedback mechanism so that production outcomes are driving the training rather than a necessarily limited data set. Canned data sets are usually restricted in the accuracy of their probabilistic representation of social phenomena. None will likely infer dialect and other locale features to better detect insults. A Parisian insult may have nothing in common with a Creole insult.

The feedback mechanism must be based on impressions in some way to become and remain accurate. The impressions must be labelled with all the locale data that is reasonably easy to collect and possibly correlated to the impression.

This implies the use of rules acquisition, fuzzy logic control, reinforcement learning, or the application of naive Bayesian approaches somewhere appropriate within the system architecture.

Do I need to clean up the data from uppercase, special characters, numbers etc?

Numbers can be relevant. Because of historical events and religious texts, 13 and 666 might be indications of something offensive, respectively. One can also use numbers and punctuation to convey word content. Here are some examples of spam detection resistant click bait.

I've got a 6ex opportunity 4u.
Wanna 69?
Values are rising 50%! We have 9 investment choices 4 you to check out.

The meaning of the term special character is vague and ambiguous. Any character in UTF-8 is legitimate for almost all Internet communications today. HTML5 provides additional entities beginning with an ampersand and ending with a semicolon. (See https://dev.w3.org/html5/html-author/charref.)

Filtering these out is a mistake. Spammers leverage these standards to penetrate spam detection. In this example, the stroke similarities of a capital ell (L) and those of the British pound symbol can be exploited to produce spam detection resistant click bait.

Do you like hot £egs?

Removing special characters that fit within the Internet standards of UTF-8 and HTML entities will likely lead to disaster. It is recommended not to follow that part of the predecessor's design.

Regarding emoticons and other ideograms, these are linguistic elements that may represent in text encoding the volume, pitch, or tone modulation of phonetics, or they may represent face or body language. In many languages ideograms are used in place of words. For a global system running in parallel with the blogsphere, emoticons are part of linguistic expression.

For that reason, they are not significantly different than word roots, prefixes, suffixes, conjugations, or word pairs as linguistic elements which can also express emotion as well as logical reasoning. For the learning algorithm to learn categorization behavior in the presence of ideograms, the ideograms must remain in training features and later in real time processing of those features using the results of training.

Additional Information

Some additional information is covered in this existing post: Spam Detection using Recurrent Neural Networks.

Since spam detection is closely related to fraud detection, the spammer fraudulently acting like a relationship already exists with their recipients, this existing post may be of assistance too: Can we implement GAN (Generative adversarial neural networks) for classication problem like Fraud detecion?

Another resource that may help is this: https://www.tensorflow.org/tutorials/representation/word2vec

score 2 · Answer 2 · answered Dec 28 '18 at 13:49

First your questions:

Yes, it is possible to accomplish this with a neural network, this is actually very similar to your current working model (the idea is the same, just different classes). So, there are no real reasons to start changing the architecture, especially if from what I understand, you are not a deep-learning expert. I mean, I'm sure that you can poke and play with the model to improve it, but that is not your current concern. Right now it seems that your model isn't training properly.

I do think that 300k messages should be enough at least for a decent model (way better than random choice).

There are some standard preprocessing steps that you should perform on your input text (if you haven't already), in general you should normalize (convert to lower case) and remove noise from your text (like special characters). You can read more about different methods in the following link: Text Preprocessing in Python: Steps, Tools, and Examples

Suggestions and questions:

From your initial explanation, your new model has 2 outputs (moderated and operative), however later you mention that 90% of the input was not classified. So do you have a third class (no-classification) or are you just using some threshold of probability, that only above it the massage is classified? Because if you only have 2 classes, I would suggest to use a binary cross-entropy loss and not categorical.

I have a strong suspicion that there is something wrong with either your training dataset or process. Please provide more info about the dataset (number of samples from each class, how was it labeled) and your training process (are you using part of the data for validation? how many epochs? etc.).

You are correct, I'm far from being an expert. The data was hand labeled. The company has hundreds of agents answering questions and they classified and roughly 40% of them were classified as "moderated", and the rest were different irrelevant classifications that I called "operative" (I also tried classifying with all 10 existing categories, didn't help much either). 90% of the input was indeed not classified because I set a threshold of 0.7, so 90% of the time the threshold was lower than that. The model was trained up to 100 cycles of 100 epochs. A couple of hours training on a GTX1080. — hjf, Dec 28 '18 at 14:02
I'm using a 20% validation split as well, and batch size is 500. — hjf, Dec 28 '18 at 14:04

Neural network to detect "spam"?

2 Answers2

Linked