Question 1. I am wondering whether this field (using RNNs for email spam detection) worths more researches or it is a closed research field.
Use of RNNs to detect spam grew out of the use of artificial networks to detect fraud in telecommunications and the financial industry as a result of the rise of attacks on long distance lines, ATMs, banks, and credit card systems in online and at data centers supporting physical points of sale.
Although basic RNN design has given way to the newer LSTM and GRU approaches and its variants and extensions, artificial networks are now one of the primary fraud detection technologies. The dominance of this fraud detection strategy extends to SPAM detection, with its close ties to fraudulence. The spammers present the appearance of a relationship with their recipients that does not exist.
The improvement on computing designs to recognize patterns in time series data and the application of those designs for fraud detection and countermeasures and the detection and routing or deletion of of unwanted incoming information will be a stable area of research and development for the foreseeable future.
Question 2. What is the oldest published paper in this field?
There is no oldest published paper. The first papers on RNN are given in this answer: Where can I find the original paper that introduced RNNs?, but the move from pattern based detection to artificial networks to stateful artificial networks was gradual. The earliest deployments of these networks in server side or client side solutions occurred before any papers were published on the specific topic of RNN use in spam detection.
Question 3. What are the pros and cons of using RNNs for email spam detection over other classification methods?
Spam also has a strong temporal element. What one considers undesirable spam in one year may be considered mission critical email a few years later, and vice versa. The performance in this space includes speed, accuracy, and reliability of classification, but also adaptation to changing user classification needs.
It is because of these four performance characteristics in tandem that stateful networks derived from RNNs are commonly used for spam detection. The need for gated learning and forgetting at the cell level to support the variable adaptivity makes the LSTM and GRU variants common choices.
Semantic document classification is riding on an emerging set of technologies, which are primarily artificial network designs that begin to broach the threshold of cognitive understanding of the text by storing linguistic structure in forms that allow analogy, comparison, and composition between them. Semantic algorithms that perform these operations on fuzzy associations in combination with recursive artificial networks may emerge as the dominant design as such designs are further developed.
References
Detecting Spam Blogs: A Machine Learning Approach, Pranam Kolari, 2006
Automated labeling of bugs and tickets using attention-based mechanisms in recurrent neural networks, Volodymyr Lyubinets et. al., 2018
Spam Filter Through Deep Learning and Information Retrieval, Weicheng Zhang, 2018
An Unsupervised Neural Network Approach to Profiling the Behavior of Mobile Phone Users for Use in Fraud Detection, Peter Burge, John Shawe-Taylor, Journal of Parallel and Distributed Computing, Volume 61 Issue 7, July 2001, pp 915-925
Intelligent junk mail detection using neural networks, Michael Vinther, June 2002
Mining for fraud, Margaret Weatherford, IEEE Intelligent Systems, 2002
Discovering golden nuggets: data mining in financial application, D Zhang, L Zhou, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 34, No. 4, November 2004
A Comprehensive Survey of Data Mining-based Fraud Detection Research, C Phua, V Lee, K Smith, R Gayler - Arxiv preprint arXiv, 2007