How to determine if an Amazon review is likely to be fake using text classification

Question

I'm currently in the research stage of building a web app in ASP.NET where the user can input a URL to an Amazon product, then the app would determine how likely its reviews are to be genuine. I need help figuring out what algorithm to use in determining if a certain review is likely to be deceptive. I want my app to behave similarly to Fakespot or ReviewMeta. I realise a tool like this won't be 100% accurate and that's fine.

So far I read parts of this book which seems to be recommended a lot to NLP newbies but haven't found anything that applies to such a specific problem. I also read this article but it's based on hotel, restaurant and doctor reviews. I'm trying to find a more general method that can be applied to any product. Any help would be greatly appreciated!

Aiden Grossman · Answer 1 · 2017-10-03T00:58:21.513

This will not be that hard of a problem once you have a lot of training data. But, before you have a lot of training data, you will need to get some training data one way or another. You will need a lot of training data for quite a few of the models that will give you a high accuracy. Then, you will probably want to use a Long short term memory recurrent neural network along with a word2vec model, or maybe even a sentence/ paraphraph2vec model. This will be able to give you some fairly good results with a little bit of tweaking, but if you want really accurate results you will want to try an ensemble. An ensemble is when you use multiple neural networks that provide an output that gets fed through another classifier(usually XGboost) in order to achieve better classification results. You might also want to try a little bit of feature engineering. Finally, in order to implement this model, you can take one of several approaches. First off, you can use tensorflow serving. Tensorflow was created by google and used by much of the machine learning community. However, tensorflow was written in python, so you will need to use tensorflow serving in order to use it within your Asp.net app. Or, you could use microsofts CNTK. Or finally, you could use a custom built implementation, but this would take a lot of time and not be worthwhile unless you are doing research. Also, if you really do not want to use deep learning for one reason or another, you can probably use a simpler model, but you will need to understand your data. You will need also need to understand your data if you are doing deep learning but maybe not as much.

score 2 · Answer 2 · answered Nov 13 '17 at 02:54

This is a common question, but before hand, you need a lot of label data, the more, the better.

Since all the reviews from amazon are not label as deceptive or not, you may manually label them.

And then, you can use NLP tricks as what describe by Aiden Grossman. A simple way is let every word as one of vector and the output is one vector(deceptive or not). A better way is use word2v as input, and lstm as hidden layer.

Beware this anomaly detection problem, that is the deceptive review may extreme seldom happen. Please carefully handle precision and recall.

Asp.net is OK. You can use your inference model as backend service, and asp.net can call this service with any js framework.

How to determine if an Amazon review is likely to be fake using text classification

2 Answers2