1

I'm building a web application that collects schema.org data from different webshops as Amazon, Shopify, etc. It collects data every 6h and shows the current and lowest price. It is used for monitoring products and buying at the lowest price.

My goal is to recognize products from different shops as the same product. Every shop has its own title for the same product.

Example:

Google Pixel 2 64GB Clearly White (Unlocked) Smartphone 
Google Pixel 2 GSM/CDMA Google Unlocked (Clearly White, 64GB, US warranty) 

Problems:

  1. don't have a lot of data (only products chosen by the user)
  2. needs to support every new product that app doesn't have data history
malioboro
  • 2,729
  • 3
  • 20
  • 46
Mr.Code
  • 111
  • 2
  • "needs to support every new product that app doesn't have data history", to obtain this, you will have to build a model that is able to _generalise_. To do this, you might need to use a _regularisation technique_. Furthermore, if you don't have a lot of data, then you likely don't want to use a very complex model. Is your data labelled? – nbro May 12 '19 at 21:36
  • yes and yes. I have data only of the product/site that user add to the app. I have labeled data as on shema.org: title, rating, price, etc. Any ideas? – Mr.Code May 12 '19 at 22:00
  • You could try to use an SVM. You train it to predict the label of each product in the training set and validate the results on validation dataset. Given you have only a small amount of data, I would use cross-validation. Have a look at it. If SVM doesn't produce decent results on the validation dataset, then you likely will need more data or you could try to use another model. – nbro May 12 '19 at 22:03

0 Answers0