Best feature engineering approach for interest-based age classification

Asked Mar 11 '23 at 17:44

Active Mar 11 '23 at 18:08

Viewed 25 times

I have a dataset which has users (rows) with the list of their interests (IABs), which looks like this

user_id | gender | list of interests
--------+--------+--------------------------------
user 1  | male   | games, productivity
user 2  | female | games, lifestyle, design
user 3  | male   | travel, games, messaging
user 4  | male   | messaging, blogging, lifestyle
...

Since the number of unique interests are few (~500) and the number of rows are high (~67M), what are the feature engineering practices that I should follow to get an ML model score a better accuracy?

P.S.: Simple model with one hot/count hot vectorization yields an accuracy of ~52%

edited Mar 11 '23 at 18:08

asked Mar 11 '23 at 17:44

theodre7

Your title suggests a question about metric but your question is asking about feature engineering for fixed metric (accuracy). Please update either the title or question – SajanGohil Mar 11 '23 at 18:03
@SajanGohil updated the title – theodre7 Mar 11 '23 at 18:09
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Mar 13 '23 at 15:50

Best feature engineering approach for interest-based age classification

0 Answers0