I have the following situation:
Stock | Time_Stamps | Feature_1 | Feature_2 | Feature_n | Price |
---|---|---|---|---|---|
Stock_1 | 2019 | 0.5 | 1.0 | 1.0 | 100 |
Stock_1 | 2020 | 0.7 | 1.3 | 0.9 | 90 |
Stock_2 | 2019 | 0.3 | 0.9 | 1.1 | 110 |
Stock_2 | 2020 | 0.2 | 0.8 | 1.1 | 120 |
Stock_n | year_n | value_n | value_n | value_n | price_n |
So this is how my data table is structured. My original df has 100+ features and 70000k observations resp. 2000+ stocks - so this is only a simplification.
I want to train a LSTM on this data table and look for features correlation with the price. Common idea, nothing new, so pls save your time giving me "this will not work" bla bla.
I am generally interested in how you would approach this problem. We have multiple inputs (features) for our time series forecast with 8 time stamps (8 years) per stock. However, in my understanding, I'd have to train my model for every stock seperately which is inconvenient.
How would you pre-process my data, so that I can train a decent model?