Training Issue in Solving Multi-Dimensional Multiple Knapsack Problem with Transformer Model and PPO and SAC algorithm

Question

I'm reaching out to the brilliant minds of the AI community to seek help with a challenging issue in my project on solving the multi-dimensional multiple knapsack problem using a transformer model. As part of my master's thesis focusing on resource management in cloud computing, this research aims to contribute to the advancement of cloud computing resource management techniques.

In this GitHub repository, I have implemented two variants of Proximal Policy Optimization (PPO) and one variant of Discrete Soft Actor Critic (SAC) to train a transformer model. The goal is to effectively highlight the object-knapsack connections during various stages of the problem-solving process.

However, I'm encountering a significant obstacle in the training process that doesn't yield the desired performance. Its sims model doesn't train after more than 100000 times.

Further investigation has led us to suspect that the lack of a specific sequence in the decoder input has resulted in a significant amount of variation, thereby weakening the model's output. However, there may be other reasons that we have yet to discover. We are uncertain about how to overcome this challenge and would appreciate collaborating with others to find a solution.

I would greatly appreciate any assistance, ideas, or suggestions that could shed light on this matter. If you have experience or expertise in reinforcement learning algorithms, transformer models, or solving optimization problems, please take a look at my project and help me find a solution. Together, let's tackle this challenge and push the boundaries of cloud computing resource management.

Feel free to explore the GitHub repository to learn more about the project. https://github.com/ImMohammadHosseini/MKP-RL

Training Issue in Solving Multi-Dimensional Multiple Knapsack Problem with Transformer Model and PPO and SAC algorithm

0 Answers0