First of all, there are multiple factors on how well models will work. Amount of data, source of data, hyperparameters, model type, training time etc... All of these will affect the accuracy. However, no classifier will work best in general. It all depends on the different factors, and not one can satisfy all, at least for now.
For improving the accuracy, we need to first make those factors ideal so that the classification will have a higher accuracy.
First of all, how many data do you have? If you are using html webpage, you probably need at least 10000 data samples. If you have at least that amount of data you should be ok with overfitting. You also need to clean the data. One way to do it is to tokenize it. Tokenization of text data basically means to split the text into words and make a dictionary out of it. Then each word is encoded to a specific number where each same word have the same encoding. You are using the raw HTML as input, which have a lot of unnecessary information and tags and stuff, you can try removing those or completely remove all html tags if they are not required. The key to cleaning the data is to extract the pieces of information that is important and necessary for the model to work.
Then, you should explore the model. For a NLP (Natural Language Processing) task, your best bet is to choose a RNN (Recurrent Neural Network). This type of network have memory cells taht helps with text type data as text often have distant linkage in a paragraph, for example one sentence may use a "she" that refers to a person mentioned in two sentence before, and if you just feed every single encoding of words in a MLP, it would not have this memory for the network to learn long term connection between text. A RNN also is time dependent, meaning it processes each token one by one according to the direction of time. This makes the text more intuitive to the network as text is designed to be read forward, not all at once.
Your current method is to first vectorize the HTML code, then feed it into a random forest classifier. A random forest classifier works great, but it cannot scale when there is more data. The accuracy of a random forest classifier will stay mostly the same when data increase while in deep neural networks the accuracy will increase with the amount of data. However a deep neural network will require a large amount of data to start of with. If your amount of data is not too much (< 10000), this method should be your choice of method. However if you plan to add more data or if teh data is more, you should try a deep learning based method.
For deep learning based method, ULMFit is a great model to try. It uses a LSTM(Long Short Term Memory) network (which is a type of RNN) with a language model pretraining and many different method to increase the accuracy. You can try it with the fast.ai implementation. https://nlp.fast.ai/
If you wish to try a method that you can practically implement yourself, you could try to use a plain LSTM with one hot encoding as input. However, don't use word2vec to do preprocessing as your input data is html code. The word2vec model is for normal English text, not the html tags and stuff. Moreover a custom encodings will work better as in the training process you can train teh encoding as well.
Hope I can help you