Data pre-processing and feature extraction are by far the most important part of any machine learning algorithm. It's even more important that the model you choose to do the classification.
Unfortunately, pre-processing and feature extraction are completely different for each type of data. You need to play around with the data yourself to find out what works best with the nature of your data. With experience you start noticing some patterns with different data types. For example, as you are doing, building a word vector is an effective means of feature extraction with text based data.
"I could just add tweet location at the end of the vector I suppose, but that would give it a very small weighting."
This is entirely untrue for any machine learning algorithm I would choose. Your model should not associate weights to the inputs based on their array location. They should be associated based on the explained variance (information gain) it provides.
After you do your pre-processing and feature extraction you can then further refine your feature set with some common methods which can be found in libraries such as: principle component analysis (PCA) and linear discriminant analysis (LDA).