Short answer
To select the proper dataset to construct, you should first figure out a metric to use to compare, and then select the dataset construction that gives the better metric. There is no single best metric, it depends on the task and your interpretation on what type of error is more important.
If you believe it is important that errors should not be normalized across class, then use the overall accuracy, and keep your dataset distribution same as the natural distribution (so 1-2% positive cases).
If you believe it is important that errors should be normalized across class, then use PR-AUC or ROC-AUC, and re-balance your dataset so that the samples a little more closer to 1:1. The exact ratio will only be determined after testing and comparing the PR-AUC or ROC-AUC metrics.
How to select the best metric?
Two popular metrics are ROC-AUC and PR-AUC. ROC curves (Receiver Operating Characteristics) plot the true positive rate vs false positive rate, while PR curves (Precision and Recall) plot the precision vs recall. AUC stands for "area under curve", because you can achieve any single point in the curve by specifying the classifier threshold, so the sum of all points (i.e. the entire area under the curve) is the most general way of comparing if one model is doing better than another.
Although both ROC curves and PR curves equalize class imbalance at some capacity, PR curves are more sensitive to class imbalance. The paper The Relationship Between Precision-Recall and ROC Curves concludes that if the PR-AUC is good then the ROC-AUC will always also be good, but not the other way around. The difference is due to the fact that if the dataset has huge class imbalance, a false positive hurts PR curves significantly more than ROC curves.
On the other hand, total accuracy does not normalize class imbalance at all, so therefore favors the majority class.
As a result:
- if you do not care about normalizing the measures of class imbalance, choose total accuracy, which will optimize for the most # of correct cases (regardless of class)
- if you want to normalize your metric across class imbalance, and normalizing false positive errors across classes is at all important to you, choose PR-AUC
- if you want to normalize your metric across class imbalance, and don't care about normalizing false positive errors, PR-AUC or ROC-AUC may both be good for you
If it helps, for most imbalance problems, people usually go for PR curves.
By the way, (this paper) studies class imbalance in neural networks by optimizing the ROC curves, and show that you should definitely have equal numbers of positive and negative examples. So if you want the best performance in terms of ROC-AUC, you should do the 50:50 split. I haven't read any similar study that optimizes for PR-AUC, but my intuition tells me that it will have the same conclusion (you should do 50:50 split to optimize for PR-AUC as well).