So basically you have three values per security gap:
- The type of gap (i.e. its label)
- The risk of that gap for that specific website. As I understand it this value is different for each website, even if the type of the security gap is the same.
- The number of occurrences of the gap
One reasonable way to combine the features is to make a feature vector where the indices are the type of the gap, and the values are the risk of the gap multiplied with the occurences. However, this would mean information loss, due to a gap with risk 1 occuring 3 times being identical to a gap with risk 3 occuring once.
A different way to combine the features is just to make a feature vector that consists of the gap risk values and the gap occurences appended (for a feature vector with length of 2 * num(type_of_gaps)
. Assuming a model with a fully connected structure, the pattern between the connected values may be determined by the model itself during learning.
A third way would be to make a feature vector filled with tuples, which is not really feature engineering as much as it is restructuring the data into a manageable data type. However, you can combine this, when flattened, with an algorithm using a 1D convolution layer with stride kernel 2 and stride 2 in order to process the tuples together, instead of all the values separately.