0

I have a data set containing the number of security gaps and the level of that gap for a specific website.

Now suppose I have 2 features in this data set, the first feature is the number of a specific security gap and the second feature is the risk of this gap for a specific website.

How can I combine these two features into one?

What is the best way to apply feature engineering to these features?

Thanks

Kroshtan
  • 239
  • 1
  • 10
  • 1
    Can you elaborate on the purpose of the features, the task, and what you've already thought of or tried? Without knowing the goal, the features themselves don't say much. – Kroshtan Aug 08 '22 at 14:50
  • the features are the number of gaps and the risk level of each gap, the output is the security level of website – Issa Mansour Aug 12 '22 at 04:47

1 Answers1

0

So basically you have three values per security gap:

  1. The type of gap (i.e. its label)
  2. The risk of that gap for that specific website. As I understand it this value is different for each website, even if the type of the security gap is the same.
  3. The number of occurrences of the gap

One reasonable way to combine the features is to make a feature vector where the indices are the type of the gap, and the values are the risk of the gap multiplied with the occurences. However, this would mean information loss, due to a gap with risk 1 occuring 3 times being identical to a gap with risk 3 occuring once.

A different way to combine the features is just to make a feature vector that consists of the gap risk values and the gap occurences appended (for a feature vector with length of 2 * num(type_of_gaps). Assuming a model with a fully connected structure, the pattern between the connected values may be determined by the model itself during learning.

A third way would be to make a feature vector filled with tuples, which is not really feature engineering as much as it is restructuring the data into a manageable data type. However, you can combine this, when flattened, with an algorithm using a 1D convolution layer with stride kernel 2 and stride 2 in order to process the tuples together, instead of all the values separately.

Kroshtan
  • 239
  • 1
  • 10