A company entrusts a Data Scientist with the mission of processing and valuing data for the research or treatment of events related to traces of computer attacks. I was wondering how would he get the train data.
I guess he would need to exploit the logs of the different devices of the clients and use statistical, Machine Learning and visualization techniques in order to bring a better understanding of the attacks in progress and to identify the weak signals of attacks... But how would he get labelled data?
He might get the logs of attacks received before, but that might not have the same signature with the attacks that are going to come later? So it might be difficult to create a reliable product?