I want to understand automatic Neural Architecture Search (NAS). I read already multiple papers, but I cannot figure out what the actual search space of NAS is / how are classical hyper-parameters considered in NAS?
My understanding:
NAS aims to find a good performing model in the search space of all possible model architectures using a certain search- and performance estimation strategy. There are architecture-specific hyper-parameters (in the most simple feed-forward network case) like the number of hidden layers, the number of hidden neurons per layer as well as the type of activation function per neuron There are classical hyper-parameters like learning rate, dropout rate, etc. What I don't understand is:
What exactly is part of the model architecture as defined above? Is it only the architecture-specific hyper-parameters or also the classical hyper-parameters? In other words, what is spanning the search space in NAS: Only the architecture-specific hyper-parameters or also the classical hyper-parameters?
In case only the architecture-specific hyper-parameters are part of the NAS search space, what about the classical hyper-parameters? A certain architecture (with a fixed configuration of the architecture-specific hyper-parameters) might perform better or worse depending on the classical hyper-parameters - so not taking into account the classical hyper-parameters in the NAS search space might result in a non-optimal ultimate model architecture, or not?