For questions related to the architecture of AI models, e.g. the architecture of neural networks.
Questions tagged [architecture]
83 questions
19
votes
2 answers
Are Modular Neural Networks more effective than large, monolithic networks at any tasks?
Modular/Multiple Neural networks (MNNs) revolve around training smaller, independent networks that can feed into each other or another higher network.
In principle, the hierarchical organization could allow us to make sense of more complex problem…

Harsh Sikka
- 191
- 2
16
votes
2 answers
How can I automate the choice of the architecture of a neural network for an arbitrary problem?
Assume that I want to solve an issue with a neural network that either I can't fit to existing architectures (perceptron, Konohen, etc) or I'm simply not aware of the existence of those or I'm unable to understand their mechanics and I rely on my…

Zoltán Schmidt
- 623
- 7
- 14
11
votes
1 answer
Why is the merged neural network of AlphaGo Zero more efficient than two separate neural networks?
AlphaGo Zero contains several improvements compared to its predecessors. Architectural details of Alpha Go Zero can be seen in this cheat sheet.
One of those improvements is using a single neural network that calculates move probabilities and the…

Demento
- 1,684
- 1
- 7
- 26
9
votes
4 answers
Should neural nets be deeper the more complex the learning problem is?
I know it's not an exact science. But would you say that generally for more complicated tasks, deeper nets are required?

Gilad Deutsch
- 629
- 5
- 12
7
votes
2 answers
Why do very deep non resnet architectures perform worse compared to shallower ones for the same iteration? Shouldn't they just train slower?
My understanding of the vanishing gradient problem in deep networks is that as backprop progresses through the layers the gradients become small, and thus training progresses slower. I'm having a hard time reconciling this understanding with images…

Intent Filters
- 71
- 1
7
votes
1 answer
How do neural network topologies affect GPU/TPU acceleration?
I was thinking about different neural network topologies for some applications. However, I am not sure how this would affect the efficiency of hardware acceleration using GPU/TPU/some other chip.
If, instead of layers that would be fully connected,…

user2316602
- 173
- 4
6
votes
1 answer
Are there well-established ways of mixing different inputs (e.g. image and numbers)?
I am interested in the possibility of having extra input along with the main data. For instance, a medical application that would rely mostly on an image: how could one also account for sex, age, etc.?
It is certainly possible to put the output of…

Mathieu Bouville
- 241
- 1
- 7
5
votes
2 answers
Why do Transformers have a sequence limit at inference time?
As far as I understand, Transformer's time complexity increases quadratically with respect to the sequence length. As a result, during training to make training feasible, a maximum sequence limit is set, and to allow batching, all sequences smaller…

chessprogrammer
- 2,215
- 2
- 12
- 23
5
votes
2 answers
What's the difference between architectures and backbones?
In the paper "ForestNet: Classifying Drivers of Deforestation in Indonesia using Deep Learning on Satellite Imagery", the authors talk about using:
Feature Pyramid Networks (as the architecture)
EfficientNet-B2 (as the backbone)
Performance…

codinggirl123
- 51
- 1
5
votes
1 answer
How to create an AI to solve a word search?
This at first sounds ridiculous. Of course there is an easy way to write a program to solve a wordsearch.
But what I would like to do is write a program that solves a word-search like a human.
That is, use or invent different strategies. e.g.…

zooby
- 2,196
- 1
- 11
- 21
4
votes
2 answers
Are Neural Net architectures accidental discoveries?
Recently, I have been learning about new neural networks, which are used for specialized purposes, like speech recognition, image recognition, etc. The more I discover the more I get amazed by the cleverness behind models such as RNN's and CNN's.…
user9947
4
votes
2 answers
Which neural network can I use to solve this constrained optimisation problem?
Let $\mathcal{S}$ be the training data set, where each input $u^i \in \mathcal{S}$ has $d$ features.
I want to design an ANN so that the cost function below is minimized (the sum of the square of pairwise differences between model outputs) and the…

user3489173
- 179
- 6
4
votes
1 answer
What is a unified neural network model?
In many articles (for example, in the YOLO paper, this paper or this one), I see the term "unified" being used. I was wondering what the meaning of "unified" in this case is.

Reactionic
- 63
- 3
4
votes
2 answers
Is a basic neural network architecture better with small datasets?
I'm currently trying to predict 1 output value with 52 input values. The problem is that I only have around 100 rows of data that I can use.
Will I get more accurate results when I use a small architecture than when I use multiple layers with a…

Yari Nowicki
- 73
- 3
4
votes
1 answer
Get the position of an object, out of an image
I have some images with a fixed background and a single object on them which is placed, in each image, at a different position on that background. I want to find a way to extract, in an unsupervised way, the positions of that object. For example,…