Does it make sense to compare images (samples) with words (features)?

Question

Consider the following paragraphs from the introduction of the chapter named Recurrent Neural Networks from the textbook titled Dive into Deep Learning

So far we encountered two types of data: tabular data and image data. For the latter we designed specialized layers to take advantage of the regularity in them. In other words, if we were to permute the pixels in an image, it would be much more difficult to reason about its content of something that would look much like the background of a test pattern in the times of analog TV.

Most importantly, so far we tacitly assumed that our data are all drawn from some distribution, and all the examples are independently and identically distributed (i.i.d.). Unfortunately, this is not true for most data. For instance, the words in this paragraph are written in sequence, and it would be quite difficult to decipher its meaning if they were permuted randomly. Likewise, image frames in a video, the audio signal in a conversation, and the browsing behavior on a website, all follow sequential order. It is thus reasonable to assume that specialized models for such data will do better at describing them.

In neural networks, we generally use words: instances, examples, data points to refer to a particular row of a dataset. In general, in the case of CNN, an instance will be an image, and in the case of RNN, an instance will be a sequence of text (maybe a sentence, paragraph, or text).

Every instance contains features: in image data, pixels are generally treated as features and in the case of text data, either characters or words are generally treated as features.

With this context, let me explain my doubt

I have an issue understanding the paragraphs. The issue is the comparison of examples/instances of image data with features of text data. Pixels can be compared to words/characters and images can be compared to sentences or words or paragraphs or text documents based on the context.

In the second paragraph, it is said that the examples (images) are i.i.d. But then the images are compared with words in a paragraph, but words are features, not examples. As the paragraph is not saying that pixels are i.i.d., how can the words in paragraphs be used for comparison?

I am not sure I understand your question. I understand it has to do with images, words, sentences, being iid or not, and to compare words with images, but I don't understand what the specific question is. I don't understand this "As the paragraph is not saying that pixels are i.i.d. how can the words in paragraphs be used for comparison?". I mean, why do you need both the pixels and the words to be iid to compare them? If pixels and words are features, then you can compare them, in the sense that you can say that both are features. What do you mean by "compare"? — nbro, Mar 09 '22 at 09:48
@nbro in the second paragraph, they told that examples are iid (w.r.t image data). But they are talking about the words in paragraphs. How to reconcile? — hanugm, Mar 09 '22 at 11:52
@nbro If they say pixels are iid then they can compare with words. But they are saying images are iid then how can they compare with words instead of paragraphs? — hanugm, Mar 09 '22 at 11:55
It seems to me you think that words are "features" and that we can't compare words with images because images are "samples" and not features. Is this your question? I would recommend that you edit your post to remove unnecessary details and just leave specific question that highlights your doubt/problem. — nbro, Mar 10 '22 at 09:15
Thanks. I've rewritten the title to make it clearer what the problem is. Make sure that's your question. Feel free to edit again your problem to improve it ;) — nbro, Mar 10 '22 at 09:47
Thanks @nbro After you decreased activity, I also decreased involuntarily :( — hanugm, Mar 10 '22 at 09:56
It's fine. We're all volunteers here, so we're not forced to be here. Nobody will pay you for being here. We're here because we believe in or like this community. I needed this break from AISE and to lower my activity here for a while. When you're a mod, people will attack you only because you're a mod and because you take a decision that is not consistent with their expectations. I don't want to deal with toxic users. That made my life worse and more stressful. I can avoid this stress! — nbro, Mar 10 '22 at 12:18
_For now_, I prefer to be a regular user and help whenever I can (especially, edits, clarifications and answer questions that can be answered quickly), without having to deal with problematic users. I think that every small contribution can be helpful. Voting, edits, commenting, asking and answering questions or reviewing the queues. You don't have to do everything. If we're about 10 active users every day (and they don't always have to be the same), and everyone does a little, the community should be able to self-maintain, assuming that our general activity doesn't increase much. — nbro, Mar 10 '22 at 12:18

Does it make sense to compare images (samples) with words (features)?

0 Answers0