2

I am faced with a problem which I bet was already solved before, but that I had never seen. Perhaps by discussing it abstractly, someone can point me to relevant literature.

It goes like this: I have a dataset of images $I_j$ and numerical features $f_{1,j}, f_{2,j}, ..., f_{k, j}$. In production, I don't have access to the features $f_{i,j}$, which I know to be more relevant to my learning problem than the images.

However, theses features $f_{i,j}$ are also very hard to obtain, and there is a known correlation (informal) between the images and the numerical features $f_{i,j}$. So the whole point of using the images is to avoid using the $f's$.

Know comes the question: is there any useful way I could learn a representation of the images that "exposes" the maximum information about the features $f_{i,j}$? Perhaps some form of autoencoder with a regularization term that penalizes uncorrelated codings?

Any ideas are welcome. Thanks!

Edit:

In the problem that I have, the images $I_j$ are $(64,64)$ images of tumor nuclei, and the $f_{i,j}$ are genetical information about each cell, such as the expressivity of certain biomarks (HER2, ER, etc.).

enter image description here

  • 1
    can you please provide a practical example of what $f_{i,j}$ is? – Alberto Aug 16 '23 at 12:12
  • @AlbertoSinigaglia Done. – Alek Fröhlich Aug 16 '23 at 12:39
  • 1
    It looks like the features you are wanting to learn are explanatory for some other classification that you don't say? And that you want to train a model that predicts the classification from an image as the primary task, whilst also having these features expressed within the model? Could you please clarify – Neil Slater Aug 16 '23 at 13:19
  • @NeilSlater A first example could be to classify a cell as HER2+ or HER2- based only on the image data. (During training the autoencoder would have access to reliable HER2 score for each nuclei) – Alek Fröhlich Aug 16 '23 at 13:53
  • 1
    I'm not really an expert on this field so i still don't get what are the $f$, but I think that you can formulize this as a contrastive problem, using biomarks to attract of pull away latent vectors – Alberto Aug 16 '23 at 23:09

0 Answers0