Questions tagged [data-preprocessing]

For questions related to the concept of data pre-processing, which includes, for example, cleaning, instance selection, normalization, transformation, feature extraction or selection.

For more info, see e.g. https://en.wikipedia.org/wiki/Data_pre-processing.

165 questions
10
votes
2 answers

How can I encode angle data to train neural networks?

I am training a neural network where the target data is a vector of angles in radians (between $0$ and $2\pi$). I am looking for study material on how to encode this data. Can you supply me with a book or research paper that covers this topic…
7
votes
1 answer

How to solve the problem of too big activations when using genetic algorithms to train neural networks?

I am trying to create a fixed-topology MLP from scratch (with C#), which can solve some simple problems, such as the XOR problem and MNIST classification. The network will be trained purely with genetic algorithms instead of back-propagation. Here…
7
votes
2 answers

Does data skew matter in classification problem?

I'm working on an image classification problem using a neural network. In the training data set, 90% of the samples fall into 10% of all categories, while 10% of the sample fall into the other 90% categories. So an example is not evenly distributed…
6
votes
1 answer

How should I deal with variable-length inputs for neural networks?

I am a very beginner in the field of AI. I am basically a Pharma Professional without much coding experience. I use GUI-based tools for the neural network. I am trying to develop an ANN that receives as input a protein sequence and produces as…
6
votes
1 answer

How to deal with images of different sizes, which need to be passed to a model of fixed input size, without losing details and spatial information?

I have the following problem while using convolutional neural networks to detect forgeries: Resizing the image to fit the required input size may not be a good way because the forgery detection largely relies on the details of images, for example,…
5
votes
1 answer

Does the term "data augmentation" imply increasing the training dataset?

I have a manuscript that has been reviewed and one of the reviewers commented on my use of the term " data augmentation", saying that it might not be the appropriate term in my case (explained below). I collected a large dataset of short audio files…
5
votes
1 answer

What is "conditioning" on a feature?

On page 98 of Jet Substructure at the Large Hadron Collider: A Review of Recent Advances in Theory and Machine Learning the author writes; Redacted phase space: Studying the distribution of inputs and the network performance after conditioning on…
4
votes
3 answers

Would this relatively small dataset be enough to train a CNN?

Scenario: I am trying to create a dataset with images of choice for different animal classes. I am going to train those images for classification using CNN. Problem: Let's assume I somehow don't have the privilege to collect too many images and was…
4
votes
1 answer

How to fill missing values in a dataset where some properties can be inputs and outputs?

I have a dataset with missing values, I would like to use machine learning methods to fill. In more detail, there are $n$ individuals, for which up to 10 properties are provided, all numerical. The fact is, there are no individuals for which all…
4
votes
1 answer

How should I deal with variable input sizes for a neural network classifier?

I am currently working on a project, where I have a sensor in a shoe that records the $X, Y, Z$ axes, from an acceleration and gyroscope sensor. Every millisecond, I get 6 data points. Now, the goal is, if I do an action, such a jumping or kicking,…
3
votes
1 answer

Why is the short-time Fourier transform used for preprocessing audio samples?

I've been told this is how I should be preprocessing audio samples, but what information does this method actually give me? What are the alternatives, and why shouldn't I use them?
3
votes
1 answer

How are sentences numerically encoded before passing them to neural networks?

I'm trying to understand NLP, how sentences can be used as input output in neural network architecture. As we know ANN is only compatible with number data. That's mean the sentences must be convert to number, right? Suppose I have this…
3
votes
1 answer

Process 2TB worth of conversational data hoarded over 40 years. How can I pass this into GPT to ask questions about it?

I'm still very new to this stuff. I have close to 2TB worth of data hoarded from IRC chats to everyday chats with friends and family. But is there a way to pass in this much data into GPT to ask questions about it? Or would I require something…
3
votes
0 answers

How to deal with a variable number of channels of the inputs?

I have a problem in which my input data may have a varying number of channels. Let me explain with an example. Imagine we have a classification problem in which we wish to identify if certain species are present in wildlife photographs. This can…
3
votes
2 answers

Is pre-processing used in deep learning?

I'm new to deep learning. I wanted to know: do we use pre-processing in deep learning? Or it is only used in machine learning. I searched for it and its methods on the internet, but I didn't find a suitable answer.
Pablo
  • 273
  • 1
  • 5
1
2 3
10 11