Questions tagged [data-science]

Use for questions related to Data Science aspects of AI. Generally speaking, only basic Data Science questions should be asked on this Stack, ideally involving the fundamental concepts. For more advanced questions related to Data Science, please use the Data Science stack: https://datascience.stackexchange.com/

Data Science Stack: https://datascience.stackexchange.com/

Data science is an interdisciplinary field of scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.

Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.

When Harvard Business Review called it "The Sexiest Job of the 21st Century" the term became a buzzword, and is now often applied to business analytics, or even arbitrary use of data, or used as a sexed-up term for statistics. While many university programs now offer a data science degree, there exists no consensus on a definition or curriculum contents. Because of the current popularity of this term, there are many "advocacy efforts" surrounding it.

https://en.wikipedia.org/wiki/Data_science

54 questions
7
votes
1 answer

How to implement exploration function and learning rate in Q Learning

I'm trying to implement Q-learning (state-based representation and no neural / deep stuff) but I'm having a hard time getting it to learn anything. I believe my issue is with the exploration function and/or learning rate. Thing is, I see different…
5
votes
3 answers

What should we do when we have equal observations with different labels?

Suppose we have a labeled data set with columns $A$, $B$, and $C$ and a binary outcome variable $X$. Suppose we have rows as follows: col A B C X 1 1 2 3 1 2 4 2 3 0 3 6 5 1 1 4 1 2 3 0 Should we throw away either row 1 or row 4…
3
votes
0 answers

How do to mitigate or design out hidden feedback loops when designing ML systems?

Two months ago, I've found myself working on a churn detection problem which can be briefly described as follows: Assume the current date is N Use customer behavior for N-1,..N-x dates to develop training dataset Train model and make prediction at…
3
votes
0 answers

Speaker Identification / Recognition for less size audio files

I am working on speaker identification problem using GMM (Gaussian Mixture Model). I have to just identify one user present in the given audio, so for second class noise or silent audio may use or not just like in image classification for an object…
3
votes
1 answer

Can alpha-beta pruning be used for applications apart from games?

Can alpha-beta pruning/ minimax be used for systems apart from games? Like for selecting the right customer for a product, etc. (the typical data science problems)? I have seen people do it, but can't understand how. Can someone help me understand…
3
votes
1 answer

How to make a distinction between item feature and environment feature?

My data is stock data with features such as stocks' closing prices.I am curious to know if I can put the economy feature such as 'national interest rate' or 'unemployment rate' besides each stocks' features. Data: Date Ticker Open High Low …
Eiffelbear
  • 131
  • 3
2
votes
1 answer

What is the impact of scaling the features on the performance of the model?

I am trying to generate a model that uses several physicochemical properties of a molecule (including number of atoms, number of rings, volume, etc.) to predict a numeric value $Y$. I would like to use PLS Regression, and I understand that…
2
votes
2 answers

Ensemble models - XGboost

I am building 2 models using XGboost, one with x number of parameters and the other with y number of parameters of the data set. It is a classification problem. A yes-yes, no-no case is easy, but what should I do when one model predicts a yes and…
2
votes
0 answers

Deep NN architecture for predicting a matrix from two matrices

Recently my friend asked me a question: having two input matrices X and Y (each size NxD) where D >> N, and ground truth matrix Z of size DxD, what deep architecture shall I use to learn a deep model of this representation? N ~ is in the order of…
2
votes
0 answers

How does one deal with images that are too large to fit in the GPU memory for doing ML image analysis?

How does one deal with images that are too large to fit in the GPU memory for doing ML image analysis? I am interested in detecting small structures on images which are themselves many GB in size. Beyond simple downsampling and maybe doing…
2
votes
0 answers

How to decide which column has more weightage to output

As per Image we can see Column_A value is directly proportional to output, While Change in value of Column_B has no effects in output. So basically I want to know is there any algorithm where I can get weightage of columns which is affecting more…
2
votes
1 answer

What are the possible ways to handle imbalance in multi-class image datasets?

Image imbalance is one of the major factor in the performance of DL model. Some of the methods that I found to tackle this are oversampling, under-sampling, SMOTE. Over-sampling has cons as it makes model to be overfit.undersampling results in loss…
2
votes
1 answer

How to tackle the human error made in labeling datasets for classification tasks like facial expression recognition?

I am working on the Facial Expression Recognition Task. One of the most challenging tasks that I faced was human error in labeling the datasets (ex: let's say FER2013). Are there anyways to Handle incorrect labeling of datasets in the classification…
2
votes
0 answers

Taking a machine learning model to production\deployment

I've designed a machine learning model for the predictive maintenance of machines. The data used for training and testing the ML model is the data from various sensors connected to various parts of the machines. Now, I'm searching for a good…
santobedi
  • 71
  • 2
1
2 3 4