For questions related to the Python's package scikit-learn (or sklearn).
Questions tagged [scikit-learn]
26 questions
5
votes
2 answers
Why isn't my decision tree classifier able to solve the XOR problem properly?
I was trying to solve an XOR problem, and the dataset seems like the one in the image.
I plotted the tree and got this result:
As I understand, the tree should have depth 2 and four leaves. The first comparison is annoying, because it is close to…

Pedro Paiva
- 53
- 4
4
votes
0 answers
When computing the ROC-AUC score for multi-class classification problems, when should we use One-vs-Rest and One-vs-One?
The sklearn's documentation of the method roc_auc_score states that the parameter multi_class can take the value 'OvR' (which stands for One-vs-Rest) or 'OvO' (which stands for One-vs-One). These values are only applicable for multi-class…

Leockl
- 151
- 1
4
votes
2 answers
Can ML be used to curve fit data based on dataset of example fits?
Say I have x,y data connected by a function with some additional parameters (a,b,c):
$$ y = f(x ; a, b, c) $$
Now given a set of data points (x and y) I want to determine a,b,c. If I know the model for $f$, this is a simple curve fitting problem.…

argentum2f
- 151
- 1
- 7
2
votes
0 answers
How matrix factorization helps with recommendations when it converges to the initial user-items matrix?
We can say that matrix factorization of a matrix $R$, in general, is finding two matrices $P$ and $Q$ such that $R \approx P.Q^{T}$ with some constraints on $P$ and $Q$. Looking at some matrix factorization algorithms on the internet like…

KindNewbie
- 21
- 2
2
votes
0 answers
Suitable deep learning algorithms for spatial / geometric data
I have a task of classifying spatial data from a geographic information system. More precisely, I need a way to filter out unnecessary line segments from the CAD system before loading into the GIS (see the attached picture, colors for illustrative…

Oleg Bizin
- 121
- 2
2
votes
1 answer
Is it compulsary to normalize the dataset if doing so can negatively impact a Binary Logistic regression performance?
I am using raw data set with 4 feature variables (Total Cholesterol, Systolic Blood Pressure, Diastolic Blood Pressure, and Cigraeette count) to do a Binominal Classification (find stroke likelihood) using Logistic Regression Algorithm.
I made sure…

GYSHIDO
- 51
- 4
1
vote
0 answers
Using ML to uncover procedural logic
The game Elite Dangerous has a proceduraly generated galaxy of some 400 billion star systems.
Each star system in the game can be uniquely identified bu a 64bit number (id64) which is used as a seed for building the system but can also be decoded…

Paulo Rodrigues
- 11
- 1
1
vote
1 answer
Unexpected behaviour on using class weights in loss
I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss function. To tackle the problem of class imbalance, I use sklearn’s class_weight while…

helloworld
- 65
- 6
1
vote
1 answer
Why does sklearn perceptron converge for linearly inseparable data points?
I learned that the perceptron algorithm only converges if the dataset is linearly separable. I am implementing this algorithm using scikit learn.
The blue and orange points are from the training set, while red and green are from the test set.…

jacquesadit00
- 13
- 2
1
vote
1 answer
How can I interpret the value returned by score(X) method of sklearn.neighbors.KernelDensity?
For sklearn.neighbors.KernelDensity, its score(X) method according to the sklearn KDE documentation says:
Compute the log-likelihood of each sample under the model
For 'gaussian' kernel, I have implemented hyper-parameter tuning for the…

Arun
- 225
- 1
- 8
1
vote
1 answer
Interpretation of feature selection based on the model
The description of feature selection based on a random forest uses trees without pruning.
Do I need to use tree pruning?
The thing is, if I don't cut the trees, the forest will retrain.
Below in the picture is the importance of features based on 500…

user287629
- 45
- 4
1
vote
0 answers
How can I split the data into training and validation sets such that entries with a certain value are kept together?
I have the following kind of data frame. These are just example:
A 1 Normal
A 2 Normal
A 3 Stress
B 1 Normal
B 2 Stress
B 3 Stress
C 1 Normal
C 2 Normal
C 3 Normal
I want to do 5-fold cross-validation and splitting the data using
skf =…

user1631306
- 81
- 5
1
vote
0 answers
How can I use gradient boosting with multiple features?
I'm trying to use gradient boosting and I'm using sklearn's GradientBoostingClassifier class.
My problem is that I'm having a data frame with 5 columns and I want to use these columns as features. I want to use them continuously. I mean I want each…

Kamran Hosseini
- 111
- 3
0
votes
1 answer
cross_val_score of sklearn and LinearRegression scoring method
cross_val_score (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html) uses the estimator’s default scorer (if available) and LinearRgression (the estimator I use -…
0
votes
1 answer
Can I implement a sklearn model inside a Pytorch nn.Module?
I am making a custom Pytorch model that at some point, clusters a latent space that was created by another, previous routine of the model (Autoencoder).
In a bit more detail, my model is a regular Autoencoder, but in every training step, I want to…

puradrogasincortar
- 189
- 6