Questions tagged [cosine-similarity]
9 questions
1
vote
1 answer
Cheap differentiable similarity metrics of vectors
I am looking to compute the similarity between a large set of vectors during neural network training - a process that is considerably expensive when choosing the wrong metric. So far, I am making use of cosine similarity, but I found that the…

postnubilaphoebus
- 345
- 1
- 11
1
vote
2 answers
Given embedding vector A and vector B, how to find top k embedding vectors such that they are similar to vector A and dissimilar to vector B
Which would be better approach for getting top k embedding vectors such that they are similar to embedding vector A and dissimilar to vector B.
Approach 1:
calculate f(V) = cosine_similarity(A,V) - cosine_similarity(B,V) for each vector V
sort…

Shubham
- 11
- 3
0
votes
2 answers
How do I choose a good treshold for classification (using cosine similarity scores)?
I am using openai's text-embedding-ada-002 embeddings model to do a semantic search on a database of articles to find articles that are most related to a given input text. I am looking for a way to define a minimum similarity score to prevent…

Stefan
- 1
0
votes
0 answers
Word Embeddings but for Logical reasoning in custom knowledge GPT-3.5 bot
So I have created a chatbot using GPT-3.5 turbo. I have a vector database that holds vector embeddings of brands, ratings, commission percentages, outlets, tags, etc. Here's how the system is designed.
User Asks a question.
The question is…
0
votes
0 answers
Given a document and a set of keywords. I want to calculate how well each keyword describes the document
The question is in the title, but here is an example:
Context: "I often go swimming in the ocean"
Keywords: "water", "bird", ...
The keyword water + context should output a higher value, while bird + context a lowr value than water.
I tried using a…
0
votes
0 answers
How can you add data to BERT? Will 10-20 books added affect the word embeddings?
I will be using BERT to get word embeddings before performing cosine similarity analysis on my data. According to this paper the accuracy of word embeddings can be improved by updating the model with domain specific textbooks. They do not provide…

learner
- 13
- 3
0
votes
0 answers
model to generate suggestions for improving the cosine similarity of two documents?
I am working on a system that compares a source document to a target document and then generate alternative variations of the source document. The goal is to reach a higher cosine similarity between the two documents, say +80%. There's already…

Samewise
- 1
- 1
0
votes
2 answers
How to reduce the number of clusters produced by the Markov Clustering Algorithm?
I have used the Markov Clustering Algorithm (MCL) to cluster tweets, based on their similarity. However, I got a too high number of clusters, and most of the clusters have only one tweet. Any suggestions to reduce the number of clusters?

Adnan Hussein
- 23
- 3
0
votes
0 answers
How to calculate cosine similarity for classification when you have say 10000 samples belonging to two classes have a bunch of samples
Does anyone have experience with using Cosine Similarity for text classification? I see a number of articles on how to find cosine similarity between documents using Doc2Vec, Gensim, etc.
I have a classification problem (binary) where I want to try…

Sanny28
- 1
- 1