Questions tagged [large-language-models]

Large Language Models (LLMs) is a collective term for large natural language models trained on large quantities of unlabelled text using self-supervised learning. Most notably, large language models include models such as BERT, GPT-(2, 3, 3.5, 4), LaMDA, Chinchilla, PaLM, and LLaMA. There is no formal definition for the term large language model.

71 questions
12
votes
4 answers

Why LLMs and RNNs learn so fast during inference but, ironically, are so slow during training?

Why LLMs learn so fast during inference, but, ironically, are so slow during training? That is, if you teach an AI a new concept in a prompt, it will learn and use the concept perfectly and flawless, through the whole prompt, after just one shot.…
10
votes
1 answer

How does the (decoder-only) transformer architecture work?

How does the (decoder-only) transformer architecture work which is used in impressive models such as GPT-4?
4
votes
1 answer

Who invented DAN?

DAN was a prompt that went through many, many iterations during the initial months of ChatGPT’s release to the public. DAN is an acronym which stood for “Do Anything Now”, and was a prompt specifically designed to circumvent the grid lines OpenAI…
hmltn
  • 103
  • 9
4
votes
0 answers

LLM-like architecture capable of dynamically learning from its own output

Language Learning Models (LLMs) have demonstrated remarkable capabilities in quick learning during inference. They can effectively grasp a concept from a single example and generate relevant outputs. However, a noticeable limitation of LLMs is their…
4
votes
2 answers

What makes reproducing a model like GPT3/GPT3.5/ChatGPT difficult?

Is it difficult for other companies to train a model similar to ChatGPT, and what makes it difficult? What is challenging about reproducing the results obtained by OpenAI with ChatGPT/GPT3.5? Would it be possible for a company like Meta or Google to…
Robin van Hoorn
  • 1,810
  • 7
  • 32
3
votes
3 answers

Why can't Lucene search be used to power LLM applications?

w.r.t. LLM applications using the RAG (retriever-augmented-generation) architecture, people have started taken it for granted that it will be powered by a vector database. e.g., see this: The most important piece of the preprocessing pipeline, from…
morpheus
  • 214
  • 5
3
votes
2 answers

How to Formulate a realiable ChatGPT Prompt for Sentiment Analysis of a Text, and show that it is reliable?

I have a dataset which consists of like.. 400000 sentences and I want give each sentence to ChatGPT so it classifies each sentence as positive or negative. My question is, where can I find a reliable / trusted prompt to do that? and provide evidence…
cnmesr
  • 131
  • 3
3
votes
1 answer

How could chatGPT avoid consuming what it produces

Considering the popularity of chatGPT, we can imagine that in the near future, many people will use it to produce lots of text contents on the internet, like blogs, forums. The productivity will be significantly improved. However, I have a worry…
zzzgoo
  • 141
  • 4
3
votes
0 answers

Why do LLMs like GPT-3 or Bloom use Vanilla Transformer instead of long sequence variants like Transformer-XL?

Is there any particular reason that the most recent and successful large language models like GPT-3 or Bloom utilize a vanilla Transformer architecture instead of an arguably superior long sequence architecture like, e.g. Transformer-XL, LongFormer,…
hokage555
  • 31
  • 2
2
votes
1 answer

Could hallucinations be the demise of the AI hype?

For quite some time now, I have been evaluating ChatGPT's capability to deliver accurate and helpful responses. While its performance is undeniably impressive, the issue of hallucinations poses a significant drawback to this otherwise capable…
machine_1
  • 133
  • 4
2
votes
0 answers

RAM Capacity of Mac Studio with M2 Ultra for inference of 65B LLM

How much RAM would be needed on Mac Studio M2 Ultra for inferring from a 65B LLM model. There are three options: 64GB, 128GB and 192GB. If using Apple M2 Ultra with 24‑core CPU, 76‑core GPU, 32‑core Neural Engine with 192GB Unified Memory, how would…
2
votes
1 answer

Is it possible to use LLMs for regression tasks?

I want to use LLMs to predict edge weights in a graph based on attributes between two nodes. Is this even possible? If not, what would you recommend? I tried to look up uses of LLM in regression tasks, but haven't had much luck finding anything…
2
votes
1 answer

OpenAI: What is the difference between model "gpt-3.5-turbo" and "gpt-3.5-turbo-0301"?

I have performed an API call to OpenAI's endpoint https://api.openai.com/v1/models . The endpoint lists the currently available engines, and provides basic information about each one such as the owner and availability. As a logged-in user, I get a…
knb
  • 143
  • 1
  • 6
2
votes
2 answers

How does a LLM (transformer) pick words from its vocabulary?

I have a very rough understanding of the "attention/self attention" mechanism of transformer models and how this can be used to process a set of word vectors provided as an input/prompt to the encoder of a network and how this will produce…
2
votes
2 answers

What temperature would you recommend for the chatgpt api?

I believe that it is recommended to have a tiny bit of temperature with GPT 3 even for noncreative tasks like 0.2 or something (I am not entirely sure why). Last I checked, and if I remember correctly, the examples from openai on their GitHub page…
1
2 3 4 5