Highest Voted 'large-language-models' Questions - Artificial Intelligence Stack Exchange

12

votes

4 answers

Why LLMs and RNNs learn so fast during inference but, ironically, are so slow during training?

Why LLMs learn so fast during inference, but, ironically, are so slow during training? That is, if you teach an AI a new concept in a prompt, it will learn and use the concept perfectly and flawless, through the whole prompt, after just one shot.…

asked Mar 31 '23 at 12:19

MaiaVictor

355
1
9

10

votes

1 answer

How does the (decoder-only) transformer architecture work?

How does the (decoder-only) transformer architecture work which is used in impressive models such as GPT-4?

deep-learning transformer attention gpt large-language-models

asked Apr 23 '23 at 19:28

Robin van Hoorn

1,810
7
32

4

votes

1 answer

Who invented DAN?

DAN was a prompt that went through many, many iterations during the initial months of ChatGPT’s release to the public. DAN is an acronym which stood for “Do Anything Now”, and was a prompt specifically designed to circumvent the grid lines OpenAI…

large-language-models prompt prompt-design

asked Jul 08 '23 at 11:24

hmltn

103
9

4

votes

0 answers

LLM-like architecture capable of dynamically learning from its own output

Language Learning Models (LLMs) have demonstrated remarkable capabilities in quick learning during inference. They can effectively grasp a concept from a single example and generate relevant outputs. However, a noticeable limitation of LLMs is their…

training recurrent-neural-networks large-language-models

asked Mar 30 '23 at 02:28

MaiaVictor

355
1
9

4

votes

2 answers

What makes reproducing a model like GPT3/GPT3.5/ChatGPT difficult?

Is it difficult for other companies to train a model similar to ChatGPT, and what makes it difficult? What is challenging about reproducing the results obtained by OpenAI with ChatGPT/GPT3.5? Would it be possible for a company like Meta or Google to…

training language-model large-language-models

asked Jan 25 '23 at 07:33

Robin van Hoorn

1,810
7
32

3

votes

3 answers

Why can't Lucene search be used to power LLM applications?

w.r.t. LLM applications using the RAG (retriever-augmented-generation) architecture, people have started taken it for granted that it will be powered by a vector database. e.g., see this: The most important piece of the preprocessing pipeline, from…

large-language-models

asked Jul 13 '23 at 21:14

morpheus

214
5

3

votes

2 answers

How to Formulate a realiable ChatGPT Prompt for Sentiment Analysis of a Text, and show that it is reliable?

I have a dataset which consists of like.. 400000 sentences and I want give each sentence to ChatGPT so it classifies each sentence as positive or negative. My question is, where can I find a reliable / trusted prompt to do that? and provide evidence…

chatgpt large-language-models prompt prompt-design

asked Jul 13 '23 at 02:29

cnmesr

131
3

3

votes

1 answer

How could chatGPT avoid consuming what it produces

Considering the popularity of chatGPT, we can imagine that in the near future, many people will use it to produce lots of text contents on the internet, like blogs, forums. The productivity will be significantly improved. However, I have a worry…

chat-bots large-language-models

asked Mar 22 '23 at 16:17

zzzgoo

141
4

3

votes

0 answers

Why do LLMs like GPT-3 or Bloom use Vanilla Transformer instead of long sequence variants like Transformer-XL?

Is there any particular reason that the most recent and successful large language models like GPT-3 or Bloom utilize a vanilla Transformer architecture instead of an arguably superior long sequence architecture like, e.g. Transformer-XL, LongFormer,…

transformer large-language-models

asked Feb 21 '23 at 18:14

hokage555

31
2

2

votes

1 answer

Could hallucinations be the demise of the AI hype?

For quite some time now, I have been evaluating ChatGPT's capability to deliver accurate and helpful responses. While its performance is undeniably impressive, the issue of hallucinations poses a significant drawback to this otherwise capable…

generative-model chatgpt large-language-models

asked Jul 20 '23 at 15:01

machine_1

133
4

2

votes

0 answers

RAM Capacity of Mac Studio with M2 Ultra for inference of 65B LLM

How much RAM would be needed on Mac Studio M2 Ultra for inferring from a 65B LLM model. There are three options: 64GB, 128GB and 192GB. If using Apple M2 Ultra with 24‑core CPU, 76‑core GPU, 32‑core Neural Engine with 192GB Unified Memory, how would…

large-language-models

asked Jun 30 '23 at 21:00

Balaji Kartheeswaran

121
1

2

votes

1 answer

Is it possible to use LLMs for regression tasks?

I want to use LLMs to predict edge weights in a graph based on attributes between two nodes. Is this even possible? If not, what would you recommend? I tried to look up uses of LLM in regression tasks, but haven't had much luck finding anything…

neural-networks machine-learning regression large-language-models

asked May 17 '23 at 20:18

sharkeater123

23
2

2

votes

1 answer

OpenAI: What is the difference between model "gpt-3.5-turbo" and "gpt-3.5-turbo-0301"?

I have performed an API call to OpenAI's endpoint https://api.openai.com/v1/models . The endpoint lists the currently available engines, and provides basic information about each one such as the owner and availability. As a logged-in user, I get a…

open-ai chatgpt large-language-models

asked Apr 18 '23 at 13:51

knb

143
1
6

2

votes

2 answers

How does a LLM (transformer) pick words from its vocabulary?

I have a very rough understanding of the "attention/self attention" mechanism of transformer models and how this can be used to process a set of word vectors provided as an input/prompt to the encoder of a network and how this will produce…

neural-networks natural-language-processing transformer natural-language-generation large-language-models

asked Mar 28 '23 at 18:23

MLBeginner

21
1

2

votes

2 answers

What temperature would you recommend for the chatgpt api?

I believe that it is recommended to have a tiny bit of temperature with GPT 3 even for noncreative tasks like 0.2 or something (I am not entirely sure why). Last I checked, and if I remember correctly, the examples from openai on their GitHub page…

chatgpt natural-language-generation large-language-models

asked Mar 27 '23 at 16:02

userrandrand

121
5

Questions tagged [large-language-models]