Large language models can have biases that depend on cultural biases present in the information they trained on.
Some models, including GPT-4, are trained on input data in multiple languages. Some languages are used by people from many different cultures and nations, like English, but others only by a much more culturally homogeneous group, like German. Cultural biases are correlated with cultures, which are correlated with languages.
Now, there is an interesting question: Which biases does a model learn?
Mainly the biases associated with the most used input language?
Or an average set of biases in the way it would be when all input material would be translated to a common language, and language learning would happen independently?
Or does it learn different biases in different languages?
To explore that, I asked a question related to a bias that should be closely correlated with the language, in that language and in English.
The result was surprising: I found a difference in bias depending on the language - but in the opposite way I expected to see.
There is a pretty strong pro-privacy bias in Germany, in part because privacy was routinely invaded in the east of the country until 1989.
I used the following prompts for GPT-4, both mean the same:
"Should we have a database of all address changes?"
and
"Sollten wir eine Datenbank mit allen Addressänderungen haben?"
The answers* for English were very explicit that there are pros and cons, while the answers for German were clearly positive. I would have expected the opposite bias, but that is not relevant here. The point here is that there is a significant difference in the biases GPT-4 expresses, depending on the language the question is asked in.
But how does that even work? I thought of GPT-4 being fluent in many languages, including local dialects, implies that it understands languages, and answers the same question when I ask the same in two different languages.
One explanation would be that it sees the language as implying a specific cultural context, and answers in this context.
Another would be that I interact with two separate parts of the system in some way, that it learns separate world models for separate languages.
The answer may be "we don't know", but I would also be interested in speculations how it could work.
(*) I did that repeatedly with temperature of 0.7, the difference was significant, it was not a random fluctuation as part of a random valid answer.