Using bless
I can see my gedit
output is ASCII. Can gedit
process some kind of Unicode?
-
1Gedit can handle a lot of encodings. – muru May 09 '16 at 07:16
-
Yes, the website says UTF-8 but my Preferences menu doesn't seem to offer any encoding selector. – H2ONaCl May 09 '16 at 07:47
-
2The codepoints in uncode that are also in ASCII have the same value; it's backwards compatible. So a file that contains no codepoints outside of ascii will be recognized as ascii. – Stefano Palazzo Sep 13 '16 at 09:16
2 Answers
When you click on save as, on the lower left corner you will get some encodings to choose from, choose add and remove (the last entry) and you will get to a list of available encodings including various unicode encodings.

- 10,542
-
1I confirm that this solution solved my problem using a Raspberry Pi 3 model B ver 1.2, running Raspbian Jessie and a Samsung UN32D5500, to play videos with subtitle files using miniDLNA version 1.1.4 Otherwise SRT subtitles don't show and even crashes video playback. Thank you. – JohnBR Apr 16 '21 at 23:23
So, I gave Bruni a screenshot for their answer to show what they meant. But then I tested the result. You can indeed select UTF-8 encoding in gedit, or any other text editor. However, unless these files contain non-ASCII characters**, they will be detected as ASCII. Indeed, the same holds if you create a "plain text" (dubious term*) file by any method, and this answer has the reason:
When all your chars are < 128 ASCII and UTF-8 are the same. ASCII is a subset of UTF-8 (and also a subset of latin1 and many other encoding formats).
I challenge anyone to test this answer; I can only create a "UTF-8" text file on my system by adding non-ASCII characters to it, even though all my terminals, all my text editors and my locale
are set to UTF-8:
$ echo unicorns > rainbows; file rainbows
rainbows: ASCII text
redirecting echo
creates a file that file
says is ASCII (try it yourself!)
$ echo ユニコーン >> rainbows; file rainbows
rainbows: UTF-8 Unicode text
Appending non-ASCII characters automagically changes the encoding? No, just forces file
to see that really, the encoding is UTF-8, because it can no longer be limited to ASCII.
TL;DR
Don't worry, your "ASCII" text files are UTF-8 files in disguise (their UTF-8-ness cannot be detected), and will be parsed as you want & expect.
*You were interested enough to ask, so perhaps you already understand what the writer of this article is telling us. This piece explains more about encoding and specifically, why ASCII!=UTF-8
and why you need to know how you encoded your text. I have extracted:
The Single Most Important Fact About Encodings
If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII.
There Ain't No Such Thing As Plain Text.
If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.
Almost every stupid "my website looks like gibberish" or "she can't read my emails when I use accents" problem comes down to one naive programmer who didn't understand the simple fact that if you don't tell me whether a particular string is encoded using UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot display it correctly or even figure out where it ends. There are over a hundred encodings and above code point 127, all bets are off.
** Fun Fact: @ByteCommander pointed out to me that file
only looks at the first 50-100kb of the file, so if there are non-ASCII chars far from the beginning of a text file, then file
will still think it is ASCII.
-
I don't understand your challenge. All ASCII-encoded files are by definition UTF-8-encoded as well.
file
just tries to guess the "narrowest" character encoding of a file. – David Foerster Sep 13 '16 at 09:15 -
-
I think the real question is "why are my files ASCII when I expect them to be Unicode?" @DavidFoerster – Zanna Sep 13 '16 at 09:24