I frequently encounter text files (such as subtitle files in my native language, Persian) with character encoding problems. These files are created on Windows, and saved with an unsuitable encoding (seems to be ANSI), which looks gibberish and unreadable, like this:
In Windows, one can fix this easily using Notepad++ to convert the encoding to UTF-8, like below:
And the correct readable result is like this:
I've searched a lot for a similar solution on GNU/Linux, but unfortunately the suggested solutions (e.g this question) don't work. Most of all, I've seen people suggest iconv
and recode
but I have had no luck with these tools. I've tested many commands, including the followings, and all have failed:
$ recode ISO-8859-15..UTF8 file.txt
$ iconv -f ISO8859-15 -t UTF-8 file.txt > out.txt
$ iconv -f WINDOWS-1252 -t UTF-8 file.txt > out.txt
None of these worked!
I'm using Ubuntu-14.04 and I'm looking for a simple solution (either GUI or CLI) that works just as Notepad++ does.
One important aspect of being "simple" is that the user is not required to determine the source encoding; rather the source encoding should be automatically detected by the tool and only the target encoding should be provided by the user. But nevertheless, I will also be glad to know about a working solution that requires the source encoding to be provided.
If someone needs a test-case to examine different solutions, the above example is accessible via this link.
vim '+set fileencoding=utf-8' '+wq' file.txt
. – muru Apr 14 '15 at 11:55iso-639
but that doesn't seem to be available in eithericonv
orrecode
. At least, I don't see it in the output oficonv -l
. – terdon Apr 14 '15 at 15:16vim
but it didn't work. – Seyed Mohammad Apr 14 '15 at 16:26iconv -f CP1256 -t UTF-8 ...
or equivalentlyiconv -f WINDOWS-1256 ...
appear to at least give the right kind of script - I suspect they are the nearest Arabic equivalents of the Persian characters? In my locale the letter order is L-R but I suspect that's corrected if you runiconv
in a suitable R-L locale. – steeldriver Apr 14 '15 at 18:06iso-639
is not seen as an available encoding foriconv
andrecode
? If so, what would the correct encoding be? – terdon Sep 03 '15 at 12:20ex '+set fileencoding=utf-8' '+wq' <FILE>
. For me was the safest way. More here: https://stackoverflow.com/a/52823709/3223785 – Eduardo Lucio Oct 16 '18 at 03:35