A character encoding is a mapping from a set of characters to their on-disk representation. jEdit can use any encoding supported by the Java platform.
Buffers in memory are always stored in
encoding, which means each character is mapped to an integer between 0
UTF-16 is the native encoding supported by
Java, and has a large enough range of characters to support most modern
When a buffer is loaded, it is converted from its on-disk
UTF-16 using a specified
The default encoding, used to load files for which no other
encoding is specified, can be set in the
the section called “The Encodings Pane”.
Unless you change this setting, it will be your operating system's
native encoding, for example
MacRoman on the MacOS,
windows-1252 on Windows, and
ISO-8859-1 on Unix.
An encoding can be explicitly set when opening a file in the file system browser's> menu.
Note that there is no general way to auto-detect the encoding used by a file, however jEdit supports "encoding detectors", of which there are some provided in the core, and others may be provided by plugins through the services api. From the encodings option pane the section called “The Encodings Pane”, you can customize which ones are used, and the order they are tried. Here are some of the encoding detectors recognized by jEdit:
files are auto-detected, because they begin with a certain fixed
character sequence. Note that plain UTF-8 does not mandate a
specific header, and thus cannot be auto-detected, unless the
file in question is an XML file.
XML-PI: Encodings used in XML files with an XML PI like the following are auto-detected:
<?xml version="1.0" encoding="UTF-8">
Encodings specified in HTML files with a
content= attribute in a
meta element may be auto-detected:
<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8">
python: Python has its own way of specifying encoding at the top of a file.
# -*- coding: utf-8 -*-
buffer-local-property: Enable buffer-local properties' syntax (see the section called “Buffer-Local Properties”) at the top of the file to specify encoding.
The encoding that will be used to save the current buffer is shown in the status bar, and can be changed in the> dialog box. Note that changing this setting has no effect on the buffer's contents; if you opened a file with the wrong encoding and got garbage, you will need to reload it. > is an easy way.
If a file is opened without an explicit encoding specified and it appears in the recent file list, jEdit will use the encoding last used when working with that file; otherwise the default encoding will be used.
While the world is slowly converging on UTF-8 and UTF-16 encodings for storing text, a wide range of older encodings are still in widespread use and Java supports most of them.
The simplest character encoding still in use is ASCII, or
“American Standard Code for Information Interchange”.
ASCII encodes Latin letters used in English, in addition to numbers
and a range of punctuation characters. Each ASCII character consists
of 7 bits, there is a limit of 128 distinct characters, which makes
it unsuitable for anything other than English text. jEdit will load
and save files as ASCII if the
Because ASCII is unsuitable for international use, most
operating systems use an 8-bit extension of ASCII, with the first
128 values mapped to the ASCII characters, and the rest used to
encode accents, umlauts, and various more esoteric used
typographical marks. The three major operating systems all extend
ASCII in a different way. Files written by Macintosh programs can be
read using the
MacRoman encoding; Windows text
files are usually stored as
windows-1252. In the
Unix world, the
8859_1 character encoding has
found widespread usage.
On Windows, various other encodings, referred to as
code pages and identified by number, are used
to store non-English text. The corresponding Java encoding name is
windows- followed by the code page number, for
Many common cross-platform international character sets are
KOI8_R for Russian text,
GBK for Chinese, and
SJIS for Japanese.