DBCS

DBCS: This article is about character sets. For other definitions, see DBCS (disambiguation).

A double-byte character set (DBCS) is a character set that represents each character with 2 bytes. The DBCS supports national languages that contain a large number of unique characters or symbols (the maximum number of characters that can be represented with 1 byte is 256 characters, while 2 bytes can represent up to 65,536 characters). Examples of such languages include Japanese, Korean, and Chinese.

DBCS stands for Double Byte Character Set. This term has two basic meanings:

In CJK (Chinese, Japanese and Korean) computing, the term "DBCS" traditionally means a character set in which every graphic character not representable by an accompanying SBCS is encoded in two bytes; Han characters would generally comprise most of these two-byte characters.

The term "DBCS" can also mean a character set in which all characters (including all control characters) are encoded in two bytes.

Contents

1 The DBCS in CJK computing

2 Controversy

3 See also

4 External links

The DBCS in CJK computing

The term DBCS traditionally refers to a character set where each graphic character is encoded in two bytes. The DBCS always has lead bytes with the most significant bit set (i.e., being greater than 7 bits), and is always paired up with a single-byte character-set (SBCS). Furthermore, for the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with halfwidth characters and the DBCS with fullwidth characters.

Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically not EUC.

Note that this original meaning of DBCS is different from what some consider correct usage today. Some insist that these character sets be properly called either multi-byte character sets (MBCS) or variable-width encodings because character sets like EUC-JP, EUC-TW, GB18030 and UTF-8 use more than 2 bytes for some characters, and they support 1 byte for some other characters.

Controversy

Some people use DBCS to mean the UTF-16 and UTF-8 encodings, while other people use the term DBCS to mean older (pre-Unicode) code pages that use more than one byte per character. Shift-JIS, GB2312 and Big5 are a few code pages that can contain more than one byte per character, but even using the term DBCS for these code pages is incorrect terminology because these code pages are really MBCS (MultiByte Character Sets). Some IBM mainframes do have true DBCS code pages, which contain only the double byte portion of a multibyte code page.

If a person uses the term "DBCS Enablement" for software internationalization, they are using ambiguous terminology. They either mean they want to write software for East Asian markets using older technology with code pages, or they are planning on using Unicode. Sometimes this term also implies translation into an East Asian language. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible code pages that exist between the various countries in East Asia for internationalizing software. Since Unicode supports all the major languages in East Asia, unlike many other code pages, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.

See also

SBCS

MBCS

Variable-width encoding

External links

Microsoft's definition of DBCS

IBM's definition of DBCS

v · d · eCharacter encodings

Character sets

Early telecommunications
ASCII · ISO/IEC 646 · ISO/IEC 6937 · T.61 · sixbit code pages · Baudot code · Morse code · Chinese telegraph code

ISO/IEC 8859
-1 · -2 · -3 · -4 · -5 · -6 · -7 · -8 · -9 · -10 · -11 · -12 · -13 · -14 · -15 · -16

Bibliographic use
ANSEL · ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 · MARC-8

National standards
ArmSCII · CNS 11643 · GOST 10859 · GB 2312 · HKSCS · ISCII · JIS X 0201 · JIS X 0208 · JIS X 0212 · JIS X 0213 · KPS 9566 · KS X 1001 · PASCII · TIS-620 · TSCII · VISCII · YUSCII

EUC
CN · JP · KR · TW

ISO/IEC 2022
CN · JP · KR · CCCII

MacOS codepages ("scripts")
Arabic · CentralEurRoman · ChineseSimp / EUC-CN · ChineseTrad / Big5 · Croatian · Cyrillic · Devanagari · Dingbats · Farsi · Greek · Gujarati · Gurmukhi · Hebrew · Icelandic · Japanese / ShiftJIS · Korean / EUC-KR · Roman · Romanian · Symbol · Thai / TIS-620 · Turkish · Ukrainian

DOS codepages
437 · 720 · 737 · 775 · 850 · 852 · 855 · 857 · 858 · 860 · 861 · 862 · 863 · 864 · 865 · 866 · 869 · Kamenický · Mazovia · MIK · Iran System

Windows codepages
874 / TIS-620 · 932 / ShiftJIS · 936 / GBK · 949 / EUC-KR · 950 / Big5 · 1250 · 1251 · 1252 · 1253 · 1254 · 1255 · 1256 · 1257 · 1258 · 1361 · 54936 / GB18030

EBCDIC codepages
37/1140 · 273/1141 · 277/1142 · 278/1143 · 280/1144 · 284/1145 · 285/1146 · 297/1147 · 420/16804 · 424/12712 · 500/1148 · 838/1160 · 871/1149 · 875/9067 · 930/1390 · 933/1364 · 937/1371 · 935/1388 · 939/1399 · 1025/1154 · 1026/1155 · 1047/924 · 1112/1156 · 1122/1157 · 1123/1158 · 1130/1164 · JEF · KEIS

Platform specific
ATASCII · CDC display code · DEC-MCS · DEC Radix-50 · Fieldata · GSM 03.38 · HP roman8 · PETSCII · TI calculator character sets · WISCII · ZX Spectrum character set

Unicode / ISO/IEC 10646
UTF-8 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-7 · UTF-1 · UTF-EBCDIC · GB 18030 · SCSU · BOCU-1

Miscellaneous codepages
APL · Cork · HZ · IBM code page 1133 · KOI8 · TRON

Related topics
control character (C0 C1) · CCSID · Character encodings in HTML · charset detection · Han unification · ISO 6429/IEC 6429/ANSI X3.64 · mojibake

Categories:
Character encoding

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

DBCS — (Double Byte Character Set) набор двухбайтовых символов. Термин имеет два базовых значения: В ИТ индустрии Китая, Японии, Кореи, термин «DBCS» обычно означает набор символов, в котором любой графический символ, не представленный в SBCS… … Википедия
DBCS — Double Byte Character Set (DBCS) bezeichnet einen Zeichensatz, der maximal zwei Byte zur Darstellung aller Zeichen nutzt. Dies ergibt eine Darstellungsmöglichkeit für maximal 65.536 verschiedene Zeichen. Im Gegensatz dazu werden Zeichensätze, die … Deutsch Wikipedia
DBCS — Double Byte Char String Double Byte Char String (DBCS) est une chaîne de caractères dont les éléments sont codés sur deux octets, par exemple pour stocker du texte dans une langue asiatique utilisant des idéogrammes. Principe normalisé proprement … Wikipédia en Français
DBCS (disambiguation) — DBCS is an acronym that can mean one of the following. In computers and electronics: Double Byte Character Set Delivery Bar Code Sorter In medicine: Diamond Bone Cutting System This disambiguation page lists articles associated with the same… … Wikipedia
DBCS — Double Byte Character Set (Computing » General) … Abbreviations dictionary
DBCS — • Delivery Bar Code Sorter • Double Byte Character Set … Acronyms
DBCS — ● ►en sg. f. ►TYPE Double Byte Char String. chaîne de caractères dont les éléments sont codés sur deux octets, par exemple pour stocker du texte dans une langue asiatique utilisant des idéogrammes. Principe normalisé proprement dans Unicode … Dictionnaire d'informatique francophone
DBCS — [1] Delivery Bar Code Sorter [2] Double Byte Character Set … Acronyms von A bis Z
DBCS — abbr. DataBase Control System … Dictionary of English abbreviation
DBCS — See double byte character set … Dictionary of networking

Academic Dictionaries and Encyclopedias

DBCS

Contents

The DBCS in CJK computing

Controversy

See also

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

DBCS

Contents

The DBCS in CJK computing

Controversy

See also

External links

Look at other dictionaries:

Share the article and excerpts

Direct link