CCSID

CCSID: CCSID is an abbreviation used by IBM to mean "Coded Character Set Identifier". It is a 16-bit number that represents a specific encoding of a specific code page. For example, Unicode is a code page that has several encoding forms, like UTF-8, UTF-16 and UTF-32.

Contents

1 What Is the Difference between a Code Page and a CCSID?

2 Examples

3 Reference

4 External links

What Is the Difference between a Code Page and a CCSID?

The terms code page and CCSID are often used interchangeably even though they are not synonymous. A code page may be only part of what makes up a CCSID. The following definitions help to illustrate this point, from glyph to CCSID and everything in between.

A glyph is the actual physical pattern of pixels or ink that shows up on a display or printout.

A character is a concept that covers all glyphs associated with a certain symbol. For instance, "F", "F", "F", "F", "F", and "F" are all different glyphs, but use the same character. The various modifiers (bold, italic, underline, color, and font) do not change the F's essential F-ness.

A character set contains the characters necessary to allow a particular human to carry on a meaningful interaction with the computer. This level is the first one to separate characters into various alphabets (Latin, Arabic, Hebrew, Cyrillic, and so on) or ideographic groups (Chinese, Korean, and so on).

A code page represents a particular assignment of code point values to glyphs. The code point is the logical representation of the computer's internal byte representation of that character. Many characters are represented by different code points in different code pages. All code points in a code page contain the same number of bytes. Certain character sets can be adequately represented with single-byte code pages (256 characters), but many require more than that. Examples include JIS X 0208 and Unicode.

An encoding scheme is the byte format of a code page. It maps code point values to byte values in a computer. For example, UTF-8 and UTF-16BE are two encodings of the same Unicode code page. In IBM's CDRA, this is typically represented with an ESID (Encoding Scheme IDentifier). EUC and ISO-2022 are other examples of encoding schemes.

A coded character set identifier (CCSID) contains all of the information necessary to assign and preserve the meaning and rendering of characters through various stages of processing and interchange. This information always includes at least one code page, but may include multiple code pages of differing byte-lengths. The CCSID also has an associated encoding scheme that governs how various code points are to be handled. This mechanism allows a program to recognize bidirectional orientation, character shaping (mainly of Arabic characters), and other complex encoding information.

Examples

The following examples show how some CCSIDs are made up of other CCSIDs.

CCSID 932
Character Set Code Page CCSID Encoding Scheme

1122 897 897 SBCS

370 301 301 DBCS

CCSID 942
Character Set Code Page CCSID Encoding Scheme

1172 1041 1041 SBCS

370 301 301 DBCS

CCSID 5028
Character Set Code Page CCSID Encoding Scheme

1170 897 4993 SBCS

370 301 301 DBCS

All three of these variant Shift-JIS CCSIDs are MBCS (multi-byte character sets). The SBCS (single byte character set) portion of each CCSID is different. The DBCS portion is the same across each CCSID. CCSID 5028 uses an updated code page 897 called CCSID 4993. CCSID 932 uses the original code page 897, which is CCSID 897. CCSID 942 uses a different SBCS from the other 2 CCSIDs, which is 1041.

Also notice how CCSID 5028 and 4993 are different by 4096 (1000 in hexadecimal) from the predecessor CCSID with the same code page identifier. This is a common way that CDRA denotes an upgraded CCSID.

There are a few reasons for this amount of complexity.

Many of the CCSIDs are used in IBM databases, like DB2, where a database field only supports an SBCS, DBCS or MBCS string. CCSIDs allow programs to differentiate between which one is being used.

When characters are added or replaced, like the Euro currency sign introduction, you can know whether the stored strings support or do not support those character additions because a different CCSID is being used. This versioning is important for the integrity of the data.

Increases reuse of resources among similar CCSIDs ^[1]

Reference

IBM CDRA (Character Data Representation Architecture) glossary of terms

IBM Globalization Terminology

^ http://www.ibm.com/software/globalization/cdra/chapter7.jsp

External links

Complete description of IBM CDRA (Character Data Representation Architecture) - This includes a more detailed description of the architecture surrounding CCSIDs.

IBM's complete list of CCSIDs and other various related identifiers

List of CCSIDs supported on the IBM System i computer

v · d · eCharacter encodings

Character sets

Early telecommunications
ASCII · ISO/IEC 646 · ISO/IEC 6937 · T.61 · sixbit code pages · Baudot code · Morse code · Chinese telegraph code

ISO/IEC 8859
-1 · -2 · -3 · -4 · -5 · -6 · -7 · -8 · -9 · -10 · -11 · -12 · -13 · -14 · -15 · -16

Bibliographic use
ANSEL · ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 · MARC-8

National standards
ArmSCII · CNS 11643 · GOST 10859 · GB 2312 · HKSCS · ISCII · JIS X 0201 · JIS X 0208 · JIS X 0212 · JIS X 0213 · KPS 9566 · KS X 1001 · PASCII · TIS-620 · TSCII · VISCII · YUSCII

EUC
CN · JP · KR · TW

ISO/IEC 2022
CN · JP · KR · CCCII

MacOS codepages ("scripts")
Arabic · CentralEurRoman · ChineseSimp / EUC-CN · ChineseTrad / Big5 · Croatian · Cyrillic · Devanagari · Dingbats · Farsi · Greek · Gujarati · Gurmukhi · Hebrew · Icelandic · Japanese / ShiftJIS · Korean / EUC-KR · Roman · Romanian · Symbol · Thai / TIS-620 · Turkish · Ukrainian

DOS codepages
437 · 720 · 737 · 775 · 850 · 852 · 855 · 857 · 858 · 860 · 861 · 862 · 863 · 864 · 865 · 866 · 869 · Kamenický · Mazovia · MIK · Iran System

Windows codepages
874 / TIS-620 · 932 / ShiftJIS · 936 / GBK · 949 / EUC-KR · 950 / Big5 · 1250 · 1251 · 1252 · 1253 · 1254 · 1255 · 1256 · 1257 · 1258 · 1361 · 54936 / GB18030

EBCDIC codepages
37/1140 · 273/1141 · 277/1142 · 278/1143 · 280/1144 · 284/1145 · 285/1146 · 297/1147 · 420/16804 · 424/12712 · 500/1148 · 838/1160 · 871/1149 · 875/9067 · 930/1390 · 933/1364 · 937/1371 · 935/1388 · 939/1399 · 1025/1154 · 1026/1155 · 1047/924 · 1112/1156 · 1122/1157 · 1123/1158 · 1130/1164 · JEF · KEIS

Platform specific
ATASCII · CDC display code · DEC-MCS · DEC Radix-50 · Fieldata · GSM 03.38 · HP roman8 · PETSCII · TI calculator character sets · WISCII · ZX Spectrum character set

Unicode / ISO/IEC 10646
UTF-8 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-7 · UTF-1 · UTF-EBCDIC · GB 18030 · SCSU · BOCU-1

Miscellaneous codepages
APL · Cork · HZ · IBM code page 1133 · KOI8 · TRON

Related topics
control character (C0 C1) · CCSID · Character encodings in HTML · charset detection · Han unification · ISO 6429/IEC 6429/ANSI X3.64 · mojibake

Categories:
Character encoding

CCSID 932
Character Set	Code Page	CCSID	Encoding Scheme
1122	897	897	SBCS
370	301	301	DBCS

CCSID 942
Character Set	Code Page	CCSID	Encoding Scheme
1172	1041	1041	SBCS
370	301	301	DBCS

CCSID 5028
Character Set	Code Page	CCSID	Encoding Scheme
1170	897	4993	SBCS
370	301	301	DBCS

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

CCSID — Coded Character Set IDentification (IBM) … Acronyms
CCSID — ● ►en sg. m. ►CHAR Coded Character Set IDentifier. Identifiant de jeu de caractères codé, utilisé par Unicode … Dictionnaire d'informatique francophone
CCSID — Coded Character Set IDentification ( IBM) … Acronyms von A bis Z
CCSID — abbr. Coded Character Set IDentification (IBM) … United dictionary of abbreviations and acronyms
EBCDIC 930 — CCSID 930 (sometimes known as CP930 or codepage 930) is one of several Japanese EBCDIC code pages created by IBM for representation of Japanese text. It is commonly used on IBM z/OS and IBM System i operating system. It encodes halfwidth Katakana … Wikipedia
EBCDIC — Extended Binary Coded Decimal Interchange Code L Extended Binary Coded Decimal Interchange Code (EBCDIC) est un mode de codage des caractères sur 8 bits créé par IBM à l époque des cartes perforées. Il existe au moins 6 versions différentes bien… … Wikipédia en Français
Extended Binary Coded Decimal Interchange Code — L Extended Binary Coded Decimal Interchange Code (EBCDIC) est un mode de codage des caractères sur 8 bits créé par IBM à l époque des cartes perforées. Il existe au moins 6 versions différentes bien documentées (et de nombreuses variantes parfois … Wikipédia en Français
Extended binary coded decimal interchange code — L Extended Binary Coded Decimal Interchange Code (EBCDIC) est un mode de codage des caractères sur 8 bits créé par IBM à l époque des cartes perforées. Il existe au moins 6 versions différentes bien documentées (et de nombreuses variantes parfois … Wikipédia en Français
EBCDIC 297 — La page de code 297 est une variante de l’EBCDIC représentant complètement le jeu de caractère latin numéro 1. Elle est utilisée pour le français. Sommaire 1 Table de codage 2 Transcodage de l’ISO 8859 1 vers l’EBCDIC 3 Voir aussi … Wikipédia en Français
Code page — is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM s EBCDIC based mainframe systems,[1] but many vendors use this term… … Wikipedia

Academic Dictionaries and Encyclopedias

CCSID

Contents

What Is the Difference between a Code Page and a CCSID?

Examples

Reference

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

CCSID

Contents

What Is the Difference between a Code Page and a CCSID?

Examples

Reference

External links

Look at other dictionaries:

Share the article and excerpts

Direct link