MySQL Tutorial/Table/COLLATION Character Set

Материал из SQL эксперт
Версия от 12:49, 26 мая 2010; Admin (обсуждение | вклад) (1 версия)
(разн.) ← Предыдущая | Текущая версия (разн.) | Следующая → (разн.)
Перейти к: навигация, поиск

Character Sets and Collations in MySQL

MySQL supports the use of character sets for the MyISAM, MEMORY, NDBCluster, and InnoDB storage engines.

A character set is a set of symbols and encodings.

A collation is a set of rules for comparing characters in a character set.

The MySQL server can support multiple character sets.

To list the available character sets, use the SHOW CHARACTER SET statement

Quote from MySQL document http://www.mysql.ru



   <source lang="sql">

mysql> mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | | cp866 | DOS Russian | cp866_general_ci | 1 | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | | macce | Mac Central European | macce_general_ci | 1 | | macroman | Mac West European | macroman_general_ci | 1 | | cp852 | DOS Central European | cp852_general_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | | cp1256 | Windows Arabic | cp1256_general_ci | 1 | | cp1257 | Windows Baltic | cp1257_general_ci | 1 | | binary | Binary pseudo charset | binary | 1 | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | +----------+-----------------------------+---------------------+--------+ 36 rows in set (0.00 sec) mysql></source>


character_set_system

   <source lang="sql">

SHOW VARIABLES LIKE "character_set_system"; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | character_set_system | utf8 | +----------------------+-------+ 1 row in set (0.00 sec) mysql></source>


Column Character Set and Collation

Every CHAR, VARCHAR, or TEXT Column has a column character set and a column collation.

Column definition syntax has optional clauses for specifying the column character set and collation:



   <source lang="sql">

col_name {CHAR | VARCHAR | TEXT} (col_length)[CHARACTER SET charset_name] [COLLATE collation_name]</source>


Database, Table, and Column Definition with Character and collate

   <source lang="sql">

mysql> mysql> CREATE DATABASE myDatabase DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci; Query OK, 1 row affected (0.02 sec) mysql> mysql> mysql> mysql> USE myDatabase; Database changed mysql> mysql> CREATE TABLE myTable

   -> (
   ->     c1 CHAR(10)
   -> );

Query OK, 0 rows affected (0.03 sec) mysql> mysql> drop table myTable; Query OK, 0 rows affected (0.00 sec) mysql> mysql> drop database myDatabase; Query OK, 0 rows affected (0.00 sec) mysql> mysql> use test; Database changed</source>


Every database has a database character set and a database collation.

The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

All database options are stored in a text file named db.opt that can be found in the database directory.



   <source lang="sql">

CREATE DATABASE db_name

   [[DEFAULT] CHARACTER SET charset_name]
   [[DEFAULT] COLLATE collation_name]

ALTER DATABASE db_name

   [[DEFAULT] CHARACTER SET charset_name]
   [[DEFAULT] COLLATE collation_name]</source>
   
  

List the available character set

   <source lang="sql">

SHOW CHARACTER SET; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | | cp866 | DOS Russian | cp866_general_ci | 1 | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | | macce | Mac Central European | macce_general_ci | 1 | | macroman | Mac West European | macroman_general_ci | 1 | | cp852 | DOS Central European | cp852_general_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | | cp1256 | Windows Arabic | cp1256_general_ci | 1 | | cp1257 | Windows Baltic | cp1257_general_ci | 1 | | binary | Binary pseudo charset | binary | 1 | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | +----------+-----------------------------+---------------------+--------+ 36 rows in set (0.00 sec) mysql></source>


Server Character Set and Collation

You can use --character-set-server for the character set.

Along with it, you can add --collation-server for the collation.



   <source lang="sql">

shell> mysqld shell> mysqld --character-set-server=latin1 shell> mysqld --character-set-server=latin1 --collation-server=latin1_swedish_ci</source>


SHOW CREATE DATABASE displays the CREATE DATABASE statement that creates a given database:

   <source lang="sql">

mysql> mysql> SHOW CREATE DATABASE test; +----------+-----------------------------------------------------------------+ | Database | Create Database | +----------+-----------------------------------------------------------------+ | test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ | +----------+-----------------------------------------------------------------+ 1 row in set (0.00 sec) mysql></source>


SHOW CREATE TABLE displays the CREATE TABLE statement to create a given table.

   <source lang="sql">

mysql> mysql> CREATE TABLE myTable

   -> (
   ->     c1 CHAR(10)
   -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

Query OK, 0 rows affected (0.05 sec) mysql> mysql> desc myTable; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+-------+ | c1 | char(10) | YES | | NULL | | +-------+----------+------+-----+---------+-------+ 1 row in set (0.00 sec) mysql> mysql> SHOW CREATE TABLE myTable; +---------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +---------+------------------------------------------------------------------------------------------------------------------------------------------------+ | myTable | CREATE TABLE `mytable` (

 `c1` char(10) collate latin1_danish_ci default NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_danish_ci | +---------+------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> mysql> drop table myTable; Query OK, 0 rows affected (0.00 sec) mysql> mysql></source>


Table and Column Definition with character and collate

   <source lang="sql">

mysql> mysql> CREATE TABLE myTable

   -> (
   ->     c1 CHAR(10)
   -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

Query OK, 0 rows affected (0.03 sec) mysql> mysql> desc myTable; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+-------+ | c1 | char(10) | YES | | NULL | | +-------+----------+------+-----+---------+-------+ 1 row in set (0.00 sec) mysql> mysql> drop table myTable; Query OK, 0 rows affected (0.00 sec) mysql></source>


Table Character Set and Collation

Every table has a table character set and a table collation.

The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying them:



   <source lang="sql">

CREATE TABLE tbl_name (column_list)

   [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]]

ALTER TABLE tbl_name

   [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]</source>
   
  

TEXT CHARACTER SET latin1 COLLATE latin1_bin

   <source lang="sql">

mysql> mysql> mysql> CREATE TABLE myTable

   -> (
   ->    ID SMALLINT UNSIGNED,
   ->    Name VARCHAR(40),
   ->    Description TEXT CHARACTER SET latin1 COLLATE latin1_bin
   -> );

Query OK, 0 rows affected (0.03 sec) mysql> mysql> desc myTable; +-------------+----------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+----------------------+------+-----+---------+-------+ | ID | smallint(5) unsigned | YES | | NULL | | | Name | varchar(40) | YES | | NULL | | | Description | text | YES | | NULL | | +-------------+----------------------+------+-----+---------+-------+ 3 rows in set (0.01 sec) mysql> mysql> drop table myTable; Query OK, 0 rows affected (0.00 sec) mysql></source>


The output from SHOW COLLATION includes all available character sets.

It takes an optional LIKE clause that indicates which collation names to match.



   <source lang="sql">

mysql> mysql> mysql> SHOW COLLATION LIKE "latin1%"; +-------------------+---------+----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +-------------------+---------+----+---------+----------+---------+ | latin1_german1_ci | latin1 | 5 | | Yes | 1 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | Yes | 1 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | Yes | 1 | | latin1_general_cs | latin1 | 49 | | Yes | 1 | | latin1_spanish_ci | latin1 | 94 | | Yes | 1 | +-------------------+---------+----+---------+----------+---------+ 8 rows in set (0.00 sec)</source>


The SHOW CHARACTER SET command shows all available character sets.

It takes an optional LIKE clause that indicates which character set names to match.



   <source lang="sql">

mysql> mysql> mysql> SHOW CHARACTER SET LIKE "latin%"; +---------+-----------------------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+-----------------------------+-------------------+--------+ | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | +---------+-----------------------------+-------------------+--------+ 4 rows in set (0.00 sec) mysql> mysql></source>


The SHOW COLUMNS statement displays the collations of a table"s columns when invoked as SHOW FULL COLUMNS

Columns with CHAR, VARCHAR, or TEXT data types have collations.

Numeric and other non-character types have no collation (indicated by NULL as the Collation value).



   <source lang="sql">

mysql> mysql> mysql> mysql> mysql> CREATE TABLE myTable

   -> (
   ->     c1 CHAR(10)
   -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

Query OK, 0 rows affected (0.03 sec) mysql> mysql> desc myTable; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+-------+ | c1 | char(10) | YES | | NULL | | +-------+----------+------+-----+---------+-------+ 1 row in set (0.00 sec) mysql> mysql> SHOW FULL COLUMNS FROM myTable\G

                                                      • 1. row ***************************
    Field: c1
     Type: char(10)
Collation: latin1_danish_ci
     Null: YES
      Key:
  Default: NULL
    Extra:

Privileges: select,insert,update,references

  Comment:

1 row in set (0.00 sec) mysql> mysql> drop table myTable; Query OK, 0 rows affected (0.00 sec) mysql></source>


Using alter command to change characgter set

   <source lang="sql">

mysql> CREATE TABLE myTable

   -> (
   ->     col1 CHAR(10)
   -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

Query OK, 0 rows affected (0.05 sec) mysql> mysql> mysql> ALTER TABLE myTable MODIFY col1 CHAR(50) CHARACTER SET greek; Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE myTable MODIFY col1 CHAR(50) CHARACTER SET utf8; Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> mysql> mysql> drop table myTable; Query OK, 0 rows affected (0.00 sec) mysql></source>