MySQL Tutorial/Table/COLLATION Character Set

Материал из SQL эксперт
Перейти к: навигация, поиск

Character Sets and Collations in MySQL

MySQL supports the use of character sets for the MyISAM, MEMORY, NDBCluster, and InnoDB storage engines.

A character set is a set of symbols and encodings.

A collation is a set of rules for comparing characters in a character set.

The MySQL server can support multiple character sets.

To list the available character sets, use the SHOW CHARACTER SET statement

Quote from MySQL document http://www.mysql.ru



mysql>
mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                 | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |
| macce    | Mac Central European        | macce_general_ci    |      1 |
| macroman | Mac West European           | macroman_general_ci |      1 |
| cp852    | DOS Central European        | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
| cp1251   | Windows Cyrillic            | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |
| binary   | Binary pseudo charset       | binary              |      1 |
| geostd8  | GEOSTD8 Georgian            | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese   | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |
+----------+-----------------------------+---------------------+--------+
36 rows in set (0.00 sec)
mysql>


character_set_system

SHOW VARIABLES LIKE "character_set_system";        
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| character_set_system | utf8  |
+----------------------+-------+
1 row in set (0.00 sec)
mysql>


Column Character Set and Collation

Every CHAR, VARCHAR, or TEXT Column has a column character set and a column collation.

Column definition syntax has optional clauses for specifying the column character set and collation:



col_name {CHAR | VARCHAR | TEXT} (col_length)[CHARACTER SET charset_name] [COLLATE collation_name]


Database, Table, and Column Definition with Character and collate

mysql>
mysql> CREATE DATABASE myDatabase DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;
Query OK, 1 row affected (0.02 sec)
mysql>
mysql> 
mysql>
mysql> USE myDatabase;
Database changed
mysql>
mysql> CREATE TABLE myTable
    -> (
    ->     c1 CHAR(10)
    -> );
Query OK, 0 rows affected (0.03 sec)
mysql>
mysql> drop table myTable;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> drop database myDatabase;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> use test;
Database changed


Every database has a database character set and a database collation.

The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

All database options are stored in a text file named db.opt that can be found in the database directory.



CREATE DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]
ALTER DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]


List the available character set

SHOW CHARACTER SET;        
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                 | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |
| macce    | Mac Central European        | macce_general_ci    |      1 |
| macroman | Mac West European           | macroman_general_ci |      1 |
| cp852    | DOS Central European        | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
| cp1251   | Windows Cyrillic            | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |
| binary   | Binary pseudo charset       | binary              |      1 |
| geostd8  | GEOSTD8 Georgian            | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese   | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |
+----------+-----------------------------+---------------------+--------+
36 rows in set (0.00 sec)
mysql>


Server Character Set and Collation

You can use --character-set-server for the character set.

Along with it, you can add --collation-server for the collation.



shell> mysqld
shell> mysqld --character-set-server=latin1
shell> mysqld --character-set-server=latin1 --collation-server=latin1_swedish_ci


SHOW CREATE DATABASE displays the CREATE DATABASE statement that creates a given database:

mysql>
mysql> SHOW CREATE DATABASE test;
+----------+-----------------------------------------------------------------+
| Database | Create Database                                                 |
+----------+-----------------------------------------------------------------+
| test     | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>


SHOW CREATE TABLE displays the CREATE TABLE statement to create a given table.

mysql>
mysql> CREATE TABLE myTable
    -> (
    ->     c1 CHAR(10)
    -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 0 rows affected (0.05 sec)
mysql>
mysql> desc myTable;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| c1    | char(10) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql>
mysql> SHOW CREATE TABLE myTable;
+---------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                   |
+---------+------------------------------------------------------------------------------------------------------------------------------------------------+
| myTable | CREATE TABLE `mytable` (
  `c1` char(10) collate latin1_danish_ci default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_danish_ci |
+---------+------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
mysql> drop table myTable;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql>


Table and Column Definition with character and collate

mysql>
mysql> CREATE TABLE myTable
    -> (
    ->     c1 CHAR(10)
    -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 0 rows affected (0.03 sec)
mysql>
mysql> desc myTable;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| c1    | char(10) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql>
mysql> drop table myTable;
Query OK, 0 rows affected (0.00 sec)
mysql>


Table Character Set and Collation

Every table has a table character set and a table collation.

The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying them:



CREATE TABLE tbl_name (column_list)
    [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]]
ALTER TABLE tbl_name
    [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]


TEXT CHARACTER SET latin1 COLLATE latin1_bin

mysql>
mysql>
mysql> CREATE TABLE myTable
    -> (
    ->    ID SMALLINT UNSIGNED,
    ->    Name VARCHAR(40),
    ->    Description TEXT CHARACTER SET latin1 COLLATE latin1_bin
    -> );
Query OK, 0 rows affected (0.03 sec)
mysql>
mysql> desc myTable;
+-------------+----------------------+------+-----+---------+-------+
| Field       | Type                 | Null | Key | Default | Extra |
+-------------+----------------------+------+-----+---------+-------+
| ID          | smallint(5) unsigned | YES  |     | NULL    |       |
| Name        | varchar(40)          | YES  |     | NULL    |       |
| Description | text                 | YES  |     | NULL    |       |
+-------------+----------------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
mysql>
mysql> drop table myTable;
Query OK, 0 rows affected (0.00 sec)
mysql>


The output from SHOW COLLATION includes all available character sets.

It takes an optional LIKE clause that indicates which collation names to match.



mysql>
mysql>
mysql> SHOW COLLATION LIKE "latin1%";
+-------------------+---------+----+---------+----------+---------+
| Collation         | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1  |  5 |         | Yes      |       1 |
| latin1_swedish_ci | latin1  |  8 | Yes     | Yes      |       1 |
| latin1_danish_ci  | latin1  | 15 |         | Yes      |       1 |
| latin1_german2_ci | latin1  | 31 |         | Yes      |       2 |
| latin1_bin        | latin1  | 47 |         | Yes      |       1 |
| latin1_general_ci | latin1  | 48 |         | Yes      |       1 |
| latin1_general_cs | latin1  | 49 |         | Yes      |       1 |
| latin1_spanish_ci | latin1  | 94 |         | Yes      |       1 |
+-------------------+---------+----+---------+----------+---------+
8 rows in set (0.00 sec)


The SHOW CHARACTER SET command shows all available character sets.

It takes an optional LIKE clause that indicates which character set names to match.



mysql>
mysql>
mysql> SHOW CHARACTER SET LIKE "latin%";
+---------+-----------------------------+-------------------+--------+
| Charset | Description                 | Default collation | Maxlen |
+---------+-----------------------------+-------------------+--------+
| latin1  | cp1252 West European        | latin1_swedish_ci |      1 |
| latin2  | ISO 8859-2 Central European | latin2_general_ci |      1 |
| latin5  | ISO 8859-9 Turkish          | latin5_turkish_ci |      1 |
| latin7  | ISO 8859-13 Baltic          | latin7_general_ci |      1 |
+---------+-----------------------------+-------------------+--------+
4 rows in set (0.00 sec)
mysql>
mysql>


The SHOW COLUMNS statement displays the collations of a table"s columns when invoked as SHOW FULL COLUMNS

Columns with CHAR, VARCHAR, or TEXT data types have collations.

Numeric and other non-character types have no collation (indicated by NULL as the Collation value).



mysql>
mysql>
mysql>
mysql>
mysql> CREATE TABLE myTable
    -> (
    ->     c1 CHAR(10)
    -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 0 rows affected (0.03 sec)
mysql>
mysql> desc myTable;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| c1    | char(10) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql>
mysql> SHOW FULL COLUMNS FROM myTable\G
*************************** 1. row ***************************
     Field: c1
      Type: char(10)
 Collation: latin1_danish_ci
      Null: YES
       Key:
   Default: NULL
     Extra:
Privileges: select,insert,update,references
   Comment:
1 row in set (0.00 sec)
mysql>
mysql> drop table myTable;
Query OK, 0 rows affected (0.00 sec)
mysql>


Using alter command to change characgter set

mysql> CREATE TABLE myTable
    -> (
    ->     col1 CHAR(10)
    -> ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 0 rows affected (0.05 sec)
mysql>
mysql>
mysql> ALTER TABLE myTable MODIFY col1 CHAR(50) CHARACTER SET greek;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE myTable MODIFY col1 CHAR(50) CHARACTER SET utf8;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql>
mysql>
mysql> drop table myTable;
Query OK, 0 rows affected (0.00 sec)
mysql>