C++ åèæå
- C++11
- C++14
- C++17
- C++20
- C++ ç¼è¯å¨æ¯ææ åµè¡¨
- ç¬ç«ä¸å®¿ä¸»å®ç°
- C++ è¯è¨
- C++ å ³é®è¯
- é¢å¤çå¨
- C++ æ ååºå¤´æä»¶
- å ·åè¦æ±
- åè½ç¹æ§æµè¯ (C++20)
- å·¥å ·åº
- ç±»åæ¯æï¼åºæ¬ç±»åãRTTIãç±»åç¹æ§ï¼
- æ¦å¿µåº (C++20)
- é误å¤ç
- 卿å å管ç
- æ¥æåæ¶é´å·¥å ·
- å符串åº
- 容å¨åº
- è¿ä»£å¨åº
- èå´åº (C++20)
- ç®æ³åº
- æ°å¼åº
- è¾å ¥/è¾åºåº
- æä»¶ç³»ç»åº
- æ¬å°ååº
- std::locale
- std::use_facet
- std::has_facet
- std::isspace(std::locale)
- std::isblank(std::locale)
- std::codecvt
- std::wstring_convert
- std::iscntrl(std::locale)
- std::isupper(std::locale)
- std::islower(std::locale)
- std::isalpha(std::locale)
- std::isdigit(std::locale)
- std::ispunct(std::locale)
- std::isxdigit(std::locale)
- std::isalnum(std::locale)
- std::isprint(std::locale)
- std::isgraph(std::locale)
- std::toupper(std::locale)
- std::tolower(std::locale)
- std::wbuffer_convert
- std::ctype_base
- std::codecvt_base
- std::messages_base
- std::time_base
- std::money_base
- std::ctype
- std::ctype<char>
- std::collate
- std::messages
- std::time_get
- std::time_put
- std::num_get
- std::num_put
- std::numpunct
- std::money_get
- std::money_put
- std::moneypunct
- std::ctype_byname
- std::codecvt_byname
- std::messages_byname
- std::collate_byname
- std::time_get_byname
- std::time_put_byname
- std::numpunct_byname
- std::moneypunct_byname
- std::codecvt_utf8
- std::codecvt_utf16
- std::codecvt_utf8_utf16
- std::codecvt_mode
- std::setlocale
- std::localeconv
- std::lconv
- LC_ALL, LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, LC_TIME
- 注é
- æ£å表达å¼åº
- ååæä½åº
- çº¿ç¨æ¯æåº
- å®éªæ§ C++ ç¹æ§
- æç¨çèµæº
- ç´¢å¼
- std 符å·ç´¢å¼
- åç¨æ¯æ (C++20)
- C++ å ³é®è¯
ä½ç½®ï¼é¦é¡µ > C++ åèæå >æ¬å°ååº > std::codecvt
std::codecvt
  class InternT,
  class ExternT,
  class State
ç±»æ¨¡æ¿ std::codecvt å°è£
å符串ç转æ¢ï¼å
æ¬å®½åå¤åèï¼ä»ä¸ç§ç¼ç å°å¦ä¸ç§ãéè¿ std::basic_fstream<CharT> è¿è¡çææ I/O æä½é½ä½¿ç¨æµä¸ææç std::codecvt<CharT, char, std::mbstate_t> æ¬å°ç¯å¢å¹³é¢ã
ç»§æ¿å¾
æ ååºæä¾ä»¥ä¸ç¬ç«ï¼æ¬å°ç¯å¢æ å ³ï¼ç¹åï¼
| å®ä¹äºå¤´æä»¶
<locale> | |
| std::codecvt<char, char, std::mbstate_t> | æçè½¬æ¢ |
| std::codecvt<char16_t, char, std::mbstate_t> | å¨ UTF-16 å UTF-8 é´è½¬æ¢ (C++11 èµ·)(C++20 ä¸å¼ç¨) |
| std::codecvt<char16_t, char8_t, std::mbstate_t> | å¨ UTF-16 å UTF-8 é´è½¬æ¢ (C++20 èµ·) |
| std::codecvt<char32_t, char, std::mbstate_t> | å¨ UTF-32 å UTF-8 é´è½¬æ¢ (C++11 èµ·)(C++20 ä¸å¼ç¨) |
| std::codecvt<char32_t, char8_t, std::mbstate_t> | å¨ UTF-32 å UTF-8 é´è½¬æ¢ (C++20 èµ·) |
| std::codecvt<wchar_t, char, std::mbstate_t> | å¨ç³»ç»åç宽åååèçªå符éé´è½¬æ¢ |
å¦å¤ï¼ C++ ç¨åºä¸æé æ¯ä¸ªç locale 对象å®è£ ä¸è¿°ç¹åå ¶èªèº«çï¼æ¬å°ç¯å¢ç¹å®ï¼çæ¬ã
æåç±»å
| Â | |
| æåç±»å | å®ä¹ |
intern_type
|
InternT
|
extern_type
|
ExternT
|
state_type
|
State
|
æå彿°
| æé æ°ç codecvt å¹³é¢ (å ¬å¼æå彿°) | |
| 鿝 codecvt å¹³é¢ (åä¿æ¤æå彿°) | |
è°ç¨ do_out (å ¬å¼æå彿°) | |
è°ç¨ do_in (å ¬å¼æå彿°) | |
è°ç¨ do_unshift (å ¬å¼æå彿°) | |
è°ç¨ do_encoding (å ¬å¼æå彿°) | |
è°ç¨ do_always_noconv (å ¬å¼æå彿°) | |
è°ç¨ do_length (å ¬å¼æå彿°) | |
è°ç¨ do_max_length (å ¬å¼æå彿°) |
æå对象
| Â | |
| æåå | ç±»å |
id [éæ]
|
std::locale::id |
åä¿æ¤æå彿°
| [è] |
ä» internT 转æ¢å符串为 externT 转æ¢å符串ï¼å¦å¨åå
¥æä»¶æ¶ (èåä¿æ¤æå彿°) |
| [è] |
ä» externT 转æ¢å符串为 internT ï¼å¦å¨ä»æä»¶è¯»åæ¶ (èåä¿æ¤æå彿°) |
| [è] |
为ä¸å®æ´è½¬æ¢çæ externT å符çç»æ¢å符åºå (èåä¿æ¤æå彿°) |
| [è] |
è¿å产çä¸ä¸ª internT å符æéç externT å符æ°ï¼è¥æ¤å¼ä¸ºå¸¸æ° (èåä¿æ¤æå彿°) |
| [è] |
æµè¯å¹³é¢ç¼ç æ¯å¦å¯¹ææåæ³å¼ä¸ºæçè½¬æ¢ (èåä¿æ¤æå彿°) |
| [è] |
计ç®è½¬æ¢æç»å®ç internT ç¼å²åºä¼æ¶èç externT å符串é¿åº¦ (èåä¿æ¤æå彿°) |
| [è] |
è¿åè½è½¬æ¢æå个 internT å符çæå¤§ externT åç¬¦æ° (èåä¿æ¤æå彿°) |
ç»§æ¿èª std::codecvt_base
| Â | |
| æåç±»å | å®ä¹ |
| enum result { ok, partial, error, noconv }; | æ ä½ç¨åæä¸¾ç±»å |
| Â | |
| æä¸¾å¸¸é | å®ä¹ |
ok
|
å®æè½¬æ¢èæ é误 |
partial
|
æªè½¬æ¢æææºå符 |
error
|
éå°éæ³å符 |
noconv
|
ä¸è¦æ±è½¬æ¢ï¼è¾å ¥ä¸è¾åºç±»åç¸å |
示ä¾
ä¸ä¾ç¤ºä¾ç¨å¨ codecvt<wchar_t, char, mbstate_t> å®ç° UTF-8 转æ¢çæ¬å°ç¯å¢è¯»å UTF-8 ç¯å¢ï¼å¹¶ç¨ std::codecvt çæ åç¹åè½¬æ¢ UTF-8 å符串为 UTF-16
#include <iostream> #include <fstream> #include <string> #include <locale> #include <iomanip> #include <codecvt>  // å·¥å ·å è£ å¨ï¼ç¨äºä¸º wstring/wbuffer éé ç»å®å° locale çå¹³é¢ template<class Facet> struct deletable_facet : Facet { template<class ...Args> deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {} ~deletable_facet() {} };  int main() { // UTF-8 çªå¤åèç¼ç std::string data = reinterpret_cast<const char*>(+u8"z\u00df\u6c34\U0001f34c"); // æ reinterpret_cast<const char*>(+u8"zÃæ°´????") // æ "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9f\x8d\x8c"  std::ofstream("text.txt") << data;  // 使ç¨ç³»ç»æä¾çæ¬å°ç¯å¢ç codecvt å¹³é¢ std::wifstream fin("text.txt"); // ä» wifstream ç读åå°ä½¿ç¨ codecvt<wchar_t, char, mbstate_t> // æ¤ locale ç codecvt ä» UTF-8 转æ¢å° UCS4 ï¼å¨å¦ Linux çç³»ç»ä¸ï¼ fin.imbue(std::locale("en_US.UTF-8")); std::cout << "The UTF-8 file contains the following UCS4 code points: \n"; for (wchar_t c; fin >> c; ) std::cout << "U+" << std::hex << std::setw(4) << std::setfill('0') << c << '\n';  // ä½¿ç¨æ åï¼æ¬å°ç¯å¢æ å ³ï¼ codecvt å¹³é¢ std::wstring_convert< deletable_facet<std::codecvt<char16_t, char, std::mbstate_t>>, char16_t> conv16; std::u16string str16 = conv16.from_bytes(data);  std::cout << "The UTF-8 file contains the following UTF-16 code points: \n"; for (char16_t c : str16) std::cout << "U+" << std::hex << std::setw(4) << std::setfill('0') << c << '\n'; }
è¾åºï¼
The UTF-8 file contains the following UCS4 code points: U+007a U+00df U+6c34 U+1f34c The UTF-8 file contains the following UTF-16 code points: U+007a U+00df U+6c34 U+d83c U+df4c
åé
| åç¬¦è½¬æ¢ | æ¬å°ç¯å¢å®ä¹å¤åè (UTF-8, GB18030) |
UTF-8 |
UTF-16 |
|---|---|---|---|
| UTF-16 | mbrtoc16 / c16rtomb(æ C11 ç DR488) | codecvt<char16_t, char, mbstate_t> codecvt_utf8_utf16<char16_t> codecvt_utf8_utf16<char32_t> codecvt_utf8_utf16<wchar_t> |
N/A |
| UCS2 | c16rtomb(æ C11 ç DR488) | codecvt_utf8<char16_t> codecvt_utf8<wchar_t>(Windows) |
codecvt_utf16<char16_t> codecvt_utf16<wchar_t>(Windows) |
| UTF-32 |
codecvt<char32_t, char, mbstate_t> |
codecvt_utf16<char32_t> | |
| ç³»ç»å®½ UTF-32(é Windows) UCS2(Windows) |
mbsrtowcs / wcsrtombs |
æ | æ |
| å®ä¹å符转æ¢é误 (类模æ¿) | |
| 为å
·åæ¬å°ç¯å¢æé codecvt å¹³é¢ (类模æ¿) | |
| (C++11)(C++17 ä¸å¼ç¨) |
å¨ UTF-8 ä¸ UCS2/UCS4 é´è½¬æ¢ (类模æ¿) |
| (C++11)(C++17 ä¸å¼ç¨) |
å¨ UTF-16 ä¸ UCS2/UCS4 é´è½¬æ¢ (类模æ¿) |
| (C++11)(C++17 ä¸å¼ç¨) |
å¨ UTF-8 ä¸ UTF-16 é´è½¬æ¢ (类模æ¿) |