Source: https://timsong-cpp.github.io/cppwp/n3337/lex.string
List of Tables [tab] List of Figures [fig] 1 General [intro] 2 Lexical conventions [lex] 2.1 Separate translation [lex.separate] 2.2 Phases of translation [lex.phases] 2.3 Character sets [lex.charset] 2.4 Trigraph sequences [lex.trigraph] 2.5 Preprocessing tokens [lex.pptoken] 2.6 Alternative tokens [lex.digraph] 2.7 Tokens [lex.token] 2.8 Comments [lex.comment] 2.9 Header names [lex.header] 2.10 Preprocessing numbers [lex.ppnumber] 2.11 Identifiers [lex.name] 2.12 Keywords [lex.key] 2.13 Operators and punctuators [lex.operators] 2.14 Literals [lex.literal] 2.14.1 Kinds of literals [lex.literal.kinds] 2.14.2 Integer literals [lex.icon] 2.14.3 Character literals [lex.ccon] 2.14.4 Floating literals [lex.fcon] 2.14.5 String literals [lex.string] 2.14.6 Boolean literals [lex.bool] 2.14.7 Pointer literals [lex.nullptr] 2.14.8 User-defined literals [lex.ext]
string-literal:
encoding-prefix: u8 u U L
s-char-sequence:
s-char:
raw-string:
r-char-sequence:
r-char:
d-char-sequence:
d-char:
R
u8
u8R
u
uR
U
UR
L
LR
"..."
R"(...)"
u8"..."
u8R"**(...)**"
u"..."
uR"*~(...)*~"
U"..."
UR"zzz(...)zzz"
L"..."
LR"(...)"
'('
')'
R"delimiter((a|b))delimiter"
"(a|b)"
const char *p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
R"a( )\ a" )a"
is equivalent to "\n)\\\na\"\n". The raw string
"\n)\\\na\"\n"
R"(??)"
is equivalent to "\?\?". The raw string
"\?\?"
R"#( )??=" )#"
"\n)\?\?=\"\n"
u8"asdf"
const char
u"asdf"
char16_t
const char16_t
U"asdf"
char32_t
const char32_t
L"asdf"
const wchar_t
u"a"
u"b"
u"ab"
U"a"
U"b"
U"ab"
L"a"
L"b"
L"ab"
"b"
"a"
Characters in concatenated strings are kept distinct.
"\xA" "B"
'\xA'
'B'
'\xAB'
'\0'
'
\'
"
\
char
U'\0'
L'\0'
u'\0'
0x0
0x10FFFF
Source: https://timsong-cpp.github.io/cppwp/n4140/lex.string
const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
Source: https://timsong-cpp.github.io/cppwp/n4659/lex.string
List of Tables [tab] List of Figures [fig] 1 Scope [intro.scope] 2 Normative references [intro.refs] 3 Terms and definitions [intro.defs] 4 General principles [intro] 5 Lexical conventions [lex] 5.1 Separate translation [lex.separate] 5.2 Phases of translation [lex.phases] 5.3 Character sets [lex.charset] 5.4 Preprocessing tokens [lex.pptoken] 5.5 Alternative tokens [lex.digraph] 5.6 Tokens [lex.token] 5.7 Comments [lex.comment] 5.8 Header names [lex.header] 5.9 Preprocessing numbers [lex.ppnumber] 5.10 Identifiers [lex.name] 5.11 Keywords [lex.key] 5.12 Operators and punctuators [lex.operators] 5.13 Literals [lex.literal] 5.13.1 Kinds of literals [lex.literal.kinds] 5.13.2 Integer literals [lex.icon] 5.13.3 Character literals [lex.ccon] 5.13.4 Floating literals [lex.fcon] 5.13.5 String literals [lex.string] 5.13.6 Boolean literals [lex.bool] 5.13.7 Pointer literals [lex.nullptr] 5.13.8 User-defined literals [lex.ext]
in the prefix is a raw string literal. The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence. A d-char-sequence shall consist of at most 16 characters.
Source: https://timsong-cpp.github.io/cppwp/n4868/lex.string
1 Scope [intro.scope] 2 Normative references [intro.refs] 3 Terms and definitions [intro.defs] 4 General principles [intro] 5 Lexical conventions [lex] 5.1 Separate translation [lex.separate] 5.2 Phases of translation [lex.phases] 5.3 Character sets [lex.charset] 5.4 Preprocessing tokens [lex.pptoken] 5.5 Alternative tokens [lex.digraph] 5.6 Tokens [lex.token] 5.7 Comments [lex.comment] 5.8 Header names [lex.header] 5.9 Preprocessing numbers [lex.ppnumber] 5.10 Identifiers [lex.name] 5.11 Keywords [lex.key] 5.12 Operators and punctuators [lex.operators] 5.13 Literals [lex.literal] 5.13.1 Kinds of literals [lex.literal.kinds] 5.13.2 Integer literals [lex.icon] 5.13.3 Character literals [lex.ccon] 5.13.4 Floating-point literals [lex.fcon] 5.13.5 String literals [lex.string] 5.13.6 Boolean literals [lex.bool] 5.13.7 Pointer literals [lex.nullptr] 5.13.8 User-defined literals [lex.ext]
R"(x = "\"y\"")"
"x = \"\\\"y\\\"\""
const char8_t
NoteA single c-char may produce more than one char16_t character in the form of surrogate pairs. A surrogate pair is a representation for a single code point as a sequence of two 16-bit code units.
const
wchar_t”, where n is the size of the string as defined below; it is initialized with the given characters.
wchar_t
NoteThis concatenation is an interpretation, not a conversion. Because the interpretation happens in translation phase 6 (after each character from a string-literal has been translated into a value from the appropriate character set), a string-literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation. Table 11 has some examples of valid concatenations.
char8_t
NoteThe size of a char16_t string literal is the number of code units, not the number of characters.
NoteAny universal-character-names are required to correspond to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal) ([lex.charset]). The size of a narrow string literal is the total number of escape sequences and other characters, plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating '\0'.
NoteThe effect of attempting to modify a string-literal is undefined.
Source: https://timsong-cpp.github.io/cppwp/n4950/lex.string
basic-s-char:
"ordinary string"
R"(ordinary raw string)"
L"wide string"
LR"w(wide raw string)w"
u8"UTF-8 string"
u8R"x(UTF-8 raw string)x"
u"UTF-16 string"
uR"y(UTF-16 raw string)y"
U"UTF-32 string"
UR"z(UTF-32 raw string)z"
NoteA string-literal's rawness has no effect on the determination of the common encoding-prefix.
R"(\u00)" "41"
1
'A'
NoteThe effect of attempting to modify a string literal object is undefined.
The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence using the string-literal's associated character encoding. If a character lacks representation in the associated character encoding, then the string-literal is conditionally-supported and an implementation-defined code unit sequence is encoded. NoteNo character lacks representation in any Unicode encoding form. When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence. NoteThe encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently.
Each numeric-escape-sequence ([lex.ccon]) contributes a single code unit with a value as follows:
Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
If v does not exceed the range of representable values of the string-literal's array element type, then the value is v.
Otherwise, if the string-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the string-literal's array element type, then the value is the unique value of the string-literal's array element type T that is congruent to v modulo 2N, where N is the width of T.
T
Otherwise, the string-literal is ill-formed.
When encoding a stateful character encoding, these sequences should have no effect on encoding state.
Each conditional-escape-sequence ([lex.ccon]) contributes an implementation-defined code unit sequence. When encoding a stateful character encoding, it is implementation-defined what effect these sequences have on encoding state.