Skip to content

Commit fd9ffcc

Browse files
committed
Correctly fold unknown-8bit originating from encoded words.
The unknown-8bit trick was designed to deal with unknown bytes in an ASCII message, and it works fine for that. However, I also tried to extend it to handle bytes that can't be decoded using the charset specified in an encoded word, and there it fails because there can be other non-ASCII characters that were *successfully* decoded. The fix is simple: do the unknown-8bit encoding using the utf-8 codec. This is especially appropriate since anyone trying to do recovery on an unknown byte string will probably attempt utf-8 first.
1 parent dcac498 commit fd9ffcc

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

Lib/email/_encoded_words.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ def encode(string, charset='utf-8', encoding=None, lang=''):
219219
220220
"""
221221
if charset == 'unknown-8bit':
222-
bstring = string.encode('ascii', 'surrogateescape')
222+
bstring = string.encode('utf-8', 'surrogateescape')
223223
else:
224224
bstring = string.encode(charset)
225225
if encoding is None:

Lib/test/test_email/test__header_value_parser.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3255,5 +3255,12 @@ def test_long_filename_attachment(self):
32553255
" filename*1*=_TEST_TES.txt\n",
32563256
)
32573257

3258+
def test_encoded_word_with_undecodable_bytes(self):
3259+
self._test(parser.get_address_list(
3260+
' =?utf-8?Q?=E5=AE=A2=E6=88=B6=E6=AD=A3=E8=A6=8F=E4=BA=A4=E7?='
3261+
)[0],
3262+
' =?unknown-8bit?b?5a6i5oi25q2j6KaP5Lqk5w==?=\n',
3263+
)
3264+
32583265
if __name__ == '__main__':
32593266
unittest.main()

0 commit comments

Comments
 (0)