Skip to content

Conversation

@bitdancer
Copy link
Member

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that. However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were successfully decoded. The fix is
simple: do the unknown-8bit encoding using the utf-8 codec. This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.

@bitdancer bitdancer requested a review from a team as a code owner December 10, 2025 14:50
@bitdancer bitdancer self-assigned this Dec 10, 2025
The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
@bitdancer bitdancer force-pushed the undecodable_encoded_words branch from 4ae90b4 to 1bba134 Compare December 10, 2025 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant