Don't abort scan on non-UTF-8 module dictionary keys (#273)#280
Open
arpitjain099 wants to merge 1 commit into
Open
Don't abort scan on non-UTF-8 module dictionary keys (#273)#280arpitjain099 wants to merge 1 commit into
arpitjain099 wants to merge 1 commit into
Conversation
convert_dictionary_to_python used PyDict_SetItemString, which decodes the key as strict UTF-8. YARA module dictionary keys are SIZED_STRING values holding arbitrary bytes (for instance pe.version_info keys read straight from the binary), so a non-UTF-8 key raised UnicodeDecodeError. That exception propagated out of the modules_callback as a SystemError and aborted the whole scan. Build the key explicitly with a tolerant decoder (PyUnicode_DecodeUTF8 with the 'replace' handler) and use the key length so embedded NULs are handled too. Keys stay str, so this is not a breaking change: valid keys are unchanged and invalid bytes become U+FFFD instead of aborting. Fixes VirusTotal#273 Signed-off-by: Arpit Jain <arpitjain099@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A scan aborts when a module dictionary has a key that is not valid UTF-8 (issue #273). This shows up with the pe module, whose version_info keys come straight from the binary and can be arbitrary bytes.
convert_dictionary_to_pythonbuilds each key withPyDict_SetItemString, which decodes strictly as UTF-8, so a bad key raisesUnicodeDecodeError. That propagates out of the modules_callback as aSystemErrorand kills the whole scan.This builds the key with
PyUnicode_DecodeUTF8(..., "replace")and the key length instead, so invalid bytes become U+FFFD rather than raising, embedded NULs are handled, and keys staystr(no breaking change). Valid keys are unchanged.I verified locally by adding a non-UTF-8 key to the bundled tests module: unpatched it reproduces the SystemError from the issue, and with the fix the scan completes with
{"foo": b"foo", "bar": b"bar", "k�": b"..."}. The full suite still passes (47 tests). Happy to add a fixture-based regression test if you would prefer one, though the natural place for it is the yara submodule test module.