Skip to content

Don't abort scan on non-UTF-8 module dictionary keys (#273)#280

Open
arpitjain099 wants to merge 1 commit into
VirusTotal:masterfrom
arpitjain099:chore/scan-nonutf8-module-dict
Open

Don't abort scan on non-UTF-8 module dictionary keys (#273)#280
arpitjain099 wants to merge 1 commit into
VirusTotal:masterfrom
arpitjain099:chore/scan-nonutf8-module-dict

Conversation

@arpitjain099

Copy link
Copy Markdown

A scan aborts when a module dictionary has a key that is not valid UTF-8 (issue #273). This shows up with the pe module, whose version_info keys come straight from the binary and can be arbitrary bytes. convert_dictionary_to_python builds each key with PyDict_SetItemString, which decodes strictly as UTF-8, so a bad key raises UnicodeDecodeError. That propagates out of the modules_callback as a SystemError and kills the whole scan.

This builds the key with PyUnicode_DecodeUTF8(..., "replace") and the key length instead, so invalid bytes become U+FFFD rather than raising, embedded NULs are handled, and keys stay str (no breaking change). Valid keys are unchanged.

I verified locally by adding a non-UTF-8 key to the bundled tests module: unpatched it reproduces the SystemError from the issue, and with the fix the scan completes with {"foo": b"foo", "bar": b"bar", "k�": b"..."}. The full suite still passes (47 tests). Happy to add a fixture-based regression test if you would prefer one, though the natural place for it is the yara submodule test module.

convert_dictionary_to_python used PyDict_SetItemString, which decodes the
key as strict UTF-8. YARA module dictionary keys are SIZED_STRING values
holding arbitrary bytes (for instance pe.version_info keys read straight
from the binary), so a non-UTF-8 key raised UnicodeDecodeError. That
exception propagated out of the modules_callback as a SystemError and
aborted the whole scan.

Build the key explicitly with a tolerant decoder (PyUnicode_DecodeUTF8
with the 'replace' handler) and use the key length so embedded NULs are
handled too. Keys stay str, so this is not a breaking change: valid keys
are unchanged and invalid bytes become U+FFFD instead of aborting.

Fixes VirusTotal#273

Signed-off-by: Arpit Jain <arpitjain099@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant