Skip to content

Return split db-discovery reports as zip bytes#18

Open
jiatolentino wants to merge 1 commit into
datamasque:mainfrom
jiatolentino:db-report-split-zip
Open

Return split db-discovery reports as zip bytes#18
jiatolentino wants to merge 1 commit into
datamasque:mainfrom
jiatolentino:db-report-split-zip

Conversation

@jiatolentino

@jiatolentino jiatolentino commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

get_db_discovery_result_report now returns the raw bytes when the Admin Server splits a large discovery report into a zip, instead of decoding the binary as text (which corrupted the archive).

Background

The Admin Server now auto-splits discovery reports above ~1M rows into a zip of numbered CSV parts (see datamasque DM-3165). The endpoint returns application/zip with an X-DM-Download-Format: zip header in that case.

What changed

  • get_db_discovery_result_report(...) return type is now Union[str, bytes]:
    • normal CSV report → str (unchanged),
    • split report → bytes (the zip), detected via the X-DM-Download-Format: zip response header.
  • Both forms can be passed straight to start_async_ruleset_generation_from_csv, which already accepts a CSV or a zip of CSV parts.

Testing

  • pytest — added a test asserting zip bytes are returned (not decoded) when the header is present.
  • ruff, ruff format, mypy clean.

Compatibility

  • Backward compatible against an older server: with no X-DM-Download-Format: zip header it returns str, exactly as before.
  • Requires the Admin Server DM-3165 change to actually emit a zip.

@jiatolentino jiatolentino self-assigned this Jun 17, 2026
@jiatolentino jiatolentino requested a review from alxboyle June 18, 2026 02:13
@jiatolentino jiatolentino marked this pull request as ready for review June 18, 2026 02:13
@jiatolentino jiatolentino requested review from carlosfunk and removed request for alxboyle June 22, 2026 22:01

@carlosfunk carlosfunk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_async_ruleset_generation_from_csv hard-codes the filename to ruleset.csv despite the contents. Suggest a small update:

elif isinstance(csv_content, bytes):
    content = BytesIO(csv_content)
    if csv_content[:4] == b"PK\x03\x04":
        filename = "ruleset.zip"

@jiatolentino jiatolentino force-pushed the db-report-split-zip branch from c03845e to 6968215 Compare June 22, 2026 23:42
@jiatolentino

Copy link
Copy Markdown
Contributor Author

@carlosfunk

start_async_ruleset_generation_from_csv hard-codes the filename to ruleset.csv despite the contents. Suggest a small update:

elif isinstance(csv_content, bytes):
    content = BytesIO(csv_content)
    if csv_content[:4] == b"PK\x03\x04":
        filename = "ruleset.zip"

Fixed. It now detects the zip magic (PK\x03\x04) on bytes input and uploads as ruleset.zip with content_type="application/zip" (kept ruleset.csv/text/csv for everything else). Added a test covering the zip case.

@jiatolentino jiatolentino requested a review from carlosfunk June 22, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants