Skip to content

fix: exclude wiped dataclips from work order body search#4889

Open
mvanhorn wants to merge 2 commits into
OpenFn:mainfrom
mvanhorn:fix/4824-clear-search-vector-on-wipe
Open

fix: exclude wiped dataclips from work order body search#4889
mvanhorn wants to merge 2 commits into
OpenFn:mainfrom
mvanhorn:fix/4824-clear-search-vector-on-wipe

Conversation

@mvanhorn

Copy link
Copy Markdown

Description

This PR fixes a data-retention hole where wiped dataclips remained searchable by their erased content.

When a dataclip is wiped (per-run wipe with save_dataclips: false, or the project data-retention job), its body/request are cleared to NULL and wiped_at is stamped, but its full-text search_vector column is left intact (the indexing trigger/worker only runs on insert, never on the wipe UPDATE). The work order body search matches that stale vector with no wiped_at guard, so a wiped dataclip stays searchable by the exact content that was meant to be erased.

The fix adds an is_nil(input_dataclip.wiped_at) guard to the :body branch of Invocation.build_search_fields_where/2, ANDed onto the full-text match. This targets the :input_dataclip binding that body search already uses, so erased dataclip content is no longer discoverable regardless of any stale search_vector, and other search fields (id / log / dataclip_name / status) are unaffected.

This mirrors the read-side guard already present in search_workorders_for_retry/2 via exclude_wiped_dataclips/1, but applied at the precise binding the body full-text predicate matches rather than the work order's own dataclip.

Closes #4824

Validation steps

  1. Create a work order whose input dataclip body contains a distinctive token, and confirm body search (search_fields: ["body"]) returns it.
  2. Wipe the dataclip (Lightning.Runs.wipe_dataclips/1, or the project data-retention job).
  3. Search again by the same token with the body filter and confirm it no longer appears, even though the dataclip's search_vector is still populated.
  4. Confirm a separate, non-wiped dataclip carrying the same token still matches (no over-filtering).

The two new tests in test/lightning/invocation_test.exs (describe "search_workorders/3") cover the regression and the non-wiped match case, including a positive control proving the body vector is populated before the negative assertion.

Additional notes for the reviewer

  1. The guard is read-side only and additive to the body predicate; it does not change the wipe path, the search_vector workers, or any migration.
  2. An earlier approach that also nulled search_vector in Query.wipe_dataclips/1 was dropped: search_vector is not a declared schema field (it is managed by raw-SQL workers), so a schema update_all referencing it would not be valid.

AI Usage

Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):

  • I have used Claude Code
  • I have used another model
  • I have not used AI

You can read more details in our
Responsible AI Policy

Pre-submission checklist

  • I have performed an AI review of my code (we recommend using /review
    with Claude Code)
  • I have implemented and tested all related authorization policies.
    (e.g., :owner, :admin, :editor, :viewer) — n/a, this is a
    read-side search filter with no authorization surface.
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

When a dataclip is wiped (per-run wipe or the project data-retention job),
its body/request are cleared and wiped_at is stamped, but its full-text
search_vector is left intact, so wiped dataclips stayed searchable by the
exact erased content via work order body search.

Guard the body full-text match on is_nil(input_dataclip.wiped_at) so erased
dataclip content is no longer discoverable, regardless of any stale
search_vector. The guard targets the input_dataclip binding the body search
already uses, so it does not affect other search fields.

Fixes OpenFn#4824
The raw Repo.query! binding passed the dataclip id as a 36-char string to a
$1::uuid parameter, which Postgrex rejects (expects a 16-byte binary). Wrap it
in Ecto.UUID.dump!/1, matching the existing pattern in runs_test.exs.
@mvanhorn

Copy link
Copy Markdown
Author

Fixed the test_elixir failure: my new test passed the dataclip id as a string to a $1::uuid raw query parameter, which Postgrex rejects (it wants a 16-byte binary). Wrapped it in Ecto.UUID.dump!/1, matching the existing pattern in runs_test.exs. Verified locally against Elixir 1.18.3-otp-27 + Postgres - the test now passes.

Heads-up on the lint job: it's failing on deps.audit and hex.audit (format, credo, and sobelow all pass). Those are dependency-level audit findings unrelated to this PR, which only touches invocation.ex and its test, so they're likely pre-existing on main. Let me know if you'd like me to look at the flagged dependencies separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

Wiped dataclips remain searchable — search_vector not cleared on wipe

1 participant