Skip to content

Fix: in case of filesystem corruption, prevent lookup soft lockup#91

Open
EricKim27 wants to merge 4 commits into
sysprog21:masterfrom
EricKim27:master
Open

Fix: in case of filesystem corruption, prevent lookup soft lockup#91
EricKim27 wants to merge 4 commits into
sysprog21:masterfrom
EricKim27:master

Conversation

@EricKim27

@EricKim27 EricKim27 commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

This is a new fix that continues from pull request #88

In this code, it ensures in case of nr_blk == 0, the system does not soft lockup, and return -EUCLEAN, because nr_blk == 0 means the filesystem is corrupted.


Summary by cubic

Prevent soft lockup during directory lookups by failing fast on corrupted entries. If nr_blk == 0, return -EUCLEAN; includes minor style cleanups like removing unnecessary brackets.

  • Bug Fixes
    • Add a guard in __file_lookup that checks !dblock->files[_fi].nr_blk and returns -EUCLEAN, annotated with unlikely() to optimize the edge-case path.

Written for commit 27bc38e. Summary will update on new commits.

Review in cubic

@RoyWFHuang

RoyWFHuang commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

As I stated in PR88,

I believe we don't need this check. From an algorithmic perspective, dblock->files[_fi].nr_blk = 0 shouldn't appear in this function. If it does, it indicates we need to address other issues.

The nr_blk field is designed to speed up reads and writes . If a check is required for every read and write operation, it defeats the original design purpose.

Therefore, if you encounter an issue with dblock->files[_fi].nr_blk = 0, you can ask this question and describe how you triggered it.

@EricKim27

EricKim27 commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

@RoyWFHuang
I agree that nr_blk == 0 should never occur and your fix would prevent any of this from happening.

But this error I encountered caused a soft lockup due to __file_lookup looping indefinitely, and I think it is better to return -EUCLEAN rather than allowing it to loop indefinitely.

It is not that I think nr_blk being 0 is what happens on normal operation, but if a corrupted metadata reaches this path, whether it be a bug in the future or a corrupted image, the current __file_lookup may loop forever, and my proposed check makes it a filesystem error that is visible on userspace.

I do understand the concern about preserving invariants and keeping lookup without any unnecessary checks for impossible on a regular filesystem(which is why I added unlikely() to the context). I just wanted to explain why I considered the guarding mechanism to be necessary even after the root cause was remedied.

@EricKim27

Copy link
Copy Markdown
Contributor Author

Following the discussion, I documented the investigation that led to identifying the root cause found on pull request #90

Posting it here as a supplementary documentation in case it is useful for future debugging, maintenance, or anyone curious.

link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants