-
Notifications
You must be signed in to change notification settings - Fork 35
Handle concurrent binary downloads using file locks #656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /** | ||
| * Unified handler for any binary-related failure. | ||
| * Checks for existing or old binaries and prompts user once. | ||
| */ | ||
| private async handleAnyBinaryFailure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is too much of a defensive programming, essentially we try to see if there is a binary and try to just execute that, otherwise, we search for an .old-* binary and try to use it
| bytesDownloaded: 1500, | ||
| totalBytes: 10000, | ||
| status: "downloading", | ||
| timestamp: Date.now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export interface DownloadProgress {
bytesDownloaded: number;
totalBytes: number | null;
status: "downloading" | "verifying";
}
shouldn't we add timestamp here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I failed to update the tests, in the first iteration I had a timestamp in there but it wasn't used since now we depend on the lock file staleness (which is handled by proper-lockfile)
src/core/cliUtils.ts
Outdated
| const stats = await Promise.all( | ||
| oldBinaries.map(async (f) => ({ | ||
| path: f, | ||
| mtime: (await fs.stat(f)).mtime, | ||
| })), | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to use Promise.allSettled() here instead so we don't accidentally lose the entire dataset if fs.stat(f) were to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good idea, we could at least attempt to run one of them, I've replaced this with:
const stats = await Promise.allSettled(
oldBinaries.map(async (f) => ({
path: f,
mtime: (await fs.stat(f)).mtime,
})),
).then((result) =>
result
.filter((promise) => promise.status === "fulfilled")
.map((promise) => promise.value),
);We could potentially log here for file that could not be read, similar to the rmOld 🤔
mafredri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against the file locking implementation used here, but asking out of curiosity: would IPC communication between VS Code windows have been an option here, like with the login prompt?
Thanks for working on this ❤️, concurrent downloads has been a pain-point for me!
src/core/binaryLock.ts
Outdated
| const release = await this.safeAcquireLock(binPath); | ||
| if (release) { | ||
| clearInterval(interval); | ||
| this.output.debug("Download completed by another process"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we acquire the lock, could it also mean that the other process failed to download? Do we need to handle that case separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you are right yeah, if this was a takeover it could mean that the other process was stuck. The logic that follows is correct though since we essentially recheck if we have the right binary and attempt to download if need be.
We even log right after "Acquired download lock". So I'll just remove this logging here.
| ); | ||
| if (existingCheck.version) { | ||
| // Perfect match - use without prompting | ||
| if (existingCheck.matches) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is true, why did we ever try to download and fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be just a lot of defensive programming, see https://github.com/coder/vscode-coder/pull/656/files#r2538524344
We could have encountered an error at any stage here and another process download the binary
| case 304: { | ||
| this.output.info("Using existing binary since server returned a 304"); | ||
| // Version mismatch - prompt user | ||
| if (await this.promptUseExistingBinary(existingCheck.version, message)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I misread, this prompt says Run/Exit. If I select Exit I would not expect old binary to be used. Consider changing terminology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, yeah if you click "Cancel" you might get another prompt with an even older version. We should perhaps show this prompt once only for the first match (whether it's binPath or an old binary).
I added throw error; so that if binPath exists we never even attempt to read old binaries.
| (oldCheck.matches || | ||
| (await this.promptUseExistingBinary(oldCheck.version, message))) | ||
| ) { | ||
| await fs.rename(oldBinaries[0], binPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume binPath exists already, but the existing check did not match, and the user selected Exit in prompt. Then we fallback to old binary and ask the user again. Now the rename may fail and we throw an error.
Why do we have to rename vs using the old binary path as-is? Too many hard-coded references to the non-old path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow here is a bit sketchy I agree, I've made such that if you reject binPath then we do not attempt to read old binaries since to me that makes even less sense... (if you don't want to run the most up-to-date binary, why would you run an older one?)
See https://github.com/coder/vscode-coder/pull/656/files#r2545981797
We rename because a lot of the logic depends on the proper name like removing old binaries but not touching binPath
| ); | ||
| if ( | ||
| oldCheck.version && | ||
| (oldCheck.matches || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only try one old binary when we are tracking all of them?
I'm trying to think about when this condition actually might happen. I.e. we have an old binary and it's the right version. This means you either downgraded coderd or switched deployment. In either case, any of the old binaries may be the correct one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also clear old binaries when we download new ones so it's unlikely that we have multiple (it could happen if we never remove them because of some errors). Different deployments don't apply here since each deployment has it's own folder, but yes if you downgrade then getting the most recent one might not matter. Should we just get the first match whatever it is?
| binPath + ".temp-" + Math.random().toString(36).substring(8); | ||
|
|
||
| try { | ||
| const removed = await cliUtils.rmOld(binPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove the old before our download has completed successfully? Is this to ensure that we don't try to download in case updating the binary would fail (e.g. in use on Windows)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we remove old binaries only, we still keep the proper binary (binPath). Old binaries should have been removed at the end of the previous download but are done here because there's less conflicts (this was the case already from before).
For example this is what might the folder look like:
coder-linuxcoder-linux.old-123
So we remove the "old" binaries only and keep the most recent one if the download fails for example..
| this.output.info("Using existing binary since server returned a 304"); | ||
| return binPath; | ||
| } | ||
| case 404: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, this could also be caused by a poorly configured reverse proxy. I'm just raising a thought, don't think we need to handle it here. There's no easy way to check this either (e.g. no https://dev.coder.com/bin/.exists)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, do keep in mind that their Coder logs would contain the errored request on the error level. So potentially they can quickly identify this using the logs 🤔
@mafredri We could maybe use this for the progress monitoring but we definitely need a lock file since that would make crashes/staleness/stuck handling much simpler. Libraries like |
6d67d87 to
f6d76c8
Compare
mtojek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments! As long as other reviewers approve, you're good to go 👍
mafredri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to add, thanks for amending/answering my comments 👍🏻
Closes #575