(core) be careful when reassigning a doc to a worker it was on before

Summary:
Importing a .grist document is implemented in a somewhat clunky way, in a multi-worker setup.

 * First a random worker receives the upload, and updates Grist's various stores appropriately (database, redis, s3).
 * Then a random worker is assigned to serve the document.

If the worker serving the document fails, there is a chance the it will end up assigned to the worker that handled its upload. Currently the worker will misbehave in this case. This diff:

 * Ports a multi-worker test from test/home to run in test/s3, and adds a test simulating a bad scenario seen in the wild.
 * Fixes persistence of any existing document checksum in redis when a worker is assigned.
 * Adds a check when assigned a document to serve, and finding that document already cached locally. It isn't safe to rely only on the document checksum in redis, since that may have expired.
 * Explicitly claims the document on the uploading worker, so this situation becomes even less likely to arise.

Test Plan: added test

Reviewers: dsagal

Reviewed By: dsagal

Subscribers: dsagal

Differential Revision: https://phab.getgrist.com/D3305
This commit is contained in:
Paul Fitzpatrick
2022-03-07 09:27:43 -05:00
parent 321019217d
commit c4d3d7d3bb
9 changed files with 137 additions and 48 deletions

View File

@@ -137,23 +137,27 @@ export class TestingHooks implements ITestingHooks {
public async setDocWorkerActivation(workerId: string, active: 'active'|'inactive'|'crash'):
Promise<void> {
log.info("TestingHooks.setDocWorkerActivation called with", workerId, active);
for (const server of this._workerServers) {
if (server.worker.id === workerId || server.worker.publicUrl === workerId) {
switch (active) {
case 'active':
await server.restartListening();
break;
case 'inactive':
await server.stopListening();
break;
case 'crash':
await server.stopListening('crash');
break;
}
return;
}
const matches = this._workerServers.filter(
server => server.worker.id === workerId ||
server.worker.publicUrl === workerId ||
(server.worker.publicUrl.startsWith('http://localhost:') &&
workerId.startsWith('http://localhost:') &&
new URL(server.worker.publicUrl).host === new URL(workerId).host));
if (matches.length !== 1) {
throw new Error(`could not find worker: ${workerId}`);
}
const server = matches[0];
switch (active) {
case 'active':
await server.restartListening();
break;
case 'inactive':
await server.stopListening();
break;
case 'crash':
await server.stopListening('crash');
break;
}
throw new Error(`could not find worker: ${workerId}`);
}
public async flushAuthorizerCache(): Promise<void> {
@@ -164,6 +168,13 @@ export class TestingHooks implements ITestingHooks {
}
}
public async flushDocs(): Promise<void> {
log.info("TestingHooks.flushDocs called");
for (const server of this._workerServers) {
await server.testFlushDocs();
}
}
// Returns a Map from docId to number of connected clients for all open docs across servers,
// but represented as an array of pairs, to be serializable.
public async getDocClientCounts(): Promise<Array<[string, number]>> {