You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
gristlabs_grist-core/app/gen-server/lib/DocApiForwarder.ts

142 lines
6.0 KiB

import * as express from "express";
import fetch, { RequestInit } from 'node-fetch';
(core) For exporting XLSX, do it memory-efficiently in a worker thread. Summary: - Excel exports were awfully memory-inefficient, causing occasional docWorker crashes. The fix is to use the "streaming writer" option of ExcelJS https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically on one example, max memory went down from 3G to 100M) - It's also CPU intensive and synchronous, and can block node for tens of seconds. The fix is to use a worker-thread. This diff uses "piscina" library for a pool of threads. - Additionally, adds ProcessMonitor that logs memory and cpu usage, particularly when those change significantly. - Also introduces request cancellation, so that a long download cancelled by the user will cancel the work being done in the worker thread. Test Plan: Updated previous export tests; memory and CPU performance tested manually by watching output of ProcessMonitor. Difference visible in these log excerpts: Before (total time to serve request 22 sec): ``` Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005 Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002 ``` After (total time to server request 18 sec): ``` Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001 Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001 ``` Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time. Reviewers: jarek Reviewed By: jarek Differential Revision: https://phab.getgrist.com/D3906
1 year ago
import {AbortController} from 'node-abort-controller';
import { ApiError } from 'app/common/ApiError';
(core) add initial support for special shares Summary: This gives a mechanism for controlling access control within a document that is distinct from (though implemented with the same machinery as) granular access rules. It was hard to find a good way to insert this that didn't dissolve in a soup of complications, so here's what I went with: * When reading rules, if there are shares, extra rules are added. * If there are shares, all rules are made conditional on a "ShareRef" user property. * "ShareRef" is null when a doc is accessed in normal way, and the row id of a share when accessed via a share. There's no UI for controlling shares (George is working on it for forms), but you can do it by editing a `_grist_Shares` table in a document. Suppose you make a fresh document with a single page/table/widget, then to create an empty share you can do: ``` gristDocPageModel.gristDoc.get().docData.sendAction(['AddRecord', '_grist_Shares', null, {linkId: 'xyz', options: '{"publish": true}'}]) ``` If you look at the home db now there should be something in the `shares` table: ``` $ sqlite3 -table landing.db "select * from shares" +----+------------------------+------------------------+--------------+---------+ | id | key | doc_id | link_id | options | +----+------------------------+------------------------+--------------+---------+ | 1 | gSL4g38PsyautLHnjmXh2K | 4qYuace1xP2CTcPunFdtan | xyz | ... | +----+------------------------+------------------------+--------------+---------+ ``` If you take the key from that (gSL4g38PsyautLHnjmXh2K in this case) and replace the document's urlId in its URL with `s.<key>` (in this case `s.gSL4g38PsyautLHnjmXh2K` then you can use the regular document landing page (it will be quite blank initially) or API endpoint via the share. E.g. for me `http://localhost:8080/o/docs/s0gSL4g38PsyautLHnjmXh2K/share-inter-3` accesses the doc. To actually share some material - useful commands: ``` gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Views_section').getRecords() gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Views_section', 1, {shareOptions: '{"publish": true, "form": true}'}]) gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Pages').getRecords() gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Pages', 1, {shareRef: 1}]) ``` For a share to be effective, at least one page needs to have its shareRef set to the rowId of the share, and at least one widget on one of those pages needs to have its shareOptions set to {"publish": "true", "form": "true"} (meaning turn on sharing, and include form sharing), and the share itself needs {"publish": true} on its options. I think special shares are kind of incompatible with public sharing, since by their nature (allowing access to all endpoints) they easily expose the docId, and changing that would be hard. Test Plan: tests added Reviewers: dsagal, georgegevoian Reviewed By: dsagal, georgegevoian Subscribers: jarek, dsagal Differential Revision: https://phab.getgrist.com/D4144
5 months ago
import { SHARE_KEY_PREFIX } from 'app/common/gristUrls';
import { removeTrailingSlash } from 'app/common/gutil';
import { HomeDBManager } from "app/gen-server/lib/HomeDBManager";
import { assertAccess, getOrSetDocAuth, getTransitiveHeaders, RequestWithLogin } from 'app/server/lib/Authorizer';
import { IDocWorkerMap } from "app/server/lib/DocWorkerMap";
import { expressWrap } from "app/server/lib/expressWrap";
import { GristServer } from "app/server/lib/GristServer";
import { getAssignmentId } from "app/server/lib/idUtils";
import { addAbortHandler } from "app/server/lib/requestUtils";
/**
* Forwards all /api/docs/:docId/tables requests to the doc worker handling the :docId document. Makes
* sure the user has at least view access to the document otherwise rejects the request. For
* performance reason we stream the body directly from the request, which requires that no-one reads
* the req before, in particular you should register DocApiForwarder before bodyParser.
*
* Use:
* const home = new ApiServer(false);
* const docApiForwarder = new DocApiForwarder(getDocWorkerMap(), home);
* app.use(docApiForwarder.getMiddleware());
*
* Note that it expects userId, and jsonErrorHandler middleware to be set up outside
* to apply to these routes.
*/
export class DocApiForwarder {
constructor(private _docWorkerMap: IDocWorkerMap, private _dbManager: HomeDBManager,
private _gristServer: GristServer) {
}
public addEndpoints(app: express.Application) {
(core) add initial support for special shares Summary: This gives a mechanism for controlling access control within a document that is distinct from (though implemented with the same machinery as) granular access rules. It was hard to find a good way to insert this that didn't dissolve in a soup of complications, so here's what I went with: * When reading rules, if there are shares, extra rules are added. * If there are shares, all rules are made conditional on a "ShareRef" user property. * "ShareRef" is null when a doc is accessed in normal way, and the row id of a share when accessed via a share. There's no UI for controlling shares (George is working on it for forms), but you can do it by editing a `_grist_Shares` table in a document. Suppose you make a fresh document with a single page/table/widget, then to create an empty share you can do: ``` gristDocPageModel.gristDoc.get().docData.sendAction(['AddRecord', '_grist_Shares', null, {linkId: 'xyz', options: '{"publish": true}'}]) ``` If you look at the home db now there should be something in the `shares` table: ``` $ sqlite3 -table landing.db "select * from shares" +----+------------------------+------------------------+--------------+---------+ | id | key | doc_id | link_id | options | +----+------------------------+------------------------+--------------+---------+ | 1 | gSL4g38PsyautLHnjmXh2K | 4qYuace1xP2CTcPunFdtan | xyz | ... | +----+------------------------+------------------------+--------------+---------+ ``` If you take the key from that (gSL4g38PsyautLHnjmXh2K in this case) and replace the document's urlId in its URL with `s.<key>` (in this case `s.gSL4g38PsyautLHnjmXh2K` then you can use the regular document landing page (it will be quite blank initially) or API endpoint via the share. E.g. for me `http://localhost:8080/o/docs/s0gSL4g38PsyautLHnjmXh2K/share-inter-3` accesses the doc. To actually share some material - useful commands: ``` gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Views_section').getRecords() gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Views_section', 1, {shareOptions: '{"publish": true, "form": true}'}]) gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Pages').getRecords() gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Pages', 1, {shareRef: 1}]) ``` For a share to be effective, at least one page needs to have its shareRef set to the rowId of the share, and at least one widget on one of those pages needs to have its shareOptions set to {"publish": "true", "form": "true"} (meaning turn on sharing, and include form sharing), and the share itself needs {"publish": true} on its options. I think special shares are kind of incompatible with public sharing, since by their nature (allowing access to all endpoints) they easily expose the docId, and changing that would be hard. Test Plan: tests added Reviewers: dsagal, georgegevoian Reviewed By: dsagal, georgegevoian Subscribers: jarek, dsagal Differential Revision: https://phab.getgrist.com/D4144
5 months ago
app.use((req, res, next) => {
if (req.url.startsWith('/api/s/')) {
req.url = req.url.replace('/api/s/', `/api/docs/${SHARE_KEY_PREFIX}`);
}
next();
});
// Middleware to forward a request about an existing document that user has access to.
// We do not check whether the document has been soft-deleted; that will be checked by
// the worker if needed.
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group Summary: In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that. * Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them. * These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`. * The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group. * An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments. The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that. For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical: * Set prod2 tag in docker to desired codebase for that document * Set desired_count for grist-docs-prod2 service to non-zero * Set doc-<docid>-group for that doc in redis to secondary * Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2 (If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff). Test Plan: added tests Reviewers: dsagal Reviewed By: dsagal Differential Revision: https://phab.getgrist.com/D2649
4 years ago
const withDoc = expressWrap(this._forwardToDocWorker.bind(this, true, 'viewers'));
// Middleware to forward a request without a pre-existing document (for imports/uploads).
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group Summary: In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that. * Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them. * These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`. * The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group. * An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments. The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that. For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical: * Set prod2 tag in docker to desired codebase for that document * Set desired_count for grist-docs-prod2 service to non-zero * Set doc-<docid>-group for that doc in redis to secondary * Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2 (If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff). Test Plan: added tests Reviewers: dsagal Reviewed By: dsagal Differential Revision: https://phab.getgrist.com/D2649
4 years ago
const withoutDoc = expressWrap(this._forwardToDocWorker.bind(this, false, null));
const withDocWithoutAuth = expressWrap(this._forwardToDocWorker.bind(this, true, null));
app.use('/api/docs/:docId/tables', withDoc);
app.use('/api/docs/:docId/force-reload', withDoc);
app.use('/api/docs/:docId/recover', withDoc);
app.use('/api/docs/:docId/remove', withDoc);
app.delete('/api/docs/:docId', withDoc);
app.use('/api/docs/:docId/download', withDoc);
app.use('/api/docs/:docId/send-to-drive', withDoc);
app.use('/api/docs/:docId/fork', withDoc);
app.use('/api/docs/:docId/create-fork', withDoc);
app.use('/api/docs/:docId/apply', withDoc);
app.use('/api/docs/:docId/attachments', withDoc);
app.use('/api/docs/:docId/snapshots', withDoc);
app.use('/api/docs/:docId/usersForViewAs', withDoc);
app.use('/api/docs/:docId/replace', withDoc);
app.use('/api/docs/:docId/flush', withDoc);
app.use('/api/docs/:docId/states', withDoc);
app.use('/api/docs/:docId/compare', withDoc);
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group Summary: In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that. * Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them. * These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`. * The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group. * An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments. The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that. For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical: * Set prod2 tag in docker to desired codebase for that document * Set desired_count for grist-docs-prod2 service to non-zero * Set doc-<docid>-group for that doc in redis to secondary * Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2 (If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff). Test Plan: added tests Reviewers: dsagal Reviewed By: dsagal Differential Revision: https://phab.getgrist.com/D2649
4 years ago
app.use('/api/docs/:docId/assign', withDocWithoutAuth);
app.use('/api/docs/:docId/webhooks/queue', withDoc);
app.use('/api/docs/:docId/webhooks', withDoc);
app.use('/api/docs/:docId/assistant', withDoc);
app.use('/api/docs/:docId/sql', withDoc);
app.use('/api/docs/:docId/forms/:vsId', withDoc);
app.use('^/api/docs$', withoutDoc);
}
private async _forwardToDocWorker(
withDocId: boolean, role: 'viewers'|null, req: express.Request, res: express.Response,
): Promise<void> {
let docId: string|null = null;
if (withDocId) {
const docAuth = await getOrSetDocAuth(req as RequestWithLogin, this._dbManager,
this._gristServer, req.params.docId);
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group Summary: In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that. * Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them. * These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`. * The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group. * An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments. The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that. For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical: * Set prod2 tag in docker to desired codebase for that document * Set desired_count for grist-docs-prod2 service to non-zero * Set doc-<docid>-group for that doc in redis to secondary * Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2 (If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff). Test Plan: added tests Reviewers: dsagal Reviewed By: dsagal Differential Revision: https://phab.getgrist.com/D2649
4 years ago
if (role) {
assertAccess(role, docAuth, {allowRemoved: true});
}
docId = docAuth.docId;
}
// Use the docId for worker assignment, rather than req.params.docId, which could be a urlId.
const assignmentId = getAssignmentId(this._docWorkerMap, docId === null ? 'import' : docId);
if (!this._docWorkerMap) {
throw new ApiError('no worker map', 404);
}
const docStatus = await this._docWorkerMap.assignDocWorker(assignmentId);
// Construct new url by keeping only origin and path prefixes of `docWorker.internalUrl`,
// and otherwise reflecting fully the original url (remaining path, and query params).
const docWorkerUrl = new URL(docStatus.docWorker.internalUrl);
const url = new URL(req.originalUrl, docWorkerUrl.origin);
url.pathname = removeTrailingSlash(docWorkerUrl.pathname) + url.pathname;
const headers: {[key: string]: string} = {
...getTransitiveHeaders(req),
'Content-Type': req.get('Content-Type') || 'application/json',
};
for (const key of ['X-Sort', 'X-Limit']) {
const hdr = req.get(key);
if (hdr) { headers[key] = hdr; }
}
(core) For exporting XLSX, do it memory-efficiently in a worker thread. Summary: - Excel exports were awfully memory-inefficient, causing occasional docWorker crashes. The fix is to use the "streaming writer" option of ExcelJS https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically on one example, max memory went down from 3G to 100M) - It's also CPU intensive and synchronous, and can block node for tens of seconds. The fix is to use a worker-thread. This diff uses "piscina" library for a pool of threads. - Additionally, adds ProcessMonitor that logs memory and cpu usage, particularly when those change significantly. - Also introduces request cancellation, so that a long download cancelled by the user will cancel the work being done in the worker thread. Test Plan: Updated previous export tests; memory and CPU performance tested manually by watching output of ProcessMonitor. Difference visible in these log excerpts: Before (total time to serve request 22 sec): ``` Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005 Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002 ``` After (total time to server request 18 sec): ``` Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001 Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001 ``` Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time. Reviewers: jarek Reviewed By: jarek Differential Revision: https://phab.getgrist.com/D3906
1 year ago
const controller = new AbortController();
// If the original request is aborted, abort the forwarded request too. (Currently this only
// affects some export/download requests which can abort long-running work.)
addAbortHandler(req, res, () => controller.abort());
(core) For exporting XLSX, do it memory-efficiently in a worker thread. Summary: - Excel exports were awfully memory-inefficient, causing occasional docWorker crashes. The fix is to use the "streaming writer" option of ExcelJS https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically on one example, max memory went down from 3G to 100M) - It's also CPU intensive and synchronous, and can block node for tens of seconds. The fix is to use a worker-thread. This diff uses "piscina" library for a pool of threads. - Additionally, adds ProcessMonitor that logs memory and cpu usage, particularly when those change significantly. - Also introduces request cancellation, so that a long download cancelled by the user will cancel the work being done in the worker thread. Test Plan: Updated previous export tests; memory and CPU performance tested manually by watching output of ProcessMonitor. Difference visible in these log excerpts: Before (total time to serve request 22 sec): ``` Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005 Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002 ``` After (total time to server request 18 sec): ``` Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001 Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001 ``` Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time. Reviewers: jarek Reviewed By: jarek Differential Revision: https://phab.getgrist.com/D3906
1 year ago
const options: RequestInit = {
method: req.method,
headers,
(core) For exporting XLSX, do it memory-efficiently in a worker thread. Summary: - Excel exports were awfully memory-inefficient, causing occasional docWorker crashes. The fix is to use the "streaming writer" option of ExcelJS https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically on one example, max memory went down from 3G to 100M) - It's also CPU intensive and synchronous, and can block node for tens of seconds. The fix is to use a worker-thread. This diff uses "piscina" library for a pool of threads. - Additionally, adds ProcessMonitor that logs memory and cpu usage, particularly when those change significantly. - Also introduces request cancellation, so that a long download cancelled by the user will cancel the work being done in the worker thread. Test Plan: Updated previous export tests; memory and CPU performance tested manually by watching output of ProcessMonitor. Difference visible in these log excerpts: Before (total time to serve request 22 sec): ``` Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005 Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002 ``` After (total time to server request 18 sec): ``` Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001 Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001 ``` Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time. Reviewers: jarek Reviewed By: jarek Differential Revision: https://phab.getgrist.com/D3906
1 year ago
signal: controller.signal,
};
if (['POST', 'PATCH', 'PUT'].includes(req.method)) {
// uses `req` as a stream
options.body = req;
}
(core) For exporting XLSX, do it memory-efficiently in a worker thread. Summary: - Excel exports were awfully memory-inefficient, causing occasional docWorker crashes. The fix is to use the "streaming writer" option of ExcelJS https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically on one example, max memory went down from 3G to 100M) - It's also CPU intensive and synchronous, and can block node for tens of seconds. The fix is to use a worker-thread. This diff uses "piscina" library for a pool of threads. - Additionally, adds ProcessMonitor that logs memory and cpu usage, particularly when those change significantly. - Also introduces request cancellation, so that a long download cancelled by the user will cancel the work being done in the worker thread. Test Plan: Updated previous export tests; memory and CPU performance tested manually by watching output of ProcessMonitor. Difference visible in these log excerpts: Before (total time to serve request 22 sec): ``` Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005 Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005 Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002 ``` After (total time to server request 18 sec): ``` Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001 Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000 Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001 ``` Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time. Reviewers: jarek Reviewed By: jarek Differential Revision: https://phab.getgrist.com/D3906
1 year ago
const docWorkerRes = await fetch(url.href, options);
res.status(docWorkerRes.status);
for (const key of ['content-type', 'content-disposition', 'cache-control']) {
const value = docWorkerRes.headers.get(key);
if (value) { res.set(key, value); }
}
return new Promise<void>((resolve, reject) => {
docWorkerRes.body.on('error', reject);
res.on('error', reject);
res.on('finish', resolve);
docWorkerRes.body.pipe(res);
});
}
}