2020-07-21 13:20:51 +00:00
|
|
|
import * as express from "express";
|
|
|
|
import fetch, { RequestInit } from 'node-fetch';
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
import {AbortController} from 'node-abort-controller';
|
2020-07-21 13:20:51 +00:00
|
|
|
|
|
|
|
import { ApiError } from 'app/common/ApiError';
|
(core) add initial support for special shares
Summary:
This gives a mechanism for controlling access control within a document that is distinct from (though implemented with the same machinery as) granular access rules.
It was hard to find a good way to insert this that didn't dissolve in a soup of complications, so here's what I went with:
* When reading rules, if there are shares, extra rules are added.
* If there are shares, all rules are made conditional on a "ShareRef" user property.
* "ShareRef" is null when a doc is accessed in normal way, and the row id of a share when accessed via a share.
There's no UI for controlling shares (George is working on it for forms), but you can do it by editing a `_grist_Shares` table in a document. Suppose you make a fresh document with a single page/table/widget, then to create an empty share you can do:
```
gristDocPageModel.gristDoc.get().docData.sendAction(['AddRecord', '_grist_Shares', null, {linkId: 'xyz', options: '{"publish": true}'}])
```
If you look at the home db now there should be something in the `shares` table:
```
$ sqlite3 -table landing.db "select * from shares"
+----+------------------------+------------------------+--------------+---------+
| id | key | doc_id | link_id | options |
+----+------------------------+------------------------+--------------+---------+
| 1 | gSL4g38PsyautLHnjmXh2K | 4qYuace1xP2CTcPunFdtan | xyz | ... |
+----+------------------------+------------------------+--------------+---------+
```
If you take the key from that (gSL4g38PsyautLHnjmXh2K in this case) and replace the document's urlId in its URL with `s.<key>` (in this case `s.gSL4g38PsyautLHnjmXh2K` then you can use the regular document landing page (it will be quite blank initially) or API endpoint via the share.
E.g. for me `http://localhost:8080/o/docs/s0gSL4g38PsyautLHnjmXh2K/share-inter-3` accesses the doc.
To actually share some material - useful commands:
```
gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Views_section').getRecords()
gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Views_section', 1, {shareOptions: '{"publish": true, "form": true}'}])
gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Pages').getRecords()
gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Pages', 1, {shareRef: 1}])
```
For a share to be effective, at least one page needs to have its shareRef set to the rowId of the share, and at least one widget on one of those pages needs to have its shareOptions set to {"publish": "true", "form": "true"} (meaning turn on sharing, and include form sharing), and the share itself needs {"publish": true} on its options.
I think special shares are kind of incompatible with public sharing, since by their nature (allowing access to all endpoints) they easily expose the docId, and changing that would be hard.
Test Plan: tests added
Reviewers: dsagal, georgegevoian
Reviewed By: dsagal, georgegevoian
Subscribers: jarek, dsagal
Differential Revision: https://phab.getgrist.com/D4144
2024-01-03 16:53:20 +00:00
|
|
|
import { SHARE_KEY_PREFIX } from 'app/common/gristUrls';
|
2020-07-21 13:20:51 +00:00
|
|
|
import { removeTrailingSlash } from 'app/common/gutil';
|
|
|
|
import { HomeDBManager } from "app/gen-server/lib/HomeDBManager";
|
|
|
|
import { assertAccess, getOrSetDocAuth, getTransitiveHeaders, RequestWithLogin } from 'app/server/lib/Authorizer';
|
|
|
|
import { IDocWorkerMap } from "app/server/lib/DocWorkerMap";
|
|
|
|
import { expressWrap } from "app/server/lib/expressWrap";
|
2022-07-19 15:39:49 +00:00
|
|
|
import { GristServer } from "app/server/lib/GristServer";
|
2020-07-21 13:20:51 +00:00
|
|
|
import { getAssignmentId } from "app/server/lib/idUtils";
|
2023-06-15 20:14:27 +00:00
|
|
|
import { addAbortHandler } from "app/server/lib/requestUtils";
|
2020-07-21 13:20:51 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Forwards all /api/docs/:docId/tables requests to the doc worker handling the :docId document. Makes
|
|
|
|
* sure the user has at least view access to the document otherwise rejects the request. For
|
|
|
|
* performance reason we stream the body directly from the request, which requires that no-one reads
|
|
|
|
* the req before, in particular you should register DocApiForwarder before bodyParser.
|
|
|
|
*
|
|
|
|
* Use:
|
|
|
|
* const home = new ApiServer(false);
|
|
|
|
* const docApiForwarder = new DocApiForwarder(getDocWorkerMap(), home);
|
|
|
|
* app.use(docApiForwarder.getMiddleware());
|
|
|
|
*
|
|
|
|
* Note that it expects userId, and jsonErrorHandler middleware to be set up outside
|
|
|
|
* to apply to these routes.
|
|
|
|
*/
|
|
|
|
export class DocApiForwarder {
|
|
|
|
|
2022-07-19 15:39:49 +00:00
|
|
|
constructor(private _docWorkerMap: IDocWorkerMap, private _dbManager: HomeDBManager,
|
|
|
|
private _gristServer: GristServer) {
|
2020-07-21 13:20:51 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
public addEndpoints(app: express.Application) {
|
(core) add initial support for special shares
Summary:
This gives a mechanism for controlling access control within a document that is distinct from (though implemented with the same machinery as) granular access rules.
It was hard to find a good way to insert this that didn't dissolve in a soup of complications, so here's what I went with:
* When reading rules, if there are shares, extra rules are added.
* If there are shares, all rules are made conditional on a "ShareRef" user property.
* "ShareRef" is null when a doc is accessed in normal way, and the row id of a share when accessed via a share.
There's no UI for controlling shares (George is working on it for forms), but you can do it by editing a `_grist_Shares` table in a document. Suppose you make a fresh document with a single page/table/widget, then to create an empty share you can do:
```
gristDocPageModel.gristDoc.get().docData.sendAction(['AddRecord', '_grist_Shares', null, {linkId: 'xyz', options: '{"publish": true}'}])
```
If you look at the home db now there should be something in the `shares` table:
```
$ sqlite3 -table landing.db "select * from shares"
+----+------------------------+------------------------+--------------+---------+
| id | key | doc_id | link_id | options |
+----+------------------------+------------------------+--------------+---------+
| 1 | gSL4g38PsyautLHnjmXh2K | 4qYuace1xP2CTcPunFdtan | xyz | ... |
+----+------------------------+------------------------+--------------+---------+
```
If you take the key from that (gSL4g38PsyautLHnjmXh2K in this case) and replace the document's urlId in its URL with `s.<key>` (in this case `s.gSL4g38PsyautLHnjmXh2K` then you can use the regular document landing page (it will be quite blank initially) or API endpoint via the share.
E.g. for me `http://localhost:8080/o/docs/s0gSL4g38PsyautLHnjmXh2K/share-inter-3` accesses the doc.
To actually share some material - useful commands:
```
gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Views_section').getRecords()
gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Views_section', 1, {shareOptions: '{"publish": true, "form": true}'}])
gristDocPageModel.gristDoc.get().docData.getMetaTable('_grist_Pages').getRecords()
gristDocPageModel.gristDoc.get().docData.sendAction(['UpdateRecord', '_grist_Pages', 1, {shareRef: 1}])
```
For a share to be effective, at least one page needs to have its shareRef set to the rowId of the share, and at least one widget on one of those pages needs to have its shareOptions set to {"publish": "true", "form": "true"} (meaning turn on sharing, and include form sharing), and the share itself needs {"publish": true} on its options.
I think special shares are kind of incompatible with public sharing, since by their nature (allowing access to all endpoints) they easily expose the docId, and changing that would be hard.
Test Plan: tests added
Reviewers: dsagal, georgegevoian
Reviewed By: dsagal, georgegevoian
Subscribers: jarek, dsagal
Differential Revision: https://phab.getgrist.com/D4144
2024-01-03 16:53:20 +00:00
|
|
|
app.use((req, res, next) => {
|
|
|
|
if (req.url.startsWith('/api/s/')) {
|
|
|
|
req.url = req.url.replace('/api/s/', `/api/docs/${SHARE_KEY_PREFIX}`);
|
|
|
|
}
|
|
|
|
next();
|
|
|
|
});
|
|
|
|
|
2020-07-21 13:20:51 +00:00
|
|
|
// Middleware to forward a request about an existing document that user has access to.
|
|
|
|
// We do not check whether the document has been soft-deleted; that will be checked by
|
|
|
|
// the worker if needed.
|
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group
Summary:
In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that.
* Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them.
* These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`.
* The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group.
* An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments.
The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that.
For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical:
* Set prod2 tag in docker to desired codebase for that document
* Set desired_count for grist-docs-prod2 service to non-zero
* Set doc-<docid>-group for that doc in redis to secondary
* Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2
(If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff).
Test Plan: added tests
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2649
2020-11-02 19:24:46 +00:00
|
|
|
const withDoc = expressWrap(this._forwardToDocWorker.bind(this, true, 'viewers'));
|
2020-07-21 13:20:51 +00:00
|
|
|
// Middleware to forward a request without a pre-existing document (for imports/uploads).
|
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group
Summary:
In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that.
* Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them.
* These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`.
* The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group.
* An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments.
The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that.
For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical:
* Set prod2 tag in docker to desired codebase for that document
* Set desired_count for grist-docs-prod2 service to non-zero
* Set doc-<docid>-group for that doc in redis to secondary
* Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2
(If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff).
Test Plan: added tests
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2649
2020-11-02 19:24:46 +00:00
|
|
|
const withoutDoc = expressWrap(this._forwardToDocWorker.bind(this, false, null));
|
|
|
|
const withDocWithoutAuth = expressWrap(this._forwardToDocWorker.bind(this, true, null));
|
2020-07-21 13:20:51 +00:00
|
|
|
app.use('/api/docs/:docId/tables', withDoc);
|
|
|
|
app.use('/api/docs/:docId/force-reload', withDoc);
|
2020-12-14 17:42:09 +00:00
|
|
|
app.use('/api/docs/:docId/recover', withDoc);
|
2020-07-21 13:20:51 +00:00
|
|
|
app.use('/api/docs/:docId/remove', withDoc);
|
|
|
|
app.delete('/api/docs/:docId', withDoc);
|
|
|
|
app.use('/api/docs/:docId/download', withDoc);
|
2021-07-21 08:46:03 +00:00
|
|
|
app.use('/api/docs/:docId/send-to-drive', withDoc);
|
2021-01-12 15:48:40 +00:00
|
|
|
app.use('/api/docs/:docId/fork', withDoc);
|
|
|
|
app.use('/api/docs/:docId/create-fork', withDoc);
|
2020-07-21 13:20:51 +00:00
|
|
|
app.use('/api/docs/:docId/apply', withDoc);
|
|
|
|
app.use('/api/docs/:docId/attachments', withDoc);
|
|
|
|
app.use('/api/docs/:docId/snapshots', withDoc);
|
2023-01-03 10:52:25 +00:00
|
|
|
app.use('/api/docs/:docId/usersForViewAs', withDoc);
|
2020-07-21 13:20:51 +00:00
|
|
|
app.use('/api/docs/:docId/replace', withDoc);
|
|
|
|
app.use('/api/docs/:docId/flush', withDoc);
|
|
|
|
app.use('/api/docs/:docId/states', withDoc);
|
|
|
|
app.use('/api/docs/:docId/compare', withDoc);
|
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group
Summary:
In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that.
* Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them.
* These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`.
* The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group.
* An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments.
The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that.
For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical:
* Set prod2 tag in docker to desired codebase for that document
* Set desired_count for grist-docs-prod2 service to non-zero
* Set doc-<docid>-group for that doc in redis to secondary
* Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2
(If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff).
Test Plan: added tests
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2649
2020-11-02 19:24:46 +00:00
|
|
|
app.use('/api/docs/:docId/assign', withDocWithoutAuth);
|
2022-11-30 12:04:27 +00:00
|
|
|
app.use('/api/docs/:docId/webhooks/queue', withDoc);
|
(core) Adding /webhooks endpoint
Summary:
- New /webhooks event that lists all webhooks in a document (available for owners),
- Monitoring webhooks usage and saving it in memory or Redis,
- Loosening _usubscribe API endpoint, so that the information returned from the /webhook endpoint is enough to unsubscribe,
- Owners can remove webhook without the unsubscribe key.
The endpoint lists all webhooks that are registered in a document, not just webhooks from a single table.
There are two status fields. First for the webhook, second for the last request attempt.
Webhook can have 5 statuses: 'idle', 'sending', 'retrying', 'postponed', 'error', which roughly describes what the
sendLoop is currently doing. The 'error' status describes a situation when all request attempts failed and the queue needs
to be drained, so some requests were dropped.
The last request status can only be: 'success', 'failure' or 'rejected'. Rejected means that the last batch was dropped because the
queue was too long.
Test Plan: New and updated tests
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3727
2022-12-13 11:47:50 +00:00
|
|
|
app.use('/api/docs/:docId/webhooks', withDoc);
|
2023-07-05 15:36:45 +00:00
|
|
|
app.use('/api/docs/:docId/assistant', withDoc);
|
2023-09-04 13:21:18 +00:00
|
|
|
app.use('/api/docs/:docId/sql', withDoc);
|
2024-02-21 19:22:01 +00:00
|
|
|
app.use('/api/docs/:docId/forms/:vsId', withDoc);
|
2020-07-21 13:20:51 +00:00
|
|
|
app.use('^/api/docs$', withoutDoc);
|
|
|
|
}
|
|
|
|
|
2021-04-26 21:54:09 +00:00
|
|
|
private async _forwardToDocWorker(
|
|
|
|
withDocId: boolean, role: 'viewers'|null, req: express.Request, res: express.Response,
|
|
|
|
): Promise<void> {
|
2020-07-21 13:20:51 +00:00
|
|
|
let docId: string|null = null;
|
|
|
|
if (withDocId) {
|
2022-07-19 15:39:49 +00:00
|
|
|
const docAuth = await getOrSetDocAuth(req as RequestWithLogin, this._dbManager,
|
2023-07-18 07:24:10 +00:00
|
|
|
this._gristServer, req.params.docId);
|
(core) support GRIST_WORKER_GROUP to place worker into an exclusive group
Summary:
In an emergency, we may want to serve certain documents with "old" workers as we fix problems. This diff adds some support for that.
* Creates duplicate task definitions and services for staging and production doc workers (called grist-docs-staging2 and grist-docs-prod2), pulling from distinct docker tags (staging2 and prod2). The services are set to have zero workers until we need them.
* These new workers are started with a new env variable `GRIST_WORKER_GROUP` set to `secondary`.
* The `GRIST_WORKER_GROUP` variable, if set, makes the worker available to documents in the named group, and only that group.
* An unauthenticated `/assign` endpoint is added to documents which, when POSTed to, checks that the doc is served by a worker in the desired group for that doc (as set manually in redis), and if not frees the doc up for reassignment. This makes it possible to move individual docs between workers without redeployments.
The bash scripts added are a record of how the task definitions + services were created. The services could just have been copied manually, but the task definitions will need to be updated whenever the definitions for the main doc workers are updated, so it is worth scripting that.
For example, if a certain document were to fail on a new deployment of Grist, but rolling back the full deployment wasn't practical:
* Set prod2 tag in docker to desired codebase for that document
* Set desired_count for grist-docs-prod2 service to non-zero
* Set doc-<docid>-group for that doc in redis to secondary
* Hit /api/docs/<docid>/assign to move the doc to grist-docs-prod2
(If the document needs to be reverted to a previous snapshot, that currently would need doing manually - could be made simpler, but not in scope of this diff).
Test Plan: added tests
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2649
2020-11-02 19:24:46 +00:00
|
|
|
if (role) {
|
|
|
|
assertAccess(role, docAuth, {allowRemoved: true});
|
|
|
|
}
|
2020-07-21 13:20:51 +00:00
|
|
|
docId = docAuth.docId;
|
|
|
|
}
|
|
|
|
// Use the docId for worker assignment, rather than req.params.docId, which could be a urlId.
|
|
|
|
const assignmentId = getAssignmentId(this._docWorkerMap, docId === null ? 'import' : docId);
|
|
|
|
|
|
|
|
if (!this._docWorkerMap) {
|
|
|
|
throw new ApiError('no worker map', 404);
|
|
|
|
}
|
|
|
|
const docStatus = await this._docWorkerMap.assignDocWorker(assignmentId);
|
|
|
|
|
|
|
|
// Construct new url by keeping only origin and path prefixes of `docWorker.internalUrl`,
|
|
|
|
// and otherwise reflecting fully the original url (remaining path, and query params).
|
|
|
|
const docWorkerUrl = new URL(docStatus.docWorker.internalUrl);
|
|
|
|
const url = new URL(req.originalUrl, docWorkerUrl.origin);
|
|
|
|
url.pathname = removeTrailingSlash(docWorkerUrl.pathname) + url.pathname;
|
|
|
|
|
|
|
|
const headers: {[key: string]: string} = {
|
|
|
|
...getTransitiveHeaders(req),
|
|
|
|
'Content-Type': req.get('Content-Type') || 'application/json',
|
|
|
|
};
|
|
|
|
for (const key of ['X-Sort', 'X-Limit']) {
|
|
|
|
const hdr = req.get(key);
|
|
|
|
if (hdr) { headers[key] = hdr; }
|
|
|
|
}
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
|
|
|
|
const controller = new AbortController();
|
|
|
|
|
|
|
|
// If the original request is aborted, abort the forwarded request too. (Currently this only
|
|
|
|
// affects some export/download requests which can abort long-running work.)
|
2023-06-15 20:14:27 +00:00
|
|
|
addAbortHandler(req, res, () => controller.abort());
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
|
2020-07-21 13:20:51 +00:00
|
|
|
const options: RequestInit = {
|
|
|
|
method: req.method,
|
|
|
|
headers,
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
signal: controller.signal,
|
2020-07-21 13:20:51 +00:00
|
|
|
};
|
2022-02-11 13:10:53 +00:00
|
|
|
if (['POST', 'PATCH', 'PUT'].includes(req.method)) {
|
2020-07-21 13:20:51 +00:00
|
|
|
// uses `req` as a stream
|
|
|
|
options.body = req;
|
|
|
|
}
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
|
2020-07-21 13:20:51 +00:00
|
|
|
const docWorkerRes = await fetch(url.href, options);
|
|
|
|
res.status(docWorkerRes.status);
|
|
|
|
for (const key of ['content-type', 'content-disposition', 'cache-control']) {
|
|
|
|
const value = docWorkerRes.headers.get(key);
|
|
|
|
if (value) { res.set(key, value); }
|
|
|
|
}
|
|
|
|
return new Promise<void>((resolve, reject) => {
|
|
|
|
docWorkerRes.body.on('error', reject);
|
|
|
|
res.on('error', reject);
|
|
|
|
res.on('finish', resolve);
|
|
|
|
docWorkerRes.body.pipe(res);
|
|
|
|
});
|
|
|
|
}
|
|
|
|
}
|