(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
/**
|
|
|
|
* Overview of Excel exports, which now use worker-threads.
|
|
|
|
*
|
|
|
|
* 1. The flow starts with downloadXLSX() method called in the main thread (or streamXLSX() used for
|
|
|
|
* Google Drive export).
|
|
|
|
* 2. It uses the 'piscina' library to call a makeXLSX* method in a worker thread, registered in
|
|
|
|
* workerExporter.ts, to export full doc, a table, or a section.
|
2023-06-02 13:30:19 +00:00
|
|
|
* 3. Each of those methods calls a doMakeXLSX* method defined in that file. I.e. downloadXLSX()
|
|
|
|
* is called in the main thread, but makeXLSX() and doMakeXLSX() are called in the worker thread.
|
|
|
|
* 4. doMakeXLSX* methods get data using an ActiveDocSource, which uses Rpc (from grain-rpc
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
* module) to request data over a message port from the ActiveDoc in the main thread.
|
|
|
|
* 5. The resulting stream of Excel data is streamed back to the main thread using Rpc too.
|
|
|
|
*/
|
2021-07-21 08:46:03 +00:00
|
|
|
import {ActiveDoc} from 'app/server/lib/ActiveDoc';
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
import {ActiveDocSource, ActiveDocSourceDirect, DownloadOptions, ExportParameters} from 'app/server/lib/Export';
|
2022-07-04 14:14:55 +00:00
|
|
|
import log from 'app/server/lib/log';
|
2023-06-15 20:14:27 +00:00
|
|
|
import {addAbortHandler} from 'app/server/lib/requestUtils';
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
import * as express from 'express';
|
2022-07-04 14:14:55 +00:00
|
|
|
import contentDisposition from 'content-disposition';
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
import {Rpc} from 'grain-rpc';
|
|
|
|
import {AbortController} from 'node-abort-controller';
|
2023-06-02 13:30:19 +00:00
|
|
|
import {Writable} from 'stream';
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
import {MessageChannel} from 'worker_threads';
|
|
|
|
import Piscina from 'piscina';
|
|
|
|
|
2023-06-02 13:30:19 +00:00
|
|
|
// If this file is imported from within a worker thread, we'll create more thread pools from each
|
|
|
|
// thread, with a potential for an infinite loop of doom. Better to catch that early.
|
|
|
|
if (Piscina.isWorkerThread) {
|
|
|
|
throw new Error("ExportXLSX must not be imported from within a worker thread");
|
|
|
|
}
|
|
|
|
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
// Configure the thread-pool to use for exporting XLSX files.
|
|
|
|
const exportPool = new Piscina({
|
|
|
|
filename: __dirname + '/workerExporter.js',
|
|
|
|
minThreads: 0,
|
|
|
|
maxThreads: 4,
|
|
|
|
maxQueue: 100, // Fail if this many tasks are already waiting for a thread.
|
|
|
|
idleTimeout: 10_000, // Drop unused threads after 10s of inactivity.
|
|
|
|
});
|
2021-09-01 21:07:53 +00:00
|
|
|
|
|
|
|
/**
|
2022-09-14 18:55:44 +00:00
|
|
|
* Converts `activeDoc` to XLSX and sends the converted data through `res`.
|
2021-09-01 21:07:53 +00:00
|
|
|
*/
|
|
|
|
export async function downloadXLSX(activeDoc: ActiveDoc, req: express.Request,
|
2022-09-14 18:55:44 +00:00
|
|
|
res: express.Response, options: DownloadOptions) {
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
const {filename} = options;
|
|
|
|
res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
|
|
|
|
res.setHeader('Content-Disposition', contentDisposition(filename + '.xlsx'));
|
|
|
|
return streamXLSX(activeDoc, req, res, options);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Converts `activeDoc` to XLSX and sends to the given outputStream.
|
|
|
|
*/
|
|
|
|
export async function streamXLSX(activeDoc: ActiveDoc, req: express.Request,
|
|
|
|
outputStream: Writable, options: ExportParameters) {
|
2021-09-01 21:07:53 +00:00
|
|
|
log.debug(`Generating .xlsx file`);
|
2023-07-19 17:37:22 +00:00
|
|
|
const {tableId, viewSectionId, filters, sortOrder, linkingFilter} = options;
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
const testDates = (req.hostname === 'localhost');
|
2022-09-14 18:55:44 +00:00
|
|
|
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
const { port1, port2 } = new MessageChannel();
|
|
|
|
try {
|
|
|
|
const rpc = new Rpc({
|
|
|
|
sendMessage: async (m) => port1.postMessage(m),
|
|
|
|
logger: { info: m => {}, warn: m => log.warn(m) },
|
|
|
|
});
|
|
|
|
rpc.registerImpl<ActiveDocSource>("activeDocSource", new ActiveDocSourceDirect(activeDoc, req));
|
|
|
|
rpc.on('message', (chunk) => { outputStream.write(chunk); });
|
|
|
|
port1.on('message', (m) => rpc.receiveMessage(m));
|
|
|
|
|
2023-06-02 13:30:19 +00:00
|
|
|
// For request cancelling to work, remember that such requests are forwarded via DocApiForwarder.
|
|
|
|
const abortController = new AbortController();
|
|
|
|
const cancelWorker = () => abortController.abort();
|
|
|
|
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
// When the worker thread is done, it closes the port on its side, and we listen to that to
|
|
|
|
// end the original request (the incoming HTTP request, in case of a download).
|
2023-06-02 13:30:19 +00:00
|
|
|
port1.on('close', () => {
|
|
|
|
outputStream.end();
|
|
|
|
req.off('close', cancelWorker);
|
|
|
|
});
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
|
2023-06-15 20:14:27 +00:00
|
|
|
addAbortHandler(req, outputStream, cancelWorker);
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
|
|
|
|
const run = (method: string, ...args: any[]) => exportPool.run({port: port2, testDates, args}, {
|
|
|
|
name: method,
|
|
|
|
signal: abortController.signal,
|
|
|
|
transferList: [port2],
|
|
|
|
});
|
|
|
|
|
|
|
|
// hanlding 3 cases : full XLSX export (full file), view xlsx export, table xlsx export
|
|
|
|
try {
|
|
|
|
if (viewSectionId) {
|
2023-07-19 17:37:22 +00:00
|
|
|
await run('makeXLSXFromViewSection', viewSectionId, sortOrder, filters, linkingFilter);
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
} else if (tableId) {
|
|
|
|
await run('makeXLSXFromTable', tableId);
|
|
|
|
} else {
|
|
|
|
await run('makeXLSX');
|
|
|
|
}
|
|
|
|
log.debug('XLSX file generated');
|
|
|
|
} catch (e) {
|
|
|
|
// We fiddle with errors in workerExporter to preserve extra properties like 'status'. Make
|
|
|
|
// the result an instance of Error again here (though we won't know the exact class).
|
|
|
|
throw (e instanceof Error) ? e : Object.assign(new Error(e.message), e);
|
|
|
|
}
|
|
|
|
} finally {
|
|
|
|
port1.close();
|
|
|
|
port2.close();
|
|
|
|
}
|
2021-09-01 21:07:53 +00:00
|
|
|
}
|
2021-07-21 08:46:03 +00:00
|
|
|
|