2023-06-06 17:08:50 +00:00
|
|
|
import log from 'app/server/lib/log';
|
|
|
|
import { ITelemetry } from 'app/server/lib/Telemetry';
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
|
|
|
|
const MONITOR_PERIOD_MS = 5_000; // take a look at memory usage this often
|
|
|
|
const MEMORY_DELTA_FRACTION = 0.1; // fraction by which usage should change to get reported
|
|
|
|
const CPU_DELTA_FRACTION = 0.1; // by how much cpu usage should change to get reported
|
|
|
|
const MONITOR_LOG_PERIOD_MS = 600_000; // log usage at least this often
|
|
|
|
|
|
|
|
let _timer: NodeJS.Timeout|undefined;
|
|
|
|
let _lastTickTime: number = Date.now();
|
|
|
|
let _lastReportTime: number = 0;
|
|
|
|
let _lastReportedHeapUsed: number = 0;
|
|
|
|
let _lastCpuUsage: NodeJS.CpuUsage = {system: 0, user: 0};
|
|
|
|
let _lastReportedCpuAverage: number = 0;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Monitor process memory (heap) and CPU usage, reporting as telemetry on an interval, and more
|
|
|
|
* often when usage ticks up or down by a big enough delta.
|
|
|
|
*
|
2023-06-06 17:08:50 +00:00
|
|
|
* There is a single global process monitor, reporting to the `telemetry` object passed into the
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
* first call to start().
|
|
|
|
*
|
|
|
|
* Returns a function that stops the monitor, or null if there was already a process monitor
|
|
|
|
* running, and no new one was started.
|
|
|
|
*
|
|
|
|
* Reports:
|
|
|
|
* - heapUsedMB: Size of JS heap in use, in MiB.
|
|
|
|
* - heapTotalMB: Total heap size, in MiB, allocated for JS by v8.
|
|
|
|
* - cpuAverage: Fraction between 0 and 1, cpu usage over the last MONITOR_PERIOD_MS. Note it
|
|
|
|
* includes usage from all threads, so may exceed 1.
|
|
|
|
* - intervalMs: Interval (in milliseconds) over which cpuAverage is reported. Being much
|
|
|
|
* higher than MONITOR_PERIOD_MS is a sign of being CPU bound for that long.
|
|
|
|
*/
|
2023-06-06 17:08:50 +00:00
|
|
|
export function start(telemetry: ITelemetry): (() => void) | undefined {
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
if (!_timer) {
|
|
|
|
// Initialize variables needed for accurate first-tick measurement.
|
|
|
|
_lastTickTime = Date.now();
|
|
|
|
_lastCpuUsage = process.cpuUsage();
|
2023-06-06 17:08:50 +00:00
|
|
|
_timer = setInterval(() => monitor(telemetry), MONITOR_PERIOD_MS);
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
|
|
|
|
return function stop() {
|
|
|
|
clearInterval(_timer);
|
|
|
|
_timer = undefined;
|
|
|
|
};
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-06-06 17:08:50 +00:00
|
|
|
function monitor(telemetry: ITelemetry) {
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
const memoryUsage = process.memoryUsage();
|
|
|
|
const heapUsed = memoryUsage.heapUsed;
|
|
|
|
const cpuUsage = process.cpuUsage();
|
|
|
|
const now = Date.now();
|
|
|
|
|
|
|
|
const intervalMs = now - _lastTickTime;
|
|
|
|
// Note that cpuUsage info is in microseconds, while intervalMs is milliseconds.
|
|
|
|
const cpuAverage = (cpuUsage.system + cpuUsage.user - _lastCpuUsage.system - _lastCpuUsage.user)
|
|
|
|
/ 1000 / intervalMs;
|
|
|
|
_lastCpuUsage = cpuUsage;
|
|
|
|
_lastTickTime = now;
|
|
|
|
|
|
|
|
// Report usage when:
|
|
|
|
// (a) enough time has passed (MONITOR_LOG_PERIOD_MS)
|
|
|
|
// (b) memory usage ticked up or down enough since the last report
|
|
|
|
// (c) average cpu usage ticked up or down enough since the last report
|
|
|
|
if (
|
|
|
|
now > _lastReportTime + MONITOR_LOG_PERIOD_MS ||
|
|
|
|
Math.abs(heapUsed - _lastReportedHeapUsed) > _lastReportedHeapUsed * MEMORY_DELTA_FRACTION ||
|
|
|
|
Math.abs(cpuAverage - _lastReportedCpuAverage) > CPU_DELTA_FRACTION
|
|
|
|
) {
|
2023-06-06 17:08:50 +00:00
|
|
|
telemetry.logEvent('processMonitor', {
|
|
|
|
full: {
|
|
|
|
heapUsedMB: Math.round(memoryUsage.heapUsed/1024/1024),
|
|
|
|
heapTotalMB: Math.round(memoryUsage.heapTotal/1024/1024),
|
|
|
|
cpuAverage: Math.round(cpuAverage * 100) / 100,
|
|
|
|
intervalMs,
|
|
|
|
},
|
|
|
|
})
|
|
|
|
.catch(e => log.error('failed to log telemetry event processMonitor', e));
|
(core) For exporting XLSX, do it memory-efficiently in a worker thread.
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
2023-06-01 13:09:50 +00:00
|
|
|
_lastReportedHeapUsed = heapUsed;
|
|
|
|
_lastReportedCpuAverage = cpuAverage;
|
|
|
|
_lastReportTime = now;
|
|
|
|
}
|
|
|
|
}
|