(core) tweak throttling to work for gvisor/runsc
Summary:
Grist has, up to now, used a throttling mechanism that allows a sandbox free rein until it starts using above some threshold percentage of a cpu for some time - at that point, we start sending STOP and CONT signals on a duty cycle, with longer and longer STOPped periods until cpu usage is at a threshold. The general idea is to do short jobs quickly, while throttling long jobs (thus unfortunately making them even longer) in order to continue doing other short jobs quickly.
The runsc sandbox is not a single process, there are in fact 5 per sandbox in our setup. Runsc can work with kvm or ptrace. Kvm is not available to us, so we use ptrace. With ptrace, there is one process that is the appropriate one to duty cycle, and another that needs to receive a signal in order to yield. This diff adds the necessary machinery.
This is a conservative change, where I stick with our existing throttling mechanism and adapt it to the new sandbox. It would be reasonable to consider switching throttling. There's a lot the OS allows. We can set a quota for how much cpu a process can use within a given period, for example. However the overall behavior with that would be quite different to what we have, so feels like this would need more discussion.
The implementation contains use of a linux utility `pgrep` since portability is not important (runsc is only available on linux) and there's no node api for enumerating children of a process.
The diff contains some tweaks to `buildtools/contain.sh` to streamline experimenting with Grist and runsc on a mac. It is important for throttling that node and the sandbox processes are in the same process name space, if docker is in between them then some extra machinery is needed (a proxy throttler and a way to communicate with it) which I chose not to implement.
Test Plan: added test; a lot of manual testing
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3113
2021-11-04 20:44:59 +00:00
|
|
|
import { delay } from 'app/common/delay';
|
|
|
|
import * as log from 'app/server/lib/log';
|
|
|
|
import { Throttle } from 'app/server/lib/Throttle';
|
|
|
|
|
|
|
|
import * as pidusage from '@gristlabs/pidusage';
|
|
|
|
import * as childProcess from 'child_process';
|
|
|
|
import * as util from 'util';
|
|
|
|
|
|
|
|
const execFile = util.promisify(childProcess.execFile);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Sandbox usage information that we log periodically (currently just memory).
|
|
|
|
*/
|
|
|
|
export interface ISandboxUsage {
|
|
|
|
memory: number;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Control interface for a sandbox. Looks like it doesn't do much, but there may be
|
|
|
|
* background activities (specifically, throttling).
|
|
|
|
*/
|
|
|
|
export interface ISandboxControl {
|
|
|
|
getUsage(): Promise<ISandboxUsage>; // Poll usage information for the sandbox.
|
|
|
|
prepareToClose(): void; // Start shutting down (but don't wait).
|
|
|
|
close(): Promise<void>; // Wait for shut down.
|
|
|
|
kill(): Promise<void>; // Send kill signals to any related processes.
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Control a single process directly. A thin wrapper around the Throttle class.
|
|
|
|
*/
|
|
|
|
export class DirectProcessControl implements ISandboxControl {
|
|
|
|
private _throttle?: Throttle;
|
|
|
|
|
|
|
|
constructor(private _process: childProcess.ChildProcess, logMeta?: log.ILogMeta) {
|
|
|
|
if (process.env.GRIST_THROTTLE_CPU) {
|
|
|
|
this._throttle = new Throttle({
|
|
|
|
pid: _process.pid,
|
|
|
|
logMeta: {...logMeta, pid: _process.pid},
|
|
|
|
});
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
public async close() {
|
|
|
|
this.prepareToClose();
|
|
|
|
}
|
|
|
|
|
|
|
|
public prepareToClose() {
|
|
|
|
this._throttle?.stop();
|
|
|
|
this._throttle = undefined;
|
|
|
|
}
|
|
|
|
|
|
|
|
public async kill() {
|
|
|
|
this._process.kill('SIGKILL');
|
|
|
|
}
|
|
|
|
|
|
|
|
public async getUsage() {
|
|
|
|
const memory = (await pidusage(this._process.pid)).memory;
|
|
|
|
return { memory };
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Dummy control interface that does no monitoring or throttling.
|
|
|
|
*/
|
|
|
|
export class NoProcessControl implements ISandboxControl {
|
|
|
|
constructor(private _process: childProcess.ChildProcess) {
|
|
|
|
}
|
|
|
|
|
|
|
|
public async close() {
|
|
|
|
}
|
|
|
|
|
|
|
|
public prepareToClose() {
|
|
|
|
}
|
|
|
|
|
|
|
|
public async kill() {
|
|
|
|
this._process.kill('SIGKILL');
|
|
|
|
}
|
|
|
|
|
|
|
|
public async getUsage() {
|
|
|
|
return { memory: Infinity };
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Control interface when multiple processes are involved, playing different roles.
|
|
|
|
* This is entirely conceived with gvisor's runsc in mind.
|
|
|
|
*
|
|
|
|
* As a process is starting up, we scan it and its children (recursively) for processes
|
|
|
|
* that match certain "recognizers". For gvisor runsc, we'll be picking out a sandbox
|
|
|
|
* process from its peers handling filesystem access, and a ptraced process that is
|
|
|
|
* effectively the data engine.
|
|
|
|
*
|
|
|
|
* This setup is very much developed by inspection, and could have weaknesses.
|
|
|
|
* TODO: check if more processes need to be included in memory counting.
|
|
|
|
* TODO: check if there could be multiple ptraced processes to deal with if user were
|
|
|
|
* to create extra processes within sandbox (which we don't yet attempt to prevent).
|
|
|
|
*
|
|
|
|
* The gvisor container could be configured with operating system help to limit
|
|
|
|
* CPU usage in various ways, but I don't yet see a way to get something analogous
|
|
|
|
* to Throttle's operation.
|
|
|
|
*/
|
|
|
|
export class SubprocessControl implements ISandboxControl {
|
|
|
|
private _throttle?: Throttle;
|
|
|
|
private _monitoredProcess: Promise<ProcessInfo|null>;
|
|
|
|
private _active: boolean;
|
2021-11-10 14:59:48 +00:00
|
|
|
private _foundDocker: boolean = false;
|
(core) tweak throttling to work for gvisor/runsc
Summary:
Grist has, up to now, used a throttling mechanism that allows a sandbox free rein until it starts using above some threshold percentage of a cpu for some time - at that point, we start sending STOP and CONT signals on a duty cycle, with longer and longer STOPped periods until cpu usage is at a threshold. The general idea is to do short jobs quickly, while throttling long jobs (thus unfortunately making them even longer) in order to continue doing other short jobs quickly.
The runsc sandbox is not a single process, there are in fact 5 per sandbox in our setup. Runsc can work with kvm or ptrace. Kvm is not available to us, so we use ptrace. With ptrace, there is one process that is the appropriate one to duty cycle, and another that needs to receive a signal in order to yield. This diff adds the necessary machinery.
This is a conservative change, where I stick with our existing throttling mechanism and adapt it to the new sandbox. It would be reasonable to consider switching throttling. There's a lot the OS allows. We can set a quota for how much cpu a process can use within a given period, for example. However the overall behavior with that would be quite different to what we have, so feels like this would need more discussion.
The implementation contains use of a linux utility `pgrep` since portability is not important (runsc is only available on linux) and there's no node api for enumerating children of a process.
The diff contains some tweaks to `buildtools/contain.sh` to streamline experimenting with Grist and runsc on a mac. It is important for throttling that node and the sandbox processes are in the same process name space, if docker is in between them then some extra machinery is needed (a proxy throttler and a way to communicate with it) which I chose not to implement.
Test Plan: added test; a lot of manual testing
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3113
2021-11-04 20:44:59 +00:00
|
|
|
|
|
|
|
constructor(private _options: {
|
|
|
|
pid: number, // pid of process opened by Grist
|
|
|
|
recognizers: {
|
|
|
|
sandbox: (p: ProcessInfo) => boolean, // we will stop/start this process for throttling
|
|
|
|
memory?: (p: ProcessInfo) => boolean, // read memory from this process (default: sandbox)
|
|
|
|
cpu?: (p: ProcessInfo) => boolean, // read cpu from this process (default: sandbox)
|
|
|
|
traced?: (p: ProcessInfo) => boolean, // stop this as well for throttling (default: none)
|
|
|
|
},
|
|
|
|
logMeta?: log.ILogMeta,
|
|
|
|
}) {
|
|
|
|
this._active = true;
|
|
|
|
this._monitoredProcess = this._scan().catch(e => {
|
|
|
|
log.rawDebug(`Subprocess control failure: ${e}`, this._options.logMeta || {});
|
|
|
|
return null;
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
|
|
|
public async close() {
|
|
|
|
this.prepareToClose();
|
|
|
|
await this._monitoredProcess.catch(() => null);
|
|
|
|
}
|
|
|
|
|
|
|
|
public prepareToClose() {
|
|
|
|
this._active = false;
|
|
|
|
this._throttle?.stop();
|
|
|
|
this._throttle = undefined;
|
|
|
|
}
|
|
|
|
|
|
|
|
public async kill() {
|
2021-11-10 14:59:48 +00:00
|
|
|
if (this._foundDocker) {
|
|
|
|
process.kill(this._options.pid, 'SIGKILL');
|
|
|
|
return;
|
|
|
|
}
|
(core) tweak throttling to work for gvisor/runsc
Summary:
Grist has, up to now, used a throttling mechanism that allows a sandbox free rein until it starts using above some threshold percentage of a cpu for some time - at that point, we start sending STOP and CONT signals on a duty cycle, with longer and longer STOPped periods until cpu usage is at a threshold. The general idea is to do short jobs quickly, while throttling long jobs (thus unfortunately making them even longer) in order to continue doing other short jobs quickly.
The runsc sandbox is not a single process, there are in fact 5 per sandbox in our setup. Runsc can work with kvm or ptrace. Kvm is not available to us, so we use ptrace. With ptrace, there is one process that is the appropriate one to duty cycle, and another that needs to receive a signal in order to yield. This diff adds the necessary machinery.
This is a conservative change, where I stick with our existing throttling mechanism and adapt it to the new sandbox. It would be reasonable to consider switching throttling. There's a lot the OS allows. We can set a quota for how much cpu a process can use within a given period, for example. However the overall behavior with that would be quite different to what we have, so feels like this would need more discussion.
The implementation contains use of a linux utility `pgrep` since portability is not important (runsc is only available on linux) and there's no node api for enumerating children of a process.
The diff contains some tweaks to `buildtools/contain.sh` to streamline experimenting with Grist and runsc on a mac. It is important for throttling that node and the sandbox processes are in the same process name space, if docker is in between them then some extra machinery is needed (a proxy throttler and a way to communicate with it) which I chose not to implement.
Test Plan: added test; a lot of manual testing
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3113
2021-11-04 20:44:59 +00:00
|
|
|
for (const proc of await this._getAllProcesses()) {
|
|
|
|
try {
|
|
|
|
process.kill(proc.pid, 'SIGKILL');
|
|
|
|
} catch (e) {
|
|
|
|
// Don't worry if process is already killed.
|
|
|
|
if (e.code !== 'ESRCH') { throw e; }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
public async getUsage() {
|
|
|
|
try {
|
|
|
|
const monitoredProcess = await this._monitoredProcess;
|
|
|
|
if (!monitoredProcess) { return { memory: Infinity }; }
|
|
|
|
const pid = monitoredProcess.pid;
|
|
|
|
const memory = (await pidusage(pid)).memory;
|
|
|
|
return { memory };
|
|
|
|
} catch (e) {
|
|
|
|
return { memory: Infinity };
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Look for the desired children. Should be run once on process startup.
|
|
|
|
* This method will check all children once per second until if finds the
|
|
|
|
* desired ones or we are closed.
|
|
|
|
*
|
|
|
|
* It returns information about the child to be monitored by getUsage().
|
|
|
|
* It also has a side effect of kicking off throttling.
|
|
|
|
*/
|
|
|
|
private async _scan(): Promise<ProcessInfo> {
|
|
|
|
while (this._active) {
|
|
|
|
const processes = await this._getAllProcesses();
|
|
|
|
const unrecognizedProcess = undefined as ProcessInfo|undefined;
|
|
|
|
const recognizedProcesses = {
|
|
|
|
sandbox: unrecognizedProcess,
|
|
|
|
memory: unrecognizedProcess,
|
|
|
|
cpu: unrecognizedProcess,
|
|
|
|
traced: unrecognizedProcess,
|
|
|
|
};
|
|
|
|
let missing = false;
|
|
|
|
for (const key of Object.keys(recognizedProcesses) as Array<keyof typeof recognizedProcesses>) {
|
|
|
|
const recognizer = this._options.recognizers[key];
|
|
|
|
if (!recognizer) { continue; }
|
|
|
|
for (const proc of processes) {
|
2021-11-10 14:59:48 +00:00
|
|
|
if (proc.label.includes('docker')) {
|
|
|
|
this._foundDocker = true;
|
|
|
|
throw new Error('docker barrier found');
|
|
|
|
}
|
(core) tweak throttling to work for gvisor/runsc
Summary:
Grist has, up to now, used a throttling mechanism that allows a sandbox free rein until it starts using above some threshold percentage of a cpu for some time - at that point, we start sending STOP and CONT signals on a duty cycle, with longer and longer STOPped periods until cpu usage is at a threshold. The general idea is to do short jobs quickly, while throttling long jobs (thus unfortunately making them even longer) in order to continue doing other short jobs quickly.
The runsc sandbox is not a single process, there are in fact 5 per sandbox in our setup. Runsc can work with kvm or ptrace. Kvm is not available to us, so we use ptrace. With ptrace, there is one process that is the appropriate one to duty cycle, and another that needs to receive a signal in order to yield. This diff adds the necessary machinery.
This is a conservative change, where I stick with our existing throttling mechanism and adapt it to the new sandbox. It would be reasonable to consider switching throttling. There's a lot the OS allows. We can set a quota for how much cpu a process can use within a given period, for example. However the overall behavior with that would be quite different to what we have, so feels like this would need more discussion.
The implementation contains use of a linux utility `pgrep` since portability is not important (runsc is only available on linux) and there's no node api for enumerating children of a process.
The diff contains some tweaks to `buildtools/contain.sh` to streamline experimenting with Grist and runsc on a mac. It is important for throttling that node and the sandbox processes are in the same process name space, if docker is in between them then some extra machinery is needed (a proxy throttler and a way to communicate with it) which I chose not to implement.
Test Plan: added test; a lot of manual testing
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3113
2021-11-04 20:44:59 +00:00
|
|
|
if (recognizer(proc)) {
|
|
|
|
recognizedProcesses[key] = proc;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!recognizedProcesses[key]) { missing = true; }
|
|
|
|
}
|
|
|
|
if (!missing) {
|
|
|
|
this._configure(recognizedProcesses);
|
|
|
|
return recognizedProcesses.memory || recognizedProcesses.sandbox!; // sandbox recognizer is mandatory
|
|
|
|
}
|
|
|
|
await delay(1000);
|
|
|
|
}
|
|
|
|
throw new Error('not found');
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Having found the desired children, we configure ourselves here, kicking off
|
|
|
|
* throttling if needed.
|
|
|
|
*/
|
|
|
|
private _configure(processes: { sandbox?: ProcessInfo, cpu?: ProcessInfo,
|
|
|
|
memory?: ProcessInfo, traced?: ProcessInfo }) {
|
|
|
|
if (!processes.sandbox) { return; }
|
|
|
|
if (process.env.GRIST_THROTTLE_CPU) {
|
|
|
|
this._throttle = new Throttle({
|
|
|
|
pid: processes.sandbox.pid,
|
|
|
|
readPid: processes.cpu?.pid,
|
|
|
|
tracedPid: processes.traced?.pid,
|
|
|
|
logMeta: {...this._options.logMeta,
|
|
|
|
pid: processes.sandbox.pid,
|
|
|
|
otherPids: [processes.cpu?.pid,
|
|
|
|
processes.memory?.pid,
|
|
|
|
processes.traced?.pid]},
|
|
|
|
});
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Return the root process and all its (nested) children.
|
|
|
|
*/
|
|
|
|
private _getAllProcesses(): Promise<ProcessInfo[]> {
|
|
|
|
const rootProcess = {pid: this._options.pid, label: 'root', parentLabel: ''};
|
|
|
|
return this._addChildren([rootProcess]);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Take a list of processes, and add children of all those processes,
|
|
|
|
* recursively.
|
|
|
|
*/
|
|
|
|
private async _addChildren(processes: ProcessInfo[]): Promise<ProcessInfo[]> {
|
|
|
|
const nestedProcesses = await Promise.all(processes.map(async proc => {
|
|
|
|
const children = await this._getChildren(proc.pid, proc.label);
|
|
|
|
return [proc, ...await this._addChildren(children)];
|
|
|
|
}));
|
|
|
|
return ([] as ProcessInfo[]).concat(...nestedProcesses);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Figure out the direct children of a parent process.
|
|
|
|
*/
|
|
|
|
private async _getChildren(pid: number, parentLabel: string): Promise<ProcessInfo[]> {
|
|
|
|
// Use "pgrep" to find children of a process, in the absence of any better way.
|
|
|
|
// This only needs to happen a few times as sandbox is starting up, so doesn't need
|
|
|
|
// to be super-optimized.
|
|
|
|
// This currently is only good for Linux. Mechanically, it will run on Macs too,
|
|
|
|
// but process naming is slightly different. But this class is currently only useful
|
|
|
|
// for gvisor's runsc, which runs on Linux only.
|
|
|
|
const cmd =
|
|
|
|
execFile('pgrep', ['--list-full', '--parent', String(pid)])
|
|
|
|
.catch(() => execFile('pgrep', ['-l', '-P', String(pid)])) // mac version of pgrep
|
|
|
|
.catch(() => ({ stdout: '' }));
|
|
|
|
const result = (await cmd).stdout;
|
|
|
|
const parts = result
|
|
|
|
.split('\n')
|
|
|
|
.map(line => line.trim())
|
|
|
|
.map(line => line.split(' ', 2))
|
|
|
|
.map(part => {
|
|
|
|
return {
|
|
|
|
pid: parseInt(part[0], 10) || 0,
|
|
|
|
label: part[1] || '',
|
|
|
|
parentLabel,
|
|
|
|
};
|
|
|
|
});
|
|
|
|
return parts.filter(part => part.pid !== 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* The information we need about processes is their pid, some kind of label (whatever
|
|
|
|
* pgrep reports, which is a version of their command line), and the label of the process's
|
|
|
|
* parent (blank if it has none).
|
|
|
|
*/
|
|
|
|
export interface ProcessInfo {
|
|
|
|
pid: number;
|
|
|
|
label: string;
|
|
|
|
parentLabel: string;
|
|
|
|
}
|