This adds a new `GRIST_SANDBOX_FLAVOR=pyodide` option where the
version of Python used for the data engine is wasm, and so can
be run by node like the rest of the back end. It still runs as
a separate process.
There are a few small version changes made to packages to avoid
various awkwardnesses present in the current versions. All existing
tests pass.
This is very experimental. To use, you'll need something with
a bash shell and make. First do:
```
cd sandbox/pyodide
make setup # README.md and Makefile have details
cd ..
```
Then running Grist as:
```
GRIST_SANDBOX_FLAVOR=pyodide yarn start
```
should work. Adding a formula with content:
```
import sys; return sys.version
```
should return a different Python version than other sandboxes.
The motivation for this work is to have a form of sandboxing
that will work on Windows for Grist Electron (for Linux we have
gvisor/runsc, for Mac we have sandbox-exec, but I haven't found
anything comparable for Windows).
It also brings a back-end-free version of Grist a bit closer, for
use-cases where that would make sense - such as serving a report
(in the form of a Grist document) on a static site.
* Replace `ormconfig.js` with a newer mechanism of configuring
TypeORM that can be included in the source code properly.
The path to `ormconfig.js` has always been awkward to handle,
and eliminating the file makes building different Grist setups
a bit simpler.
* Remove `electron` package. It is barely used, just for some old
remnants of an older attempt at electron packaging. It was used
for two types, which I left at `any` for now. More code pruning is
no doubt possible here, but I'd rather do it when Electron packaging
has solidified.
* Add a hook for replacing the login system, and for adding some
extra middleware the login system may need.
* Add support for some more possible locations of Python, which
arise when a standalone version of it is included in the Electron
package. This isn't very general purpose, just configurations
that I found useful.
* Support using grist-core within a yarn workspace - the only tweak
needed was webpack related.
* Allow an external ID to be optionally associated with documents.
Summary:
- Use newer flag in .npmrc to avoid warnings
- Fix check in WidgetRepository, useful for development but was broken
- Fix macSandboxExec for Macs that require libRosettaRuntime
- Make sure row count in Raw Data listing is visible when it takes more space
Test Plan: Tested manually
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3759
Removes a comment now that `gvisor` works fine with grist-core, and is packaged in the docker image. Reorders possible sandbox flavors to de-emphasize `pynbox` since it isn't packaged in the docker image.
Summary: Extend formula error messages with explanations from https://github.com/friendly-traceback/friendly-traceback. Only for Python 3.
Test Plan: Updated several Python tests. In general, these require separate branches for Python 2 and 3.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3542
Summary:
Building:
- Builds no longer wait for tsc for either client, server, or test targets. All use esbuild which is very fast.
- Build still runs tsc, but only to report errors. This may be turned off with `SKIP_TSC=1` env var.
- Grist-core continues to build using tsc.
- Esbuild requires ES6 module semantics. Typescript's esModuleInterop is turned
on, so that tsc accepts and enforces correct usage.
- Client-side code is watched and bundled by webpack as before (using esbuild-loader)
Code changes:
- Imports must now follow ES6 semantics: `import * as X from ...` produces a
module object; to import functions or class instances, use `import X from ...`.
- Everything is now built with isolatedModules flag. Some exports were updated for it.
Packages:
- Upgraded browserify dependency, and related packages (used for the distribution-building step).
- Building the distribution now uses esbuild's minification. babel-minify is no longer used.
Test Plan: Should have no behavior changes, existing tests should pass, and docker image should build too.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3506
Summary:
- Upgrades to build-related packages:
- Upgrade typescript, related libraries and typings.
- Upgrade webpack, eslint; add tsc-watch, node-dev, eslint_d.
- Build organization changes:
- Build webpack from original typescript, transpiling only; with errors still
reported by a background tsc watching process.
- Typescript-related changes:
- Reduce imports of AWS dependencies (very noticeable speedup)
- Avoid auto-loading global @types
- Client code is now built with isolatedModules flag (for safe transpilation)
- Use allowJs to avoid copying JS files manually.
- Linting changes
- Enhance Arcanist ESLintLinter to run before/after commands, and set up to use eslint_d
- Update eslint config, and include .eslintignore to avoid linting generated files.
- Include a bunch of eslint-prompted and eslint-generated fixes
- Add no-unused-expression rule to eslint, and fix a few warnings about it
- Other items:
- Refactor cssInput to avoid circular dependency
- Remove a bit of unused code, libraries, dependencies
Test Plan: No behavior changes, all existing tests pass. There are 30 tests fewer reported because `test_gpath.py` was removed (it's been unused for years)
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: paulfitz
Differential Revision: https://phab.getgrist.com/D3498
Summary:
This allows limiting the memory available to documents in the sandbox when gvisor is used. If memory limit is exceeded, we offer to open doc in recovery mode. Recovery mode is tweaked to open docs with tables in "ondemand" mode, which will generally take less memory and allow for deleting rows.
The limit is on the size of the virtual address space available to the sandbox (`RLIMIT_AS`), which in practice appears to function as one would want, and is the only practical option. There is a documented `RLIMIT_RSS` limit to `specifies the limit (in bytes) of the process's resident set (the number of virtual pages resident in RAM)` but this is no longer enforced by the kernel (neither the host nor gvisor).
When the sandbox runs out of memory, there are many ways it can fail. This diff catches all the ones I saw, but there could be more.
Test Plan: added tests
Reviewers: alexmojaki
Reviewed By: alexmojaki
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3398
Summary:
Use openpyxl instead of messytables (which used xlrd internally) in import_xls.py.
Skip empty rows since excel files can easily contain huge numbers of them.
Drop support for xls files (which openpyxl doesn't support) in favour of the newer xlsx format.
Fix some details relating to python virtualenvs and dependencies, as Jenkins was failing to find new Python dependencies.
Test Plan: Mostly relying on existing tests. Updated various tests which referred to xls files instead of xlsx. Added a Python test for skipping empty rows.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3406
Summary:
- Include docId when available for client-side error reporting
- Distinguish sandbox crashes from forced exits
Test Plan: Tested manually
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3373
Summary:
This adds support for gvisor sandboxing in core. When Grist is run outside of a container, regular gvisor can be used (if on linux), and will run in rootless mode. When Grist is run inside a container, docker's default policy is insufficient for running gvisor, so a fork of gvisor is used that has less defence-in-depth but can run without privileges.
Sandboxing is automatically turned on in the Grist core container. It is not turned on automatically when built from source, since it is operating-system dependent.
This diff may break a complex method of testing Grist with gvisor on macs that I may have been the only person using. If anyone complains I'll find time on a mac to fix it :)
This diff includes a small "easter egg" to force document loads, primarily intended for developer use.
Test Plan: existing tests pass; checked that core and saas docker builds function
Reviewers: alexmojaki
Reviewed By: alexmojaki
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3333
Summary: Capture the stacktrace (via SandboxError) in `_pyCallWait` instead of `_onSandboxMsg` where it's always the same.
Test Plan:
Tested manually, found for example that the stacktrace in the logs changed from being rather useless:
```
at NSandbox._onSandboxMsg (/home/alex/work/grist/_build/core/app/server/lib/NSandbox.js:229:36)
at /home/alex/work/grist/_build/core/app/server/lib/NSandbox.js:179:18
at Unmarshaller.parse (/home/alex/work/grist/_build/core/app/common/marshal.js:289:21)
at NSandbox._onSandboxData (/home/alex/work/grist/_build/core/app/server/lib/NSandbox.js:174:28)
at Socket.<anonymous> (/home/alex/work/grist/_build/core/app/server/lib/NSandbox.js:63:59)
at Socket.emit (events.js:315:20)
at Socket.EventEmitter.emit (domain.js:467:12)
at addChunk (internal/streams/readable.js:309:12)
at readableAddChunk (internal/streams/readable.js:284:9)
at Socket.Readable.push (internal/streams/readable.js:223:10)
at Pipe.onStreamRead (internal/stream_base_commons.js:188:23)
```
to being somewhat more helpful:
```
at NSandbox._pyCallWait (/home/alex/work/grist/_build/core/app/server/lib/NSandbox.js:134:19)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async ActiveDoc.applyActionsToDataEngine (/home/alex/work/grist/_build/core/app/server/lib/ActiveDoc.js:1080:39)
at async Sharing._applyActionsToDataEngine (/home/alex/work/grist/_build/core/app/server/lib/Sharing.js:325:37)
```
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: paulfitz
Differential Revision: https://phab.getgrist.com/D3329
Summary: This is https://phab.getgrist.com/D3205 plus some changes (https://github.com/dsagal/grist/compare/type-convert...type-convert-server?expand=1) that move the conversion process to the backend. A new user action ConvertFromColumn uses `call_external` so that the data engine can delegate back to ActiveDoc. Code for creating formatters and parsers is significantly refactored so that most of the logic is in `common` and can be used in different ways.
Test Plan: The original diff adds plenty of tests.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D3240
Summary:
This updates the grist-core README to list specific features of Grist,
to make it easier for a casual visitor to get a sense of its scope. Adds links
to some new resources (reviews, templates, grist v airtable post) that could
also help. Adds python3 to docker image so that templates work without fuss.
Test Plan: existing tests should pass
Reviewers: georgegevoian
Reviewed By: georgegevoian
Subscribers: dsagal, anaisconce
Differential Revision: https://phab.getgrist.com/D3204
Summary:
Grist has, up to now, used a throttling mechanism that allows a sandbox free rein until it starts using above some threshold percentage of a cpu for some time - at that point, we start sending STOP and CONT signals on a duty cycle, with longer and longer STOPped periods until cpu usage is at a threshold. The general idea is to do short jobs quickly, while throttling long jobs (thus unfortunately making them even longer) in order to continue doing other short jobs quickly.
The runsc sandbox is not a single process, there are in fact 5 per sandbox in our setup. Runsc can work with kvm or ptrace. Kvm is not available to us, so we use ptrace. With ptrace, there is one process that is the appropriate one to duty cycle, and another that needs to receive a signal in order to yield. This diff adds the necessary machinery.
This is a conservative change, where I stick with our existing throttling mechanism and adapt it to the new sandbox. It would be reasonable to consider switching throttling. There's a lot the OS allows. We can set a quota for how much cpu a process can use within a given period, for example. However the overall behavior with that would be quite different to what we have, so feels like this would need more discussion.
The implementation contains use of a linux utility `pgrep` since portability is not important (runsc is only available on linux) and there's no node api for enumerating children of a process.
The diff contains some tweaks to `buildtools/contain.sh` to streamline experimenting with Grist and runsc on a mac. It is important for throttling that node and the sandbox processes are in the same process name space, if docker is in between them then some extra machinery is needed (a proxy throttler and a way to communicate with it) which I chose not to implement.
Test Plan: added test; a lot of manual testing
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3113
Summary:
This verifies that all existing tests are capable of running under python3/gvisor, and fixes the small issues that came up. It does not yet activate python3 tests on all diffs, only diffs that specifically request them.
* Adds a suffix in test names and output directories for tests run with PYTHON_VERSION=3, so that results of the same test run with and without the flag can be aggregated cleanly.
* Adds support for checkpointing to the gvisor sandbox adapter.
* Prepares a checkpoint made after grist python code has loaded in the gvisor sandbox.
* Changes how `DOC_URL` is passed to the sandbox, since it can no longer be passed in as an environment variable when using checkpoints.
* Uses the checkpoint to speed up tests using the gvisor sandbox, otherwise a lot of tests need more time (especially on mac under docker).
* Directs jenkins to run all tests with python2 and python3 when a new file `buildtools/changelogs/python.txt` is touched (this diff counts as touching that file).
* Tweaks miscellaneous tests
- some needed fixes exposed by slightly different timing
- a small number actually give different results in py3 (removal of `u` prefixes).
- some needed a little more time
The DOC_URL change is not the ultimate solution we want for DOC_URL. Eventually it should be a variable that gets updated, like the date perhaps. This is just a small pragmatic change to preserve existing behavior.
Tests are run mindlessly as py3, and for some tests it won't change anything (e.g. if they do not use NSandbox). Tests are not run in parallel, doubling overall test time.
Checkpoints could be useful in deployment, though this diff doesn't use them there.
The application of checkpoints doesn't check for other configuration like 3-versus-5-pipe that we don't actually use.
Python2 tests run using pynbox as always for now.
The diff got sufficiently bulky that I didn't tackle running py3 on "regular" diffs in it. My preference, given that most tests don't appear to stress the python side of things, would be to make a selection of the tests that do and a few wild cards, and run those tests on both pythons rather then all of them. For diffs making a significant python change, I'd propose touching buildtools/changelogs/python.txt for full tests. But this is a conversation in progress.
A total of 6886 tests ran on this diff.
Test Plan: this is a step in preparing tests for py3 transition
Reviewers: dsagal
Reviewed By: dsagal
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D3066
Summary:
docker is slow on macs, so use native sandbox-exec by
default for tests involving python3 on macs.
Test Plan: updated test
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3068
Summary: There was a bad regex processing the document url passed to the sandbox.
Test Plan: added test
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3048
Summary:
This adds `runsc` and `python3` to the grist-server images. For deployments with GRIST_EXPERIMENTAL_PLUGINS=1 (dev + staging but not prod) a hack is added to use `python3` under `runsc` for documents with a special title (`activate-python3-magic` or similar).
This will simplify experiments on behavior of this configuration under realistic conditions.
Hopefully, before landing this, I'll be able to switch to storing a python flag in a document options cell being added by @georgegevoian in a parallel diff, since using the doc title is super hacky :-).
Test Plan: tested manually on worker built locally
Reviewers: dsagal, alexmojaki
Reviewed By: dsagal, alexmojaki
Subscribers: georgegevoian
Differential Revision: https://phab.getgrist.com/D2998
Summary:
Move all the plugins python code into the main folder with the core code.
Register file importing functions in the same main.py entrypoint as the data engine.
Remove options relating to different entrypoints and code directories. The only remaining plugin-specific option in NSandbox is the import directory/mount, i.e. where files to be parsed are placed.
Test Plan: this
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D2965
Summary:
This adds python2 to the gvisor sandbox image. It can be used instead
of the default python3 by setting PYTHON_VERSION to 2 (or calling run.py with python2).
This is useful for making side-by-side comparisons with code running python3.
Test Plan: manual
Reviewers: alexmojaki
Reviewed By: alexmojaki
Differential Revision: https://phab.getgrist.com/D2957
Summary:
* Moves essential plugins to grist-core, so that basic imports (e.g. csv) work.
* Adds support for a `GRIST_SANDBOX_FLAVOR` flag that can systematically override how the data engine is run.
- `GRIST_SANDBOX_FLAVOR=pynbox` is "classic" nacl-based sandbox.
- `GRIST_SANDBOX_FLAVOR=docker` runs engines in individual docker containers. It requires an image specified in `sandbox/docker` (alternative images can be named with `GRIST_SANDBOX` flag - need to contain python and engine requirements). It is a simple reference implementation for sandboxing.
- `GRIST_SANDBOX_FLAVOR=unsandboxed` runs whatever local version of python is specified by a `GRIST_SANDBOX` flag directly, with no sandboxing. Engine requirements must be installed, so an absolute path to a python executable in a virtualenv is easiest to manage.
- `GRIST_SANDBOX_FLAVOR=gvisor` runs the data engine via gvisor's runsc. Experimental, with implementation not included in grist-core. Since gvisor runs on Linux only, this flavor supports wrapping the sandboxes in a single shared docker container.
* Tweaks some recent express query parameter code to work in grist-core, which has a slightly different version of express (smoke test doesn't catch this since in Jenkins core is built within a workspace that has node_modules, and wires get crossed - in a dev environment the problem on master can be seen by doing `buildtools/build_core.sh /tmp/any_path_outside_grist`).
The new sandbox options do not have tests yet, nor does this they change the behavior of grist servers today. They are there to clean up and consolidate a collection of patches I've been using that were getting cumbersome, and make it easier to run experiments.
I haven't looked closely at imports beyond core.
Test Plan: tested manually against regular grist and grist-core, including imports
Reviewers: alexmojaki, dsagal
Reviewed By: alexmojaki
Differential Revision: https://phab.getgrist.com/D2942
Summary:
This switches to using stdin/stdout for RPC calls to the sandbox, rather than specially allocated side channels. Plain text error information remains on stderr.
The motivation for the change is to simplify use of sandboxes, some of which support extra file descriptors and some of which don't.
The new style of communication is made the default, but I'm not committed to this, just that it be easy to switch to if needed. It is possible I'll need to switch the communication method again in the near future.
One reason not to make this default would be windows support, which is likely broken since stdin/stdout are by default in text mode.
Test Plan: existing tests pass
Reviewers: dsagal, alexmojaki
Reviewed By: dsagal, alexmojaki
Differential Revision: https://phab.getgrist.com/D2897
Summary:
Run JS with a value for SANDBOX_BUFFERS_DIR, then run test_replay in python with the same value to replay just the python code.
See test_replay.py for more info.
Test Plan:
Record some data, e.g. `SANDBOX_BUFFERS_DIR=manual npm start` or `SANDBOX_BUFFERS_DIR=server ./test/testrun.sh server`.
Then run `SANDBOX_BUFFERS_DIR=server python -m unittest test_replay` from within `core/sandbox/grist` to replay the input from the JS.
Sample of the output will look like this:
```
Checking /tmp/sandbox_buffers/server/2021-06-16T15:13:59.958Z
True
Checking /tmp/sandbox_buffers/server/2021-06-16T15:16:37.170Z
True
Checking /tmp/sandbox_buffers/server/2021-06-16T15:14:22.378Z
True
```
Reviewers: paulfitz, dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2866
Summary:
Replaces https://phab.getgrist.com/D2854
Refactoring of NSandbox:
- Simplify arguments to NSandbox.spawn. Only half the arguments were used depending on the flavour, adding a layer of confusion.
- Ensure the same environment variables are passed to both flavours of sandbox
- Simplify passing down environment variables.
Implement deterministic mode with libfaketime and a seeded random instance.
- Include static prebuilt libfaketime.so.1, may need another solution in future for other platforms.
Recording pycalls:
- Add script recordDocumentPyCalls.js to open a single document outside of tests.
- Refactor out recordPyCalls.ts to support various uses.
- Add afterEach hook to save all pycalls from server tests under $PYCALLS_DIR
- Make docTools usable without mocha.
- Add useLocalDoc and loadLocalDoc for loading non-fixture documents
Test Plan:
Made a document with formulas NOW() and UUID()
Compare two document openings in normal mode:
diff <(test/recordDocumentPyCalls.js samples/d4W6NrzCMNVSVD6nWgNrGC.grist /dev/stdout) \
<(test/recordDocumentPyCalls.js samples/d4W6NrzCMNVSVD6nWgNrGC.grist /dev/stdout)
Output:
< 1623407499.58132,
---
> 1623407499.60376,
1195c1195
< "B": "bd2487f6-63c9-4f02-bbbc-5c0d674a2dc6"
---
> "B": "22e1a4fd-297f-4b86-91a2-bc42cc6da4b2"
`export DETERMINISTIC_MODE=1` and repeat. diff is empty!
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D2857
Summary:
* adds a smoke test to grist-core
* fixes a problem with highlight.js failing to load correctly
* skips survey for default user
* freshens docker build
Utility files in test/nbrowser are moved to core/test/nbrowser, so that gristUtils are available there. This increased the apparent size of the diff as "./" import paths needed replacing with "test/nbrowser/" paths. The utility files are untouched, except for the code to start a server - it now has a small grist-core specific conditional in it.
Test Plan: adds test
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2768
Summary:
* Adds a `SELF_HYPERLINK()` python function, with optional keyword arguments to set a label, the page, and link parameters.
* Adds a `UUID()` python function, since using python's uuid.uuidv4 hits a problem accessing /dev/urandom in the sandbox. UUID makes no particular quality claims since it doesn't use an audited implementation. A difficult to guess code is convenient for some use cases that `SELF_HYPERLINK()` enables.
The canonical URL for a document is mutable, but older versions generally forward. So for implementation simplicity the document url is passed it on sandbox creation and remains fixed throughout the lifetime of the sandbox. This could and should be improved in future.
The URL is passed into the sandbox as a `DOC_URL` environment variable.
The code for creating the URL is factored out of `Notifier.ts`. Since the url is a function of the organization as well as the document, some rejiggering is needed to make that information available to DocManager.
On document imports, the new document is registered in the database slightly earlier now, in order to keep the procedure for constructing the URL in different starting conditions more homogeneous.
Test Plan: updated test
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2759
Summary:
this moves sandbox/grist to core, and adds a requirements.txt
file for reconstructing the content of sandbox/thirdparty.
Test Plan:
existing tests pass.
Tested core functionality manually. Tested docker build manually.
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2563
Summary: This moves enough server material into core to run a home server. The data engine is not yet incorporated (though in manual testing it works when ported).
Test Plan: existing tests pass
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2552