A small test harness bundle was recently added that is breaking the docker image build. It could be added to the docker image, but that would introduce a bunch of extraneous test file dependencies. So this tweaks the build to simply skip the test bundle if its primary source file is not found.
Also added some other test fixes along the way:
* make a custom widget test more reliable
* update a localization test now that `pt` exists
* store more log info in artifact on error
There's a little nest of SelectBy tests that sometimes fail.
They are the only tests with an import of a helper function from
an other file that contains tests. Such imports have caused trouble
with mocha in the past. I'm not sure if that is the case now, but
I'd like to eliminate it as a possibility.
Summary:
This tweaks the prompting so that the user's message is given on its own instead of as a docstring within Python. This is so that the prompt makes sense when:
- the user asks a question such as "Can you write me a formula which does ...?" rather than describing their formula as a docstring would, or
- the user sends a message that doesn't ask for a formula at all (https://grist.slack.com/archives/C0234CPPXPA/p1687699944315069?thread_ts=1687698078.832209&cid=C0234CPPXPA)
Also added wording for the model to refuse when the user asks for something that the model cannot do.
Because the code (and maybe in some cases the model) for non-ChatGPT models relies on the prompt consisting entirely of Python code produced by the data engine (which no longer contains the user's message) those code paths have been disabled for now. Updating them now seems like undesirable drag, I think it'd be better to revisit this when iteration/experimentation has slowed down and stabilised.
Test Plan:
Added entries to the formula dataset where the response shouldn't contain a formula, indicated by the value `1` for the new column `no_formula`.
This is somewhat successful, as the model does refuse to help in some of the new test cases, but not all. Performance on existing entries also seems a bit worse, but it's hard to distinguish this from random noise. Hopefully this can be remedied in the future with more work, e.g. automatic followup messages containing example inputs and outputs.
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D3936
Summary:
This uses a newer version of mocha in grist-core so that tests can be run in parallel. That allows more tests to be moved without slowing things down overall. Tests moved are venerable browser tests; only the ones that "just work" or worked without too much trouble to are moved, in order to keep the diff from growing too large. Will wrestle with more in follow up.
Parallelism is at the file level, rather than the individual test.
The newer version of mocha isn't needed for grist-saas repo; tests are parallelized in our internal CI by other means. I've chosen to allocate files to workers in a cruder way than our internal CI, based on initial characters rather than an automated process. The automated process would need some reworking to be compatible with mocha running in parallel mode.
Test Plan: this diff was tested first on grist-core, then ported to grist-saas so saas repo history will correctly track history of moved files.
Reviewers: jarek
Reviewed By: jarek
Subscribers: jarek
Differential Revision: https://phab.getgrist.com/D3927
Summary:
Some edits to virtual tables (such as webhook lists) happen
via a route that was not yet handled. Actually Cyprien (the
original author) had handled this case but it got removed
because I didn't know what it was for :-). This brings back
support for edits by this route.
Test Plan: added a test
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3924
Summary: I looked through the template documents mentioned in `formula-dataset-index.csv` and selected formulas involving lookups to add to the CSV, particularly nontrivial formulas.
Test Plan: Running the test script on the new dataset gives a score of 47/61 compared to the previous 45/47, i.e. it scores 2/14 on the new entries. Lookups are clearly challenging and we'll need to add more information to the prompt, maybe even consider a more complicated strategy than a single prompt. This diff is purely for expanding the dataset, improving performance will come later.
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3931
Summary: Converts `rec.` to `$` in AI generated formulas, and removes redundant `return` at the end.
Test Plan: Expanded unit test.
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3933
Summary:
The GristDocTutorial table is now always visible to users with edit
access to the trunk, and the Share menu is now available within
tutorial forks, making it easier for editors to replace the original
tutorial trunk with changes made in the fork, and for viewers to export
their copy of the tutorial.
Also, changes to the GristDocTutorial table are now immediately reflected
in the tutorial popup.
Test Plan: Browser tests.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3930
Summary:
- Cleaning stripe data after billing tests
- Better stripe webhook test integration, that should fix test interference
- Not importing why-is-node-running when its not needed, which improves dev experience.
Test Plan: Modified
Reviewers: dsagal
Reviewed By: dsagal
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D3932
Summary:
- Fly flyctl cleanup which has been failing; deleting volumes before app no longer seems needed
- Move fly-deploy and add workflow scripts to make fly preview work in grist-core too
Test Plan: Tested that previews still work in this repo; and tested with a copy in grist-core.
Reviewers: jarek
Reviewed By: jarek
Subscribers: jarek
Differential Revision: https://phab.getgrist.com/D3928
Summary:
1. Introduces another highlight for link-selector rows, with the same color as
regular selection, and allowing to overlap with regular selection.
2. Don't show "secondary" cursors (those in inactive sections), to keep a single
cursor on the screen, since having multiple (which different in color) could
cause confusion.
3. An unrelated improvement (prompted by a new fixture doc) is to default the
active section to the top-left one (rather than the one with smallest rowId).
4. Another unrelated improvement (prompted by a test affected by the previous unrelated improvement) is to skip chart widgets when searching (previously search would step through those with an invisible "cursor").
Includes also tweaks for better testing on Arm-based Macs:
- Add support for TEST_CHROME_BINARY_PATH environment variable (helpful for a Mac arm64 architecture workaround)
- Remove unsetting of SELENIUM_REMOTE_URL when running headless (unlikely to affect anyone, and can be done outside the script, but interferes with the Mac workaround)
Test Plan: Added a new test case that cursor and linking-selector CSS classes are present or absent appropriately. Fixed test affected by the fix to default active section.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3891
Summary:
It became hard to detect aborted connections in node 16.
In node 14, req.on('close', ...) did the job. Thid diff adds a
work-around, until a better way is discovered or added.
Aborting a req will typically lead to 'close' being called
on the response, without writableFinished being set.
- https://github.com/nodejs/node/issues/38924
- https://github.com/nodejs/node/issues/40775
Test Plan:
existing DocApiForwarder test passes; manually
checking on various node versions.
Reviewers: JakubSerafin
Reviewed By: JakubSerafin
Differential Revision: https://phab.getgrist.com/D3923
Summary:
The previous code for extracting a Python formula from the LLM completion involved some shaky string manipulation which this improves on.
Overall the 'test results' from `runCompletion` went from 37/47 to 45/47 for `gpt-3.5-turbo-0613`.
The biggest problem that motivated these changes was that it assumed that code was always inside a markdown code block
(i.e. triple backticks) and so if there was no block there was no code. But the completion often consists of *only* code
with no accompanying explanation or markdown. By parsing the completion in Python instead of JS,
we can easily check if the entire completion is valid Python syntax and accept it if it is.
I also noticed one failure resulting from the completion containing the full function (instead of just the body)
and necessary imports before that function instead of inside. The new parsing moves import inside.
Test Plan: Added a Python unit test
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: paulfitz
Differential Revision: https://phab.getgrist.com/D3922
Not only GREP_TESTS can be assigned a single word like:
GREP_TESTS=DocApi yarn test
But also can be assigned a whole sentence part:
GREP_TESTS="supports ascending sort" yarn test
That's especially useful to run a single test (and not a whole test
suit)
Co-authored-by: Florent FAYOLLE <florent.fayolle@beta.gouv.fr>
Summary:
On Firefox and Safari, setting scrollLeft to a max safe integer was
causing it to be treated as 0. It's not clear why - for now, the
scrollWidth is used instead.
Also fixes a bug where the column title popup wouldn't appear for a
new column if tab was previously used to close the same popup for
the last column.
Test Plan: Browser test.
Reviewers: JakubSerafin
Reviewed By: JakubSerafin
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D3911
Summary:
Previously we failed to log signup info for users who signed up via
Google. This fixes that issue by recording it on first post-signup
visit. It also includes signup as a new telemetry event, recorded at the
same point.
Test Plan: Tested locally to see that a signup produces an appropriate log message and telemetry event.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3921
Summary:
In the future, it may be helpful to capture some output to display in the
formula editor (e.g. the result of calling print()), but for now it's best
to redirect stdout/stderr to avoid excessive logging.
Test Plan: Tested manually.
Reviewers: alexmojaki
Reviewed By: alexmojaki
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3919
Summary:
- Move css module for the login page css to core/, to be reusable in core/ pages.
- Move /welcome/teams implementation to WelcomeSitePicker.ts
- List users for personal sites, as well as team sites.
- Add org param to setSessionActive() API method and end endpoint, to allow
switching the specified org to another user.
- Add a little safety to getOrgUrl() function.
Test Plan: Added a test case for the new behaviors of the /welcome/teams page.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3914
Summary:
The issue is in the app due to the possibility of subtle differences in
order of events. It's hard to trigger a wrong order, but the fix is
intended to make it impossible.
Test Plan: Running test with a bunch of iterations, to see how reliable it is
Reviewers: georgegevoian
Reviewed By: georgegevoian
Subscribers: georgegevoian
Differential Revision: https://phab.getgrist.com/D3918
Summary:
This adds a `yarn cli settings telemetry [--json] [--all]` command
that allows telemetry settings to be inspected. It is useful for
keeping documentation about telemetry up to date.
Test Plan:
manual (a bit cheeky; justified on basis of breakage
not being very important yet, this is essentially an internal
feature)
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3917
Summary: Also fixes a few small bugs with telemetry collection.
Test Plan: Server and manual tests.
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3915
Summary:
For grist-static, we want to the data engine to be able to call external/exported JS functions directly,
rather than via the node 'server' living in another thread which requires synchronous communication hackery.
As a step in that direction, this diff changes the exported functions that we care about (guessColInfo and convertFromColumn)
to just using the top-level functions instead of relying on fields in ActiveDoc, namely docData.
For guessColInfo, this is done by directly passing the small amount of metadata that was previously retrieved from the DocData.
For convertFromColumn, disentangling DocData is a lot more complicated, so instead we construct a fresh DocData object using
the required metadata tables which are now passed in by the data engine.
Test Plan: Existing tests
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3913
Summary:
Adds support for optional telemetry to grist-core.
A new environment variable, GRIST_TELEMETRY_LEVEL, controls the level of telemetry collected.
Test Plan: Server and unit tests.
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: dsagal, anaisconce
Differential Revision: https://phab.getgrist.com/D3880
Summary:
Filter bar buttons used to overflow, which required horizontal scrolling. Buttons
now wrap instead.
Test Plan: Tested manually.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3901
Summary:
sanitazing errors output in webhooks to protect users data (not show them in logs and other places).
Because redis is returing whole payload when error occur, best approach is to hijack exception as close to redis operation as posible and sanitize the data.
We need to know data structure do do this corretly tho. Currently I decided to just censore everything that has "payload" key.
Test Plan: Because logs that need to be sanitized come from redis, to be valid tested we should force redis to crash. It's hard to do in our integration test setup. In this moment, unit test is all we got.
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3905
Summary:
Adding a way to detach an editor. Initially only implemented for the formula editor, includes redesign for the AI part.
- Initially, the detached editor is tight with the formula assistant and both are behind GRIST_FORMULA_ASSISTANT flag, but this can be relaxed
later on, as the detached editor can be used on its own.
- Detached editor is only supported in regular fields and on the creator panel. It is not supported yet for conditional styles, due to preview limitations.
- Old code for the assistant was removed completely, as it was only a temporary solution, but the AI conversation part was copied to the new one.
- Prompting was not modified in this diff, it will be included in the follow-up with more test cases.
Test Plan: Added only new tests; existing tests should pass.
Reviewers: JakubSerafin
Reviewed By: JakubSerafin
Differential Revision: https://phab.getgrist.com/D3863
Summary:
- Move makeXLSX* methods to workerExporter file to avoid the risk of creating a piscina worker pool from a thread.
- Increase request timeout in ExportsAccessRules test that started failing occasionally
Test Plan: Test should succeed more reliably
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3910
Summary:
- Excel exports were awfully memory-inefficient, causing occasional docWorker
crashes. The fix is to use the "streaming writer" option of ExcelJS
https://github.com/exceljs/exceljs#streaming-xlsx-writercontents. (Empirically
on one example, max memory went down from 3G to 100M)
- It's also CPU intensive and synchronous, and can block node for tens of
seconds. The fix is to use a worker-thread. This diff uses "piscina" library
for a pool of threads.
- Additionally, adds ProcessMonitor that logs memory and cpu usage,
particularly when those change significantly.
- Also introduces request cancellation, so that a long download cancelled by
the user will cancel the work being done in the worker thread.
Test Plan:
Updated previous export tests; memory and CPU performance tested
manually by watching output of ProcessMonitor.
Difference visible in these log excerpts:
Before (total time to serve request 22 sec):
```
Telemetry processMonitor heapUsedMB=2187, heapTotalMB=2234, cpuAverage=1.13, intervalMs=17911
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0.66, intervalMs=5005
Telemetry processMonitor heapUsedMB=2188, heapTotalMB=2234, cpuAverage=0, intervalMs=5005
Telemetry processMonitor heapUsedMB=71, heapTotalMB=75, cpuAverage=0.13, intervalMs=5002
```
After (total time to server request 18 sec):
```
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=0.5, intervalMs=5001
Telemetry processMonitor heapUsedMB=109, heapTotalMB=144, cpuAverage=1.39, intervalMs=5002
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.13, intervalMs=5000
Telemetry processMonitor heapUsedMB=94, heapTotalMB=131, cpuAverage=1.35, intervalMs=5001
```
Note in "Before" that heapTotalMB goes up to 2GB in the first case, and "intervalMs" of 17 seconds indicates that node was unresponsive for that long. In the second case, heapTotalMB stays low, and the main thread remains responsive the whole time.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3906
Test Plan:
Tested listed versions against docs.getgrist.com using browserstack, and tested
with an iPhone and local server that new settings are noticed on mobile.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3909