Summary:
Sometimes the model repeats the classes given in the prompt which would mess up extracting the actual formula. This diff solves this by:
1. Changes the generated Python schema so that (a) the thing that needs completing is a plain top level function instead of a property/method inside the class and (2) the classes are fully valid syntax, which makes it easier to
2. Remove classes from the parsed Python code when converting the completion to a formula.
3. Tweak the prompt wording to discourage including classes in general, especially because sometimes the model tries to solve the problem by defining extra methods/attributes/classes.
While I was at it, I changed type hints to use builtins (e.g. `list` instead of `List`) to prevent `from typing import List` which was happening sometimes and would look weird in a formula. Similarly I removed `@dataclass` since that also implies an import, and this also fits with the tweaked wording that the classes are fake.
Test Plan:
Added a new test case to the formula dataset which triggers the unwanted behaviour. The factors that seem to trigger the problem are (1) a small schema so the classes are easier to repeat and (2) the need to import modules, which the model wants to place before all other code. The case failed before this diff and succeeded after. The tweaked wording reduces the chances of repeating the classes but didn't eliminate it, so forcibly removing the classes in Python was needed.
There were also a couple of other existing cases where repeating the classes was observed before but not after.
Overall the score increased from 49 to 51 out of 69 (including the new case). At one point the score was 53, but changes in whitespace were enough to make it drop again.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D4000
* REQUEST now supports POST
* Add extra flag for enabling REQUEST, also update README and comments
Co-authored-by: John Cant <a.jonncant@gmail.com>
Co-authored-by: Alex Hall <alex.mojaki@gmail.com>
Summary: Mostly just changes to tests to make them more flexible.
Test Plan: Python tests pass locally with 3.10 and 3.11. Making tests run in CI on these versions will happen in grist-core.
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: paulfitz
Differential Revision: https://phab.getgrist.com/D3978
Summary:
Upgrades asttokens mainly because I suspected it would have helped with an error in a user's document (see https://grist.slack.com/archives/C0234CPPXPA/p1686145370484509) but we were unable to confirm that. Also adds a related test that I (wrongly) expected to have a similar problem before upgrading asttokens.
Upgrades wrapt for Python 3.11 support. Closes https://github.com/gristlabs/grist-core/issues/534.
Not upgrading the dependencies for Python 2 because pip is giving me errors when I try to install to the `thirdparty` folder. Both these upgrades are motivated by issues specific to Python 3 so this doesn't seem worth pursuing immediately.
Test Plan: Existing tests.
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: paulfitz
Differential Revision: https://phab.getgrist.com/D3973
Summary:
Replaces https://phab.getgrist.com/D3940, particularly to avoid doing potentially unwanted things automatically.
Adds optional fields `evaluateCurrentFormula?: boolean; rowId?: number` to `FormulaAssistanceContext` (part of `AssistanceRequest`). When `evaluateCurrentFormula` is `true`, calls a new function `evaluate_formula` in the sandbox which computes the existing formula in the column (regardless of anything the AI may have suggested) and uses that to generate an additional system message which is added before the user's message. In theory this could be used in an interface where users ask why a formula doesn't work, including possibly a formula suggested by the AI. For now, it's only used in `runCompletion_impl.ts` for experimenting.
Also cleaned up a bit, removing `_chatMode` which is always `true` now, and uses of `regenerate` which is always `false`.
Test Plan: Updated `runCompletion_impl` to optionally use the new feature, in which case it now scores 51/68 instead of 49/68.
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3970
Summary:
The image Jenkins used up to now was from 2021. We updated buildtools/jenkins-ec2/Dockerfile since then but couldn't use new image; we can now that an issue with mounts in gvisor has been resolved.
The new image uses newer Chrome (111 vs 95); 111 is the latest version that works with the webdriver version we are using (after that one, copy-pasting tests fail).
In addition, this adds a more precise way to maintain python dependencies: by specifying top-level dependencies in the new core/sandbox/requirements3.in. An updated build step compiles that into requirements3.txt, and syncs venv to that using pip-sync.
This addresses the issue that previously, removing a module from dependencies would not have caused the build to remove it from sandbox_venv3.
Test Plan: Existing tests pass on the new image.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3952
Summary:
- Replace logger module by the standard module 'logging'.
- When a log message from the sandbox includes newlines (e.g. for tracebacks),
keep those lines together in the Node log message.
Previously each line was a different message, making it difficult to view
tracebacks, particularly in prod where each line becomes a separate message
object.
- Fix assorted lint errors.
Test Plan: Added a test for the log-line splitting and escaping logic.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3956
Summary:
This tweaks the prompting so that the user's message is given on its own instead of as a docstring within Python. This is so that the prompt makes sense when:
- the user asks a question such as "Can you write me a formula which does ...?" rather than describing their formula as a docstring would, or
- the user sends a message that doesn't ask for a formula at all (https://grist.slack.com/archives/C0234CPPXPA/p1687699944315069?thread_ts=1687698078.832209&cid=C0234CPPXPA)
Also added wording for the model to refuse when the user asks for something that the model cannot do.
Because the code (and maybe in some cases the model) for non-ChatGPT models relies on the prompt consisting entirely of Python code produced by the data engine (which no longer contains the user's message) those code paths have been disabled for now. Updating them now seems like undesirable drag, I think it'd be better to revisit this when iteration/experimentation has slowed down and stabilised.
Test Plan:
Added entries to the formula dataset where the response shouldn't contain a formula, indicated by the value `1` for the new column `no_formula`.
This is somewhat successful, as the model does refuse to help in some of the new test cases, but not all. Performance on existing entries also seems a bit worse, but it's hard to distinguish this from random noise. Hopefully this can be remedied in the future with more work, e.g. automatic followup messages containing example inputs and outputs.
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D3936
Summary: Converts `rec.` to `$` in AI generated formulas, and removes redundant `return` at the end.
Test Plan: Expanded unit test.
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3933
Summary:
The previous code for extracting a Python formula from the LLM completion involved some shaky string manipulation which this improves on.
Overall the 'test results' from `runCompletion` went from 37/47 to 45/47 for `gpt-3.5-turbo-0613`.
The biggest problem that motivated these changes was that it assumed that code was always inside a markdown code block
(i.e. triple backticks) and so if there was no block there was no code. But the completion often consists of *only* code
with no accompanying explanation or markdown. By parsing the completion in Python instead of JS,
we can easily check if the entire completion is valid Python syntax and accept it if it is.
I also noticed one failure resulting from the completion containing the full function (instead of just the body)
and necessary imports before that function instead of inside. The new parsing moves import inside.
Test Plan: Added a Python unit test
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: paulfitz
Differential Revision: https://phab.getgrist.com/D3922
Summary:
In the future, it may be helpful to capture some output to display in the
formula editor (e.g. the result of calling print()), but for now it's best
to redirect stdout/stderr to avoid excessive logging.
Test Plan: Tested manually.
Reviewers: alexmojaki
Reviewed By: alexmojaki
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3919
Summary:
For grist-static, we want to the data engine to be able to call external/exported JS functions directly,
rather than via the node 'server' living in another thread which requires synchronous communication hackery.
As a step in that direction, this diff changes the exported functions that we care about (guessColInfo and convertFromColumn)
to just using the top-level functions instead of relying on fields in ActiveDoc, namely docData.
For guessColInfo, this is done by directly passing the small amount of metadata that was previously retrieved from the DocData.
For convertFromColumn, disentangling DocData is a lot more complicated, so instead we construct a fresh DocData object using
the required metadata tables which are now passed in by the data engine.
Test Plan: Existing tests
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3913
Summary:
Due to a mishap, two distinct migrations with the same migration
number were introduced into Grist. This diff reconciles them as
best we can, by adding another migration to make sure both desired
changes have run (and running them if not).
Test Plan:
updated a test; checked manually that documents
with different 38 migrations are handled as expected.
Reviewers: georgegevoian, jarek
Reviewed By: georgegevoian, jarek
Differential Revision: https://phab.getgrist.com/D3895
Summary:
This adds a UI panel for managing webhooks. Work started by Cyprien Pindat. You can find the UI on a document's settings page. Main changes relative to Cyprien's demo:
* Changed behavior of virtual table to be more consistent with the rest of Grist, by factoring out part of the implementation of on-demand tables.
* Cell values that would create an error can now be denied and reverted (as for the rest of Grist).
* Changes made by other users are integrated in a sane way.
* Basic undo/redo support is added using the regular undo/redo stack.
* The table list in the drop-down is now updated if schema changes.
* Added a notification from back-end when webhook status is updated so constant polling isn't needed to support multi-user operation.
* Factored out webhook specific logic from general virtual table support.
* Made a bunch of fixes to various broken behavior.
* Added tests.
The code remains somewhat unpolished, and behavior in the presence of errors is imperfect in general but may be adequate for this case.
I assume that we'll soon be lifting the restriction on the set of domains that are supported for webhooks - otherwise we'd want to provide some friendly way to discover that list of supported domains rather than just throwing an error.
I don't actually know a lot about how the front-end works - it looks like tables/columns/fields/sections can be safely added if they have string ids that won't collide with bone fide numeric ids from the back end. Sneaky.
Contains a migration, so needs an extra reviewer for that.
Test Plan: added tests
Reviewers: jarek, dsagal
Reviewed By: jarek, dsagal
Differential Revision: https://phab.getgrist.com/D3856
* add support for conversational state to assistance endpoint
This refactors the assistance code somewhat, to allow carrying
along some conversational state. It extends the OpenAI-flavored
assistant to make use of that state to have a conversation.
The front-end is tweaked a little bit to allow for replies that
don't have any code in them (though I didn't get into formatting
such replies nicely).
Currently tested primarily through the runCompletion script,
which has been extended a bit to allow testing simulated
conversations (where an error is pasted in follow-up, or
an expected-vs-actual comparison).
Co-authored-by: George Gevoian <85144792+georgegevoian@users.noreply.github.com>
Summary:
- When importing into a Ref column, use lookupOne() formula for correct previews.
- When selecting columns to import into a Ref column, now a Numeric column like
'Order' will produce two options: "Order" and "Order (as row ID)".
- Fixes exports to correct the formatting of visible columns. This addresses multiple bugs:
1. Formatting wasn't used, e.g. a Ref showing a custom-formatted date was still presented as YYYY-MM-DD in CSVs.
2. Ref showing a Numeric column was formatted as if a row ID (e.g. `Table1[1.5]`), which is very wrong.
- If importing into a table that doesn't have a primary view, don't switch page after import.
Refactorings:
- Generalize GenImporterView to be usable in more cases; removed near-duplicated logic from node side
- Some other refactoring in importing code.
- Fix field/column option selection in ValueParser
- Add NUM() helper to turn integer-valued floats into ints, useful for "as row ID" lookups.
Test Plan: Added test cases for imports into reference columns, updated Exports test fixtures.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3875
Summary:
This addresses two issues, differently:
- For a formula with leading whitespace, like " 1+1", it is stored as is, but
is fixed to work (it should be valid Python, and whitespace is only stripped out
at parsing time to avoid intentation errors caused by the way it gets parsed)
- For a formula with a leading equals-sign ("="), it is stripped out on the
client side before the formula is stored. Grist documentation uses leading
"=" to indicate formulas (because UI shows an "=" icon), and Excel formulas
actually contain the leading "=", so it is a common mistake to include it.
Test Plan: Added new test cases
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3873
Summary: Also adds an explanatory comment for a recently-added column.
Test Plan: N/A
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3854
Summary:
When a formula used a local variable referring to a user table (which is
a global name), e.g. `myvar = MyTable; myvar.lookupOne(...)`, renaming
logic mistakenly used the inferred name (`MyTable`) in places where the
actual variable name (`myvar`) should have been used.
Additionally, we should not rename local variables, even if they match a
global name.
This fixes both issues.
Test Plan: Added a test case that checks that local variables aren't renamed.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3846
Test Plan: Added a test case to trigger this situation.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3832
Summary:
The problem is that the implementation for a summary update was relying on type consistency to get columns (ie: matches agains colId and type).
Type consistency is an attempt at maintaining consistent type across same-named column for summaries of same table.
But the problem is that the consistency of types is NOT a strict guarantee or an invariant, more of a best-effort attempt (there are too many possible sequences of operations possible with renaming/adding/removing in summary tables and the underlying table).
With current implementation and with a document violating the type consistency, a summary table could end up with fields referencing columns to the former summary table (more detail below(1)). Which is a bad state (yields js errors on the client).
This diff fixes this issue by relaxing the type comparison when search for same-named column.
(1) __Below is a description of how a violation of type consistency could end-up in bad state document (example taken from the reported bug):__
> In this document, let's assume two summary tables `Table1 [by A]` and `Table1 [Totals]`. Let's also assume Table1 and `Table1 [Totals]` both have an `Amount(Numeric)` column, and that `Table1 [by A]` has one `Amount(Any)` column (violating the type consistency principle). Now when users wanted to change the `Table1 [Totals]` section to group by 'A', grist found that there is already a summary table with same grouping. But it couldn't find a matching column for `Amount(Numeric)` so it created a new one. Except that because there was still an `Amount(Any)` the new column was named `Amount2` which caused following code to ignore it and in particular forgetting to update it's corresponding section's field which was then pointing toward the column of a different table (which is bad).
Test Plan: Added python test.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3809
Summary:
- For python2, skip some tests of renaming which produce different results
because of an un-upgradable astroid version.
- Fix test affected by pyCall() having changed to async; avoid hanging timeout
callback in case of error.
Test Plan: All test cases should now pass (with 4 getting skipped)
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3819
This adds a new `GRIST_SANDBOX_FLAVOR=pyodide` option where the
version of Python used for the data engine is wasm, and so can
be run by node like the rest of the back end. It still runs as
a separate process.
There are a few small version changes made to packages to avoid
various awkwardnesses present in the current versions. All existing
tests pass.
This is very experimental. To use, you'll need something with
a bash shell and make. First do:
```
cd sandbox/pyodide
make setup # README.md and Makefile have details
cd ..
```
Then running Grist as:
```
GRIST_SANDBOX_FLAVOR=pyodide yarn start
```
should work. Adding a formula with content:
```
import sys; return sys.version
```
should return a different Python version than other sandboxes.
The motivation for this work is to have a form of sandboxing
that will work on Windows for Grist Electron (for Linux we have
gvisor/runsc, for Mac we have sandbox-exec, but I haven't found
anything comparable for Windows).
It also brings a back-end-free version of Grist a bit closer, for
use-cases where that would make sense - such as serving a report
(in the form of a Grist document) on a static site.
Summary:
Whole numbers, when imported from Excel into a Text column show up
without decimals (e.g. "300"), but when imported from Google Sheets show
up with decimals (e.g. "300.0"). The decimals are hard for end-users to
remove. Fix by treating whole numbers consistently as ints.
Test Plan: Added a fixture reproducing the issue, and a test case.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3800
Summary:
- A bunch of optimizations guided by python profiling (esp. py-spy)
- Big one is optimizing Record/RecordSet attribute access
- Adds tracemalloc printout when running test_replay with PYTHONTRACEMALLOC=1 (on PY3)
(but memory size is barely affected by these changes)
- Testing with RECORD_SANDBOX_BUFFERS_DIR, loading and calculating a particular
very large doc (CRM), time taken improved from 73.9s to 54.8s (26% faster)
Test Plan: No behavior changes intended; relying on existing tests to verify that.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3781
Summary: This is a backend part for the formula AI.
Test Plan: New tests
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: cyprien
Differential Revision: https://phab.getgrist.com/D3786
There was no script for updating typescript schema information after
a python-based document migration. Moving one in here, along with its
test. Tweaked the code slightly to work with grist-core's directory
structure. Also fixed a formatting error in mocha calls that was resulting
in some root tests not running.
Summary:
Fix for a bug that prevented two users to change column types at
the same time.
Test Plan: Added and updated
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3745
Summary:
for users who don't automatically have deep rights
to the document, provide them with attachment metadata only
for rows they have access to. This is a little tricky to
do efficiently. We provide attachment metadata when an
individual table is fetched, rather than on initial document
load, so we don't block that load on a full document scan.
We provide attachment metadata to a client when we see that
we are shipping rows mentioning particular attachments,
without making any effort to keep track of the metadata they
already have.
Test Plan: updated tests
Reviewers: dsagal, jarek
Reviewed By: dsagal, jarek
Differential Revision: https://phab.getgrist.com/D3722
Summary:
This is just a convenience for myself. I happen to have a version of
gvisor on my Linux dev machine that differs from what we use in our
containers. There's a small difference in user setup that only manifests
itself when importing files. Grist uses a directory readable only by
the creating user, created outside the container, and then accessed
within the container. For that to work, the user identities have to
line up exactly. This adds a flag I can set in my environment to make
things work. An alternative solution that doesn't require a flag
would be to make the temporary directories readable by other users,
but that seemed a bigger change than justified.
Ideally we'd make a very robust and easy to run sandbox for Linux
users, and I have ideas there for the future.
Test Plan: manual
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D3742
Summary:
Adds a new UI for writing access rule memos.
Migrates old memos (written as Python comments) to the new UI.
Test Plan: Browser and migration tests.
Reviewers: jarek, dsagal
Reviewed By: jarek
Subscribers: dsagal, paulfitz
Differential Revision: https://phab.getgrist.com/D3726