Commit Graph

18 Commits

Author SHA1 Message Date
George Gevoian
ebcfd2074f Fix bug that skips empty columns during imports
A faulty conditional in _makeDefaultTransformRule was the cause of the
bug. The conditional isn't necessary, as it's unreachable from the
import flows, so it was removed.
2022-08-11 11:05:30 -07:00
Paul Fitzpatrick
ec8ab598cb (core) add a yarn run cli tool, and add a sqlite gristify option
Summary:
This adds rudimentary support for opening certain SQLite files in Grist.

If you have a file such as `landing.db` in Grist, you can convert it to Grist format by doing (either in monorepo or grist-core):
```
yarn run cli -h
yarn run cli sqlite -h
yarn run cli sqlite gristify landing.db
```

The file is now openable by Grist. To actually do so with the regular Grist server, you'll need to either import it, or convert some doc you don't care about in the `samples/` directory to be a soft link to it (and then force a reload).

This implementation is a rudimentary experiment. Here are some awkwardnesses:
 * Only tables that happen to have a column called `id`, and where the column happens to be an integer, can be opened directly with Grist as it is today. That could be generalized, but it looked more than a Gristathon's worth of work, so I instead used SQLite views.
 * Grist will handle tables that start with an uncapitalized letter a bit erratically. You can successfully add columns, for example, but removing them will cause sadness - Grist will rename the table in a confused way.
 * I didn't attempt to deal with column names with spaces etc (though views could deal with those).
 * I haven't tried to do any fancy type mapping.
 * Columns with constraints can make adding new rows impossible in Grist, since Grist requires that a row can be added with just a single cell set.

Test Plan: added small test

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3502
2022-07-14 12:00:30 -04:00
Dmitry S
51ff72c15e (core) Faster builds all around.
Summary:
Building:
- Builds no longer wait for tsc for either client, server, or test targets. All use esbuild which is very fast.
- Build still runs tsc, but only to report errors. This may be turned off with `SKIP_TSC=1` env var.
- Grist-core continues to build using tsc.
- Esbuild requires ES6 module semantics. Typescript's esModuleInterop is turned
  on, so that tsc accepts and enforces correct usage.
- Client-side code is watched and bundled by webpack as before (using esbuild-loader)

Code changes:
- Imports must now follow ES6 semantics: `import * as X from ...` produces a
  module object; to import functions or class instances, use `import X from ...`.
- Everything is now built with isolatedModules flag. Some exports were updated for it.

Packages:
- Upgraded browserify dependency, and related packages (used for the distribution-building step).
- Building the distribution now uses esbuild's minification. babel-minify is no longer used.

Test Plan: Should have no behavior changes, existing tests should pass, and docker image should build too.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Subscribers: alexmojaki

Differential Revision: https://phab.getgrist.com/D3506
2022-07-04 10:42:40 -04:00
Dmitry S
309ddb0fe7 (core) Move guessing logic for column types to run in node once for all columns.
Summary:
Previously, columns of type Any were created and modified one by one by reusing
the "empty column" logic from the data engine. This copies that logic to Node,
and sets the type of all columns together, to create them with the correct type
in the AddTable call.

This makes imports about twice faster (when slowness is due to many columns),
but doesn't address all cases where individual handling of columns causes slowness.

Test Plan: Added a test case for the new helper function.

Reviewers: alexmojaki

Reviewed By: alexmojaki

Subscribers: alexmojaki

Differential Revision: https://phab.getgrist.com/D3427
2022-05-19 12:49:51 -04:00
Alex Hall
6c90de4d62 (core) Switch excel import parsing from messytables+xlrd to openpyxl, and ignore empty rows
Summary:
Use openpyxl instead of messytables (which used xlrd internally) in import_xls.py.

Skip empty rows since excel files can easily contain huge numbers of them.

Drop support for xls files (which openpyxl doesn't support) in favour of the newer xlsx format.

Fix some details relating to python virtualenvs and dependencies, as Jenkins was failing to find new Python dependencies.

Test Plan: Mostly relying on existing tests. Updated various tests which referred to xls files instead of xlsx. Added a Python test for skipping empty rows.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3406
2022-05-12 14:43:21 +02:00
George Gevoian
ad04744b4a (core) Fix import bug when skipping non-text columns
Summary:
Skipping columns during incremental imports wasn't working for certain
column types, such as numeric columns. The column's default value was
being used instead (e.g. 0), overwriting values in the destination
table.

Test Plan: Browser tests.

Reviewers: jarek

Reviewed By: jarek

Subscribers: alexmojaki

Differential Revision: https://phab.getgrist.com/D3402
2022-04-28 12:46:44 -07:00
George Gevoian
f02174eb7e (core) Fix error when canceling import
Summary:
If cancel was clicked while a transform section was still being
generated in the Importer, an error was thrown. This refactors
the cancelImportFiles API action to take in the file upload id
in place of the entire DataSourceTransformed parameter, which
contains other values that are irrelevant to canceling. One of those
values, the transform section id, was causing the error to be thrown
since it was momentarily null.

Test Plan: Tested manually.

Reviewers: alexmojaki

Reviewed By: alexmojaki

Differential Revision: https://phab.getgrist.com/D3317
2022-03-10 16:24:49 -08:00
Alex Hall
321019217d (core) Lossless imports
Summary:
- Removed string parsing and some type guessing code from parse_data.py. That logic is now implicitly done by ValueGuesser by leaving the initial column type as Any. parse_data.py mostly comes into play when importing files (e.g. Excel) containing values that already have types, i.e. numbers and dates.
- 0s and 1s are treated as numbers instead of booleans to keep imports lossless.
- Removed dateguess.py and test_dateguess.py.
- Changed what `guessDateFormat` does when multiple date formats work equally well for the given data, in order to be consistent with the old dateguess.py.
- Columns containing numbers are now always imported as Numeric, never Int.
- Removed `NullIfEmptyParser` because it was interfering with the new system. Its purpose was to avoid pointlessly changing a column from Any to Text when no actual data was inserted. A different solution to that problem was already added to `_ensure_column_accepts_data` in the data engine in a recent related diff.

Test Plan:
- Added 2 `nbrowser/Importer2` tests.
- Updated various existing tests.
- Extended testing of `guessDateFormat`. Added `guessDateFormats` to show how ambiguous dates are handled internally.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3302
2022-03-08 12:14:39 +02:00
Edward Betts
d6e0e1fee3 Correct spelling mistakes 2022-02-19 09:46:49 +00:00
George Gevoian
6abe7d5827 (core) Use original column headers during imports
Summary:
When possible, the original column headers from imported
files will now be used as the labels for Grist columns. This includes
values that were previously invalid Grist column identifiers, such
as those containing Unicode.

Test Plan: Updated server and browser tests.

Reviewers: jarek

Reviewed By: jarek

Differential Revision: https://phab.getgrist.com/D3261
2022-02-13 16:50:19 -08:00
Alex Hall
d1a848b44a (core) Parse string cell values in Doc API and Imports
Summary:
- Adds a function `parseUserAction` for parsing strings in UserActions to `ValueParser.ts`
- Adds a boolean option `parseStrings` to use `parseUserAction` in `ActiveDoc.applyUserActions`, off by default.
- Uses `parseStrings` by default in DocApi (set `?noparse=true` in a request to disable) when adding/updating records through the `/data` or `/records` endpoints or in general with the `/apply` endpoint.
- Uses `parseStrings` for various actions in `ActiveDocImport`. Since most types are parsed in Python before these actions are constructed, this only affects references, which still look like errors in the import preview. Importing references can also easily still run into more complicated problems discussed in https://grist.slack.com/archives/C0234CPPXPA/p1639514844028200

Test Plan:
- Added tests to DocApi to compare behaviour with and without string parsing.
- Added a new browser test, fixture doc, and fixture CSV to test importing a file containing references.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3183
2021-12-17 15:40:58 +02:00
Jarosław Sadziński
1ae586cf42 (core) Adding Skip options when importing multiple tables.
Summary:
Adding new destination "Skip" for multiple table imports.
Selecting this destination skips the import and makes the preview grayed out.

Test Plan: New Tests

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3181
2021-12-13 19:07:33 +01:00
George Gevoian
c6aa9b65d4 (core) Fix bug preventing importing of nested json files
Summary:
BulkAddRecord when finishing imports of nested JSON was throwing
an error due to unchecked access of referencing tables. This adds
a guard to prepare_new_values to handle such cases.

Imports happened to cause this to occur because the order that
imported tables are created/populated isn't aware of references
between tables, so it's possible for a reference column to
exist (momentarily) without a valid reference to another table.
These references are currently fixed after all imported tables are
created/populated.

Test Plan: Browser test.

Reviewers: dsagal

Reviewed By: dsagal

Subscribers: dsagal

Differential Revision: https://phab.getgrist.com/D3144
2021-11-18 17:06:03 -08:00
George Gevoian
08b1286f4f (core) Add column matching to Importer
Summary:
The Importer dialog is now maximized, showing additional column
matching options and information on the left, with the preview
table shown on the right. Columns can be mapped via a select menu
listing all source columns, or by clicking a formula field next to
the menu and directly editing the transform formula.

Test Plan: Browser tests.

Reviewers: jarek

Reviewed By: jarek

Differential Revision: https://phab.getgrist.com/D3096
2021-11-09 12:30:52 -08:00
George Gevoian
62db263d1f (core) Add diff preview to Importer
Summary:
Updates the preview table in Importer to show a diff of changes
when importing into an existing table and updating existing records.

Test Plan: Browser tests.

Reviewers: paulfitz

Reviewed By: paulfitz

Differential Revision: https://phab.getgrist.com/D3060
2021-10-08 14:15:07 -07:00
George Gevoian
e1780e4f58 (core) Migrate import code from data engine to Node
Summary:
Finishing imports now occurs in Node instead of the
data engine, which makes it possible to import into
on-demand tables. Merging code was also refactored
and now uses a SQL query to diff source and destination
tables in order to determine what to update or add.

Also fixes a bug where incremental imports involving
Excel files with multiple sheets would fail due to the UI
not serializing merge options correctly.

Test Plan: Browser tests.

Reviewers: jarek

Reviewed By: jarek

Differential Revision: https://phab.getgrist.com/D3046
2021-10-04 10:27:00 -07:00
George Gevoian
8a7edb6257 (core) Enable incremental imports
Summary:
The import dialog now has an option to 'Update existing records',
which when checked will allow for selection of 1 or more fields
to match source and destination tables on.

If all fields match, then the matched record in the
destination table will be merged with the incoming record
from the source table. This means the incoming values will
replace the destination table values, unless the incoming
values are blank.

Additional merge strategies are implemented in the data
engine, but the import dialog only uses one of the
strategies currently. The others can be exposed in the UI
in the future, and tweak the behavior of how source
and destination values should be merged in different contexts,
such as when blank values exist.

Test Plan: Python and browser tests.

Reviewers: paulfitz

Reviewed By: paulfitz

Subscribers: alexmojaki

Differential Revision: https://phab.getgrist.com/D3020
2021-09-16 09:15:54 -07:00
Paul Fitzpatrick
5ef889addd (core) move home server into core
Summary: This moves enough server material into core to run a home server.  The data engine is not yet incorporated (though in manual testing it works when ported).

Test Plan: existing tests pass

Reviewers: dsagal

Reviewed By: dsagal

Differential Revision: https://phab.getgrist.com/D2552
2020-07-21 20:39:10 -04:00