Summary:
Previously, columns of type Any were created and modified one by one by reusing
the "empty column" logic from the data engine. This copies that logic to Node,
and sets the type of all columns together, to create them with the correct type
in the AddTable call.
This makes imports about twice faster (when slowness is due to many columns),
but doesn't address all cases where individual handling of columns causes slowness.
Test Plan: Added a test case for the new helper function.
Reviewers: alexmojaki
Reviewed By: alexmojaki
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3427
Summary:
Use openpyxl instead of messytables (which used xlrd internally) in import_xls.py.
Skip empty rows since excel files can easily contain huge numbers of them.
Drop support for xls files (which openpyxl doesn't support) in favour of the newer xlsx format.
Fix some details relating to python virtualenvs and dependencies, as Jenkins was failing to find new Python dependencies.
Test Plan: Mostly relying on existing tests. Updated various tests which referred to xls files instead of xlsx. Added a Python test for skipping empty rows.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3406
Summary:
Skipping columns during incremental imports wasn't working for certain
column types, such as numeric columns. The column's default value was
being used instead (e.g. 0), overwriting values in the destination
table.
Test Plan: Browser tests.
Reviewers: jarek
Reviewed By: jarek
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3402
Summary:
If cancel was clicked while a transform section was still being
generated in the Importer, an error was thrown. This refactors
the cancelImportFiles API action to take in the file upload id
in place of the entire DataSourceTransformed parameter, which
contains other values that are irrelevant to canceling. One of those
values, the transform section id, was causing the error to be thrown
since it was momentarily null.
Test Plan: Tested manually.
Reviewers: alexmojaki
Reviewed By: alexmojaki
Differential Revision: https://phab.getgrist.com/D3317
Summary:
- Removed string parsing and some type guessing code from parse_data.py. That logic is now implicitly done by ValueGuesser by leaving the initial column type as Any. parse_data.py mostly comes into play when importing files (e.g. Excel) containing values that already have types, i.e. numbers and dates.
- 0s and 1s are treated as numbers instead of booleans to keep imports lossless.
- Removed dateguess.py and test_dateguess.py.
- Changed what `guessDateFormat` does when multiple date formats work equally well for the given data, in order to be consistent with the old dateguess.py.
- Columns containing numbers are now always imported as Numeric, never Int.
- Removed `NullIfEmptyParser` because it was interfering with the new system. Its purpose was to avoid pointlessly changing a column from Any to Text when no actual data was inserted. A different solution to that problem was already added to `_ensure_column_accepts_data` in the data engine in a recent related diff.
Test Plan:
- Added 2 `nbrowser/Importer2` tests.
- Updated various existing tests.
- Extended testing of `guessDateFormat`. Added `guessDateFormats` to show how ambiguous dates are handled internally.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3302
Summary:
When possible, the original column headers from imported
files will now be used as the labels for Grist columns. This includes
values that were previously invalid Grist column identifiers, such
as those containing Unicode.
Test Plan: Updated server and browser tests.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3261
Summary:
- Adds a function `parseUserAction` for parsing strings in UserActions to `ValueParser.ts`
- Adds a boolean option `parseStrings` to use `parseUserAction` in `ActiveDoc.applyUserActions`, off by default.
- Uses `parseStrings` by default in DocApi (set `?noparse=true` in a request to disable) when adding/updating records through the `/data` or `/records` endpoints or in general with the `/apply` endpoint.
- Uses `parseStrings` for various actions in `ActiveDocImport`. Since most types are parsed in Python before these actions are constructed, this only affects references, which still look like errors in the import preview. Importing references can also easily still run into more complicated problems discussed in https://grist.slack.com/archives/C0234CPPXPA/p1639514844028200
Test Plan:
- Added tests to DocApi to compare behaviour with and without string parsing.
- Added a new browser test, fixture doc, and fixture CSV to test importing a file containing references.
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3183
Summary:
Adding new destination "Skip" for multiple table imports.
Selecting this destination skips the import and makes the preview grayed out.
Test Plan: New Tests
Reviewers: georgegevoian
Reviewed By: georgegevoian
Differential Revision: https://phab.getgrist.com/D3181
Summary:
BulkAddRecord when finishing imports of nested JSON was throwing
an error due to unchecked access of referencing tables. This adds
a guard to prepare_new_values to handle such cases.
Imports happened to cause this to occur because the order that
imported tables are created/populated isn't aware of references
between tables, so it's possible for a reference column to
exist (momentarily) without a valid reference to another table.
These references are currently fixed after all imported tables are
created/populated.
Test Plan: Browser test.
Reviewers: dsagal
Reviewed By: dsagal
Subscribers: dsagal
Differential Revision: https://phab.getgrist.com/D3144
Summary:
The Importer dialog is now maximized, showing additional column
matching options and information on the left, with the preview
table shown on the right. Columns can be mapped via a select menu
listing all source columns, or by clicking a formula field next to
the menu and directly editing the transform formula.
Test Plan: Browser tests.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3096
Summary:
Updates the preview table in Importer to show a diff of changes
when importing into an existing table and updating existing records.
Test Plan: Browser tests.
Reviewers: paulfitz
Reviewed By: paulfitz
Differential Revision: https://phab.getgrist.com/D3060
Summary:
Finishing imports now occurs in Node instead of the
data engine, which makes it possible to import into
on-demand tables. Merging code was also refactored
and now uses a SQL query to diff source and destination
tables in order to determine what to update or add.
Also fixes a bug where incremental imports involving
Excel files with multiple sheets would fail due to the UI
not serializing merge options correctly.
Test Plan: Browser tests.
Reviewers: jarek
Reviewed By: jarek
Differential Revision: https://phab.getgrist.com/D3046
Summary:
The import dialog now has an option to 'Update existing records',
which when checked will allow for selection of 1 or more fields
to match source and destination tables on.
If all fields match, then the matched record in the
destination table will be merged with the incoming record
from the source table. This means the incoming values will
replace the destination table values, unless the incoming
values are blank.
Additional merge strategies are implemented in the data
engine, but the import dialog only uses one of the
strategies currently. The others can be exposed in the UI
in the future, and tweak the behavior of how source
and destination values should be merged in different contexts,
such as when blank values exist.
Test Plan: Python and browser tests.
Reviewers: paulfitz
Reviewed By: paulfitz
Subscribers: alexmojaki
Differential Revision: https://phab.getgrist.com/D3020
Summary: This moves enough server material into core to run a home server. The data engine is not yet incorporated (though in manual testing it works when ported).
Test Plan: existing tests pass
Reviewers: dsagal
Reviewed By: dsagal
Differential Revision: https://phab.getgrist.com/D2552