Commit Graph

14 Commits

Author SHA1 Message Date
Alex Hall
6c90de4d62 (core) Switch excel import parsing from messytables+xlrd to openpyxl, and ignore empty rows
Summary:
Use openpyxl instead of messytables (which used xlrd internally) in import_xls.py.

Skip empty rows since excel files can easily contain huge numbers of them.

Drop support for xls files (which openpyxl doesn't support) in favour of the newer xlsx format.

Fix some details relating to python virtualenvs and dependencies, as Jenkins was failing to find new Python dependencies.

Test Plan: Mostly relying on existing tests. Updated various tests which referred to xls files instead of xlsx. Added a Python test for skipping empty rows.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3406
2022-05-12 14:43:21 +02:00
George Gevoian
ad04744b4a (core) Fix import bug when skipping non-text columns
Summary:
Skipping columns during incremental imports wasn't working for certain
column types, such as numeric columns. The column's default value was
being used instead (e.g. 0), overwriting values in the destination
table.

Test Plan: Browser tests.

Reviewers: jarek

Reviewed By: jarek

Subscribers: alexmojaki

Differential Revision: https://phab.getgrist.com/D3402
2022-04-28 12:46:44 -07:00
George Gevoian
f02174eb7e (core) Fix error when canceling import
Summary:
If cancel was clicked while a transform section was still being
generated in the Importer, an error was thrown. This refactors
the cancelImportFiles API action to take in the file upload id
in place of the entire DataSourceTransformed parameter, which
contains other values that are irrelevant to canceling. One of those
values, the transform section id, was causing the error to be thrown
since it was momentarily null.

Test Plan: Tested manually.

Reviewers: alexmojaki

Reviewed By: alexmojaki

Differential Revision: https://phab.getgrist.com/D3317
2022-03-10 16:24:49 -08:00
Alex Hall
321019217d (core) Lossless imports
Summary:
- Removed string parsing and some type guessing code from parse_data.py. That logic is now implicitly done by ValueGuesser by leaving the initial column type as Any. parse_data.py mostly comes into play when importing files (e.g. Excel) containing values that already have types, i.e. numbers and dates.
- 0s and 1s are treated as numbers instead of booleans to keep imports lossless.
- Removed dateguess.py and test_dateguess.py.
- Changed what `guessDateFormat` does when multiple date formats work equally well for the given data, in order to be consistent with the old dateguess.py.
- Columns containing numbers are now always imported as Numeric, never Int.
- Removed `NullIfEmptyParser` because it was interfering with the new system. Its purpose was to avoid pointlessly changing a column from Any to Text when no actual data was inserted. A different solution to that problem was already added to `_ensure_column_accepts_data` in the data engine in a recent related diff.

Test Plan:
- Added 2 `nbrowser/Importer2` tests.
- Updated various existing tests.
- Extended testing of `guessDateFormat`. Added `guessDateFormats` to show how ambiguous dates are handled internally.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3302
2022-03-08 12:14:39 +02:00
Edward Betts
d6e0e1fee3 Correct spelling mistakes 2022-02-19 09:46:49 +00:00
George Gevoian
6abe7d5827 (core) Use original column headers during imports
Summary:
When possible, the original column headers from imported
files will now be used as the labels for Grist columns. This includes
values that were previously invalid Grist column identifiers, such
as those containing Unicode.

Test Plan: Updated server and browser tests.

Reviewers: jarek

Reviewed By: jarek

Differential Revision: https://phab.getgrist.com/D3261
2022-02-13 16:50:19 -08:00
Alex Hall
d1a848b44a (core) Parse string cell values in Doc API and Imports
Summary:
- Adds a function `parseUserAction` for parsing strings in UserActions to `ValueParser.ts`
- Adds a boolean option `parseStrings` to use `parseUserAction` in `ActiveDoc.applyUserActions`, off by default.
- Uses `parseStrings` by default in DocApi (set `?noparse=true` in a request to disable) when adding/updating records through the `/data` or `/records` endpoints or in general with the `/apply` endpoint.
- Uses `parseStrings` for various actions in `ActiveDocImport`. Since most types are parsed in Python before these actions are constructed, this only affects references, which still look like errors in the import preview. Importing references can also easily still run into more complicated problems discussed in https://grist.slack.com/archives/C0234CPPXPA/p1639514844028200

Test Plan:
- Added tests to DocApi to compare behaviour with and without string parsing.
- Added a new browser test, fixture doc, and fixture CSV to test importing a file containing references.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3183
2021-12-17 15:40:58 +02:00
Jarosław Sadziński
1ae586cf42 (core) Adding Skip options when importing multiple tables.
Summary:
Adding new destination "Skip" for multiple table imports.
Selecting this destination skips the import and makes the preview grayed out.

Test Plan: New Tests

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3181
2021-12-13 19:07:33 +01:00
George Gevoian
c6aa9b65d4 (core) Fix bug preventing importing of nested json files
Summary:
BulkAddRecord when finishing imports of nested JSON was throwing
an error due to unchecked access of referencing tables. This adds
a guard to prepare_new_values to handle such cases.

Imports happened to cause this to occur because the order that
imported tables are created/populated isn't aware of references
between tables, so it's possible for a reference column to
exist (momentarily) without a valid reference to another table.
These references are currently fixed after all imported tables are
created/populated.

Test Plan: Browser test.

Reviewers: dsagal

Reviewed By: dsagal

Subscribers: dsagal

Differential Revision: https://phab.getgrist.com/D3144
2021-11-18 17:06:03 -08:00
George Gevoian
08b1286f4f (core) Add column matching to Importer
Summary:
The Importer dialog is now maximized, showing additional column
matching options and information on the left, with the preview
table shown on the right. Columns can be mapped via a select menu
listing all source columns, or by clicking a formula field next to
the menu and directly editing the transform formula.

Test Plan: Browser tests.

Reviewers: jarek

Reviewed By: jarek

Differential Revision: https://phab.getgrist.com/D3096
2021-11-09 12:30:52 -08:00
George Gevoian
62db263d1f (core) Add diff preview to Importer
Summary:
Updates the preview table in Importer to show a diff of changes
when importing into an existing table and updating existing records.

Test Plan: Browser tests.

Reviewers: paulfitz

Reviewed By: paulfitz

Differential Revision: https://phab.getgrist.com/D3060
2021-10-08 14:15:07 -07:00
George Gevoian
e1780e4f58 (core) Migrate import code from data engine to Node
Summary:
Finishing imports now occurs in Node instead of the
data engine, which makes it possible to import into
on-demand tables. Merging code was also refactored
and now uses a SQL query to diff source and destination
tables in order to determine what to update or add.

Also fixes a bug where incremental imports involving
Excel files with multiple sheets would fail due to the UI
not serializing merge options correctly.

Test Plan: Browser tests.

Reviewers: jarek

Reviewed By: jarek

Differential Revision: https://phab.getgrist.com/D3046
2021-10-04 10:27:00 -07:00
George Gevoian
8a7edb6257 (core) Enable incremental imports
Summary:
The import dialog now has an option to 'Update existing records',
which when checked will allow for selection of 1 or more fields
to match source and destination tables on.

If all fields match, then the matched record in the
destination table will be merged with the incoming record
from the source table. This means the incoming values will
replace the destination table values, unless the incoming
values are blank.

Additional merge strategies are implemented in the data
engine, but the import dialog only uses one of the
strategies currently. The others can be exposed in the UI
in the future, and tweak the behavior of how source
and destination values should be merged in different contexts,
such as when blank values exist.

Test Plan: Python and browser tests.

Reviewers: paulfitz

Reviewed By: paulfitz

Subscribers: alexmojaki

Differential Revision: https://phab.getgrist.com/D3020
2021-09-16 09:15:54 -07:00
Paul Fitzpatrick
5ef889addd (core) move home server into core
Summary: This moves enough server material into core to run a home server.  The data engine is not yet incorporated (though in manual testing it works when ported).

Test Plan: existing tests pass

Reviewers: dsagal

Reviewed By: dsagal

Differential Revision: https://phab.getgrist.com/D2552
2020-07-21 20:39:10 -04:00