Commit Graph

4 Commits

Author SHA1 Message Date
Dmitry S
309ddb0fe7 (core) Move guessing logic for column types to run in node once for all columns.
Summary:
Previously, columns of type Any were created and modified one by one by reusing
the "empty column" logic from the data engine. This copies that logic to Node,
and sets the type of all columns together, to create them with the correct type
in the AddTable call.

This makes imports about twice faster (when slowness is due to many columns),
but doesn't address all cases where individual handling of columns causes slowness.

Test Plan: Added a test case for the new helper function.

Reviewers: alexmojaki

Reviewed By: alexmojaki

Subscribers: alexmojaki

Differential Revision: https://phab.getgrist.com/D3427
2022-05-19 12:49:51 -04:00
Alex Hall
321019217d (core) Lossless imports
Summary:
- Removed string parsing and some type guessing code from parse_data.py. That logic is now implicitly done by ValueGuesser by leaving the initial column type as Any. parse_data.py mostly comes into play when importing files (e.g. Excel) containing values that already have types, i.e. numbers and dates.
- 0s and 1s are treated as numbers instead of booleans to keep imports lossless.
- Removed dateguess.py and test_dateguess.py.
- Changed what `guessDateFormat` does when multiple date formats work equally well for the given data, in order to be consistent with the old dateguess.py.
- Columns containing numbers are now always imported as Numeric, never Int.
- Removed `NullIfEmptyParser` because it was interfering with the new system. Its purpose was to avoid pointlessly changing a column from Any to Text when no actual data was inserted. A different solution to that problem was already added to `_ensure_column_accepts_data` in the data engine in a recent related diff.

Test Plan:
- Added 2 `nbrowser/Importer2` tests.
- Updated various existing tests.
- Extended testing of `guessDateFormat`. Added `guessDateFormats` to show how ambiguous dates are handled internally.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3302
2022-03-08 12:14:39 +02:00
Alex Hall
5213972d24 (core) Guess numeric formatting options
Summary:
Change NumberParse.parse to return not just the parsed number but also information it gathered along the way about how the input string was formatted.

Use this in the new NumberParse.guessOptions to guess the actual widget options based on an array of strings.

Use NumberParse.guessOptions in TypeConversion (for when a user explicitly chooses to change type) and in ValueGuesser (for guesses about strings entered into empty columns).

Test Plan: Adds unit tests for NumberParse and ValueGuesser and updates the TypeChange2 nbrowser test.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3294
2022-03-03 21:32:03 +02:00
Alex Hall
599545fb11 (core) Fuller guessing of type and options when adding first data to blank columns
Summary:
Adds `common/ValueGuesser.ts` with logic for guessing column type and widget options (only for dates/datetimes) from an array of strings, and converting the strings to the guessed type in a lossless manner, so that converting back to Text gives the original values.

Changes `_ensure_column_accepts_data` in Python to call an exported JS method using the new logic where possible.

Test Plan: Added `test/common/ValueGuesser.ts` to unit test the core guessing logic and a DocApi end-to-end test for what happens to new columns.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3290
2022-03-01 22:00:45 +02:00