Commit Graph

7 Commits

Author SHA1 Message Date
Alex Hall
321019217d (core) Lossless imports
Summary:
- Removed string parsing and some type guessing code from parse_data.py. That logic is now implicitly done by ValueGuesser by leaving the initial column type as Any. parse_data.py mostly comes into play when importing files (e.g. Excel) containing values that already have types, i.e. numbers and dates.
- 0s and 1s are treated as numbers instead of booleans to keep imports lossless.
- Removed dateguess.py and test_dateguess.py.
- Changed what `guessDateFormat` does when multiple date formats work equally well for the given data, in order to be consistent with the old dateguess.py.
- Columns containing numbers are now always imported as Numeric, never Int.
- Removed `NullIfEmptyParser` because it was interfering with the new system. Its purpose was to avoid pointlessly changing a column from Any to Text when no actual data was inserted. A different solution to that problem was already added to `_ensure_column_accepts_data` in the data engine in a recent related diff.

Test Plan:
- Added 2 `nbrowser/Importer2` tests.
- Updated various existing tests.
- Extended testing of `guessDateFormat`. Added `guessDateFormats` to show how ambiguous dates are handled internally.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3302
2022-03-08 12:14:39 +02:00
Edward Betts
d6e0e1fee3 Correct spelling mistakes 2022-02-19 09:46:49 +00:00
Dmitry S
64d9faed5a (core) Fix import parsing from choking up on Python isdigit() surprises
Summary:
Python isdigit() returns true for unicode characters such as "²", which fail
when used as an argument to int().

Instead, be explicit about only considering characters 0-9 to be digits.

Test Plan: Added a test case which produces an error without this change.

Reviewers: alexmojaki

Reviewed By: alexmojaki

Differential Revision: https://phab.getgrist.com/D3027
2021-09-20 16:17:34 -04:00
Dmitry S
26356fe588 (core) Fix bug with "maximum recursion depth exceeded" in imports.
Summary:
Our date-guessing logic analyzes text in full looking for date parts.
This diff skip all that work when text is so long that we don't need to
consider it to be a valid date.

This is a quick fix. There are probably many other cases when we don't
need to try hard to parse arbitrary text as dates.

Test Plan: Added a fixture and test case that would trigger the error without the fix.

Reviewers: paulfitz

Subscribers: paulfitz

Differential Revision: https://phab.getgrist.com/D2992
2021-08-20 17:44:48 -04:00
Alex Hall
4d526da58f (core) Move file import plugins into core/sandbox/grist
Summary:
Move all the plugins python code into the main folder with the core code.

Register file importing functions in the same main.py entrypoint as the data engine.

Remove options relating to different entrypoints and code directories. The only remaining plugin-specific option in NSandbox is the import directory/mount, i.e. where files to be parsed are placed.

Test Plan: this

Reviewers: paulfitz

Reviewed By: paulfitz

Subscribers: dsagal

Differential Revision: https://phab.getgrist.com/D2965
2021-08-09 18:37:14 +02:00
Alex Hall
16f297a250 (core) Simple Python 3 compatibility changes
Summary: Changes that move towards python 3 compatibility that are easy to review without much thought

Test Plan: The tests

Reviewers: dsagal

Reviewed By: dsagal

Differential Revision: https://phab.getgrist.com/D2873
2021-06-22 17:13:17 +02:00
Paul Fitzpatrick
b82eec714a (core) move data engine code to core
Summary:
this moves sandbox/grist to core, and adds a requirements.txt
file for reconstructing the content of sandbox/thirdparty.

Test Plan:
existing tests pass.
Tested core functionality manually.  Tested docker build manually.

Reviewers: dsagal

Reviewed By: dsagal

Differential Revision: https://phab.getgrist.com/D2563
2020-07-29 08:57:25 -04:00