gristlabs_grist-core

mirror of https://github.com/gristlabs/grist-core.git synced 2025-12-10 05:31:55 +00:00

Author	SHA1	Message	Date
Dmitry S	d5a4605d2a	(core) Improve encoding detection for csv imports, and make encoding an editable option. Summary: - Using a sample of data was causing poor detection if the sample were cut mid-character. Switch to using line-based detection. - Add a simple option for changing encoding. No convenient UI is offered since config UI is auto-generated, but this at least makes it possible to recover from bad guesses. - Upgrades chardet library for good measure. - Also fixes python3-building step, to more reliably rebuild Python dependencies when requirements3.* files change. Test Plan: Added a python-side test case, and a browser test that encodings can be switched, errors are displayed, and wrong encodings fail recoverably. Reviewers: alexmojaki Reviewed By: alexmojaki Differential Revision: https://phab.getgrist.com/D3979	2023-08-24 09:50:52 -04:00
Dmitry S	534615dd50	(core) Update logging in sandbox code, and log tracebacks as single log messages. Summary: - Replace logger module by the standard module 'logging'. - When a log message from the sandbox includes newlines (e.g. for tracebacks), keep those lines together in the Node log message. Previously each line was a different message, making it difficult to view tracebacks, particularly in prod where each line becomes a separate message object. - Fix assorted lint errors. Test Plan: Added a test for the log-line splitting and escaping logic. Reviewers: georgegevoian Reviewed By: georgegevoian Differential Revision: https://phab.getgrist.com/D3956	2023-07-18 11:21:25 -04:00
Dmitry S	17569561bf	(core) Fix issue that ints would be imported with a trailing ".0" from Google Sheets. Summary: Whole numbers, when imported from Excel into a Text column show up without decimals (e.g. "300"), but when imported from Google Sheets show up with decimals (e.g. "300.0"). The decimals are hard for end-users to remove. Fix by treating whole numbers consistently as ints. Test Plan: Added a fixture reproducing the issue, and a test case. Reviewers: georgegevoian Reviewed By: georgegevoian Differential Revision: https://phab.getgrist.com/D3800	2023-02-26 15:24:15 -05:00
Yohan Boniface	4ff5a2eaa7	Be more accepting with None value in headers candidate (#331 ) We already filter out a line will only None values, and sometimes Excel of LibreOffice mistakes the real number of columns adding one or more that have no value at all.	2022-10-31 15:57:26 -04:00
Alex Hall	df65219729	(core) Remove messytables completely, particularly for excel imports Summary: Mirror of https://github.com/gristlabs/grist-core/pull/289 Test Plan: In addition to the PR, tweaked one Excel fixture file and tests involving that and one other fixture. Reviewers: paulfitz Reviewed By: paulfitz Differential Revision: https://phab.getgrist.com/D3640	2022-09-28 17:15:42 +02:00
Yohan Boniface	54703e2794	Remove messytables dependency from xlsx import	2022-09-21 14:20:19 +02:00
Yohan Boniface	045227cb52	Update sandbox/grist/imports/import_csv.py Co-authored-by: Alex Hall <alex.mojaki@gmail.com>	2022-09-20 17:34:59 +02:00
Yohan Boniface	ce31d1632d	Update sandbox/grist/imports/import_csv.py Co-authored-by: Alex Hall <alex.mojaki@gmail.com>	2022-09-20 17:34:41 +02:00
Yohan Boniface	57c8f9f4fe	csv importer: mimic messytables defaults for now	2022-09-20 17:28:46 +02:00
Yohan Boniface	2544736aa8	Applying review from @alexmojaki	2022-09-20 17:22:28 +02:00
Yohan Boniface	9bbf66e50e	wip: remove dependency to messytables	2022-09-20 17:22:28 +02:00
Yohan Boniface	83985ab3cf	test(import_csv): highlight differences between passed and returned options in parse_file	2022-09-20 16:29:23 +02:00
Yohan Boniface	462b66b7ee	Add tests to cover CSV parsed options	2022-09-20 15:44:08 +02:00
Yohan Boniface	7bd895ef42	Add BooleanConverter to map proper boolean cells to a Bool column Note that only proper boolean will be considered, but not integers nor truthy or falsy strings.	2022-09-10 07:07:45 +02:00
Yohan Boniface	8bc5c7d595	Fix columns with falsy cells wrongly parsed as dates (#276 ) Eg. before this commit, this table would result in Date columns: \| A \| B \| \| ----- \| -- \| \| FALSE \| 0 \| For now, even FALSE is parsed as Numeric (not sure why we don't have a BooleanConverter).	2022-09-09 15:13:34 -04:00
Alex Hall	42afb17e36	(core) Run and test imports only in Python 3, upgrade openpyxl, fix weird date handling Summary: Python 2 only needs to be supported for the sake of old documents and formulas. This doesn't apply to the separate sandboxes that parse files for imports. Using Python 3 only allows using newer libraries and library versions. In particular, the latest version of openpyxl doesn't support Python 2. This will also make it easier to make other similar changes in the future, such as replacing messytables with a modern library. See https://grist.slack.com/archives/C0234CPPXPA/p1661261829343999?thread_ts=1661260442.837959&cid=C0234CPPXPA The latest openpyxl is better at handling a particular edge case with broken dates in Excel, but still doesn't quite do what we want, so we monkeypatch it. Discussion: https://grist.slack.com/archives/C02EGJ1FUCV/p1661440851911869?thread_ts=1661154219.515549&cid=C02EGJ1FUCV Setting `preferredPythonVersion` to '3' in SafePythonComponent ensures that JS always creates import sandboxes that use Python 3. Within Python, a module used by all imports will raise an error in Python 2. Python unit tests of imports are now only run in Python 3, using the `load_tests` protocol of `unittest`. Test Plan: Mostly existing tests. Added another strange date to the Excel fixture. Reviewers: dsagal Reviewed By: dsagal Subscribers: dsagal Differential Revision: https://phab.getgrist.com/D3606	2022-09-02 16:27:34 +02:00
George Gevoian	9b08666f96	(core) Handle importing xls files with invalid dimensions Summary: This addresses a rare bug where xls files with invalid dimensions could not be imported into Grist due to how openpyxl handles parsing them. Test Plan: Server test. Reviewers: alexmojaki Reviewed By: alexmojaki Differential Revision: https://phab.getgrist.com/D3485	2022-06-16 08:39:17 -07:00
Alex Hall	af1564d410	(core) Convert row tuples to lists to fix excel import error Summary: openpyxl was producing tuples while some older code expects lists. Choosing to convert the tuples to lists (instead of making the other code work with tuples) in case there's other similar issues still out there. Should fix the error mentioned in https://grist.slack.com/archives/C0234CPPXPA/p1652797247167719: ``` Traceback (most recent call last): File "/gristroot/grist/sandbox/grist/sandbox.py", line 103, in run ret = self._functions[fname](args) File "/gristroot/grist/sandbox/grist/imports/register.py", line 11, in parse_excel return import_file(file_source) File "/gristroot/grist/sandbox/grist/imports/import_xls.py", line 20, in import_file parse_options, tables = parse_file(path) File "/gristroot/grist/sandbox/grist/imports/import_xls.py", line 26, in parse_file return parse_open_file(f) File "/gristroot/grist/sandbox/grist/imports/import_xls.py", line 69, in parse_open_file table_data_with_types = parse_data.get_table_data(rows, len(headers)) File "/gristroot/grist/sandbox/grist/parse_data.py", line 215, in get_table_data row.extend([""] missing_values) AttributeError: 'tuple' object has no attribute 'extend' ``` Test Plan: Existing tests. Haven't figured out how to reproduce the original error. Reviewers: georgegevoian Reviewed By: georgegevoian Differential Revision: https://phab.getgrist.com/D3434	2022-05-17 22:40:46 +02:00
Alex Hall	6c90de4d62	(core) Switch excel import parsing from messytables+xlrd to openpyxl, and ignore empty rows Summary: Use openpyxl instead of messytables (which used xlrd internally) in import_xls.py. Skip empty rows since excel files can easily contain huge numbers of them. Drop support for xls files (which openpyxl doesn't support) in favour of the newer xlsx format. Fix some details relating to python virtualenvs and dependencies, as Jenkins was failing to find new Python dependencies. Test Plan: Mostly relying on existing tests. Updated various tests which referred to xls files instead of xlsx. Added a Python test for skipping empty rows. Reviewers: georgegevoian Reviewed By: georgegevoian Differential Revision: https://phab.getgrist.com/D3406	2022-05-12 14:43:21 +02:00
Dmitry S	e59dcc142d	(core) Show proper message on empty Excel import, rather than a code error Summary: - Previously showed "UnboundLocalError". Now will show: Import failed: Failed to parse Excel file. Error: No tables found (1 empty tables skipped) - Also fix logging for import code Test Plan: Added a test case Reviewers: georgegevoian Reviewed By: georgegevoian Differential Revision: https://phab.getgrist.com/D3396	2022-04-27 00:49:28 -04:00
Dmitry S	8269c33d01	(core) When importing JSON, create columns of type Numeric rather than Int Summary: JSON import logic was creating columns of type Int when JSON contained integral values. This causes errors with large errors (e.g. millisecond timestamps), and Numeric is generally the more convenient and common default. Test Plan: TBD Reviewers: jarek, alexmojaki Reviewed By: jarek, alexmojaki Subscribers: jarek, alexmojaki Differential Revision: https://phab.getgrist.com/D3339	2022-03-30 09:54:35 -04:00
Alex Hall	321019217d	(core) Lossless imports Summary: - Removed string parsing and some type guessing code from parse_data.py. That logic is now implicitly done by ValueGuesser by leaving the initial column type as Any. parse_data.py mostly comes into play when importing files (e.g. Excel) containing values that already have types, i.e. numbers and dates. - 0s and 1s are treated as numbers instead of booleans to keep imports lossless. - Removed dateguess.py and test_dateguess.py. - Changed what `guessDateFormat` does when multiple date formats work equally well for the given data, in order to be consistent with the old dateguess.py. - Columns containing numbers are now always imported as Numeric, never Int. - Removed `NullIfEmptyParser` because it was interfering with the new system. Its purpose was to avoid pointlessly changing a column from Any to Text when no actual data was inserted. A different solution to that problem was already added to `_ensure_column_accepts_data` in the data engine in a recent related diff. Test Plan: - Added 2 `nbrowser/Importer2` tests. - Updated various existing tests. - Extended testing of `guessDateFormat`. Added `guessDateFormats` to show how ambiguous dates are handled internally. Reviewers: georgegevoian Reviewed By: georgegevoian Differential Revision: https://phab.getgrist.com/D3302	2022-03-08 12:14:39 +02:00
Edward Betts	d6e0e1fee3	Correct spelling mistakes	2022-02-19 09:46:49 +00:00
Dmitry S	64d9faed5a	(core) Fix import parsing from choking up on Python isdigit() surprises Summary: Python isdigit() returns true for unicode characters such as "²", which fail when used as an argument to int(). Instead, be explicit about only considering characters 0-9 to be digits. Test Plan: Added a test case which produces an error without this change. Reviewers: alexmojaki Reviewed By: alexmojaki Differential Revision: https://phab.getgrist.com/D3027	2021-09-20 16:17:34 -04:00
Dmitry S	26356fe588	(core) Fix bug with "maximum recursion depth exceeded" in imports. Summary: Our date-guessing logic analyzes text in full looking for date parts. This diff skip all that work when text is so long that we don't need to consider it to be a valid date. This is a quick fix. There are probably many other cases when we don't need to try hard to parse arbitrary text as dates. Test Plan: Added a fixture and test case that would trigger the error without the fix. Reviewers: paulfitz Subscribers: paulfitz Differential Revision: https://phab.getgrist.com/D2992	2021-08-20 17:44:48 -04:00
Alex Hall	4d526da58f	(core) Move file import plugins into core/sandbox/grist Summary: Move all the plugins python code into the main folder with the core code. Register file importing functions in the same main.py entrypoint as the data engine. Remove options relating to different entrypoints and code directories. The only remaining plugin-specific option in NSandbox is the import directory/mount, i.e. where files to be parsed are placed. Test Plan: this Reviewers: paulfitz Reviewed By: paulfitz Subscribers: dsagal Differential Revision: https://phab.getgrist.com/D2965	2021-08-09 18:37:14 +02:00
Alex Hall	16f297a250	(core) Simple Python 3 compatibility changes Summary: Changes that move towards python 3 compatibility that are easy to review without much thought Test Plan: The tests Reviewers: dsagal Reviewed By: dsagal Differential Revision: https://phab.getgrist.com/D2873	2021-06-22 17:13:17 +02:00
Paul Fitzpatrick	b82eec714a	(core) move data engine code to core Summary: this moves sandbox/grist to core, and adds a requirements.txt file for reconstructing the content of sandbox/thirdparty. Test Plan: existing tests pass. Tested core functionality manually. Tested docker build manually. Reviewers: dsagal Reviewed By: dsagal Differential Revision: https://phab.getgrist.com/D2563	2020-07-29 08:57:25 -04:00

28 Commits