(core) Improve encoding detection for csv imports, and make encoding an editable option.

Summary:
- Using a sample of data was causing poor detection if the sample were
  cut mid-character. Switch to using line-based detection.
- Add a simple option for changing encoding. No convenient UI is offered
  since config UI is auto-generated, but this at least makes it possible to
  recover from bad guesses.
- Upgrades chardet library for good measure.

- Also fixes python3-building step, to more reliably rebuild Python
  dependencies when requirements3.* files change.

Test Plan:
Added a python-side test case, and a browser test that encodings can
be switched, errors are displayed, and wrong encodings fail recoverably.

Reviewers: alexmojaki

Reviewed By: alexmojaki

Differential Revision: https://phab.getgrist.com/D3979
This commit is contained in:
Dmitry S
2023-08-18 17:03:27 -04:00
parent b9adcefcce
commit d5a4605d2a
9 changed files with 160 additions and 26 deletions

View File

@@ -0,0 +1,4 @@
Name,Age,Επάγγελμα,Πόλη
John Smith,30,Γιατρός,Athens
Μαρία Παπαδοπούλου,25,Engineer,Thessaloniki
Δημήτρης Johnson,40,Δικηγόρος,Piraeus
1 Name Age Επάγγελμα Πόλη
2 John Smith 30 Γιατρός Athens
3 Μαρία Παπαδοπούλου 25 Engineer Thessaloniki
4 Δημήτρης Johnson 40 Δικηγόρος Piraeus