(core) Improve encoding detection for csv imports, and make encoding an editable option.

Summary: - Using a sample of data was causing poor detection if the sample were cut mid-character. Switch to using line-based detection. - Add a simple option for changing encoding. No convenient UI is offered since config UI is auto-generated, but this at least makes it possible to recover from bad guesses. - Upgrades chardet library for good measure. - Also fixes python3-building step, to more reliably rebuild Python dependencies when requirements3.* files change. Test Plan: Added a python-side test case, and a browser test that encodings can be switched, errors are displayed, and wrong encodings fail recoverably. Reviewers: alexmojaki Reviewed By: alexmojaki Differential Revision: https://phab.getgrist.com/D3979
2026-03-02 04:09:24 +00:00 · 2023-08-18 17:03:27 -04:00
parent b9adcefcce
commit d5a4605d2a
9 changed files with 160 additions and 26 deletions
--- a/app/plugin/FileParserAPI-ti.ts
+++ b/app/plugin/FileParserAPI-ti.ts
@@ -15,6 +15,7 @@ export const ParseFileAPI = t.iface([], {
 export const ParseOptions = t.iface([], {
  "NUM_ROWS": t.opt("number"),
  "SCHEMA": t.opt(t.array("ParseOptionSchema")),
+  "WARNING": t.opt("string"),
 });

 export const ParseOptionSchema = t.iface([], {
--- a/app/plugin/FileParserAPI.ts
+++ b/app/plugin/FileParserAPI.ts
@@ -20,6 +20,7 @@ export interface ParseFileAPI {
 export interface ParseOptions {
  NUM_ROWS?: number;
  SCHEMA?: ParseOptionSchema[];
+  WARNING?: string;     // Only on response, includes a warning from parsing, if any.
 }

 /**