Move phab docs to /documentation (#882)

This commit is contained in:
George Gevoian 2024-03-05 08:35:48 -05:00 committed by GitHub
parent 1e169599d1
commit 0c60446f9c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 467 additions and 6 deletions

View File

@ -2,7 +2,7 @@
* dispose.js provides tools to components that needs to dispose of resources, such as
* destroy DOM, and unsubscribe from events. The motivation with examples is presented here:
*
* https://phab.getgrist.com/w/disposal/
* /documentation/disposal/disposal.md
*/
@ -191,7 +191,7 @@ Object.assign(Disposable.prototype, {
}
// Finish by wiping out the object, since nothing should use it after dispose().
// See https://phab.getgrist.com/w/disposal/ for more motivation.
// See /documentation/disposal.md for more motivation.
wipeOutObject(this);
}
});

View File

@ -938,7 +938,7 @@ export function extractOrgParts(reqHost: string|undefined, reqPath: string): Org
orgFromHost = getOrgFromHost(reqHost);
if (orgFromHost) {
// Some subdomains are shared, and do not reflect the name of an organization.
// See https://phab.getgrist.com/w/hosting/v1/urls/ for a list.
// See /documentation/urls.md for a list.
if (/^(api|v1-.*|doc-worker-.*)$/.test(orgFromHost)) {
orgFromHost = null;
}

View File

@ -31,7 +31,7 @@ const BLACKLISTED_SUBDOMAINS = new Set([
/**
*
* Checks whether the subdomain is on the list of forbidden subdomains.
* See https://phab.getgrist.com/w/hosting/v1/urls/#organization-subdomains
* See /documentation/urls.md#organization-subdomains
*
* Also enforces various sanity checks.
*

View File

@ -9,7 +9,7 @@ export interface GristTable {
// This is documenting what is currently returned by the core plugins. Capitalization
// is python-style.
//
// TODO: could be worth reconciling with: https://phab.getgrist.com/w/grist_data_format/.
// TODO: could be worth reconciling with: /documentation/grist-data-format.md.
table_name: string | null; // currently allow names to be null
column_metadata: GristColumn[];
table_data: any[][];

148
documentation/disposal.md Normal file
View File

@ -0,0 +1,148 @@
# Disposal and Cleanup
Garbage-collected languages make you think that you don't need to worry about cleanup for your objects. In reality, there are still often cases when you do. This page gives some examples, and describes a library to simplify it.
## What's the problem
In the examples, we care about a situation when you have a JS object that is responsible for certain UI, i.e. DOM, listening to DOM changes to update state elsewhere, and listening to outside changes to update state to the DOM.
### DOM Elements
So this JS object knows how to create the DOM. Removing the DOM, when the component is to be removed, is usually easy: `parentNode.removeNode(child)`. Since it's a manual operation, you may define some method to do this, named perhaps "destroy" or "dispose" or "cleanup".
If there is logic tied to your DOM either via JQuery events, or KnockoutJS bindings, you'll want to clean up the node specially: for JQuery, use `.remove()` or `.empty()` methods; for KnockoutJS, use `ko.removeNode()` or `ko.cleanNode()`. KnockoutJS's methods automatically call JQuery-related cleanup functions if JQuery is loaded in the page.
### Subscriptions and Computed Observables
But there is more. Consider this knockout code, adapted from their simplest example of a computed observable:
function FullNameWidget(firstName, lastName) {
this.fullName = ko.computed(function() {
return firstName() + " " + lastName();
});
...
}
Here we have a constructor for a component which takes two observables as constructor parameters, and creates a new observable which depends on the two inputs. Whenever `firstName` or `lastName` changes, `this.fullName` get recomputed. This makes it easy to create knockout-based bindings, e.g. to have a DOM element reflect the full name when either first or last name changes.
Now, what happens when this component is destroyed? It removes its associated DOM. Now when `firstName` or `lastName` change, there are no visible changes. But the function to recompute `this.fullName` still gets called, and still retains a reference to `this`, preventing the object from being garbage-collected.
The issue is that `this.fullName` is subscribed to `firstName` and `lastName` observables. It needs to be unsubscribed when the component is destroyed.
KnockoutJS recognizes it, and makes it easy: just call `this.firstName.dispose()`. We just have to remember to do it when we destroy the component.
This situation would exist without knockout too: the issue is that the component is listening to external changes to update the DOM that it is responsible for. When the component is gone, it should stop listening.
### Tying life of subscriptions to DOM
Since the situation above is so common in KnockoutJS, it offers some assistance. Specifically, when a computed observable is created using knockout's own binding syntax (by specifying a JS expression in an HTML attribute), knockout will clean it up automatically when the DOM node is removed using `ko.removeNode()` or `ko.cleanNode()`.
Knockout also allows to tie other cleanup to DOM node removal, documented at [Custom disposal logic](http://knockoutjs.com/documentation/custom-bindings-disposal.html) page.
In the example above, you could use `ko.utils.domNodeDisposal.addDisposeCallback(node, function() { self.fullName.dispose(); })`, and when you destroy the component and remove the `node` via `ko.removeNode()` or `ko.cleanNode()`, the `fullName` observable will be properly disposed.
### Other knockout subscriptions
There are other situations with subscriptions. For example, we may want to subscribe to a `viewId` observable, and when it changes, replace the currently-rendered View component. This might look like so
function GristDoc() {
this.viewId = ko.observable();
this.viewId.subscribe(function(viewId) {
this.loadView(viewId);
}, this);
}
Once GristDoc is destroyed, the subscription to `this.viewId` still exists, so `this.viewId` retains a reference to `this` (for calling the callback). Technically, there is no problem: as long as there are no references to `this.viewId` from outside this object, the whole cycle should be garbage-collected.
But it's very risky: if anything else has a reference to `this.viewId` (e.g. if `this.viewId` is itself subscribed to, say, `window.history` changes), then the entire `GristDoc` is unavailable to garbage-collection, including all the DOM to which it probably retains references even after that DOM is detached from the page.
Beside the memory leak, it means that when `this.viewId` changes, it will continue calling `this.loadView()`, continuing to update DOM that no one will ever see. Over time, that would of course slow down the browser, but would be hard to detect and debug.
Again, KnockoutJS offers a way to unsubscribe: `.subscribe()` returns a `ko.subscription` object, which in turn has a `dispose()` method. We just need to call it, and the callback will be unsubscribed.
### Backbone Events
To be clear, the problem isn't with Knockout, it's with the idea of subscribing to outside events. Backbone allows listening to events, which creates the same problem, and Backbone offers a similar solution.
For example, let's say you have a component that listens to an outside event and does stuff. With a made-up example, you might have a constructor like:
function Game(basket) {
basket.on('points:scored', function(team, points) {
// Update UI to show updated points for the team.
});
}
Let's say that a `Game` object is destroyed, and a new one created, but the `basket` persists across Games. As the user continues to score points on the basket, the old (supposedly destroyed) Game object continues to have that inline callback called. It may not be showing anything, but only because the DOM it's updating is no longer attached to the page. It's still taking resources, and may even continue to send stuff to the server.
We need to clean up when we destroy the Game object. In this example, it's pretty annoying. We'd have to save the `basket` object and callback in member variables (like `this.basket`, `this.callback`), so that in the cleanup method, we could call `this.basket.off('points:scored', this.callback)`.
Many people have gotten bitten with that in Backbone (see this [stackoverflow post](http://stackoverflow.com/questions/14041042/backbone-0-9-9-difference-between-listento-and-on)) with a bunch of links to blog posts about it).
Backbone's solution is `listenTo()` method. You'd use it like so:
function Game(basket) {
this.listenTo(basket, 'points:scored', function(team, points) {
// Update UI to show updated points for the team.
});
}
Then when you destroy the Game object, you only have to call `this.stopListening()`. It keeps track of what you listened to, and unsubscribes. You just have to remember to call it. (Certain objects in Backbone will call `stopListening()` automatically when they are being cleaned up.)
### Internal events
If a component listens to an event on a DOM element it itself owns, and if it's using JQuery, then we don't need to do anything special. If on destruction of the component, we clean up the DOM element using `ko.removeNode()`, the JQuery event bindings should automatically be removed. (This hasn't been rigorously verified, but if correct, is a reason to use JQuery for browser events rather than native `addEventListener`.)
## How to do cleanup uniformly
Since we need to destroy the components' DOM explicitly, the components should provide a method to call for that. By analogy with KnockoutJS, let's call it `dispose()`.
- We know that it needs to remove the DOM that the component is responsible for, probably using `ko.removeNode`.
- If the component used Backbone's `listenTo()`, it should call `stopListening()` to unsubscribe from Backbone events.
- If the component maintains any knockout subscriptions or computed observables, it should call `.dispose()` on them.
- If the component owns other components, then those should be cleaned up recursively, by calling `.dispose()` on those.
The trick is how to make it easy to remember to do all necessary cleanup. I propose keeping track when the object to clean up first enters the picture.
## 'Disposable' class
The idea is to have a class that can be mixed into (or inherited by) any object, and whose purpose is to keep track of things this object "owns", that it should be responsible for cleaning up. To combine the examples above:
function Component(firstName, lastName, basket) {
this.fullName = this.autoDispose(ko.computed(function() {
return firstName() + " " + lastName();
}));
this.viewId = ko.observable();
this.autoDispose(this.viewId.subscribe(function(viewId) {
this.loadView(viewId);
}, this));
this.ourDom = this.autoDispose(somewhere.appendChild(some_dom_we_create));
this.listenTo(basket, 'points:scored', function(team, points) {
// Update UI to show updated points for the team.
});
}
Note the `this.autoDispose()` calls. They mark the argument as being owned by `this`. When `this.dispose()` is called, those values get disposed of as well.
The disposal itself is fairly straightforward: if the object has a `dispose` method, we'll call that. If it's a DOM node, we'll call `ko.removeNode` on it. The `dispose()` method of Disposable objects will always call `this.stopListening()` if such a method exists, so that subscriptions using Backbone's `listenTo` are cleaned up automatically.
To do additional cleanup when `dispose()` is called, the derived class can override `dispose()`, do its other cleanup, then call `Disposable.prototype.dispose.call(this)`.
For convenience, Disposable class provides a few other methods:
- `disposeRelease(part)`: releases an owned object, so that it doesn't get auto-disposed.
- `disposeDiscard(part)`: disposes of an owned object early (rather than wait for `this.dispose`).
- `isDisposed()`: returns whether `this.dispose()` has already been called.
### Destroying destroyed objects
There is one more thing that Disposable class's `dispose()` method will do: destroy the object, as in ruin, wreck, wipe out. Specifically, it will go through all properties of `this`, and set each to a junk value. This achieves two goals:
1. In any of the examples above, if you forgot to mark anything with `this.autoDispose()`, and some callback continues to be called after the object has been destroyed, you'll get errors. Not just silent waste of resources that slow down the site and are hard to detect.
2. It removes references, potentially breaking references. Imagine that something wrongly retains a reference to a destroyed object (which logically nothing should, but something might by mistake). If it tries to use the object, it will fail (see point 1). But even if it doesn't access the object, it's preventing the garbage collector from cleaning any of the object. If we break references, then in this situation the GC can still collect all the properties of the destroyed object.
## Conclusion
All JS client-side components that need cleanup (e.g. maintain DOM, observables, listen to events, or subscribe to anything), should inherit from `Disposable`. To destroy them, call their `.dispose()` method. Whenever they take responsibility for any piece that requires cleanup, they should wrap that piece in `this.autoDispose()`.
This should go a long way towards avoiding leaks and slowdowns.

View File

@ -0,0 +1,218 @@
# Grist Data Format
Grist Data Format is used to send and receive data from a Grist document. For example, an implementer of an import module would need to translate data to Grist Data Format. A user of Grist Basket APIs would fetch and upload data in Grist Data Format.
The format is optimized for tabular data. A table consists of rows and columns, containing a single value for each row for each column. Various types are supported for the values.
Each column has a name and a type. The type is not strict: a column may contain values of other types. However, the type is the intended type of the value for that column, and allows those values to be represented more efficiently.
Grist Data Format is readily serialized to JSON. Other serializations are possible, for example, see below for a .proto file that allows to serialize Grist Data Format as a protocol buffer.
## Format Specification
### Document
At the top, Grist Data Format is a Document object with a single key “tables” mapping to an array of Tables:
```javascript
{
tables: [Tables…]
}
```
### Table
```javascript
{
name: "TableName",
colinfo: [ColInfo…],
columns: ColData
}
```
The `name` is the name of the table. The `colinfo` array has an item to describe each column, and `columns` is the actual table data in column-oriented layout.
### ColInfo
```javascript
{
name: "ColName",
type: "ColType",
options: <arbitrary options>
}
```
The `name` is the name of the column, and `type` is its type. The field `options` optionally specifies type-specific options that affect the column (e.g. the number of decimal places to display for a floating-point number).
### ColData
```javascript
{
<colName1>: ColValues,
<colName2>: ColValues,
...
}
```
The data in the table is represented as an object mapping a column name to an array of values for the column. This column-oriented representation allows for the representation of data to be more concise.
### ColValues
```javascript
[CellValue, CellValue, ...]
```
ColValues is an array of all values for the column. We'll refer to the type of each value as `CellValue`. ColValues has an entry for each row in the table. In particular, each ColValues array in a ColData object has the same number of entries.
### CellValue
CellValue represents the value in one cell. We support various types of values, documented below. When represented as JSON, CellValue is one of the following JSON types:
- string
- number
- bool
- null
- array of the form `[typeCode, args...]`
The interpretation of CellValue is affected by the columns type, and described in more detail below.
## JSON Schema
The description above can be summarized by this JSON Schema:
```json
{
"definitions": {
"Table": {
"type": "object",
"properties": {
"name": { "type": "string" },
"colinfo": { "type": "array", "items": { "$ref": "#/definitions/ColInfo" } }
"columns": { "$ref": "#/definitions/ColData" }
}
},
"ColInfo": {
"type": "object",
"properties": {
"name": { "type": "string" },
"type": { "type": "string" },
"options": { "type": "object" }
}
},
"ColData": {
"type": "object",
"additionalProperties": { "$ref": "#/definitions/ColValues" }
},
"ColValues": {
"type": "array",
"items": { "type": "CellValue" }
}
},
"type": "object",
"properties": {
"tables": { "type": "array", "items": { "$ref": "#/definitions/Table" } }
}
}
```
## Record identifiers
Each table should have a column named `id`, whose values should be unique across the table. It is used to identify records in queries and actions. Its details, including its type, are left for now outside the scope of this specification, because the format isn't affected by them.
## Naming
Names for tables and columns must consist of alphanumeric ASCII characters or underscore (i.e. `[0-9a-zA-Z_]`). They may not start with an underscore or a digit. Different tables and different columns within a table must have unique names case-insensitively (i.e. they cannot differ in case only).
Certain names (`id` being one of them) may be reserved, e.g. by Grist, for internal purposes, and would not be usable for user data. Such restrictions are outside the scope of this specification.
Note that this combination of rules allows tables and column names to be valid identifiers in pretty much every programming language (including Python and Javascript), as well as valid names of columns in databases.
## Value Types
The format supports a number of data types. Some types have a short representation (e.g. `Numeric` as a JSON `number`, and `Text` as a JSON `string`), but all types have an explicit representation as well.
The explicit representation of a value is an array `[typeCode, args...]`. The first member of the array is a string code that defines the type of the value. The rest of the elements are arguments used to construct the actual value.
The following table lists currently supported types and their short and explicit representations.
| **Type Name** | **Short Repr** | **[Type Code, Args...]** | **Description** |
| `Numeric` | `number`* | `['n',number]` | double-precision floating point number |
| `Text` | `string`* | `['s',string]` | Unicode string |
| `Bool` | `bool`* | `['b',bool]` | Boolean value (true or false) |
| `Null` | `null`* | `null` | Null value (no special explicit representation) |
| `Int` | `number` | `['i',number]` | 32-bit integer |
| `Date` | `number` | `['d',number]` | Calendar date, represented as seconds since Epoch to 00:00 UTC on that date. |
| `DateTime` | `number` | `['D',number]` | Instance in time, represented as seconds since Epoch |
| `Reference` | `number` | `['R',number]` | Identifier of a record in a table. |
| `ReferenceList` | | `['L',number,...]` | List of record identifiers |
| `Choice` | `string` | `['C',string]` | Unicode string selected from a list of choices. |
| `PositionNumber` | `number` | `['P',number]` | a double used to order records relative to each other. |
| `Image` | | `['I',string]` | Binary data representing an image, encoded as base64 |
| `List` | | `['l',values,...]` | List of values of any type. |
| `JSON` | | `['J',object]` | JSON-serializable object |
| `Error` | | `['E',string,string?,value?]` | Exception, with first argument exception type, second an optional message, and optionally a third containing additional info. |
An important goal is to represent data efficiently in the common case. When a value matches the column's type, the short representation is used. For example, in a Numeric column, a Numeric value is represented as a `number`, and in a Date column, a Date value is represented as a `number`.
If a value does not match the column's type, then the short representation is used when it's one of the starred types in the table AND the short type is different from the column's short type.
For example:
- In a Numeric column, Numeric is `number`, Text is `string` (being a starred type), but a Date is `['d',number]`.
- In a Date column, Date is `number`, and Numeric value is `['n',number]`, because even though it's starred, it conflicts with Date's own short type.
- In a Text column, Text is `string`, Numeric is `number` (starred), and Date is `['d',number]` (not starred).
Note how for the common case of a value matching the column's type, we can always use the short representation. But the format still allows values to have an explicit type that's different from the specified one.
Note also that columns of any of the starred types use the same interpretation for contained values.
The primary use case is to allow, for example, storing a value like "N/A" or "TBD" or "Ask Bob" in a column of type Numeric or Date. Another important case is to store errors produced by a computation.
Other complex types may be added in the future.
## Column Types
Any of the types listed in the table above may be specified as a column type.
In addition, a column type may specify type `Any`. For the purpose of type interpretations, it works the same as any of the starred types, but it does not convey anything about the expected type of value for the column.
## Other serializations
Grist Data Format is naturally serialized to JSON, which is fast and convenient to use in Javascript code. It is also possible to serialize it in other ways, e.g. as a Google protobuf.
Here is a `.proto` definition file that allows for efficient protobuf representation of data in Grist Data Format.
```proto
message Document {
repeated Table tables = 1;
}
message Table {
string name = 1;
repeated ColInfo colinfo = 2;
repeated ColData columns = 3;
}
message ColInfo {
string name = 1;
string type = 2;
string options = 3;
}
message ColData {
repeated Value value = 1;
}
message Value {
oneof value {
double vNumeric = 1;
string vText = 2;
bool vBool = 3;
// Absence of a set field represents a null
int32 vInt = 5;
double vDate = 6;
double vDateTime = 7;
int32 vReference = 8;
List vReferenceList = 9;
string vChoice = 10;
double vPositionNumber = 11;
bytes vImage = 12;
List vList = 13;
string vJSON = 14;
List vError = 15;
}
}
message ValueList {
repeated Value value = 1;
}
```

View File

@ -0,0 +1,42 @@
# Migrations
If you change Grist schema, i.e. the schema of the Grist metadata tables (in `sandbox/grist/schema.py`), you'll have to increment the `SCHEMA_VERSION` (on top of that file) and create a migration. A migration is a set of actions that would get applied to a document at the previous version, to make it satisfy the new schema.
To add a migration, add a function to `sandbox/grist/migrations.py`, of this form (using the new version number):
```lang=python
@migration(schema_version=11)
def migration11(tdset):
return tdset.apply_doc_actions([
add_column('_grist_Views_section', 'embedId', 'Text'),
])
```
Some migrations need to actually add or modify the data in a document. You can look at other migrations in that file for examples.
If you are doing anything other than adding a column or a table, you must read this document to the end.
## Philosophy of migrations
Migrations are tricky. Normally, we think about the software we are writing, but migrations work with documents that were created by an older version of the software, which may not have the logic you think our software has, and MAY have logic that the current version knows nothing about.
This is why migrations code uses its own "dumb" implementation for loading and examining data (see `sandbox/grist/table_data_set.py`), because trying to load an older document using our primary code base will usually fail, since the document will not satisfy our current assumptions.
## Restrictions
The rules below should make it at least barely possible to share documents by people who are not all on the same Grist version (even so, it will require more work). It should also make it somewhat safe to upgrade and then open the document with a previous version.
WARNING: Do not remove, modify, or rename metadata tables or columns.
Mark old columns and tables as deprecated using a comment. We may want to add a feature to mark them in code, to prevent their use in new versions. For now, it's enough to add a comment and remove references to the deprecated entities throughout code. An important goal is to prevent adding same-named entities in the future, or reusing the same column with a different meaning. So please add a comment of the form:
```lang=python
# <columnName> is deprecated as of version XX. Do not remove or reuse.
```
To justify keeping old columns around, consider what would happen if A (at version 10) communicates with B (at version 11). If column "foo" exists in v10, and is deleted in v11, then A may send actions that refer to "foo", and B would consider them invalid, since B's code has no idea what "foo" is. The solution is that B needs to still know about "foo", hence we don't remove old columns.
Similar justification applies to renaming columns, or modifying them (e.g. changing a type).
WARNING: If you change the meaning or type of a column, you have to create a new column with a new name.
You'll also need to write a migration to fill it from the old column, and would mark the old column as deprecated.

53
documentation/urls.md Normal file
View File

@ -0,0 +1,53 @@
Document URLs
-----------------
Status: WIP
Options
* An id (e.g. google)
* Several ids (e.g. airtable)
* A text name
* Several text names (e.g. github)
* An id and friendly name (e.g. dropbox)
Leaning towards an id and friendly name. Only id is interpreted by router. Name is checked only to make sure it matches current name of document. If not, we redirect to revised url before proceeding.
Length of ids depends on whether we'll be using them for obscurity to enable anyone-who-has-link-can-view style security.
Possible URLs
---------------
* docs.getgrist.com/viwpHfmtMHmKBUSyh/Document+Name
* orgname.getgrist.com/viwpHfmtMHmKBUSyh/Document+Name
* getgrist.com/d/viwpHfmtMHmKBUSyh/Document+Name
* getgrist.com/d/tblWVZDtvlsIFsuOR/viwpHfmtMHmKBUSyh/Document+Name
* getgrist.com/d/dd5bf494e709246c7601e27722e3aee656b900082c3f5f1598ae1475c35c2c4b/Document+Name
* getgrist.com/doc/fTSIMrZT3fDTvW7XDBq1b7nhWa24Zl55EVpsaO3TBBE/Document%20Name
Organization subdomains
------------------------------
Organizations get to choose a subdomain, and will access their workspaces and documents at `orgname.getgrist.com`. In addition, personal workspaces need to be uniquely determined by a URL, using `docs-` followed by the numeric id of the "personal organization":
* docs-1234.getgrist.com/
* docs.getgrist.com/o/docs-1234/
Since subdomains need to play along with all the other subdomains we use for getgrist.com, the following is a list of names that may NOT be used by any organization:
* `docs-\d+` to identify personal workspaces
* Anything that starts with underscore (`_`) (this includes special subdomains like `_domainkey`)
* Subdomains used by us for various purposes. As of 2018-10-09, these include:
* aws
* gristlogin
* issues
* metrics
* phab
* releases
* test
* vpn
* www
Some more reserved subdomains:
* doc-worker-NN
* v1-* (this could be released eventually, but currently in our code and/or routing "v1-mock", "v1-docs", "v1-static", and any other "v1-*" are special
* docs
* api

View File

@ -4,7 +4,7 @@ create tables, add and remove columns, etc, Grist stores various document metada
users' tables, views, etc.) also in tables.
Before changing this file, please review:
https://phab.getgrist.com/w/migrations/
/documentation/migrations.md
"""