(core) Add BulkAddOrUpdateRecord action for efficiency

Summary:
This diff adds a new `BulkAddOrUpdateRecord` user action which is what is sounds like:

- A bulk version of the existing `AddOrUpdateRecord` action.
- Much more efficient for operating on many records than applying many individual actions.
- Column values are specified as maps from `colId` to arrays of values as usual.
- Produces bulk versions of `AddRecord` and `UpdateRecord` actions instead of many individual actions.

Examples of users wanting to use something like `AddOrUpdateRecord` with large numbers of records:

- https://grist.slack.com/archives/C0234CPPXPA/p1651789710290879
- https://grist.slack.com/archives/C0234CPPXPA/p1660743493480119
- https://grist.slack.com/archives/C0234CPPXPA/p1660333148491559
- https://grist.slack.com/archives/C0234CPPXPA/p1663069291726159

I tested what made many `AddOrUpdateRecord` actions slow in the first place. It was almost entirely due to producing many individual `AddRecord` user actions. About half of that time was for processing the resulting `AddRecord` doc actions. Lookups and updates were not a problem. With these changes, the slowness is gone.

The Python user action implementation is more complex but there are no surprises. The JS API now groups `records` based on the keys of `require` and `fields` so that `BulkAddOrUpdateRecord` can be applied to each group.

Test Plan: Update and extend Python and DocApi tests.

Reviewers: jarek, paulfitz

Reviewed By: jarek, paulfitz

Subscribers: jarek

Differential Revision: https://phab.getgrist.com/D3642
This commit is contained in:
Alex Hall
2022-09-28 15:13:07 +02:00
parent df65219729
commit 1864b7ba5d
6 changed files with 261 additions and 40 deletions

View File

@@ -355,6 +355,10 @@ export function parseUserAction(ua: UserAction, docData: DocData): UserAction {
ua = _parseUserActionColValues(ua, docData, false, 2);
ua = _parseUserActionColValues(ua, docData, false, 3);
return ua;
case 'BulkAddOrUpdateRecord':
ua = _parseUserActionColValues(ua, docData, true, 2);
ua = _parseUserActionColValues(ua, docData, true, 3);
return ua;
default:
return ua;
}

View File

@@ -5,6 +5,7 @@ import { arrayRepeat } from './gutil';
import flatMap = require('lodash/flatMap');
import isEqual = require('lodash/isEqual');
import pick = require('lodash/pick');
import groupBy = require('lodash/groupBy');
/**
* An implementation of the TableOperations interface, given a platform
@@ -59,8 +60,21 @@ export class TableOperationsImpl implements TableOperations {
allow_empty_require: upsertOptions?.allowEmptyRequire
};
const recordOptions: OpOptions = pick(upsertOptions, 'parseStrings');
const actions = records.map(rec =>
["AddOrUpdateRecord", tableId, rec.require, rec.fields || {}, options]);
// Group records based on having the same keys in `require` and `fields`.
// A single bulk action will be applied to each group.
// We don't want one bulk action for all records that might have different shapes,
// because that would require filling arrays with null values.
const recGroups = groupBy(records, rec => {
const requireKeys = Object.keys(rec.require).sort().join(',');
const fieldsKeys = Object.keys(rec.fields || {}).sort().join(',');
return `${requireKeys}:${fieldsKeys}`;
});
const actions = Object.values(recGroups).map(group => {
const require = convertToBulkColValues(group.map(r => ({fields: r.require})));
const fields = convertToBulkColValues(group.map(r => ({fields: r.fields || {}})));
return ["BulkAddOrUpdateRecord", tableId, require, fields, options];
});
await this._applyUserActions(tableId, [...fieldNames(records)],
actions, recordOptions);
return [];

View File

@@ -77,6 +77,10 @@ function isAclTable(tableId: string): boolean {
return ['_grist_ACLRules', '_grist_ACLResources'].includes(tableId);
}
function isAddOrUpdateRecordAction(a: UserAction): boolean {
return ['AddOrUpdateRecord', 'BulkAddOrUpdateRecord'].includes(String(a[0]));
}
// A list of key metadata tables that need special handling. Other metadata tables may
// refer to material in some of these tables but don't need special handling.
// TODO: there are other metadata tables that would need access control, or redesign -
@@ -128,8 +132,9 @@ const OTHER_RECOGNIZED_ACTIONS = new Set([
'BulkRemoveRecord',
'ReplaceTableData',
// A data action handled specially because of read needs.
// Data actions handled specially because of read needs.
'AddOrUpdateRecord',
'BulkAddOrUpdateRecord',
// Groups of actions.
'ApplyDocActions',
@@ -979,21 +984,22 @@ export class GranularAccess implements GranularAccessForBundle {
// way to do that within the data engine as currently
// formulated. Could perhaps be done for on-demand tables though.
private async _checkAddOrUpdateAccess(docSession: OptDocSession, actions: UserAction[]) {
if (!scanActionsRecursively(actions, (a) => a[0] === 'AddOrUpdateRecord')) {
if (!scanActionsRecursively(actions, isAddOrUpdateRecordAction)) {
// Don't need to apply this particular check.
return;
}
// Fail if being combined with anything fancy.
if (scanActionsRecursively(actions, (a) => {
const name = a[0];
return !['ApplyUndoActions', 'ApplyDocActions', 'AddOrUpdateRecord'].includes(String(name)) &&
return !['ApplyUndoActions', 'ApplyDocActions'].includes(String(name)) &&
!isAddOrUpdateRecordAction(a) &&
!(isDataAction(a) && !getTableId(a).startsWith('_grist_'));
})) {
throw new Error('Can only combine AddOrUpdate with simple data changes');
}
// Check for read access, and that we're not touching metadata.
await applyToActionsRecursively(actions, async (a) => {
if (a[0] !== 'AddOrUpdateRecord') { return; }
if (!isAddOrUpdateRecordAction(a)) { return; }
const tableId = validTableIdString(a[1]);
if (tableId.startsWith('_grist_')) {
throw new Error(`AddOrUpdate cannot yet be used on metadata tables`);