(core) Python optimizations to speed up data engine

Summary:
- A bunch of optimizations guided by python profiling (esp. py-spy)
- Big one is optimizing Record/RecordSet attribute access
- Adds tracemalloc printout when running test_replay with PYTHONTRACEMALLOC=1 (on PY3)
  (but memory size is barely affected by these changes)

- Testing with RECORD_SANDBOX_BUFFERS_DIR, loading and calculating a particular
  very large doc (CRM), time taken improved from 73.9s to 54.8s (26% faster)

Test Plan: No behavior changes intended; relying on existing tests to verify that.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3781
This commit is contained in:
Dmitry S
2023-02-04 11:20:13 -05:00
parent 7c448d746f
commit 9d4eeda480
8 changed files with 144 additions and 84 deletions

View File

@@ -161,9 +161,9 @@ class SimpleLookupMapColumn(BaseLookupMapColumn):
def _recalc_rec_method(self, rec, table):
old_key = self._row_key_map.lookup_left(rec._row_id)
# Note that rec._get_col(_col_id) is what creates the correct dependency, as well as ensures
# Note that getattr(rec, _col_id) is what creates the correct dependency, as well as ensures
# that the columns used to index by are brought up-to-date (in case they are formula columns).
new_key = tuple(_extract(rec._get_col(_col_id)) for _col_id in self._col_ids_tuple)
new_key = tuple(_extract(getattr(rec, _col_id)) for _col_id in self._col_ids_tuple)
try:
self._row_key_map.insert(rec._row_id, new_key)
@@ -188,9 +188,9 @@ class ContainsLookupMapColumn(BaseLookupMapColumn):
# looked up with CONTAINS()
new_keys_groups = []
for col_id in self._col_ids_tuple:
# Note that _get_col is what creates the correct dependency, as well as ensures
# Note that getattr() is what creates the correct dependency, as well as ensures
# that the columns used to index by are brought up-to-date (in case they are formula columns).
group = rec._get_col(extract_column_id(col_id))
group = getattr(rec, extract_column_id(col_id))
if isinstance(col_id, _Contains):
# Check that the cell targeted by CONTAINS() has an appropriate type.