(core) Python optimizations to speed up data engine

Summary:
- A bunch of optimizations guided by python profiling (esp. py-spy)
- Big one is optimizing Record/RecordSet attribute access
- Adds tracemalloc printout when running test_replay with PYTHONTRACEMALLOC=1 (on PY3)
  (but memory size is barely affected by these changes)

- Testing with RECORD_SANDBOX_BUFFERS_DIR, loading and calculating a particular
  very large doc (CRM), time taken improved from 73.9s to 54.8s (26% faster)

Test Plan: No behavior changes intended; relying on existing tests to verify that.

Reviewers: georgegevoian

Reviewed By: georgegevoian

Differential Revision: https://phab.getgrist.com/D3781
This commit is contained in:
Dmitry S
2023-02-04 11:20:13 -05:00
parent 7c448d746f
commit 9d4eeda480
8 changed files with 144 additions and 84 deletions

View File

@@ -44,7 +44,7 @@ import unittest
from main import run
from sandbox import Sandbox
import six
def marshal_load_all(path):
result = []
@@ -65,6 +65,7 @@ class TestReplay(unittest.TestCase):
root = os.environ.get("RECORD_SANDBOX_BUFFERS_DIR")
if not root:
self.skipTest("RECORD_SANDBOX_BUFFERS_DIR not set")
for dirpath, dirnames, filenames in os.walk(root):
if "input" not in filenames:
continue
@@ -76,9 +77,18 @@ class TestReplay(unittest.TestCase):
new_output_path = os.path.join(dirpath, "new_output")
with open(input_path, "rb") as external_input:
with open(new_output_path, "wb") as external_output:
if six.PY3:
import tracemalloc # pylint: disable=import-error
tracemalloc.reset_peak()
sandbox = Sandbox(external_input, external_output)
run(sandbox)
# Run with env PYTHONTRACEMALLOC=1 to trace and print peak memory (runs much slower).
if six.PY3 and tracemalloc.is_tracing():
mem_size, mem_peak = tracemalloc.get_traced_memory()
print("mem_size {}, mem_peak {}".format(mem_size, mem_peak))
original_output = marshal_load_all(output_path)
# _send_to_js does two layers of marshalling,