Generic logic for trie updates, follow-ups #12361

Longarithm · 2024-11-01T15:56:37Z

Description

Follow-ups after #12324.

Use memory usages computed on the fly for memtries.
Fix structure of trie/mem, including resharding.rs and updating.rs. Lots of logic there is not relevant to memtries anymore.
Remove TrieNodeWithSize. It is still used in trie iteration and state parts generation. It's not really needed there, the only usecase is TrieNodeWithSize::Empty which probably should be handled in other way.
Generalise flatten_nodes, find if using the same Value type is possible, find if updated_nodes also can be abstracted, as this is just a vector of new nodes for both cases.
Generalise lookups.
I think we can use some common type for ValueHandle and FlatStateValue, but don't understand how.

The text was updated successfully, but these errors were encountered:

Conclude the work needed to remove copypaste for `insert` and `delete` for memtries and trie storage. #12324 ### Changes Final API for trie update * ensure_updated, take_node, place_node, place_node_at, get_node_ref - API to work with new nodes. Convenient to create parent-child references on the fly using `GenericNodeOrIndex`. * store_value, delete_value - API to store newly inserted values, doesn't need such careful handling. The diff is unfortunately a mess of type changes and renamings. They are mostly straightforward though. Probably it is fine to do some random checks that conversion is done correctly (like UpdatedMemTrieNode -> GenericUpdatedTrieNodeWithSize). I took memtries impl as a base, so the biggest challenge was to reintroduce memory usage recomputation. Probably worth checking correctness with `Trie::insert|delete` which are completely removed now. I don't fully get why the LoC diff is +70 in total... Maybe it is spent to setup all the traits, and introducing memory usages for memtries - if we can remove `TrieNodeWithSize`, it will get better. And having more lines for setup but less lines of complex copypaste on which we heavily rely is already better. ### Testing I ran `near-store` tests on all 4 combinations - (current on generic impl for memtries) x (current or generic impl for trie storage). It helped to catch two bugs when merging trie storage and memtries code. ### Future possibilities (#12361) * Make `updating.rs` smaller. I moved everything there because it was easier for accessing new traits in `Trie` when I did intermediate work. * Remove `TrieNodeWithSize`. I hoped to do this but it is still used in trie iteration and state parts generation. It's not really needed there, the only usecase is `TrieNodeWithSize::Empty` which probably should be handled in other way. * Generalise flatten_nodes, find if using the same Value type is possible, find if `updated_nodes` also can be abstracted, as this is just a vector of new nodes for both cases. * Generalise lookups.

Issue: #12361 After #12359, we can take memory usages from `updated_nodes` instead of recomputing them again. The original plan was to remove reading child nodes completely, where it is not necessary. However, we still need to read hash, and hash is still a part of child' `MemTrieNodeId`... Moreover, it doesn't make sense to store hashes of children in the node for memory savings. I think we just need to live with it then. At least the usecase is very clear now. If we ever want to make memtrie recording cleaner (make it a side effect of `MemTrieNodePtr::view`?), we need to skip recording node only if its hash is needed.

shreyan-gupta · 2024-11-06T16:55:08Z

Alex and I discussed a bit more on how could we improve the Trie and MemTrie interface and interop and here are a couple of ideas

Recorder code for Trie and MemTrie are separate and in a mess. We don't know a good "location"/abstraction level to put the recorder to generalize between Trie and Memtrie.
Recorder follows a different logic from tracking trie changes. One is used for state witness, other is used to figure out ref count deltas. These are almost the same but not quite.
Like mentioned in the description above, the lookup code for Trie and MemTrie are pretty much the same and can be generalized. See memtrie_lookup in core/store/src/trie/mem/lookup.rs and lookup_from_state_column in core/store/src/trie/mod.rs
Iterator code for memtrie and trie is pretty much the same and can be generalized
Generalize flatten_nodes function (mentioned in description). For more context, we have the TrieStorageUpdate::flatten_nodes function that iteratively does post order DFS of trie nodes. In comparison to that, for memtrie, we have the post_order_traverse_updated_nodes in MemTrieUpdate that is a recursive function but ONLY returns the order of node traversal.
We have GenericUpdatedTrieNode but we should also consider having type GenericTrieNode for all non-update operations.

Longarithm added the T-core Team: issues relevant to the core team label Nov 1, 2024

Longarithm mentioned this issue Nov 1, 2024

feat: generic trie insert and delete #12359

Merged

Longarithm mentioned this issue Nov 1, 2024

fix: remove recomputing memtrie node memory usages #12362

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic logic for trie updates, follow-ups #12361

Generic logic for trie updates, follow-ups #12361

Longarithm commented Nov 1, 2024 •

edited

Loading

shreyan-gupta commented Nov 6, 2024

Generic logic for trie updates, follow-ups #12361

Generic logic for trie updates, follow-ups #12361

Comments

Longarithm commented Nov 1, 2024 • edited Loading

Description

shreyan-gupta commented Nov 6, 2024

Longarithm commented Nov 1, 2024 •

edited

Loading