Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic logic for trie updates, follow-ups #12361

Open
Longarithm opened this issue Nov 1, 2024 · 1 comment
Open

Generic logic for trie updates, follow-ups #12361

Longarithm opened this issue Nov 1, 2024 · 1 comment
Labels
T-core Team: issues relevant to the core team

Comments

@Longarithm
Copy link
Member

Longarithm commented Nov 1, 2024

Description

Follow-ups after #12324.

  • Use memory usages computed on the fly for memtries.
  • Fix structure of trie/mem, including resharding.rs and updating.rs. Lots of logic there is not relevant to memtries anymore.
  • Remove TrieNodeWithSize. It is still used in trie iteration and state parts generation. It's not really needed there, the only usecase is TrieNodeWithSize::Empty which probably should be handled in other way.
  • Generalise flatten_nodes, find if using the same Value type is possible, find if updated_nodes also can be abstracted, as this is just a vector of new nodes for both cases.
  • Generalise lookups.
  • I think we can use some common type for ValueHandle and FlatStateValue, but don't understand how.
@Longarithm Longarithm added the T-core Team: issues relevant to the core team label Nov 1, 2024
github-merge-queue bot pushed a commit that referenced this issue Nov 1, 2024
Conclude the work needed to remove copypaste for `insert` and `delete`
for memtries and trie storage. #12324

### Changes

Final API for trie update

* ensure_updated, take_node, place_node, place_node_at, get_node_ref -
API to work with new nodes. Convenient to create parent-child references
on the fly using `GenericNodeOrIndex`.
* store_value, delete_value - API to store newly inserted values,
doesn't need such careful handling.

The diff is unfortunately a mess of type changes and renamings. They are
mostly straightforward though. Probably it is fine to do some random
checks that conversion is done correctly (like UpdatedMemTrieNode ->
GenericUpdatedTrieNodeWithSize).

I took memtries impl as a base, so the biggest challenge was to
reintroduce memory usage recomputation. Probably worth checking
correctness with `Trie::insert|delete` which are completely removed now.

I don't fully get why the LoC diff is +70 in total... Maybe it is spent
to setup all the traits, and introducing memory usages for memtries - if
we can remove `TrieNodeWithSize`, it will get better. And having more
lines for setup but less lines of complex copypaste on which we heavily
rely is already better.

### Testing

I ran `near-store` tests on all 4 combinations - (current on generic
impl for memtries) x (current or generic impl for trie storage). It
helped to catch two bugs when merging trie storage and memtries code.

### Future possibilities (#12361)

* Make `updating.rs` smaller. I moved everything there because it was
easier for accessing new traits in `Trie` when I did intermediate work.
* Remove `TrieNodeWithSize`. I hoped to do this but it is still used in
trie iteration and state parts generation. It's not really needed there,
the only usecase is `TrieNodeWithSize::Empty` which probably should be
handled in other way.
* Generalise flatten_nodes, find if using the same Value type is
possible, find if `updated_nodes` also can be abstracted, as this is
just a vector of new nodes for both cases.
* Generalise lookups.
github-merge-queue bot pushed a commit that referenced this issue Nov 4, 2024
Issue: #12361

After #12359, we can take memory usages from `updated_nodes` instead of
recomputing them again.

The original plan was to remove reading child nodes completely, where it
is not necessary. However, we still need to read hash, and hash is still
a part of child' `MemTrieNodeId`... Moreover, it doesn't make sense to
store hashes of children in the node for memory savings.

I think we just need to live with it then. At least the usecase is very
clear now. If we ever want to make memtrie recording cleaner (make it a
side effect of `MemTrieNodePtr::view`?), we need to skip recording node
only if its hash is needed.
@shreyan-gupta
Copy link
Contributor

Alex and I discussed a bit more on how could we improve the Trie and MemTrie interface and interop and here are a couple of ideas

  • Recorder code for Trie and MemTrie are separate and in a mess. We don't know a good "location"/abstraction level to put the recorder to generalize between Trie and Memtrie.
  • Recorder follows a different logic from tracking trie changes. One is used for state witness, other is used to figure out ref count deltas. These are almost the same but not quite.
  • Like mentioned in the description above, the lookup code for Trie and MemTrie are pretty much the same and can be generalized. See memtrie_lookup in core/store/src/trie/mem/lookup.rs and lookup_from_state_column in core/store/src/trie/mod.rs
  • Iterator code for memtrie and trie is pretty much the same and can be generalized
  • Generalize flatten_nodes function (mentioned in description). For more context, we have the TrieStorageUpdate::flatten_nodes function that iteratively does post order DFS of trie nodes. In comparison to that, for memtrie, we have the post_order_traverse_updated_nodes in MemTrieUpdate that is a recursive function but ONLY returns the order of node traversal.
  • We have GenericUpdatedTrieNode but we should also consider having type GenericTrieNode for all non-update operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-core Team: issues relevant to the core team
Projects
None yet
Development

No branches or pull requests

2 participants