-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add fold_node RFC #1526
base: master
Are you sure you want to change the base?
[RFC] Add fold_node RFC #1526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several other design questions which I think need to be discussed. I split those into separate comments to organize discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, since the node is supposed to handle multiple input streams, we need to decide how similar these streams must be. Should the same initial value be used, or should it be per stream? Should the same binary operation be used, or can/should it be separate for each stream? Should the types of input and output be the same for all streams, or can those differ as well?
Of course I see that in the proposal all those things are the same for all streams. But it is not discussed, rather assumed as given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cases should help in answering this question. Probably, the same type and same initializer value will make sense for most use cases, but we should add support for this choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related question is on the number of input and output ports. Do we expect all streams to go into a single port, or should there be a single port per stream, or even many-to-many? Of course we can add a preceding indexer_node
for the port-per-stream setup in front of a single input port (and it would also nicely create tagged_msg
es). But if the 1-1 correspondence is expected to be the typical use, maybe we would not want to complicate it. Also there is no "deindexer" node to put after a single output port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting point. We have a split_node that is complementary to join_node, joining and splitting tuples. But nothing that is complementary to an indexer_node. The indexer_node was introduced to tagged messages that came into a single point of a functional_node. So far, I don't think we've seen the need to automatically send indexed messages to different output ports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we see the case of folding a single stream to be important enough, and would it be sufficiently easy to do with the node designed to handle multiple streams? Perhaps some code samples could show that.
as an input message together with the actual data to compute. It is proposed to use ``tbb::flow::tagged_msg`` class for that purpose | ||
as an input for the ``fold_node`` that would be sent by the predecessor or by manual ``try_put`` calls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tagged_msg
was designed to hold data of multiple types, it's a 'variant' on steroids. If we decide to require a single data type for all streams, tagged_msg
will likely be not that convenient to use and may also add some overheads. In other words, I think the way to combine the data with the stream index will depend on the overall node design.
From the implementation perspective, there can be multiple approaches for implementing the ``fold_stream_end`` that affects the user API. | ||
Some of them were considered in a [separate section](#fold_stream_end-implementation-approaches). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that all the approaches are based on a message counter. The question then is, at which point the total number of messages to be folded (the stream "size") becomes known. If it is known a priori, or if we want to compute partial folds for each K messages, the counter might be set even at construction. If the stream size is dynamic but known before the first message, we may require that the stream is somehow "initialized" to set that counter.
The most flexible, but also the most complicated and risky approach is to get the stream size at any point, or to get the end of stream message. If the message does not carry any counter, there are problems with potential "still in the flight" messages that might be missed. If the message does carry a counter, what to do if more messages were already received and folded? Also, the message type is kind of problematic, as both the stream size and the end of stream message are likely of different type than the stream data.
A possible design alternative is to have a dedicated port for the "end of stream" signals.
In this case, having the internal counter not equal to _0_ indicates that there are still elements to process and once it is equal to _0_, the result | ||
of the fold is considered full and can be propagated to the ``fold_node`` successors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the single counter for both expected (negative count) and received (positive count) messages, which triggers at 0, has a downside - it loses the information about the number of messages and cannot pass it down. This value was computed as a byproduct of folding, and in theory it might not be known anywhere else, so it could possibly be useful in the subsequent processing.
The same flexible approach can be applied for solving this as well: | ||
* If the invocation of _fold operation_ with an object of type | ||
|
||
It is also an open question, should ``input_type`` be allowed as a first, second, or both operands of _fold operation_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, there can be a separate optional argument for the stream index.
is well-formed - propagate the index of the stream to the body. | ||
* Otherwise - the index of the input stream is not propagated to the _fold operation_. | ||
|
||
### Should ``InputType`` and ``OutputType`` be different? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, generally these should be different types. If I remember correctly, one of our motivating scenarios is computing a histogram for a data sequence; it is obviously of a different type than the data elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is important to outline some use cases which motivate creation of the fold node. It will guide the design and help to prevent "suboptimal" decisions.
A few simplified use case ideas to consider:
- compute the word/token count in a text file read line by line.
- compute the sum of floating-point values, assuming that the sum might be many orders of magnitude bigger than some individual values.
- "reduce by key": for a key-value sequence, compute the value sums for all different keys.
|
||
// Submit stream1 | ||
for (auto item : stream1) { | ||
f_node.try_put(input_type{stream1_index, item}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_node.try_put(input_type{stream1_index, item}); | |
f_node.try_put(fold_input{stream1_index, item}); |
} | ||
|
||
for (auto item : stream2) { | ||
f_node.try_put(input_type{stream2_index, item}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_node.try_put(input_type{stream2_index, item}); | |
f_node.try_put(fold_input{stream2_index, item}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cases should help in answering this question. Probably, the same type and same initializer value will make sense for most use cases, but we should add support for this choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting point. We have a split_node that is complementary to join_node, joining and splitting tuples. But nothing that is complementary to an indexer_node. The indexer_node was introduced to tagged messages that came into a single point of a functional_node. So far, I don't think we've seen the need to automatically send indexed messages to different output ports.
Description
Add RFC document for new
fold_node
in the Flow Graph.Fixes # - issue number(s) if exists
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
Tests
Documentation
Breaks backward compatibility
Notify the following users
List users with
@
to send notificationsOther information