Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiles: update default value field in message Profile #608

Closed
wants to merge 1 commit into from

Conversation

florianl
Copy link
Contributor

@florianl florianl commented Dec 9, 2024

To indicate the default kind of the sample type, the pointer should point to the array of sample_type. This avoids issues where the string that is pointed to by default_sample_type_strindex does not match with the value types in sample_type.

ping @open-telemetry/profiling-maintainers @jhalliday

To indicate the default kind of the sample type, the pointer should point to the array of `sample_type`. This avoids issues where the string that is pointed to by `default_sample_type_strindex` does not match with the value types in `sample_type`.

ping @open-telemetry/profiling-maintainers @jhalliday
@florianl florianl requested review from a team as code owners December 9, 2024 07:11
@@ -269,6 +265,9 @@ message Profile {
// The field is optional, however if it is present then equivalent converted data should be populated in other fields
// of this message as far as is practicable.
bytes original_payload = 21;

// Index into the sample_type array for the type of the default sample value.
int32 default_sample_type_index = 23;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the discussion in #606 (comment) this field ID could also be 16. 23 is chosen here, as this is the next free field ID once #606 is merged.

@@ -269,6 +265,9 @@ message Profile {
// The field is optional, however if it is present then equivalent converted data should be populated in other fields
// of this message as far as is practicable.
bytes original_payload = 21;

// Index into the sample_type array for the type of the default sample value.
int32 default_sample_type_index = 23;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids issues where the string that is pointed to by default_sample_type_strindex does not match with the value types in sample_type.

Did you see this issue in practice? It's not very obvious to me that this change is worth it:

  • The distinction between _index and _strindex can be subtle to some producers and consumers, causing bugs potentially.
  • The index can still be out of bound and needs care similar to the string. Arguably, bugs around not adjusting the index can be more subtle - e.g. if certain profile processing drops some sample types but does not adjust the default sample type, it may be easier to notice and debug that if the default type is a string.

Also note that this changes the semantics of unset default sample type compared to pprof: here the unset field will effectively mean first sample type, in pprof that means last:

  // Index into the string table of the type of the preferred sample
  // value. If unset, clients should default to the last sample value.
  int64 default_sample_type = 14;

Copy link
Contributor Author

@florianl florianl Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The named issues around _index vs _strindex and indexing issues of these fields can be argued about every _strindex and _index field in the protocol and therefore are not specific to this field.

Unfortunately there are implementations, where default_sample_type pointed to microseconds (in the string table) and the unit in sample_type was in nanoseconds. Which one has the precedence in such a case?
If default_sample_type_index is an index into sample_type then such issues will not happen.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that example it seems odd that the sample type name had the unit. I'd expect the name of the sample type to be "time" or something like that, and that's what the default sample type field would also contain. And microseconds or nanoseconds would be the unit field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm mixing something. The value in sample_type was [["cpu","nanoseconds"]] - just like the example in the documentation to sample_type. While default_sample_type pointed to a value in the string_table that referenced microseconds.
So to me the values in sample_type and default_sample_type are valid by themselves. But as the unit (nanoseconds) in sample_type conflicts with the value in default_sample_type it is hard to interpret the profile in the correct way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_sample_type_strindex is supposed to point to one of ValueType.type_strindex values.

For example, for sample types of [{"cpu_time","nanoseconds"}, {"num_samples", "count"}], default_sample_type_strindex can point to the string index for cpu_time or num_samples, but not to nanoseconds or count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your example :) I think it points out exactly the issue with this field. While I run into issues with nanoseconds vs microseconds your example set it even to something different.
If one looks at the data one can usually differentiate and interpret data correctly. But such differences make it hard to build tooling for automation and potential recommendation.
For that reason, default_sample_type_strindex should become default_sample_type_index and point to an element in sample_type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the unit name is clearly not unique - two or more field can use the same unit (e.g. bytes). I'm not sure why this is ambiguous tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my motivation with this proposal is to avoid named issues in the future by pointing the index for the default value to the elements that already exist in sample_type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #610 as an alternative and will close this proposal.

@florianl
Copy link
Contributor Author

An alternative is proposed with #610

@florianl florianl closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants