-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry design #11175
base: main
Are you sure you want to change the base?
Telemetry design #11175
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks!!
|
||
### Security | ||
|
||
- Providing a method for creating a hook in Framework MSBuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Providing a method for creating a hook in Framework MSBuild | |
- Providing or/and documenting a method for creating a hook in Framework MSBuild |
### Security | ||
|
||
- Providing a method for creating a hook in Framework MSBuild | ||
- document the security implications of hooking custom telemetry Exporters/Collectors in Framework |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- document the security implications of hooking custom telemetry Exporters/Collectors in Framework | |
- If custom hooking solution will be used - document the security implications of hooking custom telemetry Exporters/Collectors in Framework |
Since we plan to use AppDomainManager - we are using existing solution that is outside of our trust boundaries
### Data handling | ||
|
||
- Implement head [Sampling](https://opentelemetry.io/docs/concepts/sampling/) with the granularity of a MSBuild.exe invocation/VS instance. | ||
- VS Data handle tail sampling in their infrastructure not to overwhelm storage with a lot of build events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed - we should not prevent ourselves to be able to add (in future versions):
- different sampling rates for different namespaces/activities
- ability to configure the overal and per-namespace sampling from server side (e.g. storing it in the .msbuild folder in user profile if different then default values set from server side - this would obviously have a delay of the default sample rate # of executions)
|
||
## Looking ahead | ||
|
||
- Create a way of using a "HighPrioActivitySource" which would override sampling and initialize Collector in MSBuild.exe scenario/tracerprovider in VS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generaly - sample rate per Activity/namespace (higher even always or even lower or newer)
## Uncertainties | ||
|
||
- Configuring tail sampling in VS telemetry server side infrastructure to not overflow them with data. | ||
- How much head sampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just ballpark estimate some rates or possibly we can use some little statistic science behind the sample size determination: https://en.wikipedia.org/wiki/Sample_size_determination
E.g. for proportion estimation (of fairly common occurence in the builds), with not very strict confidnece (let's say 95% is awesome for us now) and margin for error (5% is very acceptable for us) and quite high population size (let's estimate # of total daily build events to be between 10M and 100M [while in fact much more close to the uppor bound]), we would be very fine with the sampling rate of 1 from 26.000
Sample table of sample size for proprtion hypothesis: https://www.research-advisors.com/images/subpage/SSTable.jpg
For more rare events (runaway builds, custom tasks etc.) we'd need to adjust apropriately to capture at least couple hundrets datapoints daily ... that should still allow for considerably small sampling rates and hence low impact on the observed builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. this might be as well a partial answer to some below open questions around perf - if we are not able to get the perf to be sufficient for regular executions, but still quite around 'human noticable threshold' (per various UX researches ~100ms) - we might just choose to pay the cost in very low amount of cases
Fixes #10947
Context
Writeup of proposed telemetry implementation based on experimentation in #11084