Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Metrics section to capabilities in understanding domain #1068

Open
wants to merge 9 commits into
base: features/mslearn
Choose a base branch
from

Conversation

KevDLR
Copy link
Contributor

@KevDLR KevDLR commented Oct 18, 2024

πŸ› οΈ Description

Adding a Metrics/ KPI section to the capability to provide guidance on the Metric lens of the FinOps assessment.

Fixes
N/A

πŸ“‹ Checklist

πŸ”¬ How did you test this change?

  • 🀏 Lint tests
  • 🀞 PS -WhatIf / az validate
  • πŸ‘ Manually deployed + verified
  • πŸ’ͺ Unit tests
  • πŸ™Œ Integration tests

πŸ™‹β€β™€οΈ Do any of the following that apply?

  • 🚨 This is a breaking change.
  • 🀏 The change is less than 20 lines of code.

πŸ“‘ Did you update docs/changelog.md?

  • βœ… Updated changelog (required for dev PRs)
  • ➑️ Will add log in a future PR (feature branch PRs only)
  • ❎ Log not needed (small/internal change)

πŸ“– Did you update documentation?

  • βœ… Public docs in docs (required for dev)
  • βœ… Internal dev docs in src (required for dev)
  • ➑️ Will add docs in a future PR (feature branch PRs only)
  • ❎ Docs not needed (small/internal change)

Copy link
Collaborator

@flanakin flanakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I haven't looked at it from a completeness perspective, but I tink it's a great list! My comments are mostly around landing the right way to bring metrics into the guide altogether.

@@ -108,6 +108,21 @@ At this point, you have a data pipeline and are ingesting data into a central da

<br>

## Data Ingestion Metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the same headers across all files so we can link to them generically. Also note we should use sentence casing rather than title casing to align with the Microsoft Style Guide.

Suggested change
## Data Ingestion Metrics
## KPIs and metrics

@@ -108,6 +108,21 @@ At this point, you have a data pipeline and are ingesting data into a central da

<br>

## Data Ingestion Metrics

| **Category** | **Definition** | **KPI** |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update all of the categories to be sentence cased to align to the Microsoft Style Guide.

@@ -108,6 +108,21 @@ At this point, you have a data pipeline and are ingesting data into a central da

<br>

## Data Ingestion Metrics

| **Category** | **Definition** | **KPI** |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each KPI should include a formula. We may not be able to format this as a table.


| **Category** | **Definition** | **KPI** |
|----------|-----------|-----|
| Data Completeness | Measures the extent to which all required data fields are present in the dataset and tracks the overall data completeness trend over a specified period.| Percentage of data fields that are complete and the overall data completeness over time. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How feasible is it to measure this? I'm not pushing back. It sounds like the right thing to do, but do they have a way to actually measure it? How would we calculate this for them? Should we outline any potential challenges they may have in collecting this to give them a heads up? I'd hate for someone to take this list and say, "let's go track all these" and then realize there's no way to do it.

@@ -108,6 +108,21 @@ At this point, you have a data pipeline and are ingesting data into a central da

<br>

## Data Ingestion Metrics

| **Category** | **Definition** | **KPI** |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add each one of these into the backlog for adding to Power BI?

|Data Ingestion Frequency | Measures how often data is ingested into the system. | Number of data ingestion events per unit of time (daily, weekly, monthly, quarterly, annually). |
| Volume of Data Ingested | Measures the total volume of data ingested into the repository. | Total volume of data ingested into the repository. |
| Growth Rate | Measure the rate at which the volume of data ingested is increasing over time. | Percentage increase of total data volume in repository per unit of time. |
| Ingestion Latency | Measures the average time taken for data to be ingested into the repository and tracks the trend of this latency over a specified period. | Mean time of data ingestion latency and the latency trend over a specified period. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one. A few thoughts:

  1. Do we need to call out that latency may differ by dataset?
  2. Do you intend to use "mean" time? Not average or percentile? All have merits, so just confirming.
  3. This can likely be split into multiple KPIs.
  4. Is latency trend a KPI or a visualization of a KPI over time? Not sure if visualizations need to be called out here unless we need to speak to the value of the visual. I'm open to either approach. Just thinking out loud to keep this simple. If we do keep it, it's probably a separate KPI that might be better if we can quantify a single number for it. Not sure πŸ€”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added per dataset in the Formula column, adjusted to average instead of mean time, and have removed the trend component as this is a result of the KPI

| Volume of Data Ingested | Measures the total volume of data ingested into the repository. | Total volume of data ingested into the repository. |
| Growth Rate | Measure the rate at which the volume of data ingested is increasing over time. | Percentage increase of total data volume in repository per unit of time. |
| Ingestion Latency | Measures the average time taken for data to be ingested into the repository and tracks the trend of this latency over a specified period. | Mean time of data ingestion latency and the latency trend over a specified period. |
| Historical Data Availability | Measures the lookback period of data that is ingested and available for analysis. | Span of historical data ingested. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name needs some work, but I do like it. I've thought about this one as well. We need to know what data is missing so we can backfill it. Should this be bound to months with complete data over the retention/reporting period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe along the lines of "Historical data span"? (Other options include, Data lookback period, Data retention period
, Ingested data horizon, Data history range)

I think so because that'll ensure the KPI reflects the availability of fully usable historical data, providing a more accurate measure of data completeness and reliability over time. How do you suggest I modify?

| Volume of Data Ingested | Measures the total volume of data ingested into the repository. | Total volume of data ingested into the repository. |
| Growth Rate | Measure the rate at which the volume of data ingested is increasing over time. | Percentage increase of total data volume in repository per unit of time. |
| Ingestion Latency | Measures the average time taken for data to be ingested into the repository and tracks the trend of this latency over a specified period. | Mean time of data ingestion latency and the latency trend over a specified period. |
| Historical Data Availability | Measures the lookback period of data that is ingested and available for analysis. | Span of historical data ingested. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings up a question about how much people are using historical data. We should probably talk about the cost of each month of data compared to the usage of that data. If people aren't using it, then that's wasted money. That will also help them quantify the value of storing the historical data.

@@ -108,6 +108,21 @@ At this point, you have a data pipeline and are ingesting data into a central da

<br>

## Data Ingestion Metrics

| **Category** | **Definition** | **KPI** |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think about the cost and carbon impact of each one of these? It may not apply everywhere. Anything that comes back to something that is metered, like data size or compute time.

| Growth Rate | Measure the rate at which the volume of data ingested is increasing over time. | Percentage increase of total data volume in repository per unit of time. |
| Ingestion Latency | Measures the average time taken for data to be ingested into the repository and tracks the trend of this latency over a specified period. | Mean time of data ingestion latency and the latency trend over a specified period. |
| Historical Data Availability | Measures the lookback period of data that is ingested and available for analysis. | Span of historical data ingested. |
| Investigation Time to Resolution | Measures the time taken to investigate and resolve data quality or availability issues and tracks the trend of this resolution time over a specified period. | Mean time to investigate and resolve data quality or availability issues, and the trend over time. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments about trends on this one. It's very interesting. This warrants its own backlog to think thru whether we have the right guidance to support it.

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review πŸ‘€ PR that is ready to be reviewed labels Oct 18, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Review πŸ‘€ PR that is ready to be reviewed and removed Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Oct 18, 2024
@KevDLR KevDLR changed the title Adding Metrics section to Data Ingestion capability Adding Metrics section to capabilities Oct 18, 2024
@KevDLR KevDLR changed the title Adding Metrics section to capabilities Adding Metrics section to capabilities in understanding domain Oct 21, 2024
@KevDLR
Copy link
Contributor Author

KevDLR commented Oct 23, 2024

@KevDLR please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term β€œYou” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

@flanakin flanakin added this to the Guide - Build-out milestone Oct 31, 2024
@flanakin flanakin added Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review πŸ‘€ PR that is ready to be reviewed labels Nov 17, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Review πŸ‘€ PR that is ready to be reviewed and removed Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Nov 28, 2024
Copy link
Collaborator

@flanakin flanakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going. The main issue I see is that the formulas aren't formulas and we might need more context as to why they should care about tracking each KPI.


| **KPI** | **Definition** | **Formula** |
|--------------|----------------|---------|
| Cost allocated | Evaluates the extent to which cloud costs are allocated among organizational units.| Percentage of cloud cost allocated. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure the formulas are actual formulas. This applies to all KPIs.

Also, avoid the use of "cloud". We need to remove that everywhere in the next Framework update so that will make the next update easier.

Suggested change
| Cost allocated | Evaluates the extent to which cloud costs are allocated among organizational units.| Percentage of cloud cost allocated. |
| Cost allocated | Evaluates the extent to which costs are allocated among organizational units. | {Allocated cost amount} / {Total cost} * 100 |

| **KPI** | **Definition** | **Formula** |
|--------------|----------------|---------|
| Cost allocated | Evaluates the extent to which cloud costs are allocated among organizational units.| Percentage of cloud cost allocated. |
| Allocation granularity | Assesses the level of detail in cost allocation, from department to project scope. | Percentage of cost allocation defined across various scope levels (department, subscription, resource group, project, application). |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value is this supposed to be? Granularity sounds like an attribute and not a metric.

|--------------|----------------|---------|
| Cost allocated | Evaluates the extent to which cloud costs are allocated among organizational units.| Percentage of cloud cost allocated. |
| Allocation granularity | Assesses the level of detail in cost allocation, from department to project scope. | Percentage of cost allocation defined across various scope levels (department, subscription, resource group, project, application). |
| Unallocated cloud costs | Measures the percentage of cloud costs that are not allocated to any specific project, team, or department. | Percentage of unallocated cloud costs. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need both this and allocated %? I'm not opposed to having both but is it useful to have 83% next to 17%? Seeing one is obvious what the other is. Not sure which is more important, tho. Maybe we want to describe both to let them choose. If that's the case, we might also want to include a justification column for why each metric is important. Thoughts?


To ensure effective resource allocation across the Azure environment, these KPIs provide a framework for evaluating the efficiency and accuracy of resource distribution.

| **KPI** | **Definition** | **Formula** |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to bold a header. They're bolded based on the site styles.

Suggested change
| **KPI** | **Definition** | **Formula** |
| KPI | Definition | Formula |

| Cost allocated | Evaluates the extent to which cloud costs are allocated among organizational units.| Percentage of cloud cost allocated. |
| Allocation granularity | Assesses the level of detail in cost allocation, from department to project scope. | Percentage of cost allocation defined across various scope levels (department, subscription, resource group, project, application). |
| Unallocated cloud costs | Measures the percentage of cloud costs that are not allocated to any specific project, team, or department. | Percentage of unallocated cloud costs. |
| Allocation tagging strategy | Evaluates the implementation of a tagging strategy for cost allocation for each workload or business unit. | Percentage of cost allocation tagging strategy defined and implemented for each workload or business unit, and the percentage of untagged resources and associated costs. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this already implicitly covered by the allocated %?

| Investigative time | Measures the time required to analyze cloud usage and cost questions. | Average time to report on required cloud usage and costs details. |
| Tagging compliance | Measures the resource tagging compliance to facilitate accurate reporting and analytics. | Percentage of resources tagged and the compliance. |
| Spend awareness | Measures the awareness and accountability of cloud spend across all workloads, and personas. | Percentage of personas receiving cloud usage and cost reports. |
| Feedback pipelines | Evaluates feedback processes for stakeholders and core personas. | Automation capability to provide feedback on reports to the FinOps team. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what I think this is trying to cover: feedback on reports. If that's what it is, it probably needs to be reworded to be a little clearer. The KPI name should be a metric and the formula should be a formula. I'm wondering if there's a diference between new feedback in a period, resolved feedback in a period, and total active/unresolved feedback. We should also add this to the backlog as something for us to add to our reports.

| Tagging compliance | Measures the resource tagging compliance to facilitate accurate reporting and analytics. | Percentage of resources tagged and the compliance. |
| Spend awareness | Measures the awareness and accountability of cloud spend across all workloads, and personas. | Percentage of personas receiving cloud usage and cost reports. |
| Feedback pipelines | Evaluates feedback processes for stakeholders and core personas. | Automation capability to provide feedback on reports to the FinOps team. |
| Adoption rate | Measures usage of the reporting and analytics tools. | Percentage of teams utilizing provided reports. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this be measured? Would DAU, WAU, and MAU be more achievable? I like the idea of tracking adoption, but that's just the first use and not the ongoing use. I wonder how many FinOps teams would really care to get into adoption, engagement, and retention. Although, the same could be asked about MAU. since it doesn't indicate success or give context on growth πŸ€”

| Spend awareness | Measures the awareness and accountability of cloud spend across all workloads, and personas. | Percentage of personas receiving cloud usage and cost reports. |
| Feedback pipelines | Evaluates feedback processes for stakeholders and core personas. | Automation capability to provide feedback on reports to the FinOps team. |
| Adoption rate | Measures usage of the reporting and analytics tools. | Percentage of teams utilizing provided reports. |
| Data update frequency | Tracks how often report data is updated. | Time between data refreshes. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this belong in data ingestion?

| Feedback pipelines | Evaluates feedback processes for stakeholders and core personas. | Automation capability to provide feedback on reports to the FinOps team. |
| Adoption rate | Measures usage of the reporting and analytics tools. | Percentage of teams utilizing provided reports. |
| Data update frequency | Tracks how often report data is updated. | Time between data refreshes. |
| Data accuracy | Evaluates the accuracy of available reports. | Accuracy percentage of the reports. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data ingestion? How is this measured?

| Adoption rate | Measures usage of the reporting and analytics tools. | Percentage of teams utilizing provided reports. |
| Data update frequency | Tracks how often report data is updated. | Time between data refreshes. |
| Data accuracy | Evaluates the accuracy of available reports. | Accuracy percentage of the reports. |
| Report development | Measures the time to develop requested reports. | Average time to generate and provide a new or updated report to stakeholders. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How useful is this? Every report will have a different amount of complexity and, in theory, teams shouldn't be churning out new reports endlessly. I like the idea of it, but I suspect new reports would be rare once a team is established. They'll likely update existing ones more.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity label Nov 29, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs: Review πŸ‘€ PR that is ready to be reviewed label Nov 29, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Review πŸ‘€ PR that is ready to be reviewed and removed Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Dec 17, 2024
@flanakin flanakin added Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review πŸ‘€ PR that is ready to be reviewed labels Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Attention πŸ‘‹ Issue or PR needs to be reviewed by the author or it will be closed due to no activity Tool: FinOps guide Implementing FinOps guide
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants