-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor integration tests and add metrics coverage #2432
refactor integration tests and add metrics coverage #2432
Conversation
f85c127
to
35034f4
Compare
@cijothomas @lalitb let me know what you think about this; I sunk a bit of time into this this weekend to try organise the integration tests so they are a bit more "ordinary". If we're happy with this i'll fill in the gaps in the coverage off of here (as well as other comments i've not got to from the other branch) ! |
opentelemetry-otlp/tests/integration_test/expected/metrics/test_up_down_meter.json
Show resolved
Hide resolved
opentelemetry-otlp/tests/integration_test/otel-collector-config-2.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this refactor! Thanks for working on this.
(I also tested locally and confirmed that https://github.com/open-telemetry/opentelemetry-rust/pull/2431/files is solving the issue it is intending to fix)
Not marking explicit approval as the PR is marked draft. Feel free to make it review ready.
The reason for ignoring integration tests by default is that they can take a long time to run, so |
nit - Do you need this file, as it is bringing ~38K loc, this will increase the clone time for repo.? |
results: Vec<ResourceMetrics>, | ||
expected: Vec<ResourceMetrics>, | ||
results: Value, | ||
expected: Value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The earlier code was also testing the conversion of JSON to proto structs, while it seems the current code no longer does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grpc_build.rs adds the annotations for serialization/deserialization between json and proto structs, and this got tested in current setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to test that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See we comment in the big thing at the bottom - there's an issue with the roundtrip serialization of metrics, so I wrote an additional test that shows this, marked it as #[ignore]
, and raised an issue. I've switched this to serde types so that the integration test for metrics will actually fail if there's a diff; as it stood differences in the metric value would not be detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am totally fine to just get an ACK from Collector that it accepted our metrics/logs/traces. That will give instant value (to validate lot of stuff we are manually validating now, like is the shutdown going to panic or do its job etc.!)
Validating the actual content - Its important, but I consider it as non-blocking now, and can be added later too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: we're getting an ACK from the collector, and parsing all the results back out from the copy written by the collector to its file outputs, and comparing them to our expectation - we're validating the data that comes out of the collector is what we sent it now ✅
The only "thing" is, we're just using Serde models to deserialize for that validation, not our own proto-derived ones, because of the aforementioned fields-going-missing issue that would make a proto-based-deserialization test succeed when data is actually lost.
So - i think this is in a pretty good state 🙏 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to test that?
Yes it's important to test from opentelemetry-proto prospective that we can successfully deserialize the metrics data written by collector. This is a standalone crate, and also being used by users for consumers for collector. And the only way to test this reliably is with integration tests. This is not the blocker, but we should bring it back once the serde model is fixed for metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - Please add a TODO (with reference to the metrics serde issue), to be fixed eventually.
if it is executing in parallel to the main CI, and is taking less time than the longest CI (which I think is the windows ci), then l think we can let integration test run always. |
Yes, should be fine to keep enabled in CI. The concern was that is this fast enough to keep it enabled for |
609a158
to
d5dc9ef
Compare
684a70c
to
5707bfe
Compare
Hey both, thanks for the quick turnaround! Preemptive sorry-for-the-braindump - i'm rushing this comment out between meetings 😱
|
c78170f
to
4a59154
Compare
Got it. Yes I think it is best to keep this ignored, and only triggered from the integration test CI. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2432 +/- ##
=======================================
- Coverage 79.4% 76.7% -2.8%
=======================================
Files 122 122
Lines 21700 21700
=======================================
- Hits 17247 16657 -590
- Misses 4453 5043 +590 ☔ View full report in Codecov by Sentry. |
@cijothomas , I think this is in a pretty good state, except that I need to work out why the integration suite fails on CI and not locally 😱 I'll look again tomorrow. |
I am curious if we can enable tracing::fmt in the integration tests, and view internal logs? Not required in this PR, just sharing something that'd make our lives easier if things go wrong. |
Good idea! I’ll give it a try tomorrow. Certainly doesn’t hurt leaving it on for tests. |
f8044a2
to
90c2449
Compare
@cijothomas , this should be good to merge. I've rebased and squashed everything together to make a nicer merge history too. The outstanding issue yesterday with the integration tests was the "magic sleep" for the collector container to start; i've added a newer version of testcontainers so we can wait for the HTTP collector port to start answering instead, which should make things more robust. There are two outstanding issues:
|
90c2449
to
21174e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactor. Nicely done.
results: Vec<ResourceMetrics>, | ||
expected: Vec<ResourceMetrics>, | ||
results: Value, | ||
expected: Value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - Please add a TODO (with reference to the metrics serde issue), to be fixed eventually.
hey @lalitb , i've got a test over here and a link to the issue for the roundtripping the models: opentelemetry-rust/opentelemetry-otlp/tests/integration_test/tests/metrics.rs Lines 199 to 221 in 21174e8
I reckon, if we keep the MetricsAsserter stuff on Serde, and have separate "roundtrip the models" tests, then we won't accidentally miss cases where an integration test regresses and we don't notice it because of model mapping. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can address some of the remaining issues in followups as needed.
fn init_tracing() { | ||
INIT_TRACING.call_once(|| { | ||
let subscriber = FmtSubscriber::builder() | ||
.with_max_level(tracing::Level::DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEBUG is giving a ton of noise from the networking libraries in the CI logs.. probably okay for now, we can revisit if we find it too much noise.
Yes agree, this ensures serialization and deserialization are validated independently. |
Hurrah! Thanks @lalitb @cijothomas . |
Fixes #2401
This is an alternate to #2424 , with a more of an opinionated take on how the integration tests could look. Essentially I am trying to make it into a more normal looking test suite, so it is easier to extend and reason about the results of.
integration_test.rs
into discrete unit teststest_utils.rs
and re-use it everywhereanyhow
to make error handling cleaner and easier to follow panic output when it happens#[ignore]
from all the integration tests - if we don't want them to run as part ofcargo test
, we can pass--lib
in our CI scriptstestcontainers
and added startup waits so that we don't have to sleep waiting for the container to start and potentially run over our startup time budgetThis saves a fair bit of duplication too, which is nice!
In metrics i've added test-per-meter and some supporting code to pluck the data out for each meter easily. IMHO this will make it easy to extend and easy to follow, rather than just having a enormous "total world" diff to pick through.
I've adapted traces/logs so each has a single normal unit test in its own file, but have not decomposed them or introduced more tests there, yet.
Changes
Please provide a brief description of the changes here.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes