Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test to reproduce data race when reloading TLS config #6213

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

chahatsagarmain
Copy link
Contributor

Which problem is this PR solving?

Description of the changes

  • Added unit to test to check for data race

How was this change tested?

Checklist

Signed-off-by: chahatsagarmain <[email protected]>
@chahatsagarmain chahatsagarmain requested a review from a team as a code owner November 15, 2024 13:28
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 48.76%. Comparing base (2b7cf3a) to head (ee64cb4).
Report is 11 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (2b7cf3a) and HEAD (ee64cb4). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (2b7cf3a) HEAD (ee64cb4)
unittests 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #6213       +/-   ##
===========================================
- Coverage   96.50%   48.76%   -47.75%     
===========================================
  Files         354      179      -175     
  Lines       20127    10803     -9324     
===========================================
- Hits        19424     5268    -14156     
- Misses        520     5092     +4572     
- Partials      183      443      +260     
Flag Coverage Δ
badger_v1 8.31% <ø> (ø)
badger_v2 1.67% <ø> (-0.01%) ⬇️
cassandra-4.x-v1 14.39% <ø> (ø)
cassandra-4.x-v2 1.61% <ø> (-0.01%) ⬇️
cassandra-5.x-v1 14.39% <ø> (ø)
cassandra-5.x-v2 1.61% <ø> (-0.01%) ⬇️
elasticsearch-6.x-v1 18.59% <ø> (ø)
elasticsearch-7.x-v1 18.68% <ø> (ø)
elasticsearch-8.x-v1 18.85% <ø> (ø)
elasticsearch-8.x-v2 1.66% <ø> (-0.02%) ⬇️
grpc_v1 9.44% <ø> (-0.04%) ⬇️
grpc_v2 6.97% <ø> (-0.04%) ⬇️
kafka-v1 8.88% <ø> (ø)
kafka-v2 1.67% <ø> (-0.01%) ⬇️
memory_v2 1.67% <ø> (+<0.01%) ⬆️
opensearch-1.x-v1 18.73% <ø> (ø)
opensearch-2.x-v1 18.72% <ø> (-0.01%) ⬇️
opensearch-2.x-v2 1.67% <ø> (-0.01%) ⬇️
tailsampling-processor 0.46% <ø> (-0.01%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Signed-off-by: chahatsagarmain <[email protected]>
@chahatsagarmain chahatsagarmain changed the title Unit test to reproduce data race when reloading TLF config Unit test to reproduce data race when reloading TLS config Nov 16, 2024
@yurishkuro
Copy link
Member

Have you observe this new test fail? It doesn't look like it's testing race condition deterministically.

Signed-off-by: chahatsagarmain <[email protected]>
@chahatsagarmain
Copy link
Contributor Author

chahatsagarmain commented Nov 18, 2024

Have you observe this new test fail? It doesn't look like it's testing race condition deterministically.

In the previous test, the race condition occurred because the mutex was not used. Adding the mutex resolved the issue. However, in the updated test, new TLS configurations are being applied and its seems to be causing data race during client connection and TLS handshake error .

@yurishkuro
Copy link
Member

so I see this test is failing on detected race condition, but the stack traces indicate that the race is caused by the test itself (the Write part), not by the production code:

WARNING: DATA RACE
Write at 0x00c000138400 by goroutine 163:
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.TestCertificateRaceCondition.func4()
      /home/runner/work/jaeger/jaeger/pkg/config/tlscfg/options_test.go:392 +0xdc

Previous read at 0x00c000[138](https://github.com/jaegertracing/jaeger/actions/runs/11900851926/job/33162577763?pr=6213#step:8:139)400 by goroutine 162:
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.(*Options).Config()
      /home/runner/work/jaeger/jaeger/pkg/config/tlscfg/options.go:75 +0x676

@yurishkuro
Copy link
Member

@chahatsagarmain an alternative to fixing the race condition from reloading is to eliminate the use of this package altogether. We already migrated several endpoints to use OTEL helpers which internally handle TLS reloading differently (on a timeout rather than on file change). It would be interesting to see which parts of Jaeger still use the tlscfg package and switch to OTEL helpers.

@chahatsagarmain
Copy link
Contributor Author

chahatsagarmain commented Nov 21, 2024

@yurishkuro So i can use configtls from OTEL and replace the usage of the tlscfg package ?
Also , there is usage of tlscfg in collector and agent and mostly test files .

@yurishkuro
Copy link
Member

yes, that would be good. You can start small, e.g. can we remove tlscfg dependency from cmd/es-rollover?

@chahatsagarmain chahatsagarmain marked this pull request as draft November 24, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants