-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration change validation has false positives #80
Comments
Relies on etcd-io/raft#81. We don't want it to, since that causes issues due to false positives. We are taking responsibility for carrying out only valid conf changes, as we always have. See also etcd-io/raft#80. Fixes cockroachdb#105797. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Note to editors: this time it's really addressed (fingers crossed); a previous attempt with an identical release note had to be reverted.
in #81 (which adds a I think we should fix this lag in Relying solely on above-raft config change mechanisms (such as in CRDB) also smells like a footgun. Config changes seem fundamentally belonging to Currently we do a collection of tricks which makes correctness reasoning hard:
Instead, I think we should just precisely track and know the current and prospective config from the explicit state, and explicitly error out on incorrect transitions. I.e., when we commit a pending config change, it becomes part of the state. To do so, we would need to decouple config changes from logs (otherwise, again, we would have to have a "scan the log" step to find the next config, e.g. when entries are appended or the commit index is advanced). More generally, it would be nice to have separate state machines: raft itself, and the application-level state machine that it supports. There are more arguments for this decoupling:
|
DisableConfChangeValidation turns off propose-time verification of configuration changes against the currently active configuration of the raft instance. These checks are generally sensible (cannot leave a joint config unless in a joint config, et cetera) but they have false positives because the active configuration may not be the most recent configuration. This is because configurations are activated during log application, and even the leader can trail log application by an unbounded number of entries. Symmetrically, the mechanism has false negatives - because the check may not run against the "actual" config that will be the predecessor of the newly proposed one, the check may pass but the new config may be invalid when it is being applied. In other words, the checks are best-effort. Users should *not* use this option unless they have a reliable mechanism (above raft) that serializes and verifies configuration changes. In CockroachDB, we doubly hit the false positive case. In CockroachDB, each proposal (including configuration changes) has a UUID which the leader tracks while the command is inflight. To detect that a command is no longer inflight, the leader has to see it apply (or needs some proof that it will never apply in the future). When a CockroachDB Replica loses leadership in the middle of a configuration change, a new leader may carry out additional changes that now take effect, while the old Replica is still trying to finalize its old, now incompatible, configuration change. What CockroachDB needs is to see the old proposal in the log so that it can be rejected at apply time, thus terminating the inflight status of the old configuration change. But since raft will forward the proposal to the leader where it is rejected (since it's incompatible with the active, much newer, config), this would never happen, thus leaking an inflight[^1]. The other case became obvious when we tried to work around the above by terminating the config change when we noticed that raft rejected it. Because the rejection is not made against a stable basis (it's against whatever config is active at the given time), legitimate configuration changes would be rejected frequently. In particular, this could occur in a way that would still result in the respective entries becoming applied later! This caused corruption in CockroachDB[^2] by leading to a divergence between what we thought the config was and what it actually was. CockroachDB does have a higher-level protection mechanism (configuration changes are serialized through transactions reading from and writing to the same key). The intention is for CockroachDB to use this setting, however it is likely that other systems are susceptible to similar issues (and perhaps unknowingly so). This setting is thus of general interest. Touches #80. [^1]: cockroachdb/cockroach#105797 (comment) [^2]: cockroachdb/cockroach#106172 (comment) Signed-off-by: Tobias Grieger <[email protected]>
Relies on etcd-io/raft#81. We don't want it to, since that causes issues due to false positives. We are taking responsibility for carrying out only valid conf changes, as we always have. See also etcd-io/raft#80. Fixes cockroachdb#105797. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Note to editors: this time it's really addressed (fingers crossed); a previous attempt with an identical release note had to be reverted.
105368: backupccl: add unit tests for FileSSTSink r=rhu713 a=rhu713 Backfill unit tests for the basic functionality of FileSSTSink with additional test cases involving inputs of keys with many entries in its revision history. Epic: CRDB-27758 Release note: None 105624: jobsprofiler: dump trace recording on job completion r=dt a=adityamaru This change teaches the job resumer to fetch and write its trace recording before finishing its tracing span. These traces will be consumed by the job profiler bundle that is being introduced in #105384. These traces will be valuable in understanding a job's execution characteristics during each resumption, even if the job has reached a terminal state. Currently, this behaviour is opt-in and has been enabled for backups, restore, import and physical replication jobs. Informs: #102794 Release note: None 106515: DEPS: bump across etcd-io/raft#81 and disable conf change validation r=erikgrinaker a=tbg We don't want raft to validate conf changes, since that causes issues due to false positives (the check is above raft, but needs to be below raft to always work correctly). We are taking responsibility for carrying out only valid conf changes, as we always have. See also etcd-io/raft#80. Fixes #105797. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Note to editors: this time it's really addressed (fingers crossed); a previous attempt with an identical release note had to be reverted. 106939: changefeedccl: fix flake in TestParquetRows r=miretskiy a=jayshrivastava changefeedccl: fix flake in TestParquetRows Previously, this test would flake when rows were not emitted in the exact order they were inserted/modified. This change makes the test resilient to different ordering. Epic: None Fixes: #106911 Release note: None --- util/parquet: make metadata transparent in tests Previously, users of the library would need to explicitly call `NewWriterWithReaderMetadata()` to configure the parquet writer to add metadata required to use reader utils in `pkg/util/parquet/testutils.go`. This led to a lot of code uncessary code duplication. This moves the logic to decide if metadata should be written to `NewWriter()` so callers do not need to do the extra work. Epic: None Release note: None Co-authored-by: Rui Hu <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]>
DisableConfChangeValidation turns off propose-time verification of configuration changes against the currently active configuration of the raft instance. These checks are generally sensible (cannot leave a joint config unless in a joint config, et cetera) but they have false positives because the active configuration may not be the most recent configuration. This is because configurations are activated during log application, and even the leader can trail log application by an unbounded number of entries. Symmetrically, the mechanism has false negatives - because the check may not run against the "actual" config that will be the predecessor of the newly proposed one, the check may pass but the new config may be invalid when it is being applied. In other words, the checks are best-effort. Users should *not* use this option unless they have a reliable mechanism (above raft) that serializes and verifies configuration changes. In CockroachDB, we doubly hit the false positive case. In CockroachDB, each proposal (including configuration changes) has a UUID which the leader tracks while the command is inflight. To detect that a command is no longer inflight, the leader has to see it apply (or needs some proof that it will never apply in the future). When a CockroachDB Replica loses leadership in the middle of a configuration change, a new leader may carry out additional changes that now take effect, while the old Replica is still trying to finalize its old, now incompatible, configuration change. What CockroachDB needs is to see the old proposal in the log so that it can be rejected at apply time, thus terminating the inflight status of the old configuration change. But since raft will forward the proposal to the leader where it is rejected (since it's incompatible with the active, much newer, config), this would never happen, thus leaking an inflight[^1]. The other case became obvious when we tried to work around the above by terminating the config change when we noticed that raft rejected it. Because the rejection is not made against a stable basis (it's against whatever config is active at the given time), legitimate configuration changes would be rejected frequently. In particular, this could occur in a way that would still result in the respective entries becoming applied later! This caused corruption in CockroachDB[^2] by leading to a divergence between what we thought the config was and what it actually was. CockroachDB does have a higher-level protection mechanism (configuration changes are serialized through transactions reading from and writing to the same key). The intention is for CockroachDB to use this setting, however it is likely that other systems are susceptible to similar issues (and perhaps unknowingly so). This setting is thus of general interest. Touches etcd-io#80. [^1]: cockroachdb/cockroach#105797 (comment) [^2]: cockroachdb/cockroach#106172 (comment) Signed-off-by: Tobias Grieger <[email protected]>
Relies on etcd-io/raft#81. We don't want it to, since that causes issues due to false positives. We are taking responsibility for carrying out only valid conf changes, as we always have. See also etcd-io/raft#80. Fixes cockroachdb#105797. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Note to editors: this time it's really addressed (fingers crossed); a previous attempt with an identical release note had to be reverted.
DisableConfChangeValidation turns off propose-time verification of configuration changes against the currently active configuration of the raft instance. These checks are generally sensible (cannot leave a joint config unless in a joint config, et cetera) but they have false positives because the active configuration may not be the most recent configuration. This is because configurations are activated during log application, and even the leader can trail log application by an unbounded number of entries. Symmetrically, the mechanism has false negatives - because the check may not run against the "actual" config that will be the predecessor of the newly proposed one, the check may pass but the new config may be invalid when it is being applied. In other words, the checks are best-effort. Users should *not* use this option unless they have a reliable mechanism (above raft) that serializes and verifies configuration changes. In CockroachDB, we doubly hit the false positive case. In CockroachDB, each proposal (including configuration changes) has a UUID which the leader tracks while the command is inflight. To detect that a command is no longer inflight, the leader has to see it apply (or needs some proof that it will never apply in the future). When a CockroachDB Replica loses leadership in the middle of a configuration change, a new leader may carry out additional changes that now take effect, while the old Replica is still trying to finalize its old, now incompatible, configuration change. What CockroachDB needs is to see the old proposal in the log so that it can be rejected at apply time, thus terminating the inflight status of the old configuration change. But since raft will forward the proposal to the leader where it is rejected (since it's incompatible with the active, much newer, config), this would never happen, thus leaking an inflight[^1]. The other case became obvious when we tried to work around the above by terminating the config change when we noticed that raft rejected it. Because the rejection is not made against a stable basis (it's against whatever config is active at the given time), legitimate configuration changes would be rejected frequently. In particular, this could occur in a way that would still result in the respective entries becoming applied later! This caused corruption in CockroachDB[^2] by leading to a divergence between what we thought the config was and what it actually was. CockroachDB does have a higher-level protection mechanism (configuration changes are serialized through transactions reading from and writing to the same key). The intention is for CockroachDB to use this setting, however it is likely that other systems are susceptible to similar issues (and perhaps unknowingly so). This setting is thus of general interest. Touches etcd-io/raft#80. [^1]: cockroachdb/cockroach#105797 (comment) [^2]: cockroachdb/cockroach#106172 (comment) Signed-off-by: Tobias Grieger <[email protected]>
Relies on etcd-io/raft#81. We don't want it to, since that causes issues due to false positives. We are taking responsibility for carrying out only valid conf changes, as we always have. See also etcd-io/raft#80. Fixes cockroachdb#105797. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Note to editors: this time it's really addressed (fingers crossed); a previous attempt with an identical release note had to be reverted.
We currently validate configuration changes at proposal time
raft/raft.go
Lines 1224 to 1260 in 4abd9e9
This is not very helpful because it has false positives (refusing a config change that is actually allowed) though at least it currently doesn't have false negatives (because the set of false positives is sufficiently large 😄)
It could be more effective to either compare against the actual most recent config change in the (including the unstable) log, or to move the verification to apply time (i.e. erroring out conf changes that are illegal).
See #81.
The text was updated successfully, but these errors were encountered: