Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception priorities #11

Closed
mndstrmr opened this issue Sep 26, 2023 · 18 comments
Closed

Exception priorities #11

mndstrmr opened this issue Sep 26, 2023 · 18 comments

Comments

@mndstrmr
Copy link
Contributor

mndstrmr commented Sep 26, 2023

As pointed out to us before the exception priorities of CJAL, CJALR, memory instructions and instruction fetch errors are incorrect. I don't know if it is worth me going into too much detail here about each small issue here as there are many of them, most of which are best explained by simply comparing the SV and the Sail.

That said I am more happy to check any fixes as our verification setup does now check CSR writes (including for exceptions).

@kliuMsft
Copy link
Contributor

Acknowledged - this is a known weakness in RTL and I semi-intentionally deferred it till we have a good way to verify. Since we now have the ability to check I will try to fix that in the next week and we can revisit.

@kliuMsft
Copy link
Contributor

I have finally got the chance to clean things up on the exception priority side. Please see the latest commit (f3570e5). If we can start verifying RTL against sail it would be great, thanks.

@mndstrmr
Copy link
Contributor Author

Started verifying the fixes for memory instructions. Looks largely OK, but I've run into a couple of issues:

  1. In vio_cause_enc the bound_vio check should be last instead of first.
  2. Misaligned capability loads should throw E_Fetch_Addr_Align exceptions, instead of EXC_CHERI, with priority lower even than bounds checks.

@kliuMsft
Copy link
Contributor

kliuMsft commented Jan 21, 2024 via email

@kliuMsft
Copy link
Contributor

Checked in the fix (commit [5c37f9a]). Please verify, thanks.

@mndstrmr
Copy link
Contributor Author

Fix looks good, there's just one more problem, then I can am happy to close this issue:

In #12 I suggested that the is_cap case needed top_equal during bounds checking. This does lead to correct code in the sense that illegal memory loads are prevented and legal ones aren't. What I had not realised however is that it does lead to incorrect exception priorities (my apologies about that - I think you alluded to this in your response). In particular, in the Sail a bounds check failing has higher priority than a misalignment error, i.e. bounds checking should happen on the potentially misaligned address, not the aligned address, so that in the event that both errors occur simultaneously we get the bounds check error instead of the alignment error.

From line 838 in cheri_ex.sv, the below fix would close this issue:

    if (cheri_operator_i[CIS_SUBSET])
      top_chkaddr = rf_fullcap_b.top33;
    else if (cheri_operator_i[CJAL] | cheri_operator_i[CJALR])
      top_chkaddr = {base_chkaddr[31:1], 1'b0};
    else if (is_cap)  // CLC/CSC
      top_chkaddr = base_chkaddr + 33'd8; // Don't assume alignment
    else 
      top_chkaddr = base_chkaddr;

    if (cheri_operator_i[CSEAL] | cheri_operator_i[CUNSEAL]) begin
      top_bound  = rf_fullcap_b.top33;
      base_bound = rf_fullcap_b.base32;
    end else if (cheri_operator_i[CJAL]) begin
      top_bound  = {pcc_cap_i.top33[32:1], 1'b0};
      base_bound = pcc_cap_i.base32;
    end else if (cheri_operator_i[CJALR]) begin
      top_bound  = {rf_fullcap_a.top33[32:1], 1'b0};
      base_bound = rf_fullcap_a.base32;
    end else begin // Includes CLC/CSC
      top_bound  = rf_fullcap_a.top33;
      base_bound = rf_fullcap_a.base32;
    end

    top_vio   = (top_chkaddr  > top_bound);
    base_vio  = (base_chkaddr < base_bound);
    top_equal = (top_chkaddr == top_bound);

    if (debug_mode_i)
      addr_bound_vio = 1'b0;
    else if (is_cap | cheri_operator_i[CIS_SUBSET])
      addr_bound_vio = top_vio | base_vio;
    else if (cheri_operator_i[CJAL] | cheri_operator_i[CJALR])
      addr_bound_vio = top_vio | base_vio | top_equal;
    else if (cheri_operator_i[CSEAL] | cheri_operator_i[CUNSEAL])
      addr_bound_vio = top_vio | base_vio | top_equal;

@kliuMsft
Copy link
Contributor

@rmn30 , is there a particular reason the address bound violation has to take priority over alignment errors? This check is in the critical timing path for the load/store interface (since it has to gate the outgoing memory request), and I'd like to avoid additional adders..

@davidchisnall
Copy link
Collaborator

I would expect alignment errors to be higher priority because you may wish to trap and emulate on alignment failures.

@kliuMsft
Copy link
Contributor

kliuMsft commented Feb 5, 2024

@mndstrmr , the new RTL commit (ad90fc7) added separate logic (addr_bound_vio_ext) to address mcause/mtval encoding for this specific case. Please let me know if it verifies. thanks.

@mndstrmr
Copy link
Contributor Author

mndstrmr commented Apr 9, 2024

This looks good to me. There are a few more issues with exceptions that I have since found:

  1. When CSpecialRW receives an illegal register index it correctly throws an illegal instruction exception, but MTVAL is not set to the instruction which failed as in the Sail.
  2. If MTCC has base[8] = 1 and exponent 24 (a combination which is reachable by doing a CSetHigh, for instance) then when an exception is thrown MTCC is decompressed to a pcc_cap_t, and the top bit of the base is lost. This is fine, but if the PCC is later stored then the compressed bounds will have base[8] = 0 which is incorrect according to the Sail which always stores the compressed form. My fix is to add an additional 1 bit field to pcc_cap_t which stores what base[8] would be. This is written when doing a full2pcap and retrieved when doing a pcc2fullcap.
  3. is_cap_sentry (used by PVIO_SEAL for CJALR) operates on the compressed otype, but should operate on the decompressed otype.
  4. When a StoreCapImm runs, ls_addr_misaligned_only is set when perm_vio_vec[PVIO_ALIGN] && (cheri_err_cause == 0). If perm_vio[PVIO_SLC] is set (whether in debug mode or not) then the cheri_err_cause will be nonzero, and the exception will reported as a store local error instead of an alignment error, even in debug mode.

Thanks, and apologies for the issue dump.

@kliuMsft
Copy link
Contributor

Thanks, all good findings.

  1. In my test case with a illegal scr address, I am actually seeing both sail and RTL reports mcause=0x2 and mtval=0x0. Do you see different behavior?
  2. This has to do with the RTL decision to use a 32-bit representation (base32) for base in the expanded format (full_cap_t or pcc_cap_t), which seemed reasonable since bases are always inclusive. I agree that base[8] = 1, exp=24 is possible per encoding, however I don't think we could ever get such a cap with tag=1. E.g., csethigh always clears the tag. @rmn30, please share your thoughts.
  3. This is definitely a bug - will fix in RTL
  4. Not quite sure I understand what's being described here. PVIO_SLC is intended to clear the tag of the cap written to memory, but does NOT cause an exception. Thus cheri_err_cause really is a don't care.

@mndstrmr
Copy link
Contributor Author

  1. I see: There's a setting in the Sail (plat_mtval_has_illegal_inst_bits) which I was assuming was 1 because in your regular handling of illegal instructions (ibex_controller.sv line 735) you do set MTVAL to the instruction that was illegal. So I guess it's up to you which one you want, but they should be consistent with one another. It would be easier to set csr_mtval_o = 0 in ibex_controller.sv line 737.
  2. While yes the tag bit may or may not be set, a CGetHigh instruction will still see a difference between the Sail and the RTL.
  3. Sounds good
  4. An exception is raised anyway because perm_vio_vec[PVIO_ALIGN] is set, but then the exception type is not reported as an alignment exception because perm_vio[PVIO_SLC] is set.

I also have one more issue with CSpecialRW: If an illegal scr address is accessed while the PCC also doesn't have access_system_regs, currently the ASR exception is thrown instead of the illegal instruction exception. The Sail says it should be the other way around. The following fixes cheri_ex.sv line 1015:

if (cheri_operator_i[CCSR_RW] & cheri_wb_err_raw & illegal_scr_addr & cheri_exec_id_i)
    // cspecialrw trap, illegal addr, treated as illegal_insn
    cheri_wb_err_info_d = {3'h0, 1'b1, 12'h0};
else if (cheri_operator_i[CCSR_RW] & cheri_wb_err_raw & perm_vio & cheri_exec_id_i)
    // cspecialrw traps, PERM_SR
    cheri_wb_err_info_d = {5'h0, 1'b1, cheri_cs2_dec_i, cheri_err_cause};

@kliuMsft
Copy link
Contributor

Regarding #1, I think in our case we actually have
bool rv_mtval_has_illegal_inst_bits = false;
In RTL (ibex_controller.sv), it's actually handled at line 810 (basically I chose to decode illegal_scr adddress in cheri_ex rather than decoder, in an attempt to reduce fan-in to illegal_insn which goes to too many places).
cheri_wb_err_prio: begin
if (cheri_pmode_i) begin
if (cheri_wb_err_info_i[12]) begin // illegal SCR addr
exc_cause_o = EXC_CAUSE_ILLEGAL_INSN;
csr_mtval_o = {21'h0, cheri_wb_err_info_i[10:0]};
end else begin
exc_cause_o = EXC_CAUSE_CHERI_FAULT;
csr_mtval_o = {21'h0, cheri_wb_err_info_i[10:0]};
end
end
end
So I think RTL is actually matching sail..

I will to take a look at #4 and #5 later, it make take a little while since I'll be out of office for a few days.

@mndstrmr
Copy link
Contributor Author

mtval_has_illegal_inst_bits = 0 is fine, but in that case these lines also need to be updated to reflect that:

illegal_insn_prio: begin
exc_cause_o = EXC_CAUSE_ILLEGAL_INSN;
csr_mtval_o = instr_is_compressed_i ? {16'b0, instr_compressed_i} : instr_i;
end

Both that and the code at line 810 refer to the same Sail handle_illegal function, meaning they both need to do the same thing.

If the code at line 810 is to remain the same, the code at line 735 should look like the below:

illegal_insn_prio: begin
    exc_cause_o = EXC_CAUSE_ILLEGAL_INSN;
    csr_mtval_o = 0;
end

@kliuMsft
Copy link
Contributor

I see what you mean - yes we should make those 2 cases consistent. Likely will make line 737 to set mtval = 0 if pmode = 1. @rmn30, can you confirm we indeed what to go with mtval_has_illegal_inst_bits = 0?

@kliuMsft
Copy link
Contributor

@mndstrmr I have checked in RTL changes (30cac29) which hopefully fixes issue #1, #3, #4 and #5. As for #2, I have confirmed with @rmn30 that the base[8]=1, exp=24 is a don't care case (not reachable). Can we change the constraints to reflect that? Thanks.

@rmn30
Copy link

rmn30 commented Apr 16, 2024

Yes, it should not be possible to create a tagged capability with base[8]=1 and exp=24 (i.e. base=2**32, which is greater than the 32-bit base used in csetbounds). If an untagged capability with such base is installed in MTCC when an exception / interrupt is taken this will result in an unrecoverable exception loop, so we don't really care about this case.

@mndstrmr
Copy link
Contributor Author

This is all great, thanks. Easiest update I've made in a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants