You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
iofu728
changed the title
[Bug]: The evaluation code of scbench does not match the provided dataset.
[Question]: The evaluation code of scbench does not match the provided dataset.
Dec 26, 2024
==== Evaluation scbench_repoqa_and_kv====
# examples: 88
Num eval examples: -1
Verbose: False
Max new tokens: {'scbench_repoqa': 1024, 'scbench_kv': 80}
Num of turns: 5
0it [00:00, ?it/s]# tokens before: 67598
# tokens after: 67598
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/myfile/MInference/scbench/run_scbench.py", line 397, in <module>
pred = get_pred(
File "/myfile/MInference/scbench/run_scbench.py", line 125, in get_pred
outputs = model.test(
File "/myfile/MInference/scbench/eval_utils.py", line 1246, in test
max_length_per_turn = max_length[example["task"][idx]]
KeyError: 'multi_turn_kv'
The key 'multi_turn_kv' may not be present in Max new tokens
when I run the task of scbench_summary_with_needles, the error is as follow, which is similar to the above problem:
==== Evaluation scbench_summary_with_needles====
# examples: 70
Num eval examples: -1
Verbose: False
Max new tokens: {'scbench_summary': 800, 'scbench_passkey': 15}
Num of turns: 5
0it [00:00, ?it/s]# tokens before: 98057
# tokens after: 97962
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/myfile/MInference/scbench/run_scbench.py", line 397, in <module>
pred = get_pred(
File "/myfile/MInference/scbench/run_scbench.py", line 125, in get_pred
outputs = model.test(
File "/myfile/MInference/scbench/eval_utils.py", line 1246, in test
max_length_per_turn = max_length[example["task"][idx]]
KeyError: 'multi_turn_passkey'
Describe the bug
When I tested the "scbench_kv" task provided by scbench, I encountered the following problems in the compute_scores.py file during the evaluation.
I found that the evaluation tasks provided in the compute_scores.py file are as follows, which do not match the test tasks of scbench.
The text was updated successfully, but these errors were encountered: