[Question]: The evaluation code of scbench does not match the provided dataset. #103

rainstorm12 · 2024-12-26T03:00:25Z

Describe the bug

When I tested the "scbench_kv" task provided by scbench, I encountered the following problems in the compute_scores.py file during the evaluation.

[rank0]:   File "/myfile/MInference-main/scbench/compute_scores.py", line 365, in get_score_one
[rank0]:     assert task_name in NAME_TO_SCORE_GETTER, f"Invalid task name: {task_name}"
[rank0]: AssertionError: Invalid task name: scbench_kv

I found that the evaluation tasks provided in the compute_scores.py file are as follows, which do not match the test tasks of scbench.

def get_score_one(pred: str, label: str, task_name: str, model_name: str) -> float:
    """
    Computes the score for one prediction.
    Returns one float (zero and one for boolean values).
    """
    NAME_TO_SCORE_GETTER = {
        # Retrieve
        "kv_retrieval": get_score_one_kv_retrieval,
        "kv_retrieval_prefix": get_score_one_kv_retrieval,
        "kv_retrieval_both": get_score_one_kv_retrieval,
        "passkey": get_score_one_passkey,
        "number_string": get_score_one_number_string,
        # Code
        "code_run": get_score_one_code_run,
        "code_debug": get_score_one_code_debug,
        # Longbook
        "longdialogue_qa_eng": get_score_one_longdialogue_qa_eng,
        "longbook_qa_eng": get_score_one_longbook_qa_eng,
        "longbook_sum_eng": get_score_one_longbook_sum_eng,
        "longbook_choice_eng": get_score_one_longbook_choice_eng,
        "longbook_qa_chn": get_score_one_longbook_qa_chn,
        # Math
        "math_find": get_score_one_math_find,
        "math_calc": get_score_one_math_calc,
        # multi-turn nativ
        "multi_turn_summary": get_score_one_longbook_sum_eng,
        "multi_turn_vt": string_match_all,
        "multi_turn_many_shot": get_score_one_longdialogue_qa_eng,
        "multi_turn_kv_compressible": get_score_one_kv_retrieval,
    }
    assert task_name in NAME_TO_SCORE_GETTER, f"Invalid task name: {task_name}"
    score = NAME_TO_SCORE_GETTER[task_name](pred, label, model_name)
    return float(score)

The text was updated successfully, but these errors were encountered:

iofu728 · 2024-12-26T03:24:44Z

Hi @rainstorm12, thank you for pointing out this issue.

We have already fixed it in #101.

Please fetch the updated code and let us know if you encounter any further problems!

git clone https://github.com/microsoft/MInference
pip install -e .

rainstorm12 · 2024-12-27T10:06:54Z

Thank you very much for your help! I solved my problem!
However, when I try the muli-task, I encounter a new problem. My test.sh file is as follows:

python run_scbench.py \
    --task scbench_repoqa_and_kv \
    --model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct \
    --data_dir ./data \
    --output_dir ./results \
    --rewrite \
    --attn_type minference \
    --kv_type dense \
    --use_chat_template \
    --trust_remote_code

and the error is as follow:

==== Evaluation scbench_repoqa_and_kv====
# examples: 88
Num eval examples: -1
Verbose: False
Max new tokens: {'scbench_repoqa': 1024, 'scbench_kv': 80}
Num of turns: 5
0it [00:00, ?it/s]# tokens before: 67598
# tokens after: 67598
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/myfile/MInference/scbench/run_scbench.py", line 397, in <module>
    pred = get_pred(
  File "/myfile/MInference/scbench/run_scbench.py", line 125, in get_pred
    outputs = model.test(
  File "/myfile/MInference/scbench/eval_utils.py", line 1246, in test
    max_length_per_turn = max_length[example["task"][idx]]
KeyError: 'multi_turn_kv'

The key 'multi_turn_kv' may not be present in Max new tokens
when I run the task of scbench_summary_with_needles, the error is as follow, which is similar to the above problem:

==== Evaluation scbench_summary_with_needles====
# examples: 70
Num eval examples: -1
Verbose: False
Max new tokens: {'scbench_summary': 800, 'scbench_passkey': 15}
Num of turns: 5
0it [00:00, ?it/s]# tokens before: 98057
# tokens after: 97962
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/myfile/MInference/scbench/run_scbench.py", line 397, in <module>
    pred = get_pred(
  File "/myfile/MInference/scbench/run_scbench.py", line 125, in get_pred
    outputs = model.test(
  File "/myfile/MInference/scbench/eval_utils.py", line 1246, in test
    max_length_per_turn = max_length[example["task"][idx]]
KeyError: 'multi_turn_passkey'

rainstorm12 added the bug Something isn't working label Dec 26, 2024

iofu728 self-assigned this Dec 26, 2024

iofu728 added question Further information is requested and removed bug Something isn't working labels Dec 26, 2024

iofu728 changed the title ~~[Bug]: The evaluation code of scbench does not match the provided dataset.~~ [Question]: The evaluation code of scbench does not match the provided dataset. Dec 26, 2024

iofu728 mentioned this issue Dec 26, 2024

Fix(SCBench): fix the pipeline and load dataset #101

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: The evaluation code of scbench does not match the provided dataset. #103

[Question]: The evaluation code of scbench does not match the provided dataset. #103

rainstorm12 commented Dec 26, 2024 •

edited

Loading

iofu728 commented Dec 26, 2024

rainstorm12 commented Dec 27, 2024

[Question]: The evaluation code of scbench does not match the provided dataset. #103

[Question]: The evaluation code of scbench does not match the provided dataset. #103

Comments

rainstorm12 commented Dec 26, 2024 • edited Loading

Describe the bug

iofu728 commented Dec 26, 2024

rainstorm12 commented Dec 27, 2024

rainstorm12 commented Dec 26, 2024 •

edited

Loading