Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification regarding how the accuracy.txt file is generated #1861

Open
arjunsuresh opened this issue Sep 27, 2024 · 7 comments
Open

Clarification regarding how the accuracy.txt file is generated #1861

arjunsuresh opened this issue Sep 27, 2024 · 7 comments

Comments

@arjunsuresh
Copy link
Contributor

arjunsuresh commented Sep 27, 2024

The submission generation rules for inference says that the accuracy.txt file should be generated from the accuracy scripts. My interpretation of this is that one should run the reference accuracy scripts stand alone using the logs from the accuracy run and obtain this accuracy.txt file and not dump the accuracy.txt file with in the implementation code. Is this the correct interpretation?

accuracy.txt # stdout of reference accuracy scripts
@arjunsuresh
Copy link
Contributor Author

@psyhtest @ashwin @attafosu Can you please confirm?

@attafosu
Copy link
Contributor

attafosu commented Oct 2, 2024

@arjunsuresh Yes, that's correct.

@psyhtest
Copy link
Contributor

psyhtest commented Oct 2, 2024

I can think of a situation when an implementer refactors/integrates a reference script into their own script. For example, the reference script may hardcode using /usr/bin/python3, while they may want to use /usr/local/bin/python3.8. In this case, we can probably request that no material changes should be done during such refactoring/integration, but not that the reference script must always be run stand alone?

@arjunsuresh
Copy link
Contributor Author

Thank you @attafosu @psyhtest

@psyhtest yes, running the reference accuracy script standalone is fine I believe. But this is not that straightforward as it often requires the original dataset and so we do have some submissions where accuracy.txt is generated from the benchmark run itself without calling the reference script. We didn't see any accuracy issue when running the standalone script for those submissions, but I believe this should not be allowed.

@psyhtest
Copy link
Contributor

psyhtest commented Oct 7, 2024

@arjunsuresh

But you admit that in some cases it may not be straightforward:

yes, running the reference accuracy script standalone is fine I believe.
But this is not that straightforward

So why would we disallow it in such cases?

@arjunsuresh
Copy link
Contributor Author

@psyhtest I'm not telling to disallow running the reference accuracy script in a custom way - say like within another python file. But I don't think it is right to allow generation of the accuracy.txt file by mimicking the actions of the reference script - because it becomes hard to verify this for other people.

We face this issue specifically for automating DLRMv2 submissions where to generate the accuracy.txt file we need the day23 criteo dataset which is not possible to be downloaded in an non-interactive way. But if we are allowed to generate the accuracy.txt file from within the benchmark implementation we possibly do not need this file at all.

@mrmhodak
Copy link
Contributor

@arjunsuresh to work on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants