Read-only assignment files with symlinks? #1636
Replies: 8 comments
-
Hi @trevorcampbell, I ran into the same issue for a 200 students intro to data science class. Since then, In practice, I organize the computing environment as a Python package, and the data
See e.g.: https://gitlab.u-psud.fr/L1InfoIntroScienceDonnees/ComputerLab/tree/master/binder The downside is that you need to update the computing environment whenever The upside is that, if you discover an issue in the dataset during the class, you can I am very much interested in feedback about how this approach generalizes or not |
Beta Was this translation helpful? Give feedback.
-
We supply notebooks as pre-built docker-images, and not specific to any class, course, or even University. My personal preference is that data is part of the assignment - you need the data for the autograder to work, so it should be part of the package. I would also point out that creating packages to share data is fine for experienced pythonic developers - but not really something the majority of tutors would be able to do (and made more complex, when the data is for R or Haskell or Stata (all of which we have available) Having said that, one could take your "data as install package" and install using |
Beta Was this translation helpful? Give feedback.
-
I really like these ideas. I think for the purposes of my course, this (or something like it) would solve the problem. I especially like that it doesn't require any change to nbgrader itself. Maybe worth including a discussion of this in the documentation -- "Including data and other files with your assignment" -- or something like that? |
Beta Was this translation helpful? Give feedback.
-
After some more thought: the above solutions would work well for most of our assignments. But we have some assignments where one of the key goals of the assignment is to read csv files from a local path. I suppose for those we could include a cell at the top that obtains the data (via package or some other source) and saves it locally. Then at the end of the notebook, we can include a cell that deletes the local copy. Though if the student finishes their work without a final call to that last cell, the local copy will be kept around and at least copied once into the |
Beta Was this translation helpful? Give feedback.
-
There's always a compromise...... but note that the autograde process won't run if the notebook is going to look for a data-file that's not there - so depending on what you're doing/testing, that may be significant. |
Beta Was this translation helpful? Give feedback.
-
In my case, I have a separate filesystem for course/shared data. This is mounted into student and instructor kubernetes pods (and autograding pods). The filesystems are implemented as a NFS server (along with everything except the software image). Since it's a network filesystem, it's also available other places and protected with unix permissions (and also mounted read-only in student pods). This nicely separates data from software, OS, student data, etc. |
Beta Was this translation helpful? Give feedback.
-
The autograder indeed needs access to the same assets (software I advocate for the later, but of course it's a balance which very much
Python packaging certainly can become hairy in complex setup(
name="intro-science-donnees",
version="0.1",
description='Salle de TP virtuelle pour le cours'
'«L1 Math-Info: Introduction à la Science des Données»',
url='http://nicolas.thiery.name/Enseignement/IntroScienceDonnees/',
author='Isabelle Guyon, Nicolas M. Thiéry et al.',
author_email='[email protected]',
packages=['intro_science_donnees'],
package_data={'intro_science_donnees': ['data/*', 'data/*/*']},
) Note that I am not advocating for using specifically Python packaging |
Beta Was this translation helpful? Give feedback.
-
By “local”, you mean relative? That is you would specifically want the data to (appear to) be in a subdirectory If yes, one approach, assuming your operating system(s) supports symbolic links, would be to add a cell at the beginning Just 2cts of course |
Beta Was this translation helpful? Give feedback.
-
[If this is of interest I'd be happy to put together a PR!]
Hi!
We are using nbgrader for some of our courses in the UBC stats department. Something we've run into a number of times now -- nbgrader ends up copying assignment files around quite a few times (in the release folder, in the students' folders, in the submitted folder, in the autograded folder, and in the feedback folder). With a course of 100 students, you get around 400 copies of each file floating around. This becomes a problem when you want to include a large-ish read-only file with an assignment, and when there are restrictive user storage quotas -- and even worse, when all the assignments need to be collected on a single grader account.
For reference: a 10MB dataset file in a course with 300 students has caused our graders to exceed their filesystem quotas, which then causes all kinds of definitely-not-user-solvable problems in our grading pipeline (un/remounting misbehaving ZFS filesystems, killing unresponsive docker containers, rebooting VMs, etc).
Q: Is there a reasonable way to reduce the number of redundant copies being made of "read-only" assignment files?
I think a reasonable idea (?) -- without mucking with any fundamental design elements of nbgrader -- would be to have a way of designating some assignment files as "read-only", and then instead of copying them around one would just make various symbolic links.
Specifically:
source/
the instructor can create areadonly.txt
listing read-only files.generate_assignment
, the converter either creates therelease/
copy and sets read-only permissions there, or perhaps even better the converter sets thesource/
file permission to read-only and creates a symbolic link inrelease
release_assignment
,fetch_assignment
,autograde
, andgenerate_feedback
, just create a symbolic link to therelease/
copyThoughts?
P.S. I tried to think of a solution that didn't use the annoying
readonly.txt
file, but I can't think of a good solution that automatically detects whether it's safe to use symbolic links or not for any particular file (students might modify files in their assignment, and you definitely don't want to let students mess withsource/
orrelease/
orsubmitted/
via symbolic link).A related Q: is there any reason to keep copies of non-ipynb files in
submitted/
,autograded/
, orfeedback/
? If not, a very dumb option would just be to delete all non-ipynb files insubmitted/
after fetching, and inautograded/
once the autograder finishes. This would also prevent copies infeedback/
. We'd still be left with the student copies in their own work folders -- so still no really large files -- but this would solve the problem of blowing up the grader quota with moderately-sized files, and most systems would probably have a reasonable individual quota for each user so they'd be fine too.Beta Was this translation helpful? Give feedback.
All reactions