Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run evaluation in Docker #290

Open
rth opened this issue Nov 25, 2019 · 4 comments · May be fixed by #374
Open

Run evaluation in Docker #290

rth opened this issue Nov 25, 2019 · 4 comments · May be fixed by #374

Comments

@rth
Copy link
Collaborator

rth commented Nov 25, 2019

I don't know if there is already an issue for it but it would be good to run submissions in a Docker container. That would allow limiting the amount of resources (CPU, memory) a submission can use and apply other restrictions (e.g. remove network access).

The step 1 of this could be to add another worker setup that would run the same conda worker but inside docker. One could mount relevant folders with miniconda and data. Very roughly something like,

docker run --rm ubuntu -v /home/user/ramp_deployment:/ramp_deployment -v /home/user/miniconda3/:/mininconda3  /miniconda3/bin/python  start_worker_script.py

I think by mounting the right folders, one might even use default docker images.

This would help with resource limits, but not with access to hidden test data. Since it will be present on the filesystem, users can access it (and this is what is happening the current teaching event we are doing with @massich and @mathurinm).

Step 2 would be to mount only the features of the hidden test set (i.e. without the target column) inside Docker, compute predictions, then score final predictions in a separate docker environment. So that target column can not be accessed in principle by users.

@glemaitre please comment if I forgot something (I have not looked in detail into how workers are implemented).

cc @maikia

@rth
Copy link
Collaborator Author

rth commented Nov 25, 2019

cc @xNJL @guillaume-le-fur who encountered this vulnerability during the last event. If you have other suggestions on improving things security wise don't hesitate.

@glemaitre
Copy link
Collaborator

So we need to:

  • Run the submission within a docker and save the predictions on disk;
  • Then, something should compute the score. So we need to change ramp-test --submission because it was in charge of doing both. I don't think that it should be done by the dispatcher as well. The job of the dispatcher is to only manage worker and write in the DB.

@glemaitre
Copy link
Collaborator

But I agree that running already the worker in a docker would remove some security issues.

@kegl
Copy link
Collaborator

kegl commented Nov 26, 2019

Thumbs up for separating training and scoring, also for efficiency (scoring can take time, now it blocks the dispatcher). We'd need to restructure ramp-workflow script but it's relatively straightforward. It's done together now so one can choose not to save predictions and reload them. It will help to make ramp-workflow more readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants