Run evaluation in Docker #290

rth · 2019-11-25T09:46:23Z

I don't know if there is already an issue for it but it would be good to run submissions in a Docker container. That would allow limiting the amount of resources (CPU, memory) a submission can use and apply other restrictions (e.g. remove network access).

The step 1 of this could be to add another worker setup that would run the same conda worker but inside docker. One could mount relevant folders with miniconda and data. Very roughly something like,

docker run --rm ubuntu -v /home/user/ramp_deployment:/ramp_deployment -v /home/user/miniconda3/:/mininconda3  /miniconda3/bin/python  start_worker_script.py

I think by mounting the right folders, one might even use default docker images.

This would help with resource limits, but not with access to hidden test data. Since it will be present on the filesystem, users can access it (and this is what is happening the current teaching event we are doing with @massich and @mathurinm).

Step 2 would be to mount only the features of the hidden test set (i.e. without the target column) inside Docker, compute predictions, then score final predictions in a separate docker environment. So that target column can not be accessed in principle by users.

@glemaitre please comment if I forgot something (I have not looked in detail into how workers are implemented).

cc @maikia

The text was updated successfully, but these errors were encountered:

rth · 2019-11-25T09:57:57Z

cc @xNJL @guillaume-le-fur who encountered this vulnerability during the last event. If you have other suggestions on improving things security wise don't hesitate.

glemaitre · 2019-11-25T15:54:43Z

So we need to:

Run the submission within a docker and save the predictions on disk;
Then, something should compute the score. So we need to change ramp-test --submission because it was in charge of doing both. I don't think that it should be done by the dispatcher as well. The job of the dispatcher is to only manage worker and write in the DB.

glemaitre · 2019-11-25T16:13:51Z

But I agree that running already the worker in a docker would remove some security issues.

kegl · 2019-11-26T14:43:29Z

Thumbs up for separating training and scoring, also for efficiency (scoring can take time, now it blocks the dispatcher). We'd need to restructure ramp-workflow script but it's relatively straightforward. It's done together now so one can choose not to save predictions and reload them. It will help to make ramp-workflow more readable.

glemaitre added important database dispatcher labels Jan 18, 2020

glemaitre linked a pull request Jan 24, 2020 that will close this issue

[WIP] add docker worker #374

Open

lucyleeow mentioned this issue Jun 16, 2020

BUG deploy-event with --force overwrites problem and event #428

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run evaluation in Docker #290

Run evaluation in Docker #290

rth commented Nov 25, 2019

rth commented Nov 25, 2019

glemaitre commented Nov 25, 2019

glemaitre commented Nov 25, 2019

kegl commented Nov 26, 2019

Run evaluation in Docker #290

Run evaluation in Docker #290

Comments

rth commented Nov 25, 2019

rth commented Nov 25, 2019

glemaitre commented Nov 25, 2019

glemaitre commented Nov 25, 2019

kegl commented Nov 26, 2019