You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the close future, we might be faced with RAMP problems whose target dimension is too big to be handled by the existing workflow without making the database explode. Simple example is an image-to-image workflow. These problems need a huge training / testing sample, making each predictions equally as big (order of a few Gb), while the current database size is 100 Gb.
Which brings us down to two options:
modify the database model and migrate it,
find a smart way of storing and scoring the predictions for these specific problems.
We would like for now to avoid option 1 if possible, so here is our take on option 2.
Since the target is a pixel-by-pixel prediction, we would sample the prediction, e.g. take a sub-grid of pixels to compute the score. To avoid cheating, we would use a different random sub-grid for the public and the backend datasets.
Practically, this would mean creating a specific SamplingScore class which uses a hash of the input dataset as a seed to generate the scoring grid. It then passes the grid to the scoring method in y_pred.
The text was updated successfully, but these errors were encountered:
This is a summary of a discussion we just had with @kegl on which we'd like to have comments, opinions, ideas @jorisvandenbossche @glemaitre @agramfort.
In the close future, we might be faced with RAMP problems whose target dimension is too big to be handled by the existing workflow without making the database explode. Simple example is an image-to-image workflow. These problems need a huge training / testing sample, making each predictions equally as big (order of a few Gb), while the current database size is 100 Gb.
Which brings us down to two options:
We would like for now to avoid option 1 if possible, so here is our take on option 2.
Since the target is a pixel-by-pixel prediction, we would sample the prediction, e.g. take a sub-grid of pixels to compute the score. To avoid cheating, we would use a different random sub-grid for the public and the backend datasets.
Practically, this would mean creating a specific
SamplingScore
class which uses a hash of the input dataset as a seed to generate the scoring grid. It then passes the grid to the scoring method iny_pred
.The text was updated successfully, but these errors were encountered: