-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hyperopt #176
Comments
pb is that this makes .py not valid python file. It's not easy to debug. I would much prefer to have as contract that all parameters are defined in the constructor with a get_params / set_params like sklearn. my 2c |
Yes, that's an option. The problem is that it adds a constraint on what constitutes a submission. Some submissions have several classes, we even had elements with standalone functions. It also makes it python-specific. The hyperopt engine would have to know about what's in the submission files, whereas with the jinja template it can remain agnostic. The last remark with a default value is to make it easy to convert the template into a valid python file (if needed, e.g. for debugging). The typical workflow, in my mind, is that I develop first a submission, test it with |
Another option: we take a valid python submission and textually mark the hypers (with |
make people use a Bunch object where they write all there params at the top
of the files
params = Bunch(param1='bla', ...)
...
.... ... params.param1 ...
I would really stick to valid Python code
… |
OK, can you write the (pseudocode of the) hyperopt loop using this version? Below are the steps to fill. What we want to reuse is
In particular, how would you do 3. with Bunches? |
OK, my attempt. We could do this with dictionaries:
this would be a valid submission that can be tested by |
make sure it's always called params and pb is solved
… |
I've started testing an implementation of a solution like the one suggested by @agramfort. All you need to do is make the hyper-parameters explicitly appear in your estimator's To make a submission out of the best parameters, one can ask the CLI tool for hyper-parameter tuning to pickle the best model and/or the best config. We can add an option to the submission script to set the hyper-parameters values from a config file. Forcing the use of explicit keyword arguments for hyper-parameters to be optimized would only improve the readability of the code. |
I agree with @agramfort it is good to keep it a python file. And I also think it would be nice to re-use the sklearn pattern of setting parameters, however, that would need a rework of ramp-workflow testing machinery to be able to work with parameters. So I understand that an approach that generates new submission files and just uses the existing And about some specified syntax that ramp can recognize and adapt: I am not sure there is any added value in using a python dict
I think something like this is easier to parse and modify than the dictionary, and also very explicit (the naming above of the comments is of course just an idea, it can be whatever we want). We could even combine the parameter defaults with the hyperopt range, if we want:
|
that's neat too
… |
+1 to keep it as a python file for debugging |
Technically, I would prefer to modify the So if engineering time is an issue, I am +1 for the solution proposed by @jorisvandenbossche |
In case it can be useful, in the sacred package you can define configuration parameters either by decorating a config function (which basically collects all variables of the local scope of the function) or by a dictionary or by a config file (json or yaml). (This might be overly complicated for what we want to achieve here) |
One more argument for @jorisvandenbossche's solution. The list of values is very informative since it tells to the other data scientists what hypers were tried. In case of transfer (trying the model on another data set), hyperopt could be rerun on the same list of values without modifying anything. And since the list is in the submission file, the list will be submitted to the server with the submission. In case we put the list into a separate file, that file will not be automatically submitted to the server (unless we require it explicitly, but that would make no sense for those submissions that didn't use hyperopt), so the list of values is lost. |
I started a new branch |
I have a first random engine working. You can pull from the
in titanic. The run will create a new submission (so we don't overwrite Please check whether you are happy with the interface (how to specify hypers, the command line script, and the output of hyperopt). Here are some details about the interface and implementation:
We need to define the type and the value (default), as well as the values to try. There are of course a lot of tests and documentation to be done, but please try the beta and let us know especially if you have comments on the usage. |
We'll be starting to add a hyperopt/automl feature to ramp-workflow. The goal is to make it easy to convert a submission into a template with hyperparameter placeholders, and to add a hyper-config file that defines the values to try for each hyperparameter. Then rampwf will interface with various hyperopt engines (implementations of grid search, random, bayesopt), run the optimization, and output 1) a table on score(s) vs hyperparameter value combinations and 2) a valid submission where the placeholders will be replaced by the best hyperparameter. We are planning to use jinja, which means that, for example, the python code
will be replaced by
and the config json file will specify values
[median, mean]
forstrategy
and[0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 1.0]
forlogistic_C
. The user will then call e.g.ramp_hyperopt --submission ... --strategy random
. In addition, for each placeholder we will specify a default, so iframp_test --submission
is called on a templetized submission, it will use these default values.The text was updated successfully, but these errors were encountered: