-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RandomForestClassifier slower than original sklearn #1050
Comments
Hi @matchyc, thanks for reporting. Just to give an update. The issue was reproduced and it depends on the |
Hi @matchyc and @Innixma, a quick update. It looks like this option struggles with many-feature datasets. Running with Do you have the resources to give this a shot? I'm working on understanding this behavior in greater detail, and additional input from your side would be very helpful. $ python rf_slow.py
Train optimized
Took 11.61 seconds
[16, 11, 31, 22, 18, 30, 11, 15, 29, 18, 17, 33, 29, 25, 29, 25, 24, 12, 29, 23, 31, 31, 27, 18, 13, 29, 20, 19, 29, 20, 12, 19, 25, 15, 38, 18, 11, 27, 30, 35, 31, 24, 17, 32, 31, 27, 23, 30, 21, 13, 3, 22, 20, 19, 23, 24, 22, 18, 8, 30, 24, 25, 10, 19, 19, 8, 11, 28, 21, 35, 23, 25, 13, 23, 24, 29, 31, 23, 22, 26, 23, 9, 13, 10, 27, 20, 29, 25, 24, 25, 17, 32, 20, 35, 24, 26, 31, 22, 25, 27, 30, 4, 33, 9, 22, 23, 34, 28, 23, 8, 22, 20, 25, 27, 17, 5, 25, 29, 18, 15, 23, 22, 23, 32, 28, 19, 30, 24, 5, 24, 27, 8, 29, 25, 13, 22, 25, 19, 19, 10, 24, 10, 25, 26, 23, 23, 24, 21, 30, 25, 31, 28, 30, 9, 14, 12, 33, 22, 13, 9, 21, 23, 3, 17, 8, 25, 17, 30, 24, 26, 23, 5, 23, 21, 36, 27, 13, 18, 13, 23, 26, 24, 7, 9, 17, 12, 27, 4, 14, 11, 4, 24, 30, 25, 17, 29, 25, 29, 33, 11, 39, 26, 20, 19, 32, 29, 15, 34, 20, 24, 23, 27, 27, 27, 30, 19, 26, 27, 15, 26, 14, 11, 31, 15, 27, 25, 26, 25, 20, 20, 25, 37, 34, 32, 39, 31, 21, 28, 10, 28, 20, 18, 31, 25, 9, 26, 11, 23, 22, 21, 4, 13, 29, 21, 37, 23, 17, 15, 33, 26, 25, 19, 26, 20, 32, 28, 19, 8, 13, 28, 34, 28, 25, 25, 21, 42, 17, 32, 9, 24, 9, 24, 22, 4, 22, 25, 21, 30, 19, 21, 23, 29, 24, 23, 20, 31, 17, 22, 24, 22]
Train stock
Took 29.54 seconds
[83, 65, 56, 78, 85, 57, 94, 65, 120, 68, 61, 95, 63, 67, 76, 65, 64, 79, 86, 72, 91, 52, 84, 90, 60, 58, 53, 74, 65, 68, 69, 66, 77, 79, 59, 70, 84, 66, 74, 61, 71, 73, 51, 68, 58, 83, 52, 87, 69, 69, 72, 97, 88, 82, 60, 63, 76, 69, 66, 70, 92, 61, 85, 74, 86, 112, 80, 58, 57, 73, 60, 63, 80, 86, 63, 126, 78, 69, 66, 73, 74, 64, 59, 78, 64, 72, 115, 65, 94, 88, 59, 81, 70, 64, 62, 73, 64, 71, 57, 85, 102, 62, 59, 91, 92, 71, 96, 81, 74, 70, 88, 72, 77, 78, 78, 60, 52, 64, 85, 71, 68, 98, 80, 74, 59, 87, 66, 75, 60, 95, 84, 85, 66, 56, 98, 93, 92, 75, 72, 65, 91, 56, 121, 71, 71, 77, 80, 89, 82, 67, 76, 84, 84, 79, 100, 67, 72, 72, 65, 67, 70, 76, 63, 74, 74, 81, 71, 72, 86, 77, 78, 109, 69, 71, 73, 75, 59, 74, 67, 80, 81, 57, 75, 73, 74, 73, 83, 97, 90, 85, 68, 72, 69, 89, 66, 70, 57, 135, 70, 70, 63, 83, 57, 69, 69, 74, 70, 76, 65, 60, 87, 80, 67, 66, 67, 136, 64, 73, 71, 100, 77, 81, 58, 73, 77, 83, 64, 63, 81, 98, 65, 74, 68, 64, 68, 60, 75, 79, 95, 76, 70, 74, 88, 66, 68, 74, 94, 67, 85, 75, 94, 74, 72, 92, 77, 71, 80, 66, 65, 70, 90, 74, 84, 54, 109, 71, 60, 66, 77, 71, 62, 85, 92, 92, 85, 86, 71, 104, 64, 67, 71, 67, 70, 82, 53, 85, 95, 66, 70, 79, 79, 62, 70, 62, 88, 64, 90, 57, 86, 62]
Training done. Would be great to hear if you observe any other oddities when enabling memory saving mode. PS: @Innixma, the above example is training 300 estimators on the kddcup upselling data set. All in all, RAM consumption is about the same as stock scikit-learn, and about 15 GB in total |
@ahuber21 I'd be happy to test assuming the issue of RandomForest predictive performance not aligning between scikit-learn and intelex has been resolved. If the performance is identical given identical hyperparameters, then training/inference speedups would be meaningful and I'd try to find time to benchmark it, but would like to confirm if the accuracy delta has been fixed. |
For example, it appears that users are still reporting model quality deltas for RF: #1090 |
Fair point. I actually made quite some progress in the meantime and I'm hoping to provide more details soon. Naturally, I will be testing the model accuracy after I applied my changes, so I hope that I can comment on #1090 as well. |
Same issue. Training a many features |
@smith558 we're finalizing checks on uxlfoundation/oneDAL#2292. Please check the next release for an update. If the problem persists, please post a reproducer. Thanks |
Please reopen if the issues persists. |
@syakov-intel, when is the next release planned? |
The fix from uxlfoundation/oneDAL#2292 is included in 2023.1.1. Feel free to try it out |
Describe the bug
RandomForestClassifier slower than original sklearn
To Reproduce
Steps to reproduce the behavior:
Before Intel One API acceleration, time elapsed is: 117.54324022123 seconds
After Intel One API acceleration, time elapsed is: 131.16063022613525 seconds
code:
In the
main
function, there are MLP\RFClassifier\DecisionTree.only RFClassifier is supported by intel sklearnex, but I still got much slower than the original sklearn pack. I suppose it should be at least equal.
Environment:
The text was updated successfully, but these errors were encountered: