Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ACL] Stateless feature impacts winograd convolution performance #2324

Open
alvoron opened this issue Dec 28, 2024 · 3 comments
Open

[ACL] Stateless feature impacts winograd convolution performance #2324

alvoron opened this issue Dec 28, 2024 · 3 comments
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 sighting Suspicious library behavior. Should be promoted to a bug when confirmed

Comments

@alvoron
Copy link
Contributor

alvoron commented Dec 28, 2024

ACL stateless feature integrated into oneDNN in the recent releases affects winograd convolution performance.
The performance issue has been reproduced on Ampere and Apple M2 Pro.

Several benchdnn reproducers

benchdnn --max-ms-per-prb=3e3 --mode=P --conv --reset --allow-enum-tags-only=0 --engine=cpu --dir=FWD_I --alg=WINO --dt=f32:f32:f32 --stag=acdb --wtag=any --dtag=acdb --attr-scratchpad=user mb1_ic512oc512_ih7oh7kh3sh1dh0ph1_iw6ow6kw3sw1dw0pw1

benchdnn --max-ms-per-prb=3e3 --mode=P --conv --reset --allow-enum-tags-only=0 --engine=cpu --dir=FWD_I --alg=WINO --dt=f32:f32:f32 --stag=acdb --wtag=any --dtag=acdb --attr-scratchpad=user mb1_ic256oc256_ih14oh14kh3sh1dh0ph1_iw12ow12kw3sw1dw0pw1

benchdnn --max-ms-per-prb=3e3 --mode=P --conv --reset --allow-enum-tags-only=0 --engine=cpu --dir=FWD_I --alg=WINO --dt=f32:f32:f32 --stag=acdb --wtag=any --dtag=acdb --attr-scratchpad=user mb1_ic128oc128_ih28oh28kh3sh1dh0ph1_iw24ow24kw3sw1dw0pw1

ACL without stateless feature gives 0.39 ms / 0.24 ms / 0.23 ms respectively on Apple M2 Pro.
ACL with stateless feature (vanilla ACL 24.11.1) gives 17.79 ms / 4.06 ms / 1.14 ms respectively on Apple M2 Pro.

To get ACL without stateless feature the following commits were reverted:

4d962e744 fix, build, docs: aarch64: mutex lock if the ACL kernel is not stateless
3b8f5cd35 cpu: aarch64: hot fix for aux tensor management of stateless gemm-conv and winograd conv without lock.
c5c10ad1c cpu: aarch64: hot fix for aux tensor management of  stateless gemm and winograd conv
6ff1ab13a cpu: aarch64: hot fix for segfault in cached winograd gradient convolution primitive
2cb58d7db cpu: aarch64: remove calls to acl_post_ops_t::create_resource
608d92fa4 cpu: aarch64 make binary ops use stateless ACL interface
16d6dd4fd cpu: aarch64: Enable stateless ACL depthwise convolution
d5a7e3055 cpu: aarch64: Upgrade matmul to use ACL stateless API
03db3e4ac cpu: aarch64: Call stateless ACL API from winograd convolution
7801ed18e cpu: aarch64: Enable ACL stateless API for gemm conv
2cfff2be2 cpu: aarch64: Enable ACL stateless API for indirect conv
@alvoron alvoron added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Dec 28, 2024
@theComputeKid theComputeKid added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Dec 28, 2024
@theComputeKid
Copy link
Member

This might be a known problem because we put in a workaround for winograd.

@alvoron
Copy link
Contributor Author

alvoron commented Dec 30, 2024

@theComputeKid I checked the latest main (281d20d), I've got the same numbers (17.8 / 4.06 / 1.13).

@theComputeKid
Copy link
Member

@alvoron Sorry, I half typed my thoughts. What I meant was, when we converted the conv to stateless, we realised that there were segfaults happening in ACL due to thread safety issues, so while we work on fixing those in ACL, we put in a workaround for oneDNN where we reinit the object every time, and this workaround for the segfault is probably causing the performance issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 sighting Suspicious library behavior. Should be promoted to a bug when confirmed
Projects
None yet
Development

No branches or pull requests

2 participants