You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In upgrading from Ray 2.24 to 2.39, I am running into:
ValueError: Failed to look up actor with name 'actor'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.
occasionally when calling Actor.options(get_or_create=True, max_restarts=-1, ...).remote(). This did not happen in 2.24. This only happens when max_restarts is non-zero.
I've managed to reproduce it, but the repro script does not exactly match our code, so I'm not 100% sure that fixing the repro script will also fix our code. I have not yet figured out a workaround for our actual code.
Versions / Dependencies
Python 3.11.9
Ray 2.37, 2.38 and 2.39 exhibit this behavior. 2.24 and 2.36 do not. This seems to have been introduced in Ray 2.37.
The text was updated successfully, but these errors were encountered:
jfaust-fy
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 22, 2024
jfaust-fy
changed the title
[Core] Repeated calls to create an Actor with get_or_create=True and max_restarts != 1 can fail
[Core] Repeated calls to create an Actor with get_or_create=True and max_restarts != 0 can fail
Nov 22, 2024
What is your expected behavior for the script below? Should only the first Actor.options call create the actor, while all the following 999 calls to Actor.options retrieve the existing actor?
The expected behavior is somewhat difficult to define because there is no reference to the actor. As a result, the reference count will be 0, leading to the actor being destroyed.
@kevin85421 like I said, this is not exactly our use case - in our case we do have a reference to the Actor, but the reference goes away and is then re-created (we're seeing this in tests, where a test is fine when run on its own, but fails when run right after a previous test. The previous test will have created the Actor, used it, then all references disappear, and then we get this exception when the subsequent test tries to create the Actor).
So I don't really have a good answer about what this test case should do - I guess I'd expect it to either create or retrieve the Actor in each iteration, depending on how async actor destruction is, and not to throw an exception.
jjyao
added
P1
Issue that should be fixed within a few weeks
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 25, 2024
What happened + What you expected to happen
In upgrading from Ray 2.24 to 2.39, I am running into:
occasionally when calling
Actor.options(get_or_create=True, max_restarts=-1, ...).remote()
. This did not happen in 2.24. This only happens whenmax_restarts
is non-zero.I've managed to reproduce it, but the repro script does not exactly match our code, so I'm not 100% sure that fixing the repro script will also fix our code. I have not yet figured out a workaround for our actual code.
Versions / Dependencies
Python 3.11.9
Ray 2.37, 2.38 and 2.39 exhibit this behavior. 2.24 and 2.36 do not. This seems to have been introduced in Ray 2.37.
Reproduction script
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: