RSDK-9591 - Kill all lingering module process before exiting #4657

cheukt · 2024-12-27T16:13:22Z

This is part two of two PRs that will hopefully help with shutting down all module processes before viam-server exits. Part one is here

This is still a draft as I'm looking for thoughts and ideas around making this better.

Before doing this, I looked into assigning module processes to the same process group as the viam server and just kill the process group. However, we already have each module and process assign to unique process groups, and we use that property to kill each modules and processes separately if necessary. Changing that behavior would be risky, so did not pursue that path further.

We could kill each process in mod manager directly using the exposed unixpid, but figured we could just do it within each managed process, that way we get support in windows as well. It does mean I added Kill() in a few interfaces, but it will hopefully be extensible in case anything else may need killing.

The idea behind this is for a Kill() call to propagate from the viam-server at the end of 90s, and we should not block on anything if possible. The Kill() does not care about the resource graph, only that we kill processes/module processes spawned by the server. I did not do the killing in parallel, since the calls will not block. I can see things racing with Close(), but I think the mitigation would be to make sure that kill/close is idempotent and will not panic if overlapping. This Kill() call does happen in the same goroutine that eventually calls log.Fatal, is that good enough for now or should we create a different goroutine so that we can guarantee that the viam-server exits by the 90s mark?

Ideas for testing? I've tested on a python module and observed that the module process does get killed, and would be good to test on setups where this is happening.

kill

9e09821

cheukt requested review from dgottlieb and benjirewis December 27, 2024 16:13

viambot added the safe to test This pull request is marked safe to test from a trusted zone label Dec 27, 2024

cheukt mentioned this pull request Dec 27, 2024

RSDK-9591 - Add Kill to ManagedProcess viamrobotics/goutils#399

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSDK-9591 - Kill all lingering module process before exiting #4657

RSDK-9591 - Kill all lingering module process before exiting #4657

cheukt commented Dec 27, 2024 •

edited

Loading

RSDK-9591 - Kill all lingering module process before exiting #4657

Are you sure you want to change the base?

RSDK-9591 - Kill all lingering module process before exiting #4657

Conversation

cheukt commented Dec 27, 2024 • edited Loading

cheukt commented Dec 27, 2024 •

edited

Loading