Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] Method of handling websocket connections leaks goroutines #271

Open
crobert-1 opened this issue Apr 17, 2024 · 1 comment
Open

Comments

@crobert-1
Copy link
Member

Context

Hello, this was found as a result of investigation for open-telemetry/opentelemetry-collector-contrib#30931. Enabling goleak in the server directory of opamp-go shows the TestServerCallsHTTPMiddlewareOverWebsocket test is leaking. Here's the call stack for the goleak failure:

Stack trace

=== RUN   TestServerCallsHTTPMiddlewareOverWebsocket
--- PASS: TestServerCallsHTTPMiddlewareOverWebsocket (0.01s)
PASS
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 23 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
internal/poll.runtime_pollWait(0x4888de10, 0x72)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc0001e2780?, 0xc000236000?, 0x0)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0001e2780, {0xc000236000, 0x1000, 0x1000})
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_unix.go:164 +0x27a
net.(*netFD).Read(0xc0001e2780, {0xc000236000?, 0x0?, 0x0?})
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/net/fd_posix.go:55 +0x25
net.(*conn).Read(0xc0001940c0, {0xc000236000?, 0x10?, 0x10?})
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/net/net.go:179 +0x45
bufio.(*Reader).fill(0xc0001a8300)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:110 +0x103
bufio.(*Reader).Peek(0xc0001a8300, 0x2)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:148 +0x53
github.com/gorilla/websocket.(*Conn).read(0xc0001dc2c0, 0x0?)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:378 +0x26
github.com/gorilla/websocket.(*Conn).advanceFrame(0xc0001dc2c0)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:824 +0x6d
github.com/gorilla/websocket.(*Conn).NextReader(0xc0001dc2c0)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:1034 +0x13e
github.com/gorilla/websocket.(*Conn).ReadMessage(0x193d0a0?)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:1120 +0x13
github.com/open-telemetry/opamp-go/server.(*server).handleWSConnection(0xc0001f00c0, {0x197e350, 0xc00028e0a0}, 0xc0001dc2c0, {0x197d918, 0xc0002b2018})
	/Users/crobert/dev/opamp-go/server/serverimpl.go:240 +0x1a5
created by github.com/open-telemetry/opamp-go/server.(*server).httpHandler in goroutine 22
	/Users/crobert/dev/opamp-go/server/serverimpl.go:210 +0x350
]

This shows the test itself is passing, but goleak detects leaked goroutines.

Analysis

I found the culprit to be this section:

// Block until the next message can be read.
mt, msgBytes, err := wsConn.ReadMessage()

As shown by the comment, this line will block until the connection can read a message or until it's closed. However, the close call is in the same goroutine, within a defer statement. This means the websocket will never be closed, and the goroutine is leaked.

The solution would be to have the wsConn.Close method be called when the reqCtx is cancelled, from what I can tell. This will allow the read call to stop blocking and properly finish running. Note that I don't have much opamp experience, so the end solution may be quite different than my suggestion.

@tttoad
Copy link

tttoad commented Dec 26, 2024

@crobert-1
I think the issue is here.

func (s *server) Stop(ctx context.Context) error {
if s.httpServer != nil {
defer func() { s.httpServer = nil }()
// This stops accepting new connections. TODO: close existing
// connections and wait them to be terminated.
return s.httpServer.Shutdown(ctx)
}
return nil
}

httpServer.Shutdown is called when server stops, but it doesn't close all connections, which will cause wsConn.ReadMessage() to fail to read an poll.ErrNetClosing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants