Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy core changes for reverse connections #37368

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

basundhara-c
Copy link

@basundhara-c basundhara-c commented Nov 26, 2024

Commit Message: This commit collates the envoy core changes for reverse connections, described in this github issue. A detailed description of reverse connections concepts and workflows is provided in both the github issue and in the examples section.

Additional Description: This PR involves several working components, that are added as part of the following extensions:

Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: Basundhara Chakrabarty [email protected]
Co-authored-by: Arun Vasudevan [email protected]
Co-authored-by: Tejas Sangol [email protected]
Co-authored-by: Aditya Jaltade [email protected]

Copy link

Hi @basundhara-c, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #37368 was opened by basundhara-c.

see: more, trace.

@basundhara-c basundhara-c force-pushed the reverse_conn_envoy_core branch from 00a138f to e8d32a5 Compare November 26, 2024 20:57
Commit Message: This commit collates the envoy core changes for reverse connections.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

Signed-off-by: Basundhara Chakrabarty <[email protected]>
Co-authored-by: Arun Vasudevan <[email protected]>
Co-authored-by: Tejas Sangol <[email protected]>
Co-authored-by: Aditya Jaltade <[email protected]>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really glad you got this working! Tossed in a few high level comments but I'm going to assign a first pass reviewer for the rest

/**
* @return the cluster manager pointer.
*/
virtual Upstream::ClusterManager* getClusterManager() PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think cluster manager APIs belong in the dispatcher - I think it's worth finding a more clean way to get the cluster manager you need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@basundhara-c, could you pass the singleton ClusterManager() from server when creating the RCThreadLocalRegistry in your extension, so that the RCmanager has the thread_local_cluster() api, and we don't need to use this worker_.dispatcher().set/getClusterManager()?

* Provides filters access to connection handler to save outgoing connections as
* incoming connections for reverse tunnels
*/
virtual void setConnectionHandler(Network::ConnectionHandler* connection_handler) PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what this function has to do with the dispatcher - did you stash this here as a convenient way to get the handler local to your thread? I think you want to look into the thread local state (tls) getters and setters.

@@ -19,6 +21,26 @@
namespace Envoy {
namespace Network {

// The thread local registry.
class LocalRevConnRegistry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you should be able to do this PR with minimal APIs in core Envoy, but instead in your extension directory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk The main set of changes is to support the creation of the ReverseConnectionManager and ReverseConnectionHandler thread-locally as implemented using a thread-local registry. The ReverseConnectionManager and ReverseConnectionHandler need to come up right after the workers are created because they are involved in accepting reverse connections from other initiating envoy instances even if the local envoy is not initiating any reverse connections, as long as the local envoy has the reverse connections bootstrap extension enabled.

So, currently, we check if the reverse connection bootstrap extension is enabled by checking if the singleton is present, and if so, right after workers are created, we post to each worker's dispatcher a functor to create the Thread Local Registries (creating the ReverseConnectionManager and ReverseConnectionHandler). This thread local registry is then parked with the Connection Handler and is accessed later wherever we need access to the ReverseConnectionManager and ReverseConnectionHandler like here,to initiate reverse conns, here in the reverse conn filter, which accepts reverse conns and in multiple other places where only the thread local dispatcher is otherwise available to us. Therefore, I have defined abstract classes for LocalRevConnRegistry, RevConnRegistry, etc so that these two entities can be stored in the form of a ThreadLocalRegistry and accessed later. We briefly touched upon this in this comment and on slack. In summary:

  1. We have created two thread local entities which get initialized right after the workers are created. They need to be parked somewhere so that they can be accessed later in code paths where only the thread local dispatcher is available. We have parked them with the thread local connection handler and therefore added the abstract classes under Envoy::Network.
  2. If the above is not the best approach, what would you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread local registry is then parked with the Connection Handler and is accessed later wherever we need access to the ReverseConnectionManager and ReverseConnectionHandler like here,to initiate reverse conns, here in the reverse conn filter, which accepts reverse conns and in multiple other places where only the thread local dispatcher is otherwise available to us

For filters we can get the slot from serverFactoryContext from Server::Configuration::FactoryContext& context during the filter chain creation, could you try to see if this can avoid the setConnectionHandler api changes?

Copy link
Member

@botengyao botengyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @basundhara-c, very interesting feature!
Here is a first pass to kick off the process.

/**
* @return the cluster manager pointer.
*/
virtual Upstream::ClusterManager* getClusterManager() PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@basundhara-c, could you pass the singleton ClusterManager() from server when creating the RCThreadLocalRegistry in your extension, so that the RCmanager has the thread_local_cluster() api, and we don't need to use this worker_.dispatcher().set/getClusterManager()?

@@ -19,6 +21,26 @@
namespace Envoy {
namespace Network {

// The thread local registry.
class LocalRevConnRegistry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread local registry is then parked with the Connection Handler and is accessed later wherever we need access to the ReverseConnectionManager and ReverseConnectionHandler like here,to initiate reverse conns, here in the reverse conn filter, which accepts reverse conns and in multiple other places where only the thread local dispatcher is otherwise available to us

For filters we can get the slot from serverFactoryContext from Server::Configuration::FactoryContext& context during the filter chain creation, could you try to see if this can avoid the setConnectionHandler api changes?

"Thread local rverse conn registry should not be null.");
}

void ConnectionHandlerImpl::saveUpstreamConnection(Network::ConnectionSocketPtr&& upstream_socket,
Copy link
Member

@botengyao botengyao Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to store the upstream socket to a dedicated REVESR_CLUSTER cluster so that it can be reused when public Envoy wants to establish connections to the on-perm one. And we also wan to save upstream connection here is just for PING and other handshake processes.

The request flow is like:

  1. on-prem Envoy -> public Envoy
  2. Then the public Envoy stores the downstream socket to the REVESR_CLUSTER,
  3. and also initialize the listener to do RPING
  4. And then the service behind the public Envoy can send request -> normal listener -> filster-chain / router -> the stored socket in REVERSE_CLUSTER.

Am I understanding correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@botengyao yes, most of the above is correct, the request flow is as follows:

Onprem -> Cloud envoy

  1. Onprem initiates reverse connections; it creates http connections to cloud envoy and sends a reverse connection initiation request through it.
  2. This request is intercepted by the reverse conn http filter and cloud envoy stores the sockets with the ReverseConnectionHandler. This ReverseConnectionHandler periodically sends RPINGs over all such cached sockets.
  3. On cloud envoy, a REVERSE_CONNECTION cluster type is defined and is used for all requests that need to be sent over a reverse connection. When a request arrives and this REVERSE_CONNECTION cluster is picked for the route, we interface with the ReverseConnectionHandler described above, get the cached socket and send the request over it.

@@ -263,6 +278,10 @@ void ConnectionImpl::setDetectedCloseType(DetectedCloseType close_type) {
}

void ConnectionImpl::closeSocket(ConnectionEvent close_type) {
if (connection_reused_ || !ConnectionImpl::ioHandle().isOpen()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When connection_reused_ is always true, how do we close the socket?

@alyssawilk alyssawilk added waiting and removed waiting labels Dec 5, 2024
@basundhara-c
Copy link
Author

@botengyao thanks a lot for the suggestion on obtaining the slot from the context! I am trying that out along with an attempt to move the code in extensions to contrib as much as possible and will be sharing the changes shortly!

…er to the RCManager

2. Deleting unwanted APIs and ReverseConnectionManager and Handler header files

Signed-off-by: Basundhara Chakrabarty <[email protected]>
Signed-off-by: Basundhara Chakrabarty <[email protected]>
Signed-off-by: Basundhara Chakrabarty <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants