Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support default values in placeholders #5275

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

JensErat
Copy link

Support variable expansion also in placeholders, like in {unsetPlaceholder:default value}. Implemented similarly to #3682, which provided defaults for environment variable expansion.

This PR seems to work fine already, but two or three minors should be discussed (see TODO in commit).

  • {"json"\: "object"} : is this a case we need to handle? the behavior seems unspecified to me right now.
  • {"json": "object"}: defaulting to "object" seems appropriate to me here, can you confirm?
  • {"json": "object"\}: to me, this looks like a parser bug and should never have gone into the placeholder substitution code?

Closes #1793.

Jens Erat [email protected], Mercedes-Benz Tech Innovation GmbH, imprint

@CLAassistant
Copy link

CLAassistant commented Dec 29, 2022

CLA assistant check
All committers have signed the CLA.

@francislavoie
Copy link
Member

Oooh, hey! So I assume Mercedes-Benz is using Caddy? Awesome! We'd love to hear what it's helping you solve, if that's something you can share.

Thanks for working on this, it's definitely a tricky one.

The JSON issue is definitely valid. Users do write JSON strings in their configs occasionally. We have the backtick-delimited token syntax in the Caddyfile to make this easier, for example:

header Content-Type application/json
respond 200 `{"some": "json"}`

I'm thinking that a quick-and-dirty workaround for the replacer would be to reject replacing the default if the key (left side of :) contains a " anywhere. I don't expect that any valid placeholder name would contain a double-quote, but I think all valid JSON with a colon would need to use one.

We do have some placeholders that take an "argument", like {query.*} where you could index into the URL's query; I think it's very unlikely that a double-quote would be used for a URL query key (I think it's invalid in that particular case, would need to be URL encoded) with a default placeholder value. There might be other kinds of placeholders where that could make sense to have double quotes, but I can't think of any right now.

@francislavoie francislavoie added the feature ⚙️ New feature or request label Dec 29, 2022
@francislavoie francislavoie added this to the v2.7.0 milestone Dec 29, 2022
@mholt
Copy link
Member

mholt commented Dec 29, 2022

Oh, this is great -- thank you!

While I echo what Francis said, I want to add my 2 cents real quick.

So yeah, JSON in Caddy configs is somewhat common, and as Francis mentioned, the backtick string delimiters can be used to make this easier since it interprets those quotes within literally.

Additionally, it depends how the placeholders are evaluated in a particular context or place within the config. Many places use ReplaceAll(), which replaces all instances of {...} with a default/empty value if the placeholder is not known. We try to do this only in places where JSON is not expected or would not make sense. Some placeholders are evaluated using ReplaceKnown(), which leaves the "placeholder" untouched if the key is not recognized. This is useful for JSON which typically does not clash with real placeholder keys.

(I somewhat regret designing placeholders the way I have, sigh.)

Anyway, I think the double-quote workaround is good enough for now -- I don't think placeholders should use quotes within them. That way we don't need to worry about escaping colons.

@mholt mholt added the under review 🧐 Review is pending before merging label Dec 29, 2022
@JensErat
Copy link
Author

JensErat commented Dec 29, 2022

I agree with checking for quoted keys, this should be somewhat easy to add.

We realized we'd actually love to support placeholders as default values, to be specific something similar to {request.uri.foo:{header.bar}} -- so use a specific request parameter, otherwise a header value. This is not implemented by the above code, anyway. In general, this is very easy to implement by applying a recursive lookup.

But trying to provide a first approach in this, I stumbled over a whole bunch of open questions around how braces are matched. An example is this (adopted from the unit tests, adding letters to make it easier to understand what's going on):

  • input: a{b{c}d (ie., unmatching braces)
  • current output: ad (brace between a and b searches for first match, consumes everything, replaces with empty string as unknown placeholder)
  • output of adapted code (not the current commit from this PR): a{bd (just consumed {c} which is also an unknown placeholder)

Can we do any assumptions on what's allowed as placeholder key ("not contains a quote") is already one of those).

Also: how escapes get parsed is bugging me a little bit.

  • input: \\{\\}
  • current output: \\ (the first and third backslash)
  • my expectation: \ (and a consumed placeholder of key backslash)

I understand I'm doing lots of (sometimes different) interpretation of the semantics here, but I have the feeling there are no well-defined semantics at all.


  • What's your thoughts on allowing placeholders in defaults again? They'd be super useful for use cases as described above!
  • I guess this would also include something like {foo:{bar:{batz}}} -- comes for free with the recursion described above
  • I guess this even somewhat mixes well with environment variables, as these get replaced first (haven't tried yet, though)
  • What's your thoughts on the two special cases described?
  • Is there anything that would consider a good example of current "valid" uses (with JSON, ...), to add those as test cases?

@JensErat
Copy link
Author

Regarding our use of caddy: I guess there's a bunch of uses over the entire company (we're large...). In our specific use case, caddy is embedded in an in-house authentication gateway solution, somewhat comparable to something like keycloak gatekeeper, but with several integrations for special user directories.

In another more boring use case we're using it for hosting some static web content, as it fits our CI/CD pipeline well which is mostly built around creating golang container images.

@mholt
Copy link
Member

mholt commented Dec 29, 2022

  • input: a{b{c}d (ie., unmatching braces)
  • current output: ad (brace between a and b searches for first match, consumes everything, replaces with empty string as unknown placeholder)

I think this is what I'd expect: first open brace, should match until first close brace, regardless of braces inside; as currently placeholders can't be nested. (That may change, as we talk about using placeholders as defaults, and thus recursive parsing. We'll need a limit on the recursion I think!)

output of adapted code (not the current commit from this PR): a{bd (just consumed {c} which is also an unknown placeholder)

Do you mean this is the result of adapting the Caddyfile to JSON?

but I have the feeling there are no well-defined semantics at all.

You are right... I will readily admit the Caddyfile and placeholders are the weakest parts of my design 😅, so I really appreciate this feedback as I think we can make great improvements here! 👍

What's your thoughts on allowing placeholders in defaults again? They'd be super useful for use cases as described above!

This should be fine, yeah -- but let's set a limit on max recursion. Maybe 2?

I guess this would also include something like {foo:{bar:{batz}}} -- comes for free with the recursion described above

LGTM

I guess this even somewhat mixes well with environment variables, as these get replaced first (haven't tried yet, though)

Perhaps, yeah, since they're just a literal substitution (the {$ENV} form, anyway).

Is there anything that would consider a good example of current "valid" uses (with JSON, ...), to add those as test cases?

I think the tests with JSON objects which you've already added is about as good as it gets: other JSON syntax doesn't look like placeholders.

Very cool use cases by the way, we're thrilled that M-B is using Caddy! There's a couple other large companies using Caddy in a similar fashion: Stripe (an enterprise sponsor), and the other one I'm not at liberty to say which at the moment. You're in good company though :)

@JensErat
Copy link
Author

JensErat commented Dec 29, 2022

I think this is what I'd expect: first open brace, should match until first close brace, regardless of braces inside; as currently placeholders can't be nested. (That may change, as we talk about using placeholders as defaults, and thus recursive parsing. We'll need a limit on the recursion I think!)

Exactly this is where I realized there are some issue with the current behavior. it's really hard to reasonably implement "recursive" defaulting like this. Implementation-wise, a "greedy" nesting would be the most simple, and I think also the one easiest to understand (and probably what most people would expect). I'm unsure in how far this might break current use cases, although I have a pretty hard time trying to imagine some.

Do you mean this is the result of adapting the Caddyfile to JSON?

So no, this happened when I tried adapting for nested defaults.

This should be fine, yeah -- but let's set a limit on max recursion. Maybe 2?

The recursion doesn't add relevant new cost, and we already have the pretty special "100 unmatched parenthesis" limit. I'd rather align those, and allow a decent number of alternatives. If people start templating the Caddy configuration for some weird use cases, you'll quickly reach larger amounts of nested defaults that don't really hurt.

In fact: the code is not 100% complete yet (I pretty much rewrote the scanner/parser part of the function and was pretty surprised how few test cases I broke, still trying to adopt the behavior as much as possible), but I'm pretty sure that recursive parsing will not add new worst-case processing cost compared to the input length than we have already, but of course we add some function calls on the stack.

I think the tests with JSON objects which you've already added is about as good as it gets: other JSON syntax doesn't look like placeholders.

No kudos to me, they've already been there (and bugged me). ;) Test coverage actually is pretty decent already, otherwise I probably wouldn't have been brave enough to try and rewrite half of the function.


You're giving me some confidence that finishing the approach I'm currently testing is reasonable (tomorrow, it's already 7pm local time in Germany). I'll add the changes as a second commit on this PR, so we can easily undo if required.

If the code and approach seems fine, I'll add documentation updates, too.

@JensErat
Copy link
Author

JensErat commented Dec 30, 2022

I pushed some updated code. I'd still consider this work in progress with respect to following:

  • "JSON recognition" (quotes in keys) not implemented yet, but this is kind of trivial
  • last two test cases need to be fixed/adjusted after implementing above
  • I completely removed the "too many nested braces" error handling. I guess this was introduced because lots of opening braces can result in O(n^2) runtime on the input length. I'm confident the rewritten braces recognition and slightly changed braces matching semantics (I don't see how anybody could reasonably have used them different anyway) enables us to have an optimization to skip lots of characters and get down to something like 2*O(n) without such erroring out. If the general approach looks fine, I'll either implement this optimization or otherwise reintroduce the open braces counter, if my assumption proves wrong.
  • some cleanup in code and commits
  • docs update

Currently (on master), we have a pretty weird semantics around handling escapes. We always change \{ to {, but only change \} to } if there is any opening brace (whether escaped or not). We do not support escaping escapes, so we cannot provide an output like \{ (input would be \\{). To be consistent (and have a kind of clean implementation) I changed the behavior here. I understand if this is a bad idea because we might break something for some users, but on the other hand the current behavior feels unexpected and more like a bug. What's your thoughts? I'm still only handling escapes for the special characters {}\.

I'm sorry for bugging you with that many questions for such few lines of codes, but I'm really hesitating at just changing stuff that might be considered backwards incompatible.

@francislavoie
Copy link
Member

francislavoie commented Dec 30, 2022

I think the "too many braces" thing was the because we had a fuzzer that exploded because of it.

@mohammed90 being our resident fuzzing expert, how do we re-run that fuzzer to make sure it still behaves okay? Edit: Mohammed is too busy to look into this now, but he linked me to #4170 which has the reproducer.

Regarding changing escape semantics, I agree the existing behaviour is messy so I think we should rip off the bandaid and change it even if it might cause breakages. We'll put a prominent breakage warning in the release notes.

No worries about the question, your attention to detail is very much appreciated! Thank you!

@mholt
Copy link
Member

mholt commented Jan 2, 2023

@JensErat (Thanks for working on this; I meant to reply earlier, but got delayed with all the holiday things!)

I completely removed the "too many nested braces" error handling. I guess this was introduced because lots of opening braces can result in O(n^2) runtime on the input length.

Yeah; as Francis said, our fuzzer caught a problem there. I still question the value of that particular fuzz because our input is trusted (unless a glitchy script generated and deployed a ridiculous config).

If the general approach looks fine, I'll either implement this optimization or otherwise reintroduce the open braces counter, if my assumption proves wrong.

Sounds great to me! I never loved my patch anyway.

Currently (on master), we have a pretty weird semantics around handling escapes.

You're right, and again this was lazy design on my part.

We always change { to {, but only change } to } if there is any opening brace (whether escaped or not).

Yeah, I figured we don't need to escape } unless we're inside a placeholder, since it's not significant outside a placeholder. (But the "whether escaped or not" thing might be a bug... hm.)

We do not support escaping escapes, so we cannot provide an output like { (input would be \{).

Oops. 😇 I knew that was a limitation of my current implementation but didn't know if it was needed/useful.

To be consistent (and have a kind of clean implementation) I changed the behavior here. I understand if this is a bad idea because we might break something for some users, but on the other hand the current behavior feels unexpected and more like a bug. What's your thoughts?

This sounds like a bug fix to me. I think people expect escaping to work like you're describing. We should fix it.

I'm sorry for bugging you with that many questions for such few lines of codes, but I'm really hesitating at just changing stuff that might be considered backwards incompatible.

Not at all! This is a very welcomed contribution, and a refreshing update to some very old code. (I didn't rewrite this code with Caddy 2, so this is mostly original pre-v1 logic from ~7 years ago.)

PS. You might know this already, but I'm presenting to your company later this month about Caddy!

@JensErat
Copy link
Author

JensErat commented Jan 4, 2023

I just completed the code. Getting all the bits together was a little more involved than originally expected, slightly more complex and we definitely should have embedded a parser generator... The new replace function now mostly has these steps (earlier, these phase have been a little more interleaved):

  • scan for placeholders (using the three scanner loops)
  • apply the primary placeholder
  • if no value found, try to (recursively) apply a default placeholder if it exists
  • handle unknowns
  • apply functions
  • handle empty values

We have lots of new tests now, ie. I also added results for ReplaceAll to cover those empty value handling branches. I guess the easiest approach in checking the new semantics is looking at the unit test inputs and expected outputs.

Checking out the benchmarks, my code is generally a little bit slower, but always just a constant factor. I guess most of the extra compute time would already been there if we'd just fix what was considered a bug above.

The one exemption where it got much more expensive is the no_placeholder case, which comes because of the removed "if there is no brace, we can skip entirely" part (because of the unescaping). I guess we actually could do something like this, but'd have to also check for backslashes using a regular expression or two scans. Do you think this is worth the effort for normal operations? The implementation would be kind of trivial.

If you agree to the changes, I'd continue with changes the documentation (and get the linter to agree). If there's any feedback or request for changes (I'm sure there are still some rough edges with deeper Caddy knowledge applied than I have), I'm happy to follow up.


If the general approach looks fine, I'll either implement this optimization or otherwise reintroduce the open braces counter, if my assumption proves wrong.

Sounds great to me! I never loved my patch anyway.

Successfully added the optimization. Should reduce the runtime cost searching for closing braces to something like from O(n^2) for very weird input like {{{{{{{{{{{{{{{{{{{{ to O(n*m) (with n being the number of input characters and m the number of actual placeholders, at the expense of a few comparisons and a single new integer. This is a separate commit, given this is a kind of weird logic. Although I pretty much had the idea on how to do this I required a full sheet of paper to finally come up with the solution; should we add an explanation somewhere (would probably be a multiline comment if we want this)?

Yeah, I figured we don't need to escape } unless we're inside a placeholder, since it's not significant outside a placeholder. (But the "whether escaped or not" thing might be a bug... hm.)

Oops. innocent I knew that was a limitation of my current implementation but didn't know if it was needed/useful.

I assume both are rather special cases have rather little use. But you never know. :) Honestly -- I just came up with those putting together edge case unit tests...

PS. You might know this already, but I'm presenting to your company later this month about Caddy!

Would you mind giving me details in private mail (or ask your other contact at Mercedes-Benz to ping me)? Maybe I'll join in if I can make it.

@mholt
Copy link
Member

mholt commented Jan 4, 2023

@JensErat Wow, this is impressive. Thank you very much - our hats off to you!

I'll start by replying to some comments, then I'll review the semantics in the tests and go from there.

Checking out the benchmarks, my code is generally a little bit slower, but always just a constant factor. I guess most of the extra compute time would already been there if we'd just fix what was considered a bug above.

Do you think it is a negligible constant? Placeholders eval is considered a hot path, so I just want to check :)

The one exemption where it got much more expensive is the no_placeholder case, which comes because of the removed "if there is no brace, we can skip entirely" part (because of the unescaping). I guess we actually could do something like this, but'd have to also check for backslashes using a regular expression or two scans. Do you think this is worth the effort for normal operations? The implementation would be kind of trivial.

Hmm. "No placeholders" is definitely the most common scenario: the vast majority of inputs that support placeholders don't use placeholders, so it's useful for that path to be as fast as possible. How much is the slowdown here?

(In my experience I find manual for loops to be faster than even simple regular expressions, but if you're benchmarking I'll trust that.)

Successfully added the optimization. Should reduce the runtime cost searching for closing braces to something like from O(n^2) for very weird input like {{{{{{{{{{{{{{{{{{{{ to O(n*m) (with n being the number of input characters and m the number of actual placeholders, at the expense of a few comparisons and a single new integer. This is a separate commit, given this is a kind of weird logic. Although I pretty much had the idea on how to do this I required a full sheet of paper to finally come up with the solution; should we add an explanation somewhere (would probably be a multiline comment if we want this)?

That sounds like a super valuable addition to the code! Yes, that'd be fantastic if you have the time. :D Then we won't lose all that work you put into the reasoning/logic.

Would you mind giving me details in private mail (or ask your other contact at Mercedes-Benz to ping me)? Maybe I'll join in if I can make it.

Yes, absolutely -- I'll send you an email.

@JensErat
Copy link
Author

JensErat commented Jan 5, 2023

Integrating the modified code, we stumbled over some issue. We're trying to determine a redirect target either based on a custom header value or otherwise the usual X-Forwarded-For header, ie. using a placeholder like this: {http.request.header.X-Redirect-Url:https://{http.request.header.X-Forwarded-Host}/.well-known/access/login}.

What happens now: if X-Redirect-Url is not set and the left hand side ("normal") placeholder is evaluated, we run into addHTTPVarsToReplacer:

if strings.HasPrefix(key, reqHeaderReplPrefix) {
        field := key[len(reqHeaderReplPrefix):]
        vals := req.Header[textproto.CanonicalMIMEHeaderKey(field)]
        // always return true, since the header field might
        // be present only in some requests
        return strings.Join(vals, ","), true
}

The issue is the return statement: no matter whether we actually do find a value, we always respond with "we found a value". The replacer code now has a hard time deciding whether to use the default or not (the empty string well might be the replaced content).

The comment doesn't really help me understand why we return true here. I guess this is some kind of legacy defaulting to be able to just discard placeholders of non-existing headers/query paramters/... (we now could do something like {http.request.header.X-Redirect-Url:}, but this probably is a pretty breaking change of course).

Maybe this is (or can be) already completely handled by replacer()'s treatUnknownAsEmpty flag?

@francislavoie
Copy link
Member

francislavoie commented Jan 5, 2023

Hmm, you're right, that true + comment probably no longer makes sense after this refactor.

I think it was meant to make sure that ReplaceKnown does replace placeholders like {header.*} (Caddyfile placeholder shorthand) even if the header value doesn't exist, because otherwise let's say someone has a directive like respond "Value: {header.Foo}" then if Foo was not set, the response would literally be Value: {header.Foo} rather than Value: which "reveals information about the program" in a sense.

It's definitely tricky and subtle. Not sure what to say. I think it's probably best to make the change here and just document the change, most notably mention that it could change behaviour in places where ReplaceKnown is used (from a quick search, it mainly seems like respond and header directives, when a header placeholder is used. Probably low impact.

Edit: Oh, also ReplaceOnErr is affected it looks like. But looking at its usages, I don't think any of those places ever make sense to use a header placeholder. So that can be ignored I think.

@francislavoie
Copy link
Member

I just realized that #5154 is related, regarding the discussion surrounding ReplaceKnown.

@francislavoie
Copy link
Member

I just remembered we merged in a change in https://github.com/caddyserver/transform-encoder/blob/44f7460143b76360e74b5c3e8622afd6bc115bdf/formatencoder.go#L120 (it's a plugin, not in the main Caddy repo) which has similar default values support. I think that implementing it here might break it there? I'm not sure. We'll need to test that to see how it behaves. Hopefully it still just works and makes the : handling in that plugin obsolete.

@JensErat
Copy link
Author

Do you think it is a negligible constant? Placeholders eval is considered a hot path, so I just want to check :)

Hmm. "No placeholders" is definitely the most common scenario: the vast majority of inputs that support placeholders don't use placeholders, so it's useful for that path to be as fast as possible. How much is the slowdown here?

Less than 2 I guess. But thinking about it again, maybe we actually can add a similar optimization again, but we'd have to check for { and \ instead of only { (if there are neither backslashes nor opening braces, we won't modify closing braces anyway). I'll have a look at how much this improves on running on strings without placeholder, I guess it's worth the extra loop. I'll compare both regex+looping.

It's definitely tricky and subtle. Not sure what to say. I think it's probably best to make the change here and just document the change, most notably mention that it could change behaviour in places where ReplaceKnown is used (from a quick search, it mainly seems like respond and header directives, when a header placeholder is used. Probably low impact.

As an alternative (and less intrusive/breaking change) we might well pass a "default to empty" flag, which we can use (in replacer.go). This feels like adding a little more legacy, though. If we're already adding a kind of breaking change, maybe we should just get everything cleaned up. Changing

type ReplacerFunc func(key string) (any, bool)

to

type ReplacerFunc func(key string, defaultToEmpty bool) (any, bool)

and then use this to determine what to return, from (kindly ignore the different fields, I was lazy and copy-pasted from the directly following similar code blocks instead of getting the diff):

			if strings.HasPrefix(key, reqURIQueryReplPrefix) {
				vals := req.URL.Query()[key[len(reqURIQueryReplPrefix):]]
				// always return true, since the query param might
				// be present only in some requests
				return strings.Join(vals, ","), true
			}

to

			// request header fields
			if strings.HasPrefix(key, reqHeaderReplPrefix) {
				field := key[len(reqHeaderReplPrefix):]
				vals, found := req.Header[textproto.CanonicalMIMEHeaderKey(field)]
				return strings.Join(vals, ","), defaultToEmpty || found
			}

I'd toggle the boolean based on whether the variable has a default or not.

@francislavoie
Copy link
Member

francislavoie commented Jan 10, 2023

Hmm, changing ReplacerFunc would be an API break instead of a config break. I lean towards a config break instead because if any plugins use ReplacerFunc then they could only work with Caddy versions before the one this is merged in, until they're updated, and then the new version of the plugin only works with Caddy versions including and after this is released. The risk of breaking configs is probably quite low compared to the possible plugin impact.

For sake of discussion, what would be acceptable as an API change is a new function to replace it called something like ReplacerDefaultFunc then we call the original deprecated, etc. But I don't think this is worth it because like you said it adds legacy etc.

@mholt
Copy link
Member

mholt commented Jan 10, 2023

One trick for adding a function parameter in a non-breaking way is to make it variadic:

type ReplacerFunc func(key string, ...defaultToEmpty bool) (any, bool)

(or something like that)

Obviously this is not ideal, but if we are concerned about breakage, this gives us time to prepare any plugins that may be using this and forewarn them to support another parameter.

Before we go further down this road though, I'd like to talk about that return statement again:

if strings.HasPrefix(key, reqHeaderReplPrefix) {
        field := key[len(reqHeaderReplPrefix):]
        vals := req.Header[textproto.CanonicalMIMEHeaderKey(field)]
        // always return true, since the header field might
        // be present only in some requests
        return strings.Join(vals, ","), true
}

So yes, the original intent there was to remove the placeholder in cases where the header isn't present; however, with further experience, I'm not sure if that's always the right behavior, especially if we are implementing default values. If we implement default values, we can always evaluate and remove/replace the placeholder, so we don't always have to return true.

If we change that to return false would that make things easier?

@francislavoie
Copy link
Member

we can always evaluate and remove/replace the placeholder, so we don't always have to return true.

Of course this precludes invalid placeholder-like text, such a JSON, which should not be replaced at all.

Using JSON in header values (which are processed with ReplaceKnown) is a common usecase.

Just want to make sure that's covered. We probably should have a test for the header handler to make sure that continues to work, I think.

@JensErat
Copy link
Author

I now:

  • resolved the linter issues
  • added the comment on why and how the optimization works
  • re-added the optimization for strings that do not contain placeholders, which definitely improved performance again. The assumption on regular expressions was pretty valid: When benchmarking an approach using regular expressions, the "return early" optimization was actually slowing down the entire code.
  • finally I changed the HTTP header, cookie, ... behavior to return false if reasonable. Please especially check the HTTP module changes with special care, these are pretty hard to test (even if there would have been tests for this, they would just resemble the same semantics). I removed one test case which doesn't feel reasonable any more.

Of course this precludes invalid placeholder-like text, such a JSON, which should not be replaced at all.

Using JSON in header values (which are processed with ReplaceKnown) is a common usecase.

Further above, we discussed just not applying defaulting to any placeholders where the key has quotes in it. This is implemented and tested here:

https://github.com/caddyserver/caddy/pull/5275/files#diff-ee2fe630c0af5186f990380915a1d4b9b2c15b24cb5bab9e13e87dafe700eb88R256-R257
https://github.com/caddyserver/caddy/pull/5275/files#diff-536dcb7bb2efdb54d66c54859f41d31aba8c0eefe173c9702c6565c4a0dce301R90-R94

Now it depends on which of the Replace...() functions is used.

If we change that to return false would that make things easier?

Just returning false would definitely not be the right thing. In most cases, it's just passing through the boolean result we get from the map lookup.

@mholt
Copy link
Member

mholt commented Jan 17, 2023

@JensErat Thanks for the excellent work. I saw this earlier but was waiting for CI, but now it appears that CI is stuck. Does anyone know how to jiggle it? I don't even see controls to run them manually for this PR anymore.

This change is looking good. I'll review it as soon as I have a chance. Busy preparing a presentation for your company :)

@mohammed90
Copy link
Member

it appears that CI is stuck. Does anyone know how to jiggle it? I don't even see controls to run them manually for this PR anymore.

Github Action workflows don't run if a conflict exists (see https://github.com/orgs/community/discussions/26304#discussioncomment-3251303). Resolving the conflict should trigger the CI run.

@mholt
Copy link
Member

mholt commented Jan 26, 2023

Ah, thanks Mohammed.

@JensErat After a brief few days vacation, I've returned to the office and am ready to do a final review and merge this, whenever you're ready! 👍 You might be more qualified to resolve the merge conflicts. Let me know if I can help though.

@francislavoie
Copy link
Member

Another thing I spotted, we might want to remove this Count condition in the vars matcher to allow recursive placeholders like {foo:{bar}}.

strings.Count(key, "{") == 1 {

There might be some similar code in other places. Searching for "{" might find some more.

Support variable expansion also in placeholders, like in
`{unsetPlaceholder:default value}`. Implemented similarly to caddyserver#3682,
which provided defaults for environment variable expansion.

Closes caddyserver#1793.

Signed-off-by: Jens Erat <[email protected]>
If we have a string `\}`, we'd expect this to be interpreted as `}`, similar to how `\}\{` would be interpreted as `}{`. If we optimize early by searching for opening braces, we'd not strip the escape character here.

Signed-off-by: Jens Erat <[email protected]>
The old parser logic was quite complicated and does not support nested placeholder evaluation for defaults, as does not support counting closing braces.

I assume the given approach using two loops and switch statements is more readable, and changes braces matching behavior to support nested defaulting provided in another commit.

Signed-off-by: Jens Erat <[email protected]>
@JensErat
Copy link
Author

JensErat commented Feb 8, 2023

I was pretty busy, I'm sorry for the late reply.

@francislavoie I have a somewhat hard time understanding how this code is used. It seems there is some relation to the cel matcher; but when I add a test case (which fails) and start debugging I don't yet see this code executed. So it seems we found like two more issues:

  • cel matcher needs to get defaulting-aware; I don't know this library yet, any thoughts here?
  • do we already have any unit tests for the vars.go#169 code you described above that could be extended for the placeholder defaults?

@francislavoie
Copy link
Member

francislavoie commented Feb 8, 2023

Yeah, the CEL code is pretty arcane. Sorry about that.

It gets called here: https://github.com/mercedes-benz/caddy/blob/66aae7049addd406698b37ef2c8b878617e82981/modules/caddyhttp/celmatcher.go#L207

To explain what happens internally, we hook into the CEL compiler to transform {path} placeholders into a function that reads caddyPlaceholder(request, "path"), with a regexp replacement, and then we register caddyPlaceholder as a function inside the CEL context which uses the request input to the CEL expression and calls the replacer to get the actual value.

Note we use this regexp to match placeholders in CEL expressions (bottom of celmatcher.go):

placeholderRegexp    = regexp.MustCompile(`{([a-zA-Z][\w.-]+)}`)

This obviously won't match : because of [\w.-], and it won't handle nested {}. I'm not sure how we should handle that. The regexp will obviously need to be a lot smarter.

Edit: {([a-zA-Z][\w.:{}-]+)} might work, but I don't think it'll be enough to cover all the kinds of defaults. Should default values allow special chars? What about spaces etc?

It looks like repl.Get() doesn't support the default value logic right now, it seems like it only works through the replace() codepath. I guess that needs some more work.

@JensErat
Copy link
Author

JensErat commented Feb 8, 2023

The default will allow pretty much anything, including escaped closing braces. I'm pretty sure this is nothing an arbitrary complex/fancy regular expression will be able to support: we'll have to catch anything in braces and analyse it. I guess we can well support it; after all a placeholder with default will have a value at any time.

I'll look into that tomorrow.

@mholt
Copy link
Member

mholt commented Feb 24, 2023

@JensErat Oh hey, I just wanted to check in and see if I can help us get the tests passing and merge this in :)

@mholt
Copy link
Member

mholt commented May 11, 2023

Hey Jens, I'm going to push a commit where I hope I resolved the merge conflict successfully through a merge. Apologies if this creates a mess after pushing 😆

@mholt
Copy link
Member

mholt commented May 11, 2023

Oops, just kidding, I don't have permission to push to your fork.

I guess it's up to you to finish this up 🙏 Let me know how I can help!

@mholt mholt removed this from the v2.7.0 milestone May 11, 2023
@mholt
Copy link
Member

mholt commented May 15, 2023

@JensErat I'll be tagging 2.7 beta 1 in the next day or so. I'm happy to get this change in that, or into a later beta release if needed. Are you still interested in finishing this?

@JensErat
Copy link
Author

Hi, I'm so sorry for ghosting you here. I was pretty busy with some trips (out of home for 6.5/9 weeks in a row...) and currently preparing to relocate within my city. I definitely want to finish this. I guess most work should actually be done, but the integration in the expression library interrupted me. I guess some help with the library would help at speeding up the remaining tasks. Do you think we can have a chat/call on what's the best approach here?

I'll also rebase the branch soon and fix any conflicts.

@francislavoie
Copy link
Member

No worries @JensErat thanks for following up! No rush, we just wanted to make sure we didn't lose you 😅

I can hop on a call as well to discuss (I can probably provide more context on the CEL stuff). I think Matt can help you schedule a time. I can make it pretty much anytime this week before Friday or next week after Monday.

@mholt
Copy link
Member

mholt commented May 16, 2023

@JensErat That's awesome, hope you had a good time on your trips! And I know the busy-ness of moving, it's a lot of work. Hope it goes well for you!

Yeah, Francis is one of our primary CEL maintainers, and I am also happy to chat sometime. You can schedule a time with me at https://matt.chat :)

@francislavoie
Copy link
Member

francislavoie commented Jan 13, 2024

@JensErat will you have time soon to finalize this? I'd love to get this finished up!

I'd be willing to rebase, but we're not able to because this PR is under your mercedes-benz fork which doesn't allow us to commit.

@mholt
Copy link
Member

mholt commented Mar 7, 2024

I'll try to circle back to this change this year and see if I can finish it up :) (Or anyone else is welcome to I suppose.) We'll make sure Jens is credited

MrStanana pushed a commit to MrStanana/OpenATBP that referenced this pull request Mar 19, 2024
TODO: dynamic env vars with defaults in caddy.json. See caddyserver/caddy#5275
MrStanana added a commit to MrStanana/OpenATBP that referenced this pull request Mar 19, 2024
TODO: dynamic env vars with defaults in caddy.json. See caddyserver/caddy#5275
@mholt
Copy link
Member

mholt commented Sep 26, 2024

Hey @JensErat , if the CEL stuff is a blocker, let's not worry about that for now. If this change is ready by your definition otherwise, I am happy to give this one last review and merge it in. (We could always add CEL support later.)

If I don't hear from you soon, I'll likely close this since I can't push to your branch. But we might salvage the changes and put them into our own branch and merge it in (with credit to you) if that's the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request under review 🧐 Review is pending before merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature-Request: Placeholder with default value
5 participants