Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle event handler attributes #226

Open
lukewarlow opened this issue Apr 23, 2024 · 23 comments
Open

How to handle event handler attributes #226

lukewarlow opened this issue Apr 23, 2024 · 23 comments

Comments

@lukewarlow
Copy link
Contributor

Afaik the current plan for the sanitizer API is that there would be a list of all known attributes and from that you'd subtract an underlying blocklist of unsafe attributes, event handlers are tricky because there's lots of them. If we have a list of event handlers to remove it would be useful if it was defined separately from other attributes so that the trusted types specification could piggy back off of it.

Having said that all current ones follow the on* pattern, should we just carve out on* attributes as being unsafe and block them all? I'm no perf specialist but I imagine that would be faster to process (both WebKit and Chromium already have some paths that do the on prefix check for example).

@annevk
Copy link
Collaborator

annevk commented Apr 23, 2024

I think from a re-parsing perspective or letting something through in one browser that's blocked in another this would also be an improvement. Although it would also be a tiny bit of magic again that cannot currently be explained through syntax, but that seems reasonable.

@lukewarlow
Copy link
Contributor Author

Has their been any discussion about allowing stuff like regex in the configs? Not needed for a v1 but if we did allow some form of pattern based matching then it would solve the fact we can't currently define it in syntax.

@domenic
Copy link

domenic commented Apr 24, 2024

I worry that this restricts the space of future HTML attribute design in a way that's hard to understand. E.g. we could not use attributes named things like once="", only="", online="", ongoing="", onto="".

Maybe it would still be OK as long as we'd consider it a backward-compatible change to modify the sanitization algorithm in the future. E.g. today we'd strip all attributes starting with on, but in the future when we add an only="" attribute, we strip all attributes starting with on except we let only="" through.

@annevk
Copy link
Collaborator

annevk commented Apr 24, 2024

I think we do have the expectation that what is being sanitized and how will subtly change over time due to the HTML (including SVG/MathML) language evolving. New "safe" attributes getting safelisted seems in line with that.

@otherdaniel
Copy link
Collaborator

I'm not fond of this, and would prefer us to be precise with the event handler attributes.

With Trusted Types, we initially had a similar logic, but kept getting bug reports about sites using their own on* attributes that would get erroneously blocked by TT. So we eventually re-implemented this with the actual list of actual event handlers, which solves that set of problems. I'd prefer to not repeat that experience.

Examples:

Also, I don't quite understand why this would be needed at all: Our default-config -- where these matter -- are allow-lists, and in an allow-lists the event handlers are very much expected to be absent. That is, no on* attribute -- whether a browser-defined event handler or not -- should show up in any built-in config. So why would we care? Or why would it make a perf difference?

For the config pre-processing, we do need to decide whether an attribute is known or not -- otherwise one could never override the built-in default. That indeed requires a list of known attributes, including the known event handlers. But that's required only during config normalization; and that requires every single HTML attribute ever. I doubt the event handlers make that much of a difference there.

@annevk
Copy link
Collaborator

annevk commented Apr 24, 2024

Fair enough. List it is then.

@annevk
Copy link
Collaborator

annevk commented Nov 29, 2024

I do have some lingering worry here that each time a browser introduces a new event handler attribute, the sanitization won't be the same across browsers and this could result in some issues. How to square that with sites using their own on* attributes is unclear though.

@mozfreddyb
Copy link
Collaborator

If we solved this by disallowing and listing known event handlers in the "no XSS possible" mode and disallowing all handlers starting with "on" in the default, we'd basically allow browsers to bypass the default because a user might create a Sanitizer that allows a new onfoo attribute that the browser supports but Sanitizer is just not aware of yet.

What if we disallow all on* and hardcode exceptions for those known to be widely used? Seems also a bit icky imho.

What if we just disallow all on and tell people to use unsafe if they are unhappy with it? Luxury of designing a new API is that we can decide not just to support harmless on* attributes... (admittedly a bit dismissive of people's use cases)

@annevk
Copy link
Collaborator

annevk commented Nov 29, 2024

@cure53 how does DOMPurify deal with event handler attributes?

@cure53
Copy link
Collaborator

cure53 commented Nov 29, 2024

@cure53 how does DOMPurify deal with event handler attributes?

@annevk We work with an allow-list on which event handlers are simply not present. If folks decide they want to permit them, they can do so using the config, i.e. via ADD_ATTR.

Other than that, they receive no extra treatment, and get handled as any other attribute 🙂

@lukewarlow
Copy link
Contributor Author

lukewarlow commented Nov 29, 2024

What if we just disallow all "on*" and tell people to use unsafe if they are unhappy with it?

My personal thoughts are we should just do this. Google decided against it for trusted types (which now makes it difficult to change retrospecitvely) but I think we should just take the slight annoyance from certain websites and do it for sanitizer like should have been done for trusted types.

As you say worst case they can use unsafe if they really need their badly named attribute to work.

I think we (whatwg) should also agree to leave onXYZ to only event handlers and if someone wants some other attribute to start with 'on' we simply get them to rename it.

@otherdaniel
Copy link
Collaborator

As Luke said, Trusted Types initially blocked any "on*", but we got too many complaints about that and then changed to an actual list, which is derived from the IDLs.

Examples problems are:

I think Sanitizer has an easier time with this, because there's always the "Unsafe" variants. But that sure isn't pretty.

As it presently stands the defaults would be an allow list, where this isn't a problem because no event handler is going to be in the default allow list anyhow. But removeUnsafe, plus the implied removeUnsafe in safe versions, would make this observable.

@mozfreddyb
Copy link
Collaborator

A couple of the linked issues are not public. I hope we can get away with this by a) codifying that on is for event handlers and assumed unsafe and b) supporting this anyway with the unsafe methods :-)

I understand that we would be dismissive of existing code and use cases for custom attributes... 😕

@annevk
Copy link
Collaborator

annevk commented Dec 10, 2024

I'm a bit surprised that it seems the Trusted Types decision was made unilaterally by Chromium. Was there no discussion about this in a specification setting?

@mozfreddyb
Copy link
Collaborator

I mean, the alternative is to somewhere have a canonical list of event handler attributes but that seems tricky in two ways. First, it appears that there is no such list and the HTML spec might not be the best source to find all of them. Secondly, most implementations don't really know or can figure out at build time either.

@cure53
Copy link
Collaborator

cure53 commented Dec 10, 2024

I mean, the alternative is to somewhere have a canonical list of event handler attributes but that seems tricky in two ways. First, it appears that there is no such list and the HTML spec might not be the best source to find all of them. Secondly, most implementations don't really know or can figure out at build time either.

Historically, as far as I know, there was only one case where a non-event handler started with "on" and if I remember correctly, that was for the VML implementation in MSIE.

Other than that, I am not aware of any attributes starting with "on" that do not trigger JavaScript execution.

@mozfreddyb
Copy link
Collaborator

Well, see links above. Lots of frameworks do funky stuff with on= or one=, that do not lead to script execution (but may cause script gadget funkiness)

@lukewarlow
Copy link
Contributor Author

The current behaviour also makes it pretty hard to test trusted types coverage of event handlers (and would similarly be difficult for sanitizer). You can read from spec web idl but not all browsers implement all event handlers so that can show up failures that aren't failures. It also doesn't cover any non-standard event handlers, nor does it necessarily catch all existing ones if they're not on the IDL constructs you expect.

You can maybe loop through the element classes on the global and from their loop through their properties that start with 'on' and test against each of those, by converting the class name to a tag name creating the element and trying to set the attribute. Though just because the property exists on the prototype doesn't mean it actually executes script as a content attribute.

@otherdaniel
Copy link
Collaborator

I'm a bit surprised that it seems the Trusted Types decision was made unilaterally by Chromium. Was there no discussion about this in a specification setting?

The change was in May 2022, when no browser other than Chromium was implementing or interested in implementing Trusted Types. I don't think anyone else paid attention to the TT spec at that time.

I'd also like to clarify that this was an implementation change; not a spec change. The spec specified from the very beginning that TT would check "event handler content attributes". Which isn't correct spec language, but it surely doesn't mean "on*". Blocking "on*" was never allowed, per spec. (Old spec versions are preserved in https://github.com/w3c/trusted-types. I double checked for revisions going back to 2020.)

In the Chromium implementation I had taken a shortcut to just block "on*" because I didn't want to do the work to obtain a list of event handler names. Then, as TT was being picked up by users, we got a number of complaints about this. It became clear that my shortcut is untenable, and I got told to pretty please fix it. Which I did, and which is the change we're talking about here.

@otherdaniel
Copy link
Collaborator

The current behaviour also makes it pretty hard to test trusted types coverage of event handlers (and would similarly be difficult for sanitizer). You can read from spec web idl but not all browsers implement all event handlers so that can show up failures that aren't failures. It also doesn't cover any non-standard event handlers, nor does it necessarily catch all existing ones if they're not on the IDL constructs you expect.

I don't understand why this is hard to test. My usual test pattern is to pick a few block and a few non-block cases and test those. If we want to test exhaustively -- whatever that would even mean for the "on*" case -- I'd take a list of candidates and check whether they're event handlers to determine what the expected outcome under TT would be.

(Something like: div.setAttribute(probe, "return"); const expect_to_throw_with_tt = (typeof div[probe] == "function"); The underlying behaviour -- converting a string to a function, or not -- is spec mandated, I believe.)

I think for IDL tests, WPT runs tests against spec-derived baseline, and whoever does something differently will have to maintain a list of expectations themselves. That puts the maintenance burden on whoever does something different. That might also work.

I wish we could somehow re-use or hook into the webref / IDL tests. That seems like a fairly comprehensive and well-tested mechanism for testing APIs against a spec baseline. I'm admittedly not seeing an obvious way to do that.

I do understand the interop concern; but don't really understand the testing one.

@noamr
Copy link

noamr commented Dec 10, 2024

Can we rely on the globally exposed WebIDL interfaces somehow here? As in:

  • By default, in unsafe mode, reject all the on* properties that are listed in WebIDLs exposed on this global object
  • Allow some WebIDL keyword to opt-out of this behavior, e.g. [Safe] or [TrustMeThisIsNotAnEvent] or whatever if we ever want to introduce attributes like once
  • This wouldn't reject author-defined on*, because those don't come from WebIDL

@annevk
Copy link
Collaborator

annevk commented Dec 11, 2024

@otherdaniel FWIW, I have been paying attention since 2019 at least, from a cursory look at old issues in the relevant repository, but I did miss that the specification was already requiring that.

I've been trying to think of how to best decide on this issue. I think it comes down to how likely it is that sanitized content in one browser ends up being reflected in another. While that concern also impacts other features, event handler attributes are much more commonly introduced. Overall though that seems like a fairly minor concern and any website that does it that way will also run into other issues I suspect. So I think I'm leaning towards using the same backing list as Trusted Types.

@mozfreddyb
Copy link
Collaborator

We discussed this in our sanitizer meeting and found both approaches doable. However, we agreed on the following:

  • in the spec, use the lists of specified event handler attributes from HTML
  • in implementations, make sure that there is an exhaustive mechanism to get all the supported attributes on top of that
  • provide no defense-in-depth for authors that sanitize in one browser and then use the result in another browser, because we can only ever match the parsing/sanitization rules of the current browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants