Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masking service to allow logging private info #17075

Open
MikeAlhayek opened this issue Nov 26, 2024 · 7 comments
Open

Masking service to allow logging private info #17075

MikeAlhayek opened this issue Nov 26, 2024 · 7 comments
Milestone

Comments

@MikeAlhayek
Copy link
Member

Add a service and/or static methods that would allow us mask and log private information like username, userid, email, etc.

If the username is malhayek, log it as ma*****ek with a preset number of *. If the username is too short, we can show the first or character. This will allow us to log info that could be helpful but not to log it in plain text.

@MikeAlhayek MikeAlhayek changed the title Loging private info Marking service to allow loging private info Nov 26, 2024
@hishamco hishamco changed the title Marking service to allow loging private info Marking service to allow logging private info Nov 26, 2024
@Piedone
Copy link
Member

Piedone commented Nov 27, 2024

Why would we mask user IDs? Those are internal identifiers that only relate to personal information if you have access to the DB or at least the admin area too (at which point you have access to all private information anyway)

@gvkries
Copy link
Contributor

gvkries commented Nov 27, 2024

Everything you can correlate to identify someone, even if you need additional data, is PII in GDPR. E.g. this applies to IP addresses as well. You can only identify a person with the help of the ISP, but that is enough. Maybe even if we hash or shorten parts of it, it is still possible to correlate back to the original user, so it may still be PII.

@Piedone
Copy link
Member

Piedone commented Nov 27, 2024

Alright, but we still somehow need to know sometimes which user did something, just as Audit Trail records who the user was. E.g. for login failures not knowing the user can be prohibitive in debugging the issue.

I don't think it's actually a problem to store personal data in the logs on the level of user IDs/usernames for cases where it's necessary (and perhaps only at certain log levels like debug, on-demand), given you handle the logs (disclosure in privacy policy, access control, ability to delete...) as you handle the SQL DB with all the other personal data, what you should be doing in any case. GDPR doesn't prohibit you from storing such data, just sets out rules on how you should do it.

And at that point, masking becomes moot because if you mask properly, then you can't correlate the logs with users, but then it's useless to log the masked data in the first place. (Note that I'm not talking about logging truly sensitive data like passwords, birthdates, or anything like that, just basically anything that will allow you to know, even if with the help of the DB, who did something.)

@sebastienros
Copy link
Member

How do you secure your logs? How are they stored and replicated, who can access them. How long to you conserve them? How do you delete them, if someone asks to remove their PII would you even be able to remove it from the logs?

Someone should find some literature because I am sure we could argue in every way for days ;)

@Piedone
Copy link
Member

Piedone commented Nov 28, 2024

These are all solved issues both with log files and Application Insights. I can't imagine other logging platforms can't do the same. If you follow GDPR (or any other similar law), and thus have proper access control for your DB with the ability to delete etc., then you can, and should, have the same for your logs (just as you have to have this for Media files, e-mails, and anything else).

Note though that I'm arguing for two things:

  • We don't need masking because if you want to log personal data, then you should be able to. It's possible in compliant ways even for production environments, but this can very well be a use case in non-production environments with test data too. For unwanted logging it could be useful to have full redaction, though this issue is not about that.
  • Where it's useful for troubleshooting, we should log user IDs (like here), on more granular log levels, perhaps only debug.

@sebastienros
Copy link
Member

Yes, it may be allowed if you know it is and you have all the way to do what you want. Does it mean OC is able to log it because it might be allowed? There are tools to configure it (mentioned in the meeting when you can watch the recording) but we shouldn't log these "as-is".

@Piedone Piedone changed the title Marking service to allow logging private info Masking service to allow logging private info Dec 1, 2024
@sebastienros sebastienros added this to the backlog milestone Dec 5, 2024
Copy link
Contributor

github-actions bot commented Dec 5, 2024

We triaged this issue and set the milestone according to the priority we think is appropriate (see the docs on how we triage and prioritize issues).

This indicates when the core team may start working on it. However, if you'd like to contribute, we'd warmly welcome you to do that anytime. See our guide on contributions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants