What information must never appear in logs?
- by MainMa
I'm about to write the company guidelines about what must never appear in logs (trace of an application). In fact, some developers try to include as many information as possible in trace, making it risky to store those logs, and extremely dangerous to submit them, especially when the customer doesn't know this information is stored, because she never cared about this and never read documentation and/or warning messages.
For example, when dealing with files, some developers are tempted to trace the names of the files. For example before appending file name to a directory, if we trace everything on error, it will be easy to notice for example that the appended name is too long, and that the bug in the code was to forget to check for the length of the concatenated string. It is helpful, but this is sensitive data, and must never appear in logs.
In the same way:
Passwords,
IP addresses and network information (MAC address, host name, etc.)¹,
Database accesses,
Direct input from user and stored business data
must never appear in trace.
So what other types of information must be banished from the logs? Are there any guidelines already written which I can use?
¹ Obviously, I'm not talking about things as IIS or Apache logs. What I'm talking about is the sort of information which is collected with the only intent to debug the application itself, not to trace the activity of untrusted entities.
Edit: Thank you for your answers and your comments. Since my question is not too precise, I'll try to answer the questions asked in the comments:
What I'm doing with the logs?
The logs of the application may be stored in memory, which means either in plain on hard disk on localhost, in a database, again in plain, or in Windows Events. In every case, the concern is that those sources may not be safe enough. For example, when a customer runs an application and this application stores logs in plain text file in temp directory, anybody who has a physical access to the PC can read those logs.
The logs of the application may also be sent through internet. For example, if a customer has an issue with an application, we can ask her to run this application in full-trace mode and to send us the log file. Also, some application may sent automatically the crash report to us (and even if there are warnings about sensitive data, in most cases customers don't read them).
Am I talking about specific fields?
No. I'm working on general business applications only, so the only sensitive data is business data. There is nothing related to health or other fields covered by specific regulations. But thank you to talk about that, I probably should take a look about those fields for some clues about what I can include in guidelines.
Isn't it easier to encrypt the data?
No. It would make every application much more difficult, especially if we want to use C# diagnostics and TraceSource. It would also require to manage authorizations, which is not the easiest think to do. Finally, if we are talking about the logs submitted to us from a customer, we must be able to read the logs, but without having access to sensitive data. So technically, it's easier to never include sensitive information in logs at all and to never care about how and where those logs are stored.