Obfuscating logs before publishing



  • I sometimes have a temptation to report some entries from our logs on public fora like StackExchange to get help form the wider community on isolating root causes.

    However, I rather refrain from that because they may reveal some details about my employer. This is particularly the case for:

    • error messages in stacktraces
    • emails, IPs, algorithms used in email headers

    Do you obfuscate such details before publishing them?

    Can you recommend any automated obfuscators?



  • When publishing data into the public you should always anonymize the data, not only log-files. Also e-mails, documents, etc...

    You could try the open-source Python based log-anon.

    This tool was designed to replace sensitive fields in customer's logs with anonymized values, while generating a lookup table. This is sometimes useful to comply with the following requirements:

    • Logs are stored for research or training purposes after the log analysis job has been completed, and no data that allows identification of the customer should be present.
    • Privacy laws (ex. informatique et libertés) or regulations prevent the customer from giving over sensitive or personal data.

    Look at the examples, it looks promising. You should be able to make a JSON definition of all the stuff you want to anonymize and keep re-using it on your log-files. Be sure to manual check some parts of the file before sending it out. Checking everything might be impossible for extremely large log-files.


Log in to reply
 

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2