< MariaDB

MariaDB/PII

Sanitize the Wiki Data

Note: As of Jun 2024 - the default section for new wikis is s5 but this might not be the case in the future.

Go to the sanitarium hosts in both data centers and run the following command to clean up the data:

redact_sanitarium.sh -d $NEW_WIKI_NAME -S /run/mysqld/mysqld.s5.sock | mysql -S /run/mysqld/mysqld.s5.sock

Run the Private Data Check Script

Execute the check_private_data.py script to identify which table/columns need to be dropped:

check_private_data.py -S /run/mysqld/mysqld.s5.sock

If the output makes sense, you can pipe it directly to MySQL to drop the necessary table/columns:

check_private_data.py -S /run/mysqld/mysqld.s5.sock | mysql -S /run/mysqld/mysqld.s5.sock

Verify Changes on Each clouddb Replica

Ensure that you run check_private_data.py on each clouddb replica after the whole process on your side is done to avoid leaking Personally Identifiable Information:

check_private_data.py -S /path/to/socket

Grant Permissions for SQL Views

Identify the clouddb hosts that belong to the relevant database section (e.g., s5). Grant the labsdbuser role the appropriate grants by running:

GRANT SELECT, SHOW VIEW ON `NEW_WIKI_NAME_p`.* TO `labsdbuser`;

This needs to be done on all clouddb hosts that have the relevant database section.

Create the View Database

Create the view database on all the clouddb hosts that belong to the relevant database section:

CREATE DATABASE NEW_WIKI_NAME_p;

Notify Relevant Teams

Once the database is sanitized and the view database is created, notify the cloud-services-team and relevant members that they can proceed with creating the views.

clouddb hosts won't stay the only servers that needs sanitization as an-redacteddb1001 is scheduled to be productionnized: https://orchestrator.wikimedia.org/web/cluster/alias/s1
This article is issued from Wikimedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.