Server Admin Log

2024-06-28

21:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
21:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy FORCED
21:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
21:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:11 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:05 sukhe: sudo cumin -b11 "A:cp-text" 'run-puppet-agent'
20:29 sukhe: sudo cumin "A:cp-text" 'disable-puppet "CR 1050672"'
20:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1028.eqiad.wmnet with OS bookworm
20:20 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1029.eqiad.wmnet with OS bookworm
20:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
20:01 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
20:01 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1029.eqiad.wmnet with OS bookworm
19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1028.eqiad.wmnet with OS bookworm
19:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
19:36 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1029
19:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
19:35 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1029
19:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1028
19:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1028
19:31 jclark@cumin1002: START - Cookbook sre.dns.netbox
19:31 jclark@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
19:31 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent --enable 'dont enable'": T368645
19:30 jclark@cumin1002: START - Cookbook sre.dns.netbox
18:22 sukhe: disable puppet on A:cp-text
18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
16:43 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2026.codfw.wmnet with OS bullseye
16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:19 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:03 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
16:03 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
15:45 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
15:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:25 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:21 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:21 andrewbogott: upgraded wikitech-static to 1_42 and php 8.3
15:14 hnowlan: homer 'cr*codfw*' commit 'T351074'
15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2027.codfw.wmnet with OS bullseye
15:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:11 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
15:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2028.codfw.wmnet with OS bullseye
15:10 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1027.eqiad.wmnet|wikikube-worker1028.eqiad.wmnet|wikikube-worker1029.eqiad.wmnet|wikikube-worker1030.eqiad.wmnet|wikikube-worker1031.eqiad.wmnet),cluster=kubernetes,service=kubesvc
15:10 claime: Pooling and uncordoning wikikube-worker1027.eqiad.wmnet,wikikube-worker1028.eqiad.wmnet,wikikube-worker1029.eqiad.wmnet,wikikube-worker1030.eqiad.wmnet,wikikube-worker1031.eqiad.wmnet - T351074
15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2029.codfw.wmnet with OS bullseye
15:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2025.codfw.wmnet with OS bullseye
15:06 jhathaway: mx-in1001 postfix mx testing complete
15:04 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:00 claime: homer 'cr*eqiad*' commit 'T351074'
14:56 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
14:54 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
14:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
14:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2029.codfw.wmnet with OS bullseye
14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2028.codfw.wmnet with OS bullseye
14:29 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2027.codfw.wmnet with OS bullseye
14:28 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2025.codfw.wmnet with OS bullseye
14:27 sukhe: sudo cumin "O:durum" "run-puppet-agent"
14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2330 to wikikube-worker2029
14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2029
14:26 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2029
14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
14:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:25 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
14:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:22 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb1002.eqiad.wmnet with OS bookworm
14:22 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:22 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2330 to wikikube-worker2029
14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2308 to wikikube-worker2028
14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2028
14:22 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2028
14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
14:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:12 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
14:08 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:07 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2308 to wikikube-worker2028
14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2306 to wikikube-worker2027
14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2027
14:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:07 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2027
14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
14:06 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
14:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
14:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
14:04 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:03 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
14:01 jhathaway: ingressing email on mx-in1001, initial test 1hr
14:00 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:00 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2306 to wikikube-worker2027
13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2298 to wikikube-worker2025
13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2025
13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2025
13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
13:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
13:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw2300 to wikikube-worker2026
13:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2300 to wikikube-worker2026
13:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
13:45 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2298 to wikikube-worker2025
13:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
13:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
13:42 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:42 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:42 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:42 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
13:42 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:42 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:41 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bookworm
13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
13:38 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
13:38 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
13:38 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
13:37 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host krb1002.eqiad.wmnet with OS bookworm
13:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
13:29 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
13:28 hnowlan: running `decommission` on 5 codfw api appservers
13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
13:24 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
13:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:12 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
13:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
13:01 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bookworm
12:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65547 and previous config saved to /var/cache/conftool/dbconfig/20240628-125926-marostegui.json
12:55 hashar@deploy1002: Finished deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291 (duration: 00m 09s)
12:55 hashar@deploy1002: Started deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291
12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
12:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
12:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
12:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
12:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65546 and previous config saved to /var/cache/conftool/dbconfig/20240628-124419-marostegui.json
12:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1004
12:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
12:21 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
12:18 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
12:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
12:15 jclark@cumin1002: START - Cookbook sre.dns.netbox
12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65544 and previous config saved to /var/cache/conftool/dbconfig/20240628-121404-marostegui.json
12:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
12:05 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1031.eqiad.wmnet with OS bullseye
11:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
11:50 Dreamy_Jazz: Finished run on `medium.dblist`
11:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
11:45 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
11:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
11:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
11:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
11:38 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:29 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 44s)
11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:29 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:29 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
11:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1031.eqiad.wmnet with OS bullseye
11:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1031.eqiad.wmnet on all recursors
11:18 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1031.eqiad.wmnet on all recursors
11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1450 to wikikube-worker1031
11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1031
11:16 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1031
11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
11:16 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:14 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
11:13 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 25s)
11:13 Dreamy_Jazz: Running `foreachwikiindblist medium.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` for T366781. `medium.dblist` does not include `loginwiki` or `metawiki` (which are to be done later).
11:12 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
11:11 Dreamy_Jazz: `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` finished running
11:11 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1030.eqiad.wmnet on all recursors
11:11 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1030.eqiad.wmnet on all recursors
11:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1450 to wikikube-worker1031
11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1418 to wikikube-worker1030
11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1030
11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:07 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
11:04 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1030
11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
11:02 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
11:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:00 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1418 to wikikube-worker1030
10:59 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
10:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1029.eqiad.wmnet on all recursors
10:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1029.eqiad.wmnet on all recursors
10:58 Dreamy_Jazz: Running `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
10:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1417 to wikikube-worker1029
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1029
10:56 Dreamy_Jazz: Stopped running script at `cawiki`
10:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1029
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1417 to wikikube-worker1029
10:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.eqiad.wmnet on all recursors
10:51 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.eqiad.wmnet on all recursors
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1413 to wikikube-worker1028
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1028
10:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1028
10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
10:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1413 to wikikube-worker1028
10:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
10:45 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.eqiad.wmnet on all recursors
10:45 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.eqiad.wmnet on all recursors
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1412 to wikikube-worker1027
10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1027
10:44 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1027
10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
10:42 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1412 to wikikube-worker1027
10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
10:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
10:22 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
10:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve2007.codfw.wmnet
09:57 klausman@cumin2002: START - Cookbook sre.hosts.remove-downtime for ml-serve2007.codfw.wmnet
09:37 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:37 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:34 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65543 and previous config saved to /var/cache/conftool/dbconfig/20240628-082946-marostegui.json
08:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
08:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
07:54 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:16 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:52 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:45 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
00:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage

2024-06-27

23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
23:33 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
23:32 eileen: civicrm upgraded from 76c6fed8 to f9782670
23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65542 and previous config saved to /var/cache/conftool/dbconfig/20240627-231703-marostegui.json
23:05 Dreamy_Jazz: Running `foreachwikiindblist group0.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php` for T366781
23:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65541 and previous config saved to /var/cache/conftool/dbconfig/20240627-230156-marostegui.json
22:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65540 and previous config saved to /var/cache/conftool/dbconfig/20240627-224649-marostegui.json
22:44 eileen: civicrm upgraded from 7747a290 to 76c6fed8
22:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
22:43 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:33 eileen: civicrm upgraded from 3af41401 to 7747a290
22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65539 and previous config saved to /var/cache/conftool/dbconfig/20240627-223142-marostegui.json
22:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:19 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5024.eqsin.wmnet
22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:05 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
22:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
21:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:53 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1005
20:51 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1005
20:50 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:50 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
20:50 jhuneidi@deploy1002: Finished scap: Backport for gerrit:1050460testwiki: Enable QuickSurveys (T368459) (duration: 14m 33s)
20:49 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
20:47 vriley@cumin1002: START - Cookbook sre.dns.netbox
20:44 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
20:40 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:40 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
20:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
20:38 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for gerrit:1050460testwiki: Enable QuickSurveys (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:37 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:37 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:35 jhuneidi@deploy1002: Started scap: Backport for gerrit:1050460testwiki: Enable QuickSurveys (T368459)
20:34 jhuneidi@deploy1002: Finished scap: Backport for gerrit:1050441QuickSurveys: Add testing survey configuration (T368459) (duration: 14m 45s)
20:29 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
20:24 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:24 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:22 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for gerrit:1050441QuickSurveys: Add testing survey configuration (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:20 jhuneidi@deploy1002: Started scap: Backport for gerrit:1050441QuickSurveys: Add testing survey configuration (T368459)
20:17 jhuneidi@deploy1002: Finished scap: Backport for gerrit:1050432Enable DiscussionTools permalinks on enwiki (T365974) (duration: 11m 09s)
20:16 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5023.eqsin.wmnet
20:11 jhuneidi@deploy1002: jhuneidi, kemayo: Continuing with sync
20:08 jhuneidi@deploy1002: jhuneidi, kemayo: Backport for gerrit:1050432Enable DiscussionTools permalinks on enwiki (T365974) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 jhuneidi@deploy1002: Started scap: Backport for gerrit:1050432Enable DiscussionTools permalinks on enwiki (T365974)
20:03 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:03 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
19:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
19:53 ottomata: deleted mw-page-content-change-enrich stuck jobmanager pod: kubectl -n mw-page-content-change-enrich delete pod flink-app-main-859d98c57b-zrgwk - T368667
19:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bookworm
19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye
19:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
19:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
19:23 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
19:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
19:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
19:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
19:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bookworm
19:07 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
18:36 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5022.eqsin.wmnet with OS bullseye
18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
18:19 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
18:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
18:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bookworm
18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.11 refs T366956
18:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bookworm
18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:10 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5022.eqsin.wmnet
18:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bookworm
18:08 ejegg: fundraising civicrm upgraded from 13a13f3a to 43fc2c89
18:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bookworm
17:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bookworm
17:51 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet
17:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
17:45 swfrench@deploy1002: Finished scap: Deploying securityContext changes for T362978 to main release (duration: 04m 09s)
17:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
17:41 swfrench@deploy1002: Started scap: Deploying securityContext changes for T362978 to main release
17:39 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
17:34 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:33 swfrench-wmf: canary deployments are healthy, slow-logs still produced, continuing with main deployments for T362978
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:26 hashar@deploy1002: Finished deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)" (duration: 00m 07s)
17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)"
17:26 hashar@deploy1002: deploy aborted: Revert Add image-diff JavaScript plugin (take 2) (duration: 00m 00s)
17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert Add image-diff JavaScript plugin (take 2)
17:23 swfrench@deploy1002: Finished scap: (no justification provided) (duration: 08m 03s)
17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
17:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bookworm
17:17 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bookworm
17:14 swfrench@deploy1002: Started scap: (no justification provided)
17:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291 (duration: 00m 07s)
17:13 hashar@deploy1002: Started deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291
17:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
17:06 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
16:55 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
16:54 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
16:50 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
16:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65537 and previous config saved to /var/cache/conftool/dbconfig/20240627-163635-arnaudb.json
16:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1022.eqiad.wmnet|wikikube-worker1023.eqiad.wmnet|wikikube-worker1024.eqiad.wmnet|wikikube-worker1025.eqiad.wmnet|wikikube-worker1026.eqiad.wmnet),cluster=kubernetes,service=kubesvc
16:35 claime: Pooling and uncordoning wikikube-worker1022.eqiad.wmnet,wikikube-worker1023.eqiad.wmnet,wikikube-worker1024.eqiad.wmnet,wikikube-worker1025.eqiad.wmnet,wikikube-worker1026.eqiad.wmnet - T351074
16:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
16:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
16:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65536 and previous config saved to /var/cache/conftool/dbconfig/20240627-162129-arnaudb.json
16:18 claime: homer 'cr*eqiad*' commit 'T351074'
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1026.eqiad.wmnet with OS bullseye
16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1025.eqiad.wmnet with OS bullseye
16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1023.eqiad.wmnet with OS bullseye
16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 50%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65535 and previous config saved to /var/cache/conftool/dbconfig/20240627-160624-arnaudb.json
16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1024.eqiad.wmnet with OS bullseye
16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1022.eqiad.wmnet with OS bullseye
16:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
16:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
16:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: sync
15:58 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
15:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
15:56 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy2005
15:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy2005
15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 25%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65534 and previous config saved to /var/cache/conftool/dbconfig/20240627-155118-arnaudb.json
15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
15:43 hnowlan: restarted ferm on 8 failing k8s workers
15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
15:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
15:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65533 and previous config saved to /var/cache/conftool/dbconfig/20240627-153613-arnaudb.json
15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1025.eqiad.wmnet with OS bullseye
15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1025.eqiad.wmnet on all recursors
15:29 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1025.eqiad.wmnet on all recursors
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1373 to wikikube-worker1025
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1025
15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1026.eqiad.wmnet with OS bullseye
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1026.eqiad.wmnet on all recursors
15:28 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1026.eqiad.wmnet on all recursors
15:27 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1025
15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1024.eqiad.wmnet with OS bullseye
15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1024.eqiad.wmnet on all recursors
15:27 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1024.eqiad.wmnet on all recursors
15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1023.eqiad.wmnet with OS bullseye
15:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1023.eqiad.wmnet on all recursors
15:26 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1023.eqiad.wmnet on all recursors
15:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1022.eqiad.wmnet with OS bullseye
15:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1022.eqiad.wmnet on all recursors
15:25 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1022.eqiad.wmnet on all recursors
15:25 pmiazga: T367901 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=rowiki --logwiki=metawiki 'Rui_Filipe_Fernandes' '44_Gabriel’`
15:24 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
15:23 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
15:21 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1373 to wikikube-worker1025
15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1366 to wikikube-worker1024
15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 5%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65532 and previous config saved to /var/cache/conftool/dbconfig/20240627-152107-arnaudb.json
15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1024
15:20 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1024
15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
15:19 pmiazga: T368451 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Agustín_Antonio_Cardozo' 'Agustín_Cardozo_Cabrera’
15:18 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
15:17 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1007.eqiad.wmnet
15:16 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1366 to wikikube-worker1024
15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1404 to wikikube-worker1026
15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1026
15:14 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1026
15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
15:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
15:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:10 cgoubert@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
15:10 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin" (duration: 00m 07s)
15:09 hashar@deploy1002: Started deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin"
15:09 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1373.eqiad.wmnet
15:09 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1373.eqiad.wmnet
15:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:08 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1404 to wikikube-worker1026
15:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291 (duration: 00m 07s)
15:04 hashar@deploy1002: Started deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291
15:03 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bullseye
15:02 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master (duration: 02m 49s)
15:00 topranks: rebooting lsw1-e7-eqiad to upgrade JunOS on switch T365988
15:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1366.eqiad.wmnet
15:00 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1366.eqiad.wmnet
15:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master
14:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master (duration: 03m 10s)
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1365 to wikikube-worker1023
14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1023
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
14:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
14:56 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master
14:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1023
14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
14:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
14:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy1003.eqiad.wmnet with OS bullseye
14:52 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update (duration: 00m 32s)
14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:52 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1365 to wikikube-worker1023
14:52 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update
14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1359 to wikikube-worker1022
14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1022
14:51 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: test deploy phab2002 (duration: 00m 34s)
14:50 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: test deploy phab2002
14:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1022
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
14:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
14:46 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
14:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
14:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1359 to wikikube-worker1022
14:43 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
14:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
14:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'T365988 - depool es1037', diff saved to https://phabricator.wikimedia.org/P65531 and previous config saved to /var/cache/conftool/dbconfig/20240627-143741-arnaudb.json
14:15 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
14:12 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bullseye
13:54 urbanecm@deploy1002: Finished scap: Backport for gerrit:1043043CommonSettings: Mark REL1_42 as stable (T359850) (duration: 08m 10s)
13:48 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:46 urbanecm@deploy1002: Started scap: Backport for gerrit:1043043CommonSettings: Mark REL1_42 as stable (T359850)
13:46 urbanecm@deploy1002: Finished scap: Backport for gerrit:1050309ptwiki: Enable CommunityConfiguration (T368310) (duration: 08m 58s)
13:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
13:41 urbanecm@deploy1002: urbanecm: Continuing with sync
13:41 urbanecm: Run `mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=ptwiki --force` via mwdebug1001 (T368310)
13:40 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1003
13:39 urbanecm@deploy1002: urbanecm: Backport for gerrit:1050309ptwiki: Enable CommunityConfiguration (T368310) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host deploy1003
13:37 urbanecm@deploy1002: Started scap: Backport for gerrit:1050309ptwiki: Enable CommunityConfiguration (T368310)
13:36 urbanecm@deploy1002: Finished scap: Backport for gerrit:1042430Enable local uploads for Gilaki Wikipedia (T364673), [[gerrit:1048855|[noop] Remove $wgRedirectScript, not used since MediaWiki 1.22]], gerrit:1048419CommunityConfiguration: Log info and higher (duration: 10m 22s)
13:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:31 urbanecm@deploy1002: urbanecm, tgr, nmw03: Continuing with sync
13:28 urbanecm@deploy1002: urbanecm, tgr, nmw03: Backport for gerrit:1042430Enable local uploads for Gilaki Wikipedia (T364673), [[gerrit:1048855|[noop] Remove $wgRedirectScript, not used since MediaWiki 1.22]], gerrit:1048419CommunityConfiguration: Log info and higher synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:25 urbanecm@deploy1002: Started scap: Backport for gerrit:1042430Enable local uploads for Gilaki Wikipedia (T364673), [[gerrit:1048855|[noop] Remove $wgRedirectScript, not used since MediaWiki 1.22]], gerrit:1048419CommunityConfiguration: Log info and higher
13:24 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:1038742|[CheckUser] Stop writing old for event tables migration on all wikis (T360685)]], gerrit:1049970testwiki: use shellbox-video for scaling video (T356241), gerrit:1049886Add VK namespace alias to Azerbaijani Wikibooks (T368237) (duration: 16m 48s)
13:19 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Continuing with sync
13:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
13:10 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Backport for [[gerrit:1038742|[CheckUser] Stop writing old for event tables migration on all wikis (T360685)]], gerrit:1049970testwiki: use shellbox-video for scaling video (T356241), gerrit:1049886Add VK namespace alias to Azerbaijani Wikibooks (T368237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:08 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
13:08 urbanecm@deploy1002: Started scap: Backport for [[gerrit:1038742|[CheckUser] Stop writing old for event tables migration on all wikis (T360685)]], gerrit:1049970testwiki: use shellbox-video for scaling video (T356241), gerrit:1049886Add VK namespace alias to Azerbaijani Wikibooks (T368237)
13:02 sukhe: A:dnsbox: remove 10.3.0.2/32 from /e/n/i
12:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65529 and previous config saved to /var/cache/conftool/dbconfig/20240627-125019-marostegui.json
12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
12:50 sukhe: sudo cumin 'A:dnsbox' 'rm /var/lib/dnsbox/ntp.state': remove obsolete ntp.state file
12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65528 and previous config saved to /var/cache/conftool/dbconfig/20240627-124957-marostegui.json
12:48 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
12:48 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
12:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
12:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
12:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
12:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65527 and previous config saved to /var/cache/conftool/dbconfig/20240627-123805-marostegui.json
12:22 jclark@cumin1002: START - Cookbook sre.dns.netbox
12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65524 and previous config saved to /var/cache/conftool/dbconfig/20240627-121942-marostegui.json
12:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
12:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:12 cmooney@cumin1002: START - Cookbook sre.dns.netbox
12:12 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
12:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
12:12 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
12:11 cmooney@cumin1002: START - Cookbook sre.dns.netbox
12:10 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
12:10 cmooney@cumin1002: START - Cookbook sre.dns.netbox
12:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
12:08 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P65523 and previous config saved to /var/cache/conftool/dbconfig/20240627-120751-marostegui.json
12:07 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65522 and previous config saved to /var/cache/conftool/dbconfig/20240627-120435-marostegui.json
12:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
12:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
11:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
11:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
11:59 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
11:58 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
11:53 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65521 and previous config saved to /var/cache/conftool/dbconfig/20240627-115244-marostegui.json
11:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
11:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
11:49 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
11:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
11:42 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:41 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
11:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
11:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
11:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
11:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
11:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
11:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
11:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
11:35 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:35 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:34 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
11:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
11:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
11:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
11:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
11:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
11:29 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
11:26 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
11:25 cgoubert@deploy1002: Finished scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861 (duration: 06m 17s)
11:24 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1004.wikimedia.org with OS bookworm
11:19 cgoubert@deploy1002: Started scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861
11:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
11:03 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:51 claime: Deploying new prometheus-php-fpm-exporter, prometheus-apache-exporter to mw-on-k8s and shellbox - T283861
10:49 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2004.wikimedia.org
10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2004.wikimedia.org with OS bookworm
10:43 fabfur: re-enabling puppet on A:cp-text_ulsfo (reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050297) (T365718)
10:38 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
10:30 fabfur: correcting previous statement: puppet disabled just on A:cp-text_ulsfo
10:28 fabfur: disable puppet on A:cp-ulsfo to apply selectively https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050258 (T365718)
10:24 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:24 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:04 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
09:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
09:21 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test2004.wikimedia.org with OS bookworm
09:14 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:13 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2004.wikimedia.org on all recursors
09:13 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test2004.wikimedia.org on all recursors
09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:12 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:09 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
09:09 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test2004.wikimedia.org
09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti1019.eqiad.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
09:01 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1019.eqiad.wmnet
08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test1004.wikimedia.org
08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test1004.wikimedia.org with OS bookworm
08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65518 and previous config saved to /var/cache/conftool/dbconfig/20240627-084043-marostegui.json
08:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
08:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1001.wikimedia.org
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1001.wikimedia.org
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2001.wikimedia.org
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:20 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65517 and previous config saved to /var/cache/conftool/dbconfig/20240627-081044-jynus.json
08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65516 and previous config saved to /var/cache/conftool/dbconfig/20240627-081016-jynus.json
08:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:59 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65515 and previous config saved to /var/cache/conftool/dbconfig/20240627-075944-jynus.json
07:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:57 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:56 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65514 and previous config saved to /var/cache/conftool/dbconfig/20240627-075620-jynus.json
07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65513 and previous config saved to /var/cache/conftool/dbconfig/20240627-075447-jynus.json
07:50 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:45 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65512 and previous config saved to /var/cache/conftool/dbconfig/20240627-074542-jynus.json
07:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:36 kartik@deploy1002: Finished scap: Backport for gerrit:1048393Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) (duration: 08m 42s)
07:36 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
07:35 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:34 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1004.wikimedia.org on all recursors
07:34 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test1004.wikimedia.org on all recursors
07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2001.wikimedia.org
07:32 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:31 kartik@deploy1002: kcvelaga, kartik: Continuing with sync
07:30 kartik@deploy1002: kcvelaga, kartik: Backport for gerrit:1048393Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:27 kartik@deploy1002: Started scap: Backport for gerrit:1048393Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028)
07:24 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
07:24 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test1004.wikimedia.org
07:18 kartik@deploy1002: Finished scap: Backport for gerrit:1049898Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) (duration: 14m 19s)
07:13 kartik@deploy1002: kartik: Continuing with sync
07:06 kartik@deploy1002: kartik: Backport for gerrit:1049898Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 kartik@deploy1002: Started scap: Backport for gerrit:1049898Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465)
06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1038 T368401', diff saved to https://phabricator.wikimedia.org/P65510 and previous config saved to /var/cache/conftool/dbconfig/20240627-064506-arnaudb.json
06:40 arnaudb@deploy1002: Finished scap: Backport for gerrit:1050096Revert "mariadb: disable writes on es6" (duration: 07m 43s)
06:35 arnaudb@deploy1002: arnaudb: Continuing with sync
06:35 arnaudb@deploy1002: arnaudb: Backport for gerrit:1050096Revert "mariadb: disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:32 arnaudb@deploy1002: Started scap: Backport for gerrit:1050096Revert "mariadb: disable writes on es6"
06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1037 T368401', diff saved to https://phabricator.wikimedia.org/P65509 and previous config saved to /var/cache/conftool/dbconfig/20240627-062338-arnaudb.json
06:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1038 to es6 primary T368401', diff saved to https://phabricator.wikimedia.org/P65508 and previous config saved to /var/cache/conftool/dbconfig/20240627-061639-arnaudb.json
06:15 arnaudb: Starting es6 eqiad failover from es1037 to es1038 - T368401
06:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1038 with weight 0 T368401', diff saved to https://phabricator.wikimedia.org/P65507 and previous config saved to /var/cache/conftool/dbconfig/20240627-061055-arnaudb.json
06:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
06:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
06:09 arnaudb@deploy1002: Finished scap: Backport for gerrit:1049555mariadb: disable writes on es6 (T368401) (duration: 08m 00s)
06:04 arnaudb@deploy1002: arnaudb: Continuing with sync
06:04 arnaudb@deploy1002: arnaudb: Backport for gerrit:1049555mariadb: disable writes on es6 (T368401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:01 arnaudb@deploy1002: Started scap: Backport for gerrit:1049555mariadb: disable writes on es6 (T368401)
03:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
03:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65506 and previous config saved to /var/cache/conftool/dbconfig/20240627-035544-marostegui.json
03:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65505 and previous config saved to /var/cache/conftool/dbconfig/20240627-034037-marostegui.json
03:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65504 and previous config saved to /var/cache/conftool/dbconfig/20240627-032530-marostegui.json
03:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65503 and previous config saved to /var/cache/conftool/dbconfig/20240627-031023-marostegui.json
00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65502 and previous config saved to /var/cache/conftool/dbconfig/20240627-005613-marostegui.json
00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
00:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65501 and previous config saved to /var/cache/conftool/dbconfig/20240627-005549-marostegui.json
00:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65500 and previous config saved to /var/cache/conftool/dbconfig/20240627-004042-marostegui.json
00:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65499 and previous config saved to /var/cache/conftool/dbconfig/20240627-002535-marostegui.json
00:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65498 and previous config saved to /var/cache/conftool/dbconfig/20240627-001028-marostegui.json

2024-06-26

23:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
23:26 mutante: people1004 - stopped confd which logs every 3 seconds that it can't find any templates (T356296)
23:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
23:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65497 and previous config saved to /var/cache/conftool/dbconfig/20240626-231020-marostegui.json
23:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
23:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65496 and previous config saved to /var/cache/conftool/dbconfig/20240626-230958-marostegui.json
22:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65495 and previous config saved to /var/cache/conftool/dbconfig/20240626-225451-marostegui.json
22:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
22:41 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5021.eqsin.wmnet
22:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65494 and previous config saved to /var/cache/conftool/dbconfig/20240626-223944-marostegui.json
22:26 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet
22:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65493 and previous config saved to /var/cache/conftool/dbconfig/20240626-222434-marostegui.json
22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
21:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:40 cjming: end of UTC late backport window
21:38 cjming@deploy1002: Finished scap: Backport for gerrit:1050005Homepage: don't load yesterdays edits on desktop (T368405) (duration: 08m 48s)
21:33 cjming@deploy1002: cjming, migr: Continuing with sync
21:32 cjming@deploy1002: cjming, migr: Backport for gerrit:1050005Homepage: don't load yesterdays edits on desktop (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:29 cjming@deploy1002: Started scap: Backport for gerrit:1050005Homepage: don't load yesterdays edits on desktop (T368405)
21:29 hashar: restarting CI Jenkins
21:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
21:13 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
21:05 cjming@deploy1002: Finished scap: Backport for gerrit:1050002Homepage: log rendering time for each module and each wiki (T368405) (duration: 14m 01s)
20:59 eileen: config revision changed from 0b822cd3 to 994e7b81
20:57 cjming@deploy1002: cjming, migr: Continuing with sync
20:55 cjming@deploy1002: cjming, migr: Backport for gerrit:1050002Homepage: log rendering time for each module and each wiki (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:51 cjming@deploy1002: Started scap: Backport for gerrit:1050002Homepage: log rendering time for each module and each wiki (T368405)
20:50 jdrewniak@deploy1002: Finished scap: Backport for gerrit:1049972Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) (duration: 08m 09s)
20:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for gerrit:1049972Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:42 jdrewniak@deploy1002: Started scap: Backport for gerrit:1049972Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583)
20:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: gerrit:1050007 Bumping portals to master (T128546) (duration: 06m 58s)
20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:1050007 Bumping portals to master (T128546) (duration: 07m 27s)
20:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5020.eqsin.wmnet
20:21 cjming@deploy1002: Finished scap: Backport for gerrit:1049947Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) (duration: 08m 46s)
20:15 cjming@deploy1002: cjming, kgraessle: Continuing with sync
20:14 cjming@deploy1002: cjming, kgraessle: Backport for gerrit:1049947Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 cjming@deploy1002: Started scap: Backport for gerrit:1049947Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969)
20:08 mutante: lists1001:/lib/systemd/system# rm wmf_auto_restart_apache2.* ; systemctl reset-failed - reaction to monitoring alert "FIRING: SystemdUnitFailed: wmf_auto_restart_apache2.service on lists1001:9100"
20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet
20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
19:48 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
19:40 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 02m 38s)
19:39 jhathaway@deploy1002: Started scap: (no justification provided)
19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
19:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
19:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:11 ottomata: re-enabling varnishkafka-eventlogging and varnish /beacon/event handling on cache text nodes. /beacon/event/ redirects which breaks the MediaWikiPingback usage - T238230
19:02 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
18:55 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
18:25 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
18:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65490 and previous config saved to /var/cache/conftool/dbconfig/20240626-182355-marostegui.json
18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65489 and previous config saved to /var/cache/conftool/dbconfig/20240626-182333-marostegui.json
18:23 sukhe@cumin1002: START - Cookbook sre.dns.netbox
18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
18:17 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
18:14 sukhe: # etcdctl --username root --endpoints https://conf1007.eqiad.wmnet:4001 rmdir /conftool/v1/pools/${site}/dnsbox/ntp: T366360
18:12 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet
18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65488 and previous config saved to /var/cache/conftool/dbconfig/20240626-180824-marostegui.json
18:07 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance. (duration: 00m 39s)
18:06 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance.
17:59 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent"
17:58 sukhe: sudo cumin -b1 -s30 "A:cp-text" "run-puppet-agent"
17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65487 and previous config saved to /var/cache/conftool/dbconfig/20240626-175317-marostegui.json
17:51 ottomata: disabling varnishkafka-eventlogging and varnish /beacon/event handling on ache text nodes. Puppet is disabled on all cache text, will test a few at a time first. - T238230
17:46 sukhe: disable puppet in A:cp-text
17:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet
17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
17:39 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65486 and previous config saved to /var/cache/conftool/dbconfig/20240626-173810-marostegui.json
17:37 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 11s)
17:37 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
17:29 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34] (duration: 02m 54s)
17:26 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34]
17:26 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34] (duration: 04m 12s)
17:22 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34]
17:17 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
17:17 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
17:16 sukhe: re-enable puppet on A:cp-text
17:14 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1049982Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), gerrit:1049989Skip failing ForeignResourceStructureTest (T362425), gerrit:1049988Skip failing ForeignResourceStructureTest (T362425), gerrit:1049984Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) (duration: 08m 52s)
17:09 ladsgroup@deploy1002: ladsgroup: Continuing with sync
17:08 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:1049982Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), gerrit:1049989Skip failing ForeignResourceStructureTest (T362425), gerrit:1049988Skip failing ForeignResourceStructureTest (T362425), gerrit:1049984Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwd
17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
17:06 ladsgroup@deploy1002: Started scap: Backport for gerrit:1049982Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), gerrit:1049989Skip failing ForeignResourceStructureTest (T362425), gerrit:1049988Skip failing ForeignResourceStructureTest (T362425), gerrit:1049984Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)
17:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
17:01 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 09m 16s)
16:52 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
16:52 sukhe: disable puppet on A:cp-text
16:50 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
16:50 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
16:44 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
16:44 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
16:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
16:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
16:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
16:27 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 00m 29s)
16:27 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
16:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5018.eqsin.wmnet
16:15 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 33s)
16:14 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1048064'"
15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1049969'"
15:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
15:38 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
15:38 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:38 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
15:36 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:35 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
15:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
15:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
15:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
15:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
15:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:20 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
15:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:16 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
15:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:14 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1048064"'
15:12 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
15:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2002-dev.codfw.wmnet
15:07 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:07 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:06 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:04 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:00 taavi@cumin1002: END (ERROR) - Cookbook sre.puppet.renew-cert (exit_code=97) for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
14:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2002-dev.codfw.wmnet
14:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2001-dev.codfw.wmnet
14:58 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
14:58 taavi@cumin1002: START - Cookbook sre.puppet.renew-cert for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
14:57 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
14:54 andrew@cumin1002: START - Cookbook sre.dns.netbox
14:48 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2001-dev.codfw.wmnet
14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
14:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
14:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1043812Add shellbox-video vars/config, enable on beta (T356241) (duration: 08m 22s)
14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Continuing with sync
14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Backport for gerrit:1043812Add shellbox-video vars/config, enable on beta (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1043812Add shellbox-video vars/config, enable on beta (T356241)
14:21 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1049924wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) (duration: 07m 57s)
14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
14:14 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for gerrit:1049924wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1049924wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010)
14:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:09 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
14:08 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1002: Finished scap: Backport for gerrit:1049857CodeEditor.vue: add watcher for disabled state (T368504) (duration: 08m 00s)
14:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:06 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:06 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
14:05 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:05 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:04 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:04 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:02 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:02 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:01 jforrester@deploy1002: jforrester: Continuing with sync
14:01 jforrester@deploy1002: jforrester: Backport for gerrit:1049857CodeEditor.vue: add watcher for disabled state (T368504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:01 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:01 claime: Deploying statsd-exporter for mw-api-int - T365265
14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
14:01 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:59 jforrester@deploy1002: Started scap: Backport for gerrit:1049857CodeEditor.vue: add watcher for disabled state (T368504)
13:56 Lucas_WMDE: UTC afternoon backport+config window done (I might deploy a few more patches later out-of-window)
13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1049888|[u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522)]], [[gerrit:1049929|[arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532)]] (duration: 08m 49s)
13:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
13:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [[gerrit:1049888|[u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522)]], [[gerrit:1049929|[arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:46 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1049888|[u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522)]], [[gerrit:1049929|[arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532)]]
13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1048408|[ltwiki] Add a new 'rollbacker' usergroup (T367993)]] (duration: 08m 48s)
13:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [[gerrit:1048408|[ltwiki] Add a new 'rollbacker' usergroup (T367993)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1048408|[ltwiki] Add a new 'rollbacker' usergroup (T367993)]]
13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65481 and previous config saved to /var/cache/conftool/dbconfig/20240626-133239-marostegui.json
13:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
13:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65480 and previous config saved to /var/cache/conftool/dbconfig/20240626-133216-marostegui.json
13:31 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
13:30 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ mwscript-k8s namespaceDupes maiwiki -- --fix # T363667, 0 pages/links to fix, i.e. no-op
13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1049667Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), gerrit:1031533maiwiki: Remove 'CA' namespace alias (T363667) (duration: 10m 50s)
13:28 elukey@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
13:23 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for gerrit:1049667Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), gerrit:1031533maiwiki: Remove 'CA' namespace alias (T363667) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1049667Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), gerrit:1031533maiwiki: Remove 'CA' namespace alias (T363667)
13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65479 and previous config saved to /var/cache/conftool/dbconfig/20240626-131709-marostegui.json
13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1038741|[CheckUser] Stop writing old for event tables migration on group1 (T360685)]] (duration: 12m 09s)
13:13 elukey: reload nginx on registry* nodes (Docker registry) to pick up new logging changes
13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Continuing with sync
13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Backport for [[gerrit:1038741|[CheckUser] Stop writing old for event tables migration on group1 (T360685)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1038741|[CheckUser] Stop writing old for event tables migration on group1 (T360685)]]
13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65478 and previous config saved to /var/cache/conftool/dbconfig/20240626-130201-marostegui.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65476 and previous config saved to /var/cache/conftool/dbconfig/20240626-124654-marostegui.json
12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
12:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65471 and previous config saved to /var/cache/conftool/dbconfig/20240626-121158-marostegui.json
12:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
12:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65470 and previous config saved to /var/cache/conftool/dbconfig/20240626-121136-marostegui.json
11:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65469 and previous config saved to /var/cache/conftool/dbconfig/20240626-115628-marostegui.json
11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:52 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:52 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:51 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:51 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:43 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65468 and previous config saved to /var/cache/conftool/dbconfig/20240626-114121-marostegui.json
11:41 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:40 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:39 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:39 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:35 moritzm: installing emacs security updates
11:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:26 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:26 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65467 and previous config saved to /var/cache/conftool/dbconfig/20240626-112614-marostegui.json
11:24 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:23 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
11:19 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 fully T363812', diff saved to https://phabricator.wikimedia.org/P65466 and previous config saved to /var/cache/conftool/dbconfig/20240626-111934-jynus.json
11:14 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:13 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
11:07 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:07 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
10:39 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 at 50% T363812', diff saved to https://phabricator.wikimedia.org/P65465 and previous config saved to /var/cache/conftool/dbconfig/20240626-103933-jynus.json
10:25 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 after backup T363812', diff saved to https://phabricator.wikimedia.org/P65464 and previous config saved to /var/cache/conftool/dbconfig/20240626-102523-jynus.json
10:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:13 claime: enabling puppet on cp-text - T367949
10:04 claime: enabling puppet on cp4037 - T367949
10:02 claime: disabling puppet on cp-text - T367949
09:59 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:55 slyngs: Update idp.wikimedia.org to CAS 6.6.15.2 (T368503)
09:50 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:48 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
09:46 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
09:44 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:38 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
09:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1002.wikimedia.org with OS bookworm
08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65463 and previous config saved to /var/cache/conftool/dbconfig/20240626-085511-root.json
08:44 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
08:42 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
08:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
08:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419 (duration: 00m 43s)
08:39 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419
08:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65462 and previous config saved to /var/cache/conftool/dbconfig/20240626-083733-marostegui.json
08:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
08:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65461 and previous config saved to /var/cache/conftool/dbconfig/20240626-083711-marostegui.json
08:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419 (duration: 00m 48s)
08:31 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419
08:25 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1002.wikimedia.org with OS bookworm
08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65460 and previous config saved to /var/cache/conftool/dbconfig/20240626-082204-marostegui.json
08:11 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65458 and previous config saved to /var/cache/conftool/dbconfig/20240626-081130-jynus.json
08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1023 as es5 master - this is a NOOP', diff saved to https://phabricator.wikimedia.org/P65457 and previous config saved to /var/cache/conftool/dbconfig/20240626-081014-marostegui.json
08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65456 and previous config saved to /var/cache/conftool/dbconfig/20240626-080657-marostegui.json
08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Fix weights for es2021 and es2024', diff saved to https://phabricator.wikimedia.org/P65455 and previous config saved to /var/cache/conftool/dbconfig/20240626-080649-marostegui.json
07:59 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1022 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65454 and previous config saved to /var/cache/conftool/dbconfig/20240626-075946-jynus.json
07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 100% load', diff saved to https://phabricator.wikimedia.org/P65453 and previous config saved to /var/cache/conftool/dbconfig/20240626-075428-jynus.json
07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65451 and previous config saved to /var/cache/conftool/dbconfig/20240626-075043-marostegui.json
07:44 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
07:33 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 50% load', diff saved to https://phabricator.wikimedia.org/P65449 and previous config saved to /var/cache/conftool/dbconfig/20240626-073304-jynus.json
07:28 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 with low load for warmup', diff saved to https://phabricator.wikimedia.org/P65448 and previous config saved to /var/cache/conftool/dbconfig/20240626-072810-jynus.json
07:03 moritzm: installing emacs security updates
06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2136 - running 10.11 with minium weight T365805', diff saved to https://phabricator.wikimedia.org/P65447 and previous config saved to /var/cache/conftool/dbconfig/20240626-065636-marostegui.json
06:52 marostegui: Enable slow query log on db2136 running 10.11 T365805
06:39 marostegui: Install mariadb 10.11 on s4 db2136 (depooled for now) T365805
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65446 and previous config saved to /var/cache/conftool/dbconfig/20240626-063109-root.json
06:01 marostegui: dbmaint eqiad Drop ipblocks in s1 T367632
05:59 marostegui: dbmaint eqiad Drop ipblocks in s3 T367632
05:57 marostegui: dbmaint eqiad Drop ipblocks in s4 T367632
05:39 ryankemper: [Elastic] `curl -s -X POST https://search.svc.eqiad.wmnet:9243/_cluster/reroute?retry_failed=true` did the trick. Shard initializing, cluster should be back to green soon enough
05:36 ryankemper: [Elastic] One unassigned shard; cluster status yellow. Not a big deal, looks like `shard has exceeded the maximum number of retries [5] on failed allocation attempts`, I'll try a manual `/_cluster/reroute?retry_failed=true`
05:01 marostegui: dbmaint eqiad Drop ipblocks in s5 T367632
04:53 marostegui: dbmaint eqiad Drop ipblocks in s2 T367632
04:51 marostegui: dbmaint eqiad Drop ipblocks in s8 T367632
03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65445 and previous config saved to /var/cache/conftool/dbconfig/20240626-033955-marostegui.json
03:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65444 and previous config saved to /var/cache/conftool/dbconfig/20240626-033933-marostegui.json
03:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65443 and previous config saved to /var/cache/conftool/dbconfig/20240626-032426-marostegui.json
03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65442 and previous config saved to /var/cache/conftool/dbconfig/20240626-030919-marostegui.json
02:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65441 and previous config saved to /var/cache/conftool/dbconfig/20240626-025412-marostegui.json
00:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65440 and previous config saved to /var/cache/conftool/dbconfig/20240626-002103-marostegui.json
00:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
00:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
00:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65439 and previous config saved to /var/cache/conftool/dbconfig/20240626-002041-marostegui.json
00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65438 and previous config saved to /var/cache/conftool/dbconfig/20240626-000534-marostegui.json

2024-06-25

23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65437 and previous config saved to /var/cache/conftool/dbconfig/20240625-235027-marostegui.json
23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65436 and previous config saved to /var/cache/conftool/dbconfig/20240625-233520-marostegui.json
23:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
23:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
23:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
22:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
22:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65435 and previous config saved to /var/cache/conftool/dbconfig/20240625-224249-marostegui.json
22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65434 and previous config saved to /var/cache/conftool/dbconfig/20240625-224226-marostegui.json
22:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65433 and previous config saved to /var/cache/conftool/dbconfig/20240625-222719-marostegui.json
22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65432 and previous config saved to /var/cache/conftool/dbconfig/20240625-221212-marostegui.json
22:10 bvibber: a webVideoTranscode job reported 'No space left on device' from a failed ffmpeg run on mw1446 recently
22:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
22:05 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65431 and previous config saved to /var/cache/conftool/dbconfig/20240625-215705-marostegui.json
21:47 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
20:44 cjming: end of UTC late backport window
20:41 cjming@deploy1002: Finished scap: Backport for gerrit:1043880Cleanup: Remove wgNavigationTimingSurveyName (T367128) (duration: 08m 29s)
20:36 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
20:35 cjming@deploy1002: jdlrobson, cjming: Backport for gerrit:1043880Cleanup: Remove wgNavigationTimingSurveyName (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:32 cjming@deploy1002: Started scap: Backport for gerrit:1043880Cleanup: Remove wgNavigationTimingSurveyName (T367128)
20:31 cjming@deploy1002: Finished scap: Backport for gerrit:1041250Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) (duration: 15m 04s)
20:26 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
20:19 cjming@deploy1002: jdlrobson, cjming: Backport for gerrit:1041250Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:16 cjming@deploy1002: Started scap: Backport for gerrit:1041250Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373)
20:14 cjming@deploy1002: Finished scap: Backport for gerrit:1049608Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) (duration: 08m 36s)
20:11 Emperor: restart swift-proxy on ms-fe2010 ms-fe1011 T360913
20:09 cjming@deploy1002: cjming, bvibber: Continuing with sync
20:08 cjming@deploy1002: cjming, bvibber: Backport for gerrit:1049608Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 cjming@deploy1002: Started scap: Backport for gerrit:1049608Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433)
20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
20:01 hashar@deploy1002: Finished deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092 (duration: 00m 07s)
20:01 hashar@deploy1002: Started deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092
19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65430 and previous config saved to /var/cache/conftool/dbconfig/20240625-192947-marostegui.json
19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65429 and previous config saved to /var/cache/conftool/dbconfig/20240625-192910-marostegui.json
19:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
19:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
19:23 sukhe: re-enable puppet on lvs2011
19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65428 and previous config saved to /var/cache/conftool/dbconfig/20240625-191403-marostegui.json
18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65426 and previous config saved to /var/cache/conftool/dbconfig/20240625-185856-marostegui.json
18:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
18:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
18:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65425 and previous config saved to /var/cache/conftool/dbconfig/20240625-184349-marostegui.json
18:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
18:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
18:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
18:06 topranks: bringing up link from ssw1-a1-codfw to ssw1-d1-codfw T364095
17:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:55 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:51 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:44 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:43 brett: Re-re-pooling lvs2011 - T368165
17:37 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
17:36 brett: Depooling lvs2011 due to elevated socket/tcp errors - T368165
17:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
17:28 brett: Pooling lvs2011 - T368165
17:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65424 and previous config saved to /var/cache/conftool/dbconfig/20240625-172502-marostegui.json
17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65423 and previous config saved to /var/cache/conftool/dbconfig/20240625-172440-marostegui.json
17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65422 and previous config saved to /var/cache/conftool/dbconfig/20240625-170933-marostegui.json
17:06 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:01 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65421 and previous config saved to /var/cache/conftool/dbconfig/20240625-165426-marostegui.json
16:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65420 and previous config saved to /var/cache/conftool/dbconfig/20240625-163919-marostegui.json
16:37 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65419 and previous config saved to /var/cache/conftool/dbconfig/20240625-163330-arnaudb.json
16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1437.eqiad.wmnet
16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1437.eqiad.wmnet
16:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
16:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
16:23 bvibber: running requeueTranscodes for missing audio files on commons (mwmaint1002) cf T368364
16:23 claime: depooling mw1437
16:19 claime: cleaning up shellbox leftover files on mw1437.eqiad.wmnet
16:19 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65418 and previous config saved to /var/cache/conftool/dbconfig/20240625-161824-arnaudb.json
16:15 claime: Extending vg-srv on mw1437
16:10 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728 (duration: 00m 39s)
16:10 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728
16:09 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728 (duration: 00m 33s)
16:08 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728
16:05 brennen: silencing phabricator hosts prior to deploy
16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 50%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65417 and previous config saved to /var/cache/conftool/dbconfig/20240625-160318-arnaudb.json
15:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65415 and previous config saved to /var/cache/conftool/dbconfig/20240625-153307-arnaudb.json
15:31 Dreamy_Jazz: Ran `mwscript extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --wiki=testwiki` for T366781
15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
15:20 claime: Deploying statsd to mw-api-ext - T365265
15:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 5%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65414 and previous config saved to /var/cache/conftool/dbconfig/20240625-151802-arnaudb.json
15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392 (duration: 00m 50s)
15:05 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392
15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392 (duration: 00m 33s)
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:00 topranks: rebooting lsw1-e5-eqiad to upgrade JunOS on switch T365986
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
14:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
14:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T365986 - depool es1035', diff saved to https://phabricator.wikimedia.org/P65413 and previous config saved to /var/cache/conftool/dbconfig/20240625-145558-arnaudb.json
14:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
14:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:45 urbanecm@deploy1002: Finished scap: Backport for gerrit:1049538WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 11m 45s)
14:40 urbanecm@deploy1002: urbanecm: Continuing with sync
14:40 urbanecm@deploy1002: urbanecm: Backport for gerrit:1049538WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
14:35 sukhe: sudo cumin -b1 -s900 "A:dnsbox" "run-puppet-agent --enable 'rolling out CR 1049165' && systemctl restart ntp.service"
14:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
14:33 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-eqiad - T364383
14:33 urbanecm@deploy1002: Started scap: Backport for gerrit:1049538WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
14:30 vgutierrez: disable puppet on A:cp-eqiad before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049570 - T364383
14:24 dcausse: re-indexing all wikidata entity schemas (T368010)
14:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:17 urbanecm@deploy1002: Finished scap: Backport for gerrit:1049534Add change tag "Community Configuration" (T366989), gerrit:1049535Add change tag "Community Configuration" (T366989), gerrit:1049539WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 58m 28s)
14:15 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1049165"'
14:12 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:11 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
14:10 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
14:09 urbanecm@deploy1002: urbanecm: Continuing with sync
14:05 sukhe: restart pybal on lvs2014
14:02 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
14:02 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
14:01 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
14:01 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
14:00 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
14:00 urbanecm@deploy1002: urbanecm: Backport for gerrit:1049534Add change tag "Community Configuration" (T366989), gerrit:1049535Add change tag "Community Configuration" (T366989), gerrit:1049539WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:59 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
13:59 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
13:54 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
13:51 sukhe: restart pybal on lvs1020
13:44 sukhe: disable puppet on A:lvs and A:codfw for CR 1049560
13:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
13:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
13:39 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:37 mvernon@cumin2002: conftool action : set/pooled=yes; selector: cluster=apus
13:36 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
13:36 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
13:29 vgutierrez: IPIP encapsulation enabled on ldap-ro.eqiad.wikimedia.org - T367861
13:26 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367861
13:18 urbanecm@deploy1002: Started scap: Backport for gerrit:1049534Add change tag "Community Configuration" (T366989), gerrit:1049535Add change tag "Community Configuration" (T366989), gerrit:1049539WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
13:07 fabfur: temporary disabled puppet on cp4037 to test benthos configuration (T367756)
12:51 cgoubert@deploy1002: Finished scap: Deploy udp2log rate-limiting - T365655 - T368098 (duration: 05m 49s)
12:46 cgoubert@deploy1002: Started scap: Deploy udp2log rate-limiting - T365655 - T368098
12:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:12 XioNoX: push NTP changes on pfw3
12:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65411 and previous config saved to /var/cache/conftool/dbconfig/20240625-120926-marostegui.json
12:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
12:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
11:58 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-esams - T364383
11:56 vgutierrez: disable puppet on A:cp-esams before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049529 - T364383
11:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:45 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:40 marostegui: m2 dbmaint eqiad Stop db1217:3322 to clone db1228 T368374
10:12 jmm@deploy1002: Finished scap: (no justification provided) (duration: 03m 30s)
10:11 jmm@deploy1002: Started scap: (no justification provided)
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
09:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
09:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
09:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
09:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1228 from dbctl T368374', diff saved to https://phabricator.wikimedia.org/P65409 and previous config saved to /var/cache/conftool/dbconfig/20240625-093454-marostegui.json
09:34 slyngs: Switching idp-test.wikimedia.org to CAS 7
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1228 T368374', diff saved to https://phabricator.wikimedia.org/P65408 and previous config saved to /var/cache/conftool/dbconfig/20240625-093221-root.json
08:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
08:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
08:32 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2025', diff saved to https://phabricator.wikimedia.org/P65407 and previous config saved to /var/cache/conftool/dbconfig/20240625-083216-jynus.json
08:31 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
08:31 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
08:26 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2022', diff saved to https://phabricator.wikimedia.org/P65406 and previous config saved to /var/cache/conftool/dbconfig/20240625-082649-jynus.json
07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
07:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
07:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65405 and previous config saved to /var/cache/conftool/dbconfig/20240625-071855-marostegui.json
07:14 marostegui: Optimize pagelinks on old s8 codfw master db2165 dbmaint T364069
07:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
07:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65404 and previous config saved to /var/cache/conftool/dbconfig/20240625-070348-marostegui.json
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2165 T368355', diff saved to https://phabricator.wikimedia.org/P65403 and previous config saved to /var/cache/conftool/dbconfig/20240625-070252-marostegui.json
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2161 to s8 primary T368355', diff saved to https://phabricator.wikimedia.org/P65402 and previous config saved to /var/cache/conftool/dbconfig/20240625-070127-marostegui.json
07:01 marostegui: Starting s8 codfw failover from db2165 to db2161 - T368355
07:00 arnaudb@deploy1002: Finished scap: Backport for gerrit:1049386Revert "dbconfig: temporary disable writes on es7" (duration: 07m 47s)
06:55 arnaudb@deploy1002: arnaudb: Continuing with sync
06:55 arnaudb@deploy1002: arnaudb: Backport for gerrit:1049386Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:54 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
06:52 arnaudb@deploy1002: Started scap: Backport for gerrit:1049386Revert "dbconfig: temporary disable writes on es7"
06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65401 and previous config saved to /var/cache/conftool/dbconfig/20240625-064841-marostegui.json
06:45 arnaudb@deploy1002: Sync cancelled.
06:45 arnaudb@deploy1002: arnaudb: Backport for gerrit:1049386Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:42 arnaudb@deploy1002: Started scap: Backport for gerrit:1049386Revert "dbconfig: temporary disable writes on es7"
06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'T368020', diff saved to https://phabricator.wikimedia.org/P65400 and previous config saved to /var/cache/conftool/dbconfig/20240625-064000-arnaudb.json
06:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2161 with weight 0 T368355', diff saved to https://phabricator.wikimedia.org/P65399 and previous config saved to /var/cache/conftool/dbconfig/20240625-063908-root.json
06:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
06:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1039 to es7 primary T368020', diff saved to https://phabricator.wikimedia.org/P65398 and previous config saved to /var/cache/conftool/dbconfig/20240625-063453-arnaudb.json
06:33 arnaudb: Starting es7 eqiad failover from es1035 to es1039 - T368020
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65397 and previous config saved to /var/cache/conftool/dbconfig/20240625-063334-marostegui.json
06:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1039 with weight 0 T368020', diff saved to https://phabricator.wikimedia.org/P65396 and previous config saved to /var/cache/conftool/dbconfig/20240625-062640-arnaudb.json
06:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
06:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
06:24 arnaudb@deploy1002: Finished scap: Backport for gerrit:1047910dbconfig: temporary disable writes on es7 (T368020) (duration: 18m 47s)
06:19 arnaudb@deploy1002: arnaudb: Continuing with sync
06:17 arnaudb@deploy1002: arnaudb: Backport for gerrit:1047910dbconfig: temporary disable writes on es7 (T368020) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:11 marostegui: Drop ipblocks from s7 T367632
06:05 arnaudb@deploy1002: Started scap: Backport for gerrit:1047910dbconfig: temporary disable writes on es7 (T368020)
06:02 marostegui: Drop ipblocks from s6 T367632
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65395 and previous config saved to /var/cache/conftool/dbconfig/20240625-053312-marostegui.json
05:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
05:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65394 and previous config saved to /var/cache/conftool/dbconfig/20240625-053239-marostegui.json
05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.8 (duration: 00m 55s)
03:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.11 refs T366956 (duration: 52m 19s)
03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.11 refs T366956
01:48 brett: Running authdns-update on dns1004 to pool eqsin - T365763
01:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=eqsin
01:40 brett: Removing downtime for cp[5017-5024] as nvme drives are installed and hosts back online - T365763
00:43 sukhe: [correction of command] sudo pkill ffmpeg: mw1438, high CPU usage, ffmpeg processes
00:43 sukhe: sudo pkill mpeg: mw1438, high CPU usage, ffmpeg processes
00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: T365763
00:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on 8 hosts with reason: T365763

2024-06-24

23:02 brett: Running authdns-update on dns1004 to depool eqsin - T365763
23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2003.codfw.wmnet
23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:57 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:53 cwhite@cumin2002: START - Cookbook sre.dns.netbox
22:46 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2003.codfw.wmnet
22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2002.codfw.wmnet
22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:41 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:38 cwhite@cumin2002: START - Cookbook sre.dns.netbox
22:26 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2002.codfw.wmnet
22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2001.codfw.wmnet
22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:24 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:18 cwhite@cumin2002: START - Cookbook sre.dns.netbox
22:11 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2001.codfw.wmnet
21:34 inflatador: bking@alerts1001 uninstall deb pkg `ripgrep` T368107
21:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-redacteddb1001.eqiad.wmnet
21:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
20:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
20:36 inflatador: bking@alert1001 install `ripgrep` deb pkg T368107
20:22 ladsgroup@deploy1002: Synchronized php-1.43.0-wmf.10/includes/libs/rdbms/loadbalancer/LoadBalancer.php: (no justification provided) (duration: 11m 04s)
20:21 mutante: snapsho1017 - systemctl mask commonsrdf-dump ; systemctl mask commonsjson-dump T368098
20:18 taavi: taavi@snapshot1017 ~ $ sudo systemctl stop commons*.service
20:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bookworm
19:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
19:32 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:08 mutante: LDAP - added daphnesmit to group 'wmf' - Phabricator: added dsmit-wmf to WMF-NDA group T368140
19:02 sukhe: ms-fe1009: restart swift-proxy: T360913
18:59 mutante: ms-fe1011 - restarted swift-proxy
18:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:52 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
18:52 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for 15 hosts
18:50 eevans@cumin1002: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:50 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:50 sukhe: sudo cumin -s1 -b60 'ms-fe1010*,ms-fe1013*' 'systemctl restart swift-proxy'
18:50 mutante: ms-fe1010,ms-fe1013 - restart swift-proxy - T360913
18:48 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotate ChronologyProtector secret (duration: 11m 33s)
18:46 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:43 ladsgroup@deploy1002: ladsgroup: Continuing with sync
18:41 ladsgroup@deploy1002: ladsgroup: Rotate ChronologyProtector secret synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:17 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bookworm
18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360913
18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360931
18:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
18:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
18:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
18:05 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
18:04 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
18:03 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:57 sukhe: restart on pybal lvs1019
17:56 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
17:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=eqiad
17:50 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
17:50 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
17:49 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
17:48 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
17:48 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
17:48 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:48 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
17:47 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
17:47 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
17:47 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
17:47 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
17:46 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:46 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:46 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:46 sbassett@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
17:46 sbassett@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
17:45 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
17:43 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:34 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=codfw
17:33 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:32 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bookworm
17:28 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:27 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:23 sukhe: restart pybal on lvs2013
17:20 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:19 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:18 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:13 sukhe: restart pybal on lvs1020 and lvs1019
17:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
16:51 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
16:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
16:48 sukhe: restart pybal on lvs1020
16:47 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
16:44 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
16:41 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
16:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bookworm
16:20 sukhe: restart pybal on lvs1020
16:01 dancy@deploy1002: Installation of scap version "4.89.0" completed for 1 hosts
16:00 dancy@deploy1002: Installing scap version "4.89.0" for 1 hosts
15:59 sukhe: restart pybal on lvs1020
15:59 dancy@deploy1002: Installing scap version "4.89.0" for 248 hosts
15:57 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:50 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:49 sukhe: restart pybal on lvs1020
15:43 vgutierrez: updated termination_state cache haproxy metrics, expect higher CD and CR rates - T367963
15:42 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:29 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
15:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
15:20 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
15:20 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
15:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
15:16 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
15:16 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
15:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
15:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
15:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
15:11 mvernon@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
15:11 claime: Enabling statsd-exporter on mw-jobrunner - T365265
15:11 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
15:09 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-drmrs - T364383
15:08 Emperor: enable/run puppet on eqiad lvs for apus LVS rollout T279621
15:08 Dreamy_Jazz: Afternoon UTC backport window done
15:08 dreamyjazz@deploy1002: Finished scap: Backport for gerrit:1010953extension-list: Add IPReputation (T360067) (duration: 30m 37s)
15:07 vgutierrez: [fixed url] disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049198 - T364383
15:06 vgutierrez: disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049104 - T364383
15:02 sukhe: restart pybal on lvs2014
15:01 mvernon@cumin1002: END (ERROR) - Cookbook sre.loadbalancer.restart-pybal (exit_code=97) rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
15:00 dreamyjazz@deploy1002: kharlan, dreamyjazz: Continuing with sync
14:57 dreamyjazz@deploy1002: kharlan, dreamyjazz: Backport for gerrit:1010953extension-list: Add IPReputation (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:56 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
14:53 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
14:52 Emperor: enable/run puppet on codfw lvs for apus LVS rollout T279621
14:49 Emperor: stop puppet on eqiad/codfw lvs prior to apus LVS rollout T279621
14:48 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
14:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
14:47 elukey: depool cp4052 to deploy a new version of glibc - T367978
14:47 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
14:37 dreamyjazz@deploy1002: Started scap: Backport for gerrit:1010953extension-list: Add IPReputation (T360067)
14:34 urbanecm@deploy1002: Finished scap: Backport for gerrit:1049188ptwiki: Undeploy CommunityConfiguration (T368121) (duration: 07m 31s)
14:32 mnz@deploy1002: Finished deploy [airflow-dags/research@b682892]: (no justification provided) (duration: 00m 31s)
14:31 mnz@deploy1002: Started deploy [airflow-dags/research@b682892]: (no justification provided)
14:27 urbanecm@deploy1002: Started scap: Backport for gerrit:1049188ptwiki: Undeploy CommunityConfiguration (T368121)
14:27 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
14:17 mvernon@cumin1002: conftool action : set/pooled=yes:weight=40; selector: cluster=apus
14:13 urbanecm@deploy1002: Finished scap: Backport for gerrit:1049126Growth: Enable CommunityConfiguration at idwiki (T366629), gerrit:1049127Growth: Enable CommunityConfiguration on round 1 wikis (T368121), gerrit:1048443AX Language selector entrypoint: Fix AX URL (T363183) (duration: 25m 37s)
14:07 urbanecm@deploy1002: kartik, urbanecm: Continuing with sync
14:03 sukhe: running homer in cr*{eqiad*,codfw*} to remove ntp.anycast.wmnet from policies/cr-labs: T366360
13:57 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
13:56 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
13:49 urbanecm@deploy1002: kartik, urbanecm: Backport for gerrit:1049126Growth: Enable CommunityConfiguration at idwiki (T366629), gerrit:1049127Growth: Enable CommunityConfiguration on round 1 wikis (T368121), gerrit:1048443AX Language selector entrypoint: Fix AX URL (T363183) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:47 urbanecm@deploy1002: Started scap: Backport for gerrit:1049126Growth: Enable CommunityConfiguration at idwiki (T366629), gerrit:1049127Growth: Enable CommunityConfiguration on round 1 wikis (T368121), gerrit:1048443AX Language selector entrypoint: Fix AX URL (T363183)
13:47 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
13:44 urbanecm@deploy1002: Finished scap: Backport for gerrit:1048014mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), gerrit:1039597CommonSettings: Restore the original behaviour of Reference Previews (T366419), [[gerrit:1049153|[MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258)]] (duration: 35m 30s)
13:41 vgutierrez: disable puppet on A:cp-magru before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049178 - T364383
13:40 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-magru - T364383
13:40 elukey: [correction] depool cp4037 to deploy a new version of glibc - T367978
13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
13:39 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
13:39 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:37 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Continuing with sync
13:35 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
13:35 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
13:32 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4034.ulsfo.wmnet
13:31 elukey: depool cp4034 to deploy a new version of glibc - T367978
13:29 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:29 elukey: uploaded debmonitor-client_0.4.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
13:29 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:24 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Backport for gerrit:1048014mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), gerrit:1039597CommonSettings: Restore the original behaviour of Reference Previews (T366419), [[gerrit:1049153|[MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:22 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 32s)
13:22 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
13:08 urbanecm@deploy1002: Started scap: Backport for gerrit:1048014mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), gerrit:1039597CommonSettings: Restore the original behaviour of Reference Previews (T366419), [[gerrit:1049153|[MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258)]]
12:53 vgutierrez: IPIP encapsulation enabled on ldap-ro.codfw.wikimedia.org. - T367861
12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2012 - T367861
12:23 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:21 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
12:17 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,name=mw1364.eqiad.wmnet
12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,name=mw2276.codfw.wmnet
12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=api_appserver,name=mw1398.eqiad.wmnet
12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,name=mw2299.codfw.wmnet
12:13 moritzm: installing pymysql security updates
12:11 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
12:10 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
12:10 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
12:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:09 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
12:09 claime: Downtiming all legacy api_appserver and appserver - T368058
12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=api_appserver
12:07 claime: Setting all legacy api_appservers to inactive - T368058
12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=appserver
12:06 claime: Setting all legacy appservers to inactive - T368058
12:05 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
12:04 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
12:01 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
11:59 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
11:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:55 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2360.codfw.wmnet
11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2360.codfw.wmnet
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2358.codfw.wmnet
11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2358.codfw.wmnet
11:52 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2339.codfw.wmnet
11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2339.codfw.wmnet
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1406.eqiad.wmnet
11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1406.eqiad.wmnet
11:51 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1403.eqiad.wmnet
11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1403.eqiad.wmnet
11:51 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
11:50 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
11:49 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
11:49 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
11:46 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
11:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
11:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
11:44 moritzm: installing php8.2 security updates
11:42 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
11:26 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:13 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:13 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
11:01 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
10:58 btullis@cumin1002: START - Cookbook sre.dns.netbox
10:51 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
10:50 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
10:41 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=videoscaler
10:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1420.eqiad.wmnet
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1420.eqiad.wmnet
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1407.eqiad.wmnet
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1407.eqiad.wmnet
10:39 claime: pooling mw1420.eqiad.wmnet,mw1407.eqiad.wmnet as videoscalers - T368058
10:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1407.eqiad.wmnet with OS buster
10:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1420.eqiad.wmnet with OS buster
10:14 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
10:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
09:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
09:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
09:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
09:45 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
09:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
09:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1420.eqiad.wmnet with OS buster
09:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
09:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
09:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1407.eqiad.wmnet with OS buster
09:39 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
09:34 claime: Reimaging scap::proxies, mediawiki deployments may be unavailable - T368058
09:33 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
09:33 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
09:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:20 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for checker.tools.wmflabs.org
09:20 taavi@cumin1002: START - Cookbook sre.hosts.remove-downtime for checker.tools.wmflabs.org
09:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
09:17 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
09:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65388 and previous config saved to /var/cache/conftool/dbconfig/20240624-050309-marostegui.json
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65387 and previous config saved to /var/cache/conftool/dbconfig/20240624-044802-marostegui.json
04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65386 and previous config saved to /var/cache/conftool/dbconfig/20240624-043254-marostegui.json
04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65385 and previous config saved to /var/cache/conftool/dbconfig/20240624-041747-marostegui.json
01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65384 and previous config saved to /var/cache/conftool/dbconfig/20240624-015859-marostegui.json
01:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
01:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65383 and previous config saved to /var/cache/conftool/dbconfig/20240624-015836-marostegui.json
01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65382 and previous config saved to /var/cache/conftool/dbconfig/20240624-014329-marostegui.json
01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65381 and previous config saved to /var/cache/conftool/dbconfig/20240624-012822-marostegui.json
01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65380 and previous config saved to /var/cache/conftool/dbconfig/20240624-011315-marostegui.json

2024-06-23

22:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65379 and previous config saved to /var/cache/conftool/dbconfig/20240623-225008-marostegui.json
22:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
22:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
22:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65378 and previous config saved to /var/cache/conftool/dbconfig/20240623-224946-marostegui.json
22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65377 and previous config saved to /var/cache/conftool/dbconfig/20240623-223439-marostegui.json
22:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65376 and previous config saved to /var/cache/conftool/dbconfig/20240623-221932-marostegui.json
22:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65375 and previous config saved to /var/cache/conftool/dbconfig/20240623-220426-marostegui.json
19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65374 and previous config saved to /var/cache/conftool/dbconfig/20240623-193306-marostegui.json
19:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
19:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
19:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65373 and previous config saved to /var/cache/conftool/dbconfig/20240623-193244-marostegui.json
19:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65372 and previous config saved to /var/cache/conftool/dbconfig/20240623-191737-marostegui.json
19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65371 and previous config saved to /var/cache/conftool/dbconfig/20240623-190230-marostegui.json
18:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65370 and previous config saved to /var/cache/conftool/dbconfig/20240623-184722-marostegui.json
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65369 and previous config saved to /var/cache/conftool/dbconfig/20240623-161243-marostegui.json
16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65368 and previous config saved to /var/cache/conftool/dbconfig/20240623-161221-marostegui.json
15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65367 and previous config saved to /var/cache/conftool/dbconfig/20240623-155714-marostegui.json
15:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65366 and previous config saved to /var/cache/conftool/dbconfig/20240623-154207-marostegui.json
15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65365 and previous config saved to /var/cache/conftool/dbconfig/20240623-152700-marostegui.json
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65364 and previous config saved to /var/cache/conftool/dbconfig/20240623-124522-marostegui.json
12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65363 and previous config saved to /var/cache/conftool/dbconfig/20240623-124459-marostegui.json
12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65362 and previous config saved to /var/cache/conftool/dbconfig/20240623-122952-marostegui.json
12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65361 and previous config saved to /var/cache/conftool/dbconfig/20240623-121445-marostegui.json
11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65360 and previous config saved to /var/cache/conftool/dbconfig/20240623-115938-marostegui.json
11:06 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
11:06 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65359 and previous config saved to /var/cache/conftool/dbconfig/20240623-092833-marostegui.json
09:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
09:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65358 and previous config saved to /var/cache/conftool/dbconfig/20240623-092811-marostegui.json
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65357 and previous config saved to /var/cache/conftool/dbconfig/20240623-091304-marostegui.json
08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65356 and previous config saved to /var/cache/conftool/dbconfig/20240623-085757-marostegui.json
08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65355 and previous config saved to /var/cache/conftool/dbconfig/20240623-084250-marostegui.json
06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65354 and previous config saved to /var/cache/conftool/dbconfig/20240623-060520-marostegui.json
06:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
06:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
06:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
06:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-22

20:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
20:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
16:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
16:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65353 and previous config saved to /var/cache/conftool/dbconfig/20240622-161841-marostegui.json
16:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65352 and previous config saved to /var/cache/conftool/dbconfig/20240622-160333-marostegui.json
15:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65351 and previous config saved to /var/cache/conftool/dbconfig/20240622-154826-marostegui.json
15:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65350 and previous config saved to /var/cache/conftool/dbconfig/20240622-153318-marostegui.json
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65349 and previous config saved to /var/cache/conftool/dbconfig/20240622-120437-marostegui.json
12:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
12:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65348 and previous config saved to /var/cache/conftool/dbconfig/20240622-120404-marostegui.json
11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65347 and previous config saved to /var/cache/conftool/dbconfig/20240622-114857-marostegui.json
11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65346 and previous config saved to /var/cache/conftool/dbconfig/20240622-113350-marostegui.json
11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65345 and previous config saved to /var/cache/conftool/dbconfig/20240622-111842-marostegui.json
06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65344 and previous config saved to /var/cache/conftool/dbconfig/20240622-064802-marostegui.json
06:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65343 and previous config saved to /var/cache/conftool/dbconfig/20240622-064739-marostegui.json
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65342 and previous config saved to /var/cache/conftool/dbconfig/20240622-063232-marostegui.json
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65341 and previous config saved to /var/cache/conftool/dbconfig/20240622-061725-marostegui.json
06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65340 and previous config saved to /var/cache/conftool/dbconfig/20240622-060216-marostegui.json
05:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
05:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65339 and previous config saved to /var/cache/conftool/dbconfig/20240622-015020-marostegui.json
01:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
01:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65338 and previous config saved to /var/cache/conftool/dbconfig/20240622-014958-marostegui.json
01:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65337 and previous config saved to /var/cache/conftool/dbconfig/20240622-013451-marostegui.json
01:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65336 and previous config saved to /var/cache/conftool/dbconfig/20240622-011943-marostegui.json
01:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65335 and previous config saved to /var/cache/conftool/dbconfig/20240622-010436-marostegui.json

2024-06-21

23:54 cwhite: delete remaining 2024.03 log indexes to make room on logstash eqiad and codfw T368180
23:43 brett@puppetmaster1001: dbctl commit (dc=all): 'set db1206 s1 weight to 1 - T368098', diff saved to https://phabricator.wikimedia.org/P65334 and previous config saved to /var/cache/conftool/dbconfig/20240621-234328-brett.json
23:28 brett: # dbctl instance db1206 set-weight 10 --section s1
21:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65333 and previous config saved to /var/cache/conftool/dbconfig/20240621-213503-marostegui.json
21:31 cwhite: restart apache2 on gerrit1003
21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65332 and previous config saved to /var/cache/conftool/dbconfig/20240621-211956-marostegui.json
21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65331 and previous config saved to /var/cache/conftool/dbconfig/20240621-210448-marostegui.json
20:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65330 and previous config saved to /var/cache/conftool/dbconfig/20240621-204941-marostegui.json
20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65329 and previous config saved to /var/cache/conftool/dbconfig/20240621-203659-marostegui.json
20:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
20:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
20:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65328 and previous config saved to /var/cache/conftool/dbconfig/20240621-203636-marostegui.json
20:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65327 and previous config saved to /var/cache/conftool/dbconfig/20240621-202129-marostegui.json
20:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65326 and previous config saved to /var/cache/conftool/dbconfig/20240621-200622-marostegui.json
19:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65325 and previous config saved to /var/cache/conftool/dbconfig/20240621-195115-marostegui.json
19:43 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
16:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:59 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:41 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:41 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:40 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
15:39 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:37 elukey@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
15:37 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65322 and previous config saved to /var/cache/conftool/dbconfig/20240621-152038-marostegui.json
15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65321 and previous config saved to /var/cache/conftool/dbconfig/20240621-152011-marostegui.json
15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65319 and previous config saved to /var/cache/conftool/dbconfig/20240621-150504-marostegui.json
15:01 ejegg: fundraising civicrm upgraded from 8a0b5bea to 13a13f3a
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65318 and previous config saved to /var/cache/conftool/dbconfig/20240621-144957-marostegui.json
14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65317 and previous config saved to /var/cache/conftool/dbconfig/20240621-143450-marostegui.json
14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65314 and previous config saved to /var/cache/conftool/dbconfig/20240621-141050-marostegui.json
14:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
14:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65313 and previous config saved to /var/cache/conftool/dbconfig/20240621-141028-marostegui.json
13:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65312 and previous config saved to /var/cache/conftool/dbconfig/20240621-135521-marostegui.json
13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65309 and previous config saved to /var/cache/conftool/dbconfig/20240621-134013-marostegui.json
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65306 and previous config saved to /var/cache/conftool/dbconfig/20240621-132506-marostegui.json
13:21 btullis@deploy1002: Finished deploy [performance/asoranking@febfb9f]: (no justification provided) (duration: 00m 04s)
13:21 btullis@deploy1002: Started deploy [performance/asoranking@febfb9f]: (no justification provided)
13:08 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
11:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) shellbox-video.discovery.wmnet on all recursors
11:37 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache shellbox-video.discovery.wmnet on all recursors
11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65303 and previous config saved to /var/cache/conftool/dbconfig/20240621-110638-marostegui.json
11:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
11:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
10:57 Emperor: restart swift-proxy on ms-fe2011 ms-fe2012 T360913
10:56 Emperor: restart swift-proxy on ms-fe1010 T360913
10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.codfw.wmnet
10:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65302 and previous config saved to /var/cache/conftool/dbconfig/20240621-100554-marostegui.json
10:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
10:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65301 and previous config saved to /var/cache/conftool/dbconfig/20240621-100531-marostegui.json
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65300 and previous config saved to /var/cache/conftool/dbconfig/20240621-095024-marostegui.json
09:45 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
09:45 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
09:41 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65299 and previous config saved to /var/cache/conftool/dbconfig/20240621-093517-marostegui.json
09:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65298 and previous config saved to /var/cache/conftool/dbconfig/20240621-092009-marostegui.json
09:16 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
09:14 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
09:02 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
08:57 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
08:56 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
08:47 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1053.eqiad.wmnet
08:41 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
08:39 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirt1053.eqiad.wmnet
08:14 vgutierrez: restarting logrotate.service on cp[3068,3070-3071].esams.wmnet
08:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
08:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
08:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
08:03 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
08:00 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
08:00 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
07:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65297 and previous config saved to /var/cache/conftool/dbconfig/20240621-075404-arnaudb.json
07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65296 and previous config saved to /var/cache/conftool/dbconfig/20240621-073858-arnaudb.json
07:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65295 and previous config saved to /var/cache/conftool/dbconfig/20240621-072353-arnaudb.json
07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65294 and previous config saved to /var/cache/conftool/dbconfig/20240621-070847-arnaudb.json
07:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool for debugging T368098', diff saved to https://phabricator.wikimedia.org/P65293 and previous config saved to /var/cache/conftool/dbconfig/20240621-070358-arnaudb.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65292 and previous config saved to /var/cache/conftool/dbconfig/20240621-045107-marostegui.json
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
04:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65291 and previous config saved to /var/cache/conftool/dbconfig/20240621-045044-marostegui.json
04:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65290 and previous config saved to /var/cache/conftool/dbconfig/20240621-044455-marostegui.json
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65289 and previous config saved to /var/cache/conftool/dbconfig/20240621-043537-marostegui.json
04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65288 and previous config saved to /var/cache/conftool/dbconfig/20240621-042948-marostegui.json
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65287 and previous config saved to /var/cache/conftool/dbconfig/20240621-042030-marostegui.json
04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65286 and previous config saved to /var/cache/conftool/dbconfig/20240621-041441-marostegui.json
04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65285 and previous config saved to /var/cache/conftool/dbconfig/20240621-040523-marostegui.json
03:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65284 and previous config saved to /var/cache/conftool/dbconfig/20240621-035934-marostegui.json
03:04 ejegg: fundraising civicrm upgraded from 2e1db811 to 8a0b5bea
01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65283 and previous config saved to /var/cache/conftool/dbconfig/20240621-014545-marostegui.json
01:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65282 and previous config saved to /var/cache/conftool/dbconfig/20240621-014523-marostegui.json
01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65281 and previous config saved to /var/cache/conftool/dbconfig/20240621-013016-marostegui.json
01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65280 and previous config saved to /var/cache/conftool/dbconfig/20240621-011509-marostegui.json
01:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65279 and previous config saved to /var/cache/conftool/dbconfig/20240621-010002-marostegui.json
00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65278 and previous config saved to /var/cache/conftool/dbconfig/20240621-005237-ladsgroup.json
00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65277 and previous config saved to /var/cache/conftool/dbconfig/20240621-003730-ladsgroup.json
00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65276 and previous config saved to /var/cache/conftool/dbconfig/20240621-002223-ladsgroup.json
00:08 mutante: [cp3072:~] $ sudo systemctl start varnishkafka-webrequest.service
00:08 mutante: [cp3067:~] $ sudo systemctl start logrotate
00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65275 and previous config saved to /var/cache/conftool/dbconfig/20240621-000716-ladsgroup.json
00:00 sukhe: restarting haproxy on cp3068 and cp3072

2024-06-20

23:47 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 10m 12s)
23:36 zabe@deploy1002: Started scap: Update interwiki cache
23:35 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=btmwiki --cluster=all 2>&1 | tee /tmp/btmwiki.UpdateSearchIndexConfig.log # T368038
23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65274 and previous config saved to /var/cache/conftool/dbconfig/20240620-233346-marostegui.json
23:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
23:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65273 and previous config saved to /var/cache/conftool/dbconfig/20240620-233324-marostegui.json
23:33 zabe@deploy1002: Finished scap: Creating btmwiki (T368038) (duration: 12m 20s)
23:20 zabe@deploy1002: Started scap: Creating btmwiki (T368038)
23:20 zabe: create Wikipedia Mandailing # T368038
23:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65272 and previous config saved to /var/cache/conftool/dbconfig/20240620-231817-marostegui.json
23:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65271 and previous config saved to /var/cache/conftool/dbconfig/20240620-230310-marostegui.json
22:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65270 and previous config saved to /var/cache/conftool/dbconfig/20240620-224803-marostegui.json
22:39 mutante: aphlict1002/aphlict2001 - systemctl stop aphlict_lograte.timer (and .service); systemctl disable aphlict_logrotate.timer (and .service); systemctl daemon-reload; systemctl reset-failed T367960
22:33 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T361041 T363825 T366649 (duration: 09m 55s)
22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65269 and previous config saved to /var/cache/conftool/dbconfig/20240620-222909-marostegui.json
22:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65268 and previous config saved to /var/cache/conftool/dbconfig/20240620-222847-marostegui.json
22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65267 and previous config saved to /var/cache/conftool/dbconfig/20240620-221340-marostegui.json
21:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65266 and previous config saved to /var/cache/conftool/dbconfig/20240620-215833-marostegui.json
21:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65265 and previous config saved to /var/cache/conftool/dbconfig/20240620-214326-marostegui.json
21:12 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
21:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
21:11 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
21:10 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
21:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
21:09 brett: Include ncmonitor 1.0.0 in wikimedia-bookworm apt repo
21:09 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:08 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:08 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
21:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
21:07 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
21:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
21:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
21:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
21:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
21:03 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
20:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
20:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
20:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
20:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
20:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
20:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
20:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
20:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
20:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
20:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
20:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
20:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
20:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
20:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
20:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
20:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
19:58 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
19:58 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
19:56 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
19:55 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:54 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
19:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
19:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
19:18 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1105* for T348977 - bking@cumin2002
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105* for T348977 - bking@cumin2002
19:18 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1105 for T348977 - bking@cumin2002
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105 for T348977 - bking@cumin2002
19:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2088.codfw.wmnet
19:01 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
18:58 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
18:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65263 and previous config saved to /var/cache/conftool/dbconfig/20240620-181635-marostegui.json
18:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65262 and previous config saved to /var/cache/conftool/dbconfig/20240620-181613-marostegui.json
18:06 inflatador: bking@an-airflow1007 install `ripgrep` deb pkg
18:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65261 and previous config saved to /var/cache/conftool/dbconfig/20240620-180104-marostegui.json
17:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
17:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65260 and previous config saved to /var/cache/conftool/dbconfig/20240620-174557-marostegui.json
17:44 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2088.codfw.wmnet
17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65259 and previous config saved to /var/cache/conftool/dbconfig/20240620-174125-ladsgroup.json
17:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65258 and previous config saved to /var/cache/conftool/dbconfig/20240620-173050-marostegui.json
17:30 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
17:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
17:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
17:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
16:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65256 and previous config saved to /var/cache/conftool/dbconfig/20240620-163348-arnaudb.json
16:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65254 and previous config saved to /var/cache/conftool/dbconfig/20240620-161842-arnaudb.json
16:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1047955Fix Special:Notifications (T368029) (duration: 12m 21s)
16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Continuing with sync
16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Backport for gerrit:1047955Fix Special:Notifications (T368029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
16:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
16:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
16:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1047955Fix Special:Notifications (T368029)
16:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65253 and previous config saved to /var/cache/conftool/dbconfig/20240620-160337-arnaudb.json
16:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
16:01 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
16:00 claime: Repooling and uncordoning mw2282.codfw.wmnet following move - T361856
15:59 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
15:59 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:58 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:57 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2019.codfw.wmnet|wikikube-worker2020.codfw.wmnet|wikikube-worker2021.codfw.wmnet|wikikube-worker2022.codfw.wmnet|wikikube-worker2023.codfw.wmnet|wikikube-worker2024.codfw.wmnet),cluster=kubernetes,service=kubesvc
15:57 claime: Pooling and uncordoning wikikube-worker2019.codfw.wmnet,wikikube-worker2020.codfw.wmnet,wikikube-worker2021.codfw.wmnet,wikikube-worker2022.codfw.wmnet,wikikube-worker2023.codfw.wmnet,wikikube-worker2024.codfw.wmnet - T351074
15:55 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
15:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
15:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65252 and previous config saved to /var/cache/conftool/dbconfig/20240620-154831-arnaudb.json
15:46 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
15:46 claime: homer 'cr*codfw*' commit 'T351074'
15:45 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl2002.codfw.wmnet
15:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2019.codfw.wmnet with OS bullseye
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2020.codfw.wmnet with OS bullseye
15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2022.codfw.wmnet with OS bullseye
15:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 5%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65251 and previous config saved to /var/cache/conftool/dbconfig/20240620-153326-arnaudb.json
15:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2024.codfw.wmnet with OS bullseye
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2023.codfw.wmnet with OS bullseye
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2405.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2405.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2404.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2404.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2403.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2403.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2400.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2400.codfw.wmnet
15:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2021.codfw.wmnet with OS bullseye
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
15:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 2%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65249 and previous config saved to /var/cache/conftool/dbconfig/20240620-151820-arnaudb.json
15:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
15:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
15:06 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:04 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
15:02 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 04m 15s)
15:01 topranks: rebooting lsw1-e6-eqiad to upgrade JunOS on switch T365987
15:01 jhathaway@deploy1002: Started scap: (no justification provided)
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lsw1-f6-eqiad.mgmt
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lsw1-f6-eqiad.mgmt
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:56 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1018.eqiad.wmnet
14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1018.eqiad.wmnet
14:54 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:53 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 6 hosts
14:48 sukhe: homer "*" commit "rolling out NTP ACL change"
14:48 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2024.codfw.wmnet with OS bullseye
14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65248 and previous config saved to /var/cache/conftool/dbconfig/20240620-144750-arnaudb.json
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2023.codfw.wmnet with OS bullseye
14:47 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367511
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2022.codfw.wmnet with OS bullseye
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2021.codfw.wmnet with OS bullseye
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2020.codfw.wmnet with OS bullseye
14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2364 to wikikube-worker2024
14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2024
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2019.codfw.wmnet with OS bullseye
14:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2024
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65247 and previous config saved to /var/cache/conftool/dbconfig/20240620-144423-marostegui.json
14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:43 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65246 and previous config saved to /var/cache/conftool/dbconfig/20240620-144341-marostegui.json
14:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2364 to wikikube-worker2024
14:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
14:39 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2363 to wikikube-worker2023
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2023
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
14:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2023
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
14:37 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2324.codfw.wmnet
14:37 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2324.codfw.wmnet
14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2323.codfw.wmnet
14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2323.codfw.wmnet
14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1489.eqiad.wmnet
14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1489.eqiad.wmnet
14:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:35 sukhe: running authdns-update for CR 1047074
14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
14:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65245 and previous config saved to /var/cache/conftool/dbconfig/20240620-143244-arnaudb.json
14:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:32 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2363 to wikikube-worker2023
14:31 moritzm: imported python-pymysql 1.0.2-2~wmf11u2 to apt.wikimedia.org (merge of the security fix from DSA 5700 on top of our internal backport)
14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 depool ahead of T365987', diff saved to https://phabricator.wikimedia.org/P65244 and previous config saved to /var/cache/conftool/dbconfig/20240620-143109-arnaudb.json
14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
14:30 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2362 to wikikube-worker2022
14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2022
14:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:28 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2022
14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65243 and previous config saved to /var/cache/conftool/dbconfig/20240620-142834-marostegui.json
14:27 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
14:27 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:26 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
14:25 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:25 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:25 elukey@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
14:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2362 to wikikube-worker2022
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:24 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2360 to wikikube-worker2021
14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2021
14:21 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2021
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
14:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65242 and previous config saved to /var/cache/conftool/dbconfig/20240620-141739-arnaudb.json
14:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: IPIP migration
14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: IPIP migration
14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2360 to wikikube-worker2021
14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2358 to wikikube-worker2020
14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2020
14:15 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2020
14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65241 and previous config saved to /var/cache/conftool/dbconfig/20240620-141328-marostegui.json
14:13 elukey@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
14:13 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
14:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2358 to wikikube-worker2020
14:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
14:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P65240 and previous config saved to /var/cache/conftool/dbconfig/20240620-141010-root.json
14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2339 to wikikube-worker2019
14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2019
14:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2019
14:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
14:07 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
14:07 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2339 to wikikube-worker2019
14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65239 and previous config saved to /var/cache/conftool/dbconfig/20240620-140233-arnaudb.json
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65238 and previous config saved to /var/cache/conftool/dbconfig/20240620-135820-marostegui.json
13:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
13:56 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
13:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65237 and previous config saved to /var/cache/conftool/dbconfig/20240620-135610-marostegui.json
13:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
13:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65236 and previous config saved to /var/cache/conftool/dbconfig/20240620-135559-marostegui.json
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
13:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
13:54 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
13:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P65235 and previous config saved to /var/cache/conftool/dbconfig/20240620-135438-root.json
13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
13:53 claime: Depooling mw2339.codfw.wmnet,mw2358.codfw.wmnet,mw2360.codfw.wmnet,mw2362.codfw.wmnet,mw2363.codfw.wmnet,mw2364.codfw.wmnet for reimage to k8s - T351074
13:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
13:52 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
13:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:51 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
13:50 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65234 and previous config saved to /var/cache/conftool/dbconfig/20240620-134728-arnaudb.json
13:46 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65233 and previous config saved to /var/cache/conftool/dbconfig/20240620-134052-marostegui.json
13:39 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P65232 and previous config saved to /var/cache/conftool/dbconfig/20240620-133907-root.json
13:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
13:32 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
13:28 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
13:28 hashar@deploy1002: Finished deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2 (duration: 00m 06s)
13:28 hashar@deploy1002: Started deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2
13:27 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
13:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65231 and previous config saved to /var/cache/conftool/dbconfig/20240620-132545-marostegui.json
13:24 reedy@deploy1002: Synchronized wmf-config/: T368003 (duration: 10m 39s)
13:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:23 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P65230 and previous config saved to /var/cache/conftool/dbconfig/20240620-132335-root.json
13:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:22 elukey: upload dragonfly packages 1.0.6-2 to bookworm-wikimedia - T365253
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65228 and previous config saved to /var/cache/conftool/dbconfig/20240620-131038-marostegui.json
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65227 and previous config saved to /var/cache/conftool/dbconfig/20240620-131031-marostegui.json
13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65226 and previous config saved to /var/cache/conftool/dbconfig/20240620-130928-marostegui.json
13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P65225 and previous config saved to /var/cache/conftool/dbconfig/20240620-130804-root.json
13:07 sukhe: running homer on cr*{eqiad,codfw}* for CR 1046737: update policies/cr-labs.yaml for new NTP servers: T366360
13:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
13:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2003.codfw.wmnet
13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
13:00 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2003.codfw.wmnet
12:54 sukhe: sudo cumin -b1 -s30 "A:installserver" "run-puppet-agent": T366360
12:51 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: 1', diff saved to https://phabricator.wikimedia.org/P65223 and previous config saved to /var/cache/conftool/dbconfig/20240620-125139-root.json
12:51 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
12:44 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
12:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
12:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
12:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
12:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
12:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
12:04 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
11:52 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
11:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
11:47 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
11:41 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
11:38 XioNoX: merge netbox-extra CR1038869 - Fix lots of CI errors
11:33 jgiannelos@deploy1002: Finished deploy [restbase/deploy@f867c66]: (no justification provided) (duration: 30m 12s)
11:27 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
11:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
11:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
11:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
11:21 akosiaris: upgrade mathoid to 2024-06-18-233457-production T349118
11:20 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
11:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
11:03 jgiannelos@deploy1002: Started deploy [restbase/deploy@f867c66]: (no justification provided)
10:57 dreamyjazz@deploy1002: Finished scap: Backport for [[gerrit:1047942|[testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)]] (duration: 15m 03s)
10:48 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
10:44 dreamyjazz@deploy1002: dreamyjazz: Backport for [[gerrit:1047942|[testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:42 dreamyjazz@deploy1002: Started scap: Backport for [[gerrit:1047942|[testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)]]
10:41 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
10:37 dreamyjazz@deploy1002: Finished scap: Backport for [[gerrit:1047931|[testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)]] (duration: 13m 49s)
10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
10:31 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
10:31 claime: repooling and uncordoning mw2321.codfw.wmnet - T367862
10:31 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2321.codfw.wmnet
10:30 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2321.codfw.wmnet
10:28 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
10:25 dreamyjazz@deploy1002: dreamyjazz: Backport for [[gerrit:1047931|[testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:24 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:23 dreamyjazz@deploy1002: Started scap: Backport for [[gerrit:1047931|[testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)]]
10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
10:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:20 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
10:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
10:18 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:17 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
10:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
10:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:15 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
10:14 claime: Draining and depooling mw2321.codfw.wmnet to test 1047031 - T367862
10:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
10:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
10:04 claime: Running puppet on A:wikikube-worker
10:02 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
10:01 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
10:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
09:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
09:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
09:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
09:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
09:47 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
09:45 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:16 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php sysop_plwiki AramilFeraxa REDACTED --bureaucrat --sysop # T361041
08:57 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
08:50 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:49 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
08:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
08:33 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:23 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.10 refs T361404
08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
08:08 moritzm: reboot of irc1001 to nudge clients to re-connect to the new bullseye host T331702
08:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
08:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
07:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
07:53 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
07:53 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
07:04 moritzm: failover irc.wikimedia.org to the new Bullseye servers T331702
06:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
06:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
05:27 marostegui: Deploy schema change on old s7 eqiad master dbmaint (db1236) T364299
05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1236 T367857', diff saved to https://phabricator.wikimedia.org/P65220 and previous config saved to /var/cache/conftool/dbconfig/20240620-052359-root.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T367857', diff saved to https://phabricator.wikimedia.org/P65219 and previous config saved to /var/cache/conftool/dbconfig/20240620-052253-marostegui.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T367857', diff saved to https://phabricator.wikimedia.org/P65218 and previous config saved to /var/cache/conftool/dbconfig/20240620-052230-marostegui.json
05:22 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T367857
05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T367857', diff saved to https://phabricator.wikimedia.org/P65217 and previous config saved to /var/cache/conftool/dbconfig/20240620-050428-marostegui.json
05:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
02:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367856)', diff saved to https://phabricator.wikimedia.org/P65216 and previous config saved to /var/cache/conftool/dbconfig/20240620-022416-marostegui.json
02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65215 and previous config saved to /var/cache/conftool/dbconfig/20240620-022349-marostegui.json
02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65214 and previous config saved to /var/cache/conftool/dbconfig/20240620-020842-marostegui.json
01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65213 and previous config saved to /var/cache/conftool/dbconfig/20240620-015335-marostegui.json
01:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65212 and previous config saved to /var/cache/conftool/dbconfig/20240620-013827-marostegui.json

2024-06-19

23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php arbcom_itwiki Superpes15 REDACTED --bureaucrat --sysop
23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php u4cwiki Superpes15 REDACTED --bureaucrat --sysop
21:08 oblivian@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
21:08 oblivian@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
20:33 zabe@deploy1002: Finished scap: Backport for [[gerrit:1047562|[tlywiki] Change the logo and wordmark/tagline (T366431)]] (duration: 14m 41s)
20:24 zabe@deploy1002: superpes, zabe: Continuing with sync
20:23 zabe@deploy1002: superpes, zabe: Backport for [[gerrit:1047562|[tlywiki] Change the logo and wordmark/tagline (T366431)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 zabe@deploy1002: Started scap: Backport for [[gerrit:1047562|[tlywiki] Change the logo and wordmark/tagline (T366431)]]
19:08 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
19:05 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
18:54 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
18:51 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
18:49 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:48 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:40 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:35 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65211 and previous config saved to /var/cache/conftool/dbconfig/20240619-182922-marostegui.json
18:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65210 and previous config saved to /var/cache/conftool/dbconfig/20240619-182900-marostegui.json
18:21 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65209 and previous config saved to /var/cache/conftool/dbconfig/20240619-181353-marostegui.json
17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65208 and previous config saved to /var/cache/conftool/dbconfig/20240619-175846-marostegui.json
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65207 and previous config saved to /var/cache/conftool/dbconfig/20240619-174338-marostegui.json
17:21 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:21 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
17:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
17:13 kamila@cumin1002: START - Cookbook sre.dns.netbox
17:05 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1044.eqiad.wmnet with OS bookworm
17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2001
17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2001
17:00 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:00 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
16:59 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
16:42 sukhe: sudo cumin 'A:durum' 'run-puppet-agent' to switch timesyncd NTP pools to ntp-[abc].anycast.wmnet: T366360
16:27 claime: pooling and uncordoning mw2321.codfw.wmnet - T367702
16:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
16:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: service=(ntp-a|ntp-b|ntp-c)
16:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
16:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
16:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
16:03 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:55 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:51 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:50 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:46 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:46 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
15:45 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
15:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
15:32 taavi@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1042
15:32 taavi@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1042
15:24 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:24 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:22 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:16 sukhe: sudo cumin -b1 -s120 'A:dnsbox' 'run-puppet-agent --enable "merging CR 1046685"': T366360
15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
15:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
15:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org,service=ntp-c
15:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:01 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
15:01 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
15:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
14:59 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.remove-downtime (exit_code=97) for wikikube-worker2003.codfw.wmnet
14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
14:42 marostegui: Deploy schema change on s2 eqiad master dbmaint T364069
14:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
14:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
14:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bookworm
14:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
14:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
14:36 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
14:35 moritzm: installing nano security updates
14:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
14:24 moritzm: installing libvpx security updates
14:23 moritzm: installing pymysql security updates
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
14:19 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
14:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
14:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
14:12 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
14:11 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
14:10 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
14:09 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:09 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
14:09 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:08 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
14:08 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:07 taavi@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:07 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
13:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
13:54 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
13:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:53 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:51 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:50 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:49 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:48 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:42 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:41 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
13:41 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
13:35 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
13:35 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
13:32 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
13:32 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
13:32 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=ntp-a
13:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
13:31 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
13:28 sukhe: enable puppet on dns6001 to test CR 1046685
13:23 sukhe: sudo cumin 'A:dnsbox' 'disable-puppet "merging CR 1046685"': T366360
13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:21 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mw2282.codfw.wmnet with reason: host move
13:21 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on mw2282.codfw.wmnet with reason: host move
13:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
13:17 kamila_: drained mw2282.codfw.wmnet for T361856
13:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
13:06 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
13:04 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: service=ntp-[abc]
13:04 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
12:52 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:51 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2011.codfw.wmnet|wikikube-worker2012.codfw.wmnet|wikikube-worker2013.codfw.wmnet|wikikube-worker2014.codfw.wmnet|wikikube-worker2017.codfw.wmnet|wikikube-worker2018.codfw.wmnet),cluster=kubernetes,service=kubesvc
12:40 claime: homer 'cr*codfw*' commit 'T351074'
12:38 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:38 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:37 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:37 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:35 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
12:34 klausman: Puppet management of install2004 restored, lpxelinux.0 also restored.
12:24 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:22 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:21 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:20 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:19 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:17 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
12:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
12:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
12:13 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
12:12 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
12:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
12:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
12:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
12:08 klausman: Will test-replace the PXE chainloader (/srv/tftpboot/lpxelinux.0) on install2003 with a newer version to see if it fixes the ldlinux.c32 error. Puppet will be disabled on that machine for the duration.
12:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
12:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
12:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:03 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65204 and previous config saved to /var/cache/conftool/dbconfig/20240619-120142-root.json
12:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
12:01 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
12:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
12:00 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
11:57 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
11:57 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
11:50 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65203 and previous config saved to /var/cache/conftool/dbconfig/20240619-114636-root.json
11:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65201 and previous config saved to /var/cache/conftool/dbconfig/20240619-113131-root.json
11:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:18 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netbox-dev2003.codfw.wmnet
11:18 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netbox-dev2003.codfw.wmnet with OS bookworm
11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65200 and previous config saved to /var/cache/conftool/dbconfig/20240619-111625-root.json
11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
11:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
11:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:08 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
11:07 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:07 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:06 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
11:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65199 and previous config saved to /var/cache/conftool/dbconfig/20240619-110120-root.json
10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65198 and previous config saved to /var/cache/conftool/dbconfig/20240619-104614-root.json
10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
10:40 jmm@deploy1002: Finished scap: (no justification provided) (duration: 04m 03s)
10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
10:36 jmm@deploy1002: Started scap: (no justification provided)
10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65197 and previous config saved to /var/cache/conftool/dbconfig/20240619-103109-root.json
10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65196 and previous config saved to /var/cache/conftool/dbconfig/20240619-102504-marostegui.json
10:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2409 to wikikube-worker2018
10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
10:22 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
10:18 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:18 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2409 to wikikube-worker2018
10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2408 to wikikube-worker2017
10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
10:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
10:16 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65195 and previous config saved to /var/cache/conftool/dbconfig/20240619-101625-marostegui.json
10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2408 to wikikube-worker2017
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2405 to wikikube-worker2014
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
10:12 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
10:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
10:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2405 to wikikube-worker2014
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2404 to wikikube-worker2013
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
10:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
10:05 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
10:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65194 and previous config saved to /var/cache/conftool/dbconfig/20240619-100118-marostegui.json
10:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
10:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2404 to wikikube-worker2013
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2403 to wikikube-worker2012
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
09:55 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
09:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
09:51 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
09:47 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
09:47 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2403 to wikikube-worker2012
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2400 to wikikube-worker2011
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
09:46 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
09:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2400 to wikikube-worker2011
09:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin
09:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:22 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
09:15 claime: Depooling mw2400.codfw.wmnet,mw2403.codfw.wmnet,mw2404.codfw.wmnet,mw2405.codfw.wmnet,mw2408.codfw.wmnet,mw2409.codfw.wmnet for reimage - T351074
09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
09:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bookworm
09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5025.*} and A:cp
08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bookworm
08:58 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5025.*} and A:cp
08:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
08:54 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
08:52 fabfur: upgrading eqsin cp hosts to haproxy 2.8.10 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047436) (T367756)
08:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
08:48 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
08:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15830
08:38 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
08:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bookworm
08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
08:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1001.eqiad.wmnet with OS bookworm
08:18 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.10 refs T361404
08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
08:11 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
08:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:03 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:01 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox-dev2003.codfw.wmnet with OS bookworm
08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox-dev2003.codfw.wmnet on all recursors
07:59 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox-dev2003.codfw.wmnet on all recursors
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:57 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:54 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:54 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox-dev2003.codfw.wmnet
07:48 kartik@deploy1002: Finished scap: Backport for gerrit:1047382igwiki: Enable MinT for Wikipedia readers (T363464) (duration: 18m 55s)
07:38 kartik@deploy1002: kartik: Continuing with sync
07:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
07:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
07:33 kartik@deploy1002: kartik: Backport for gerrit:1047382igwiki: Enable MinT for Wikipedia readers (T363464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:29 kartik@deploy1002: Started scap: Backport for gerrit:1047382igwiki: Enable MinT for Wikipedia readers (T363464)
07:22 kartik@deploy1002: Finished scap: Backport for gerrit:1047014testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) (duration: 20m 12s)
07:20 marostegui: Deploy schema change on old s7 eqiad master db1160 dbmaint T364069
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65192 and previous config saved to /var/cache/conftool/dbconfig/20240619-071516-root.json
07:12 kartik@deploy1002: kartik: Continuing with sync
07:07 kartik@deploy1002: kartik: Backport for gerrit:1047014testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:02 kartik@deploy1002: Started scap: Backport for gerrit:1047014testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852)
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65191 and previous config saved to /var/cache/conftool/dbconfig/20240619-070010-root.json
06:52 jynus: stop db1240:s1, wipe and reimport db1240:s3 T367162
06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65190 and previous config saved to /var/cache/conftool/dbconfig/20240619-064505-root.json
06:40 XioNoX: merge Puppet "Prepare for netbox-dev" CR1047081
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65189 and previous config saved to /var/cache/conftool/dbconfig/20240619-063337-root.json
06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65188 and previous config saved to /var/cache/conftool/dbconfig/20240619-062959-root.json
06:21 _joe_: upgrading conftool everywhere T367919
06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65187 and previous config saved to /var/cache/conftool/dbconfig/20240619-061831-root.json
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P65186 and previous config saved to /var/cache/conftool/dbconfig/20240619-061721-root.json
06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65185 and previous config saved to /var/cache/conftool/dbconfig/20240619-061454-root.json
06:08 _joe_: uploaded newer python-conftool packages T367919
06:05 _joe_: deleting manually thirdparty/conda repositories from reprepro T364550
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65184 and previous config saved to /var/cache/conftool/dbconfig/20240619-060326-root.json
06:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P65183 and previous config saved to /var/cache/conftool/dbconfig/20240619-060216-root.json
05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65182 and previous config saved to /var/cache/conftool/dbconfig/20240619-055948-root.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65181 and previous config saved to /var/cache/conftool/dbconfig/20240619-054820-root.json
05:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P65180 and previous config saved to /var/cache/conftool/dbconfig/20240619-054710-root.json
05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65179 and previous config saved to /var/cache/conftool/dbconfig/20240619-054443-root.json
05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65178 and previous config saved to /var/cache/conftool/dbconfig/20240619-054259-root.json
05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65177 and previous config saved to /var/cache/conftool/dbconfig/20240619-054214-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65176 and previous config saved to /var/cache/conftool/dbconfig/20240619-053315-root.json
05:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P65175 and previous config saved to /var/cache/conftool/dbconfig/20240619-053205-root.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65174 and previous config saved to /var/cache/conftool/dbconfig/20240619-052754-root.json
05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65173 and previous config saved to /var/cache/conftool/dbconfig/20240619-051809-root.json
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P65172 and previous config saved to /var/cache/conftool/dbconfig/20240619-051659-root.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65171 and previous config saved to /var/cache/conftool/dbconfig/20240619-051248-root.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P65170 and previous config saved to /var/cache/conftool/dbconfig/20240619-051233-root.json
05:10 marostegui@cumin1002: dbctl commit (dc=all): 'repool db1169', diff saved to https://phabricator.wikimedia.org/P65169 and previous config saved to /var/cache/conftool/dbconfig/20240619-051014-marostegui.json
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'test depool db1169', diff saved to https://phabricator.wikimedia.org/P65168 and previous config saved to /var/cache/conftool/dbconfig/20240619-050951-marostegui.json

2024-06-18

23:22 jforrester@deploy1002: Finished scap: Backport for gerrit:1047077Use isEnumType in selector and isCustomEnum for creating literals (T367159), gerrit:1047188findAddedContentNeedingReference was removed accidentally (T367920) (duration: 17m 16s)
23:12 jforrester@deploy1002: jforrester, kemayo: Continuing with sync
23:10 jforrester@deploy1002: jforrester, kemayo: Backport for gerrit:1047077Use isEnumType in selector and isCustomEnum for creating literals (T367159), gerrit:1047188findAddedContentNeedingReference was removed accidentally (T367920) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:05 jforrester@deploy1002: Started scap: Backport for gerrit:1047077Use isEnumType in selector and isCustomEnum for creating literals (T367159), gerrit:1047188findAddedContentNeedingReference was removed accidentally (T367920)
22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:20 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:26 jdrewniak@deploy1002: Finished scap: Backport for gerrit:1046790Improve responsive images and avoid for inline (T367463), gerrit:1047155Fix codex link styles overriding other link styles (T367844) (duration: 16m 33s)
21:16 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
21:14 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for gerrit:1046790Improve responsive images and avoid for inline (T367463), gerrit:1047155Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 jdrewniak@deploy1002: Started scap: Backport for gerrit:1046790Improve responsive images and avoid for inline (T367463), gerrit:1047155Fix codex link styles overriding other link styles (T367844)
21:07 jdrewniak@deploy1002: Sync cancelled.
21:07 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for gerrit:1046790Improve responsive images and avoid for inline (T367463), gerrit:1047155Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:03 jdrewniak@deploy1002: Started scap: Backport for gerrit:1046790Improve responsive images and avoid for inline (T367463), gerrit:1047155Fix codex link styles overriding other link styles (T367844)
20:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
20:50 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
20:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
20:49 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:47 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
20:47 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:33 urbanecm@deploy1002: Finished scap: Backport for gerrit:1047099cswiki: adding throttle rule, removing old throttle rule (T367858), gerrit:1047125Deploy references edit check to phase 1 wikis (T361843), gerrit:1047131Turn on Visual Editor collab beta feature on officewiki (duration: 18m 59s)
20:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:22 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Continuing with sync
20:18 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Backport for gerrit:1047099cswiki: adding throttle rule, removing old throttle rule (T367858), gerrit:1047125Deploy references edit check to phase 1 wikis (T361843), gerrit:1047131Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:14 urbanecm@deploy1002: Started scap: Backport for gerrit:1047099cswiki: adding throttle rule, removing old throttle rule (T367858), gerrit:1047125Deploy references edit check to phase 1 wikis (T361843), gerrit:1047131Turn on Visual Editor collab beta feature on officewiki
20:10 urbanecm@deploy1002: Sync cancelled.
20:10 urbanecm@deploy1002: urbanecm, superzerocool, kemayo: Backport for gerrit:1047099cswiki: adding throttle rule, removing old throttle rule (T367858), gerrit:1047125Deploy references edit check to phase 1 wikis (T361843), gerrit:1047131Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:06 urbanecm@deploy1002: Started scap: Backport for gerrit:1047099cswiki: adding throttle rule, removing old throttle rule (T367858), gerrit:1047125Deploy references edit check to phase 1 wikis (T361843), gerrit:1047131Turn on Visual Editor collab beta feature on officewiki
19:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:17 mutante: lists1001 - systemctl reset-failed - clean up systemd state due to units not found anymore after migration - disable puppet and then deploy gerrit:1047160 on lists to fix invalid unit name - T331706
18:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
18:44 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in esams for T365123
18:39 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in eqsin for T365123
18:33 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in drmrs for T365123
18:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
18:27 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in magru for T365123
18:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
18:17 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in ulsfo for T365123
18:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
18:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
17:37 swfrench-wmf: updated conftool to 3.0.0 on bullseye hosts in eqiad for T365123
17:35 swfrench-wmf: updated conftool to 3.0.0 on bookworm hosts in eqiad for T365123
17:34 swfrench-wmf: updated conftool to 3.0.0 on buster hosts in eqiad for T365123
17:21 cdanis: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323 T367894
17:16 swfrench-wmf: updated conftool to 3.0.0 on remaining bullseye hosts in codfw for T365123
17:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
17:14 swfrench-wmf: updated conftool to 3.0.0 on remaining bookworm hosts in codfw for T365123
17:12 swfrench-wmf: updated conftool to 3.0.0 on remaining buster hosts in codfw for T365123
16:42 swfrench-wmf: conftool on puppetmaster2001 updated to 3.0.0 for T365123
16:39 swfrench-wmf: validated dbctl 3.0.0 on cumin2002 (noop edit to note: on parsercache spare pc2014) for T365123
16:39 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
16:34 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
16:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
16:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
16:29 swfrench-wmf: conftool on cumin2002 updated to 3.0.0 for T365123
16:23 claime: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323
16:23 swfrench-wmf: depooled / pooled mw2441.codfw.wmnet to smoke-test python3-conftool for T365123
16:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65167 and previous config saved to /var/cache/conftool/dbconfig/20240618-162053-arnaudb.json
16:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65166 and previous config saved to /var/cache/conftool/dbconfig/20240618-160548-arnaudb.json
16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
15:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
15:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65165 and previous config saved to /var/cache/conftool/dbconfig/20240618-155042-arnaudb.json
15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65164 and previous config saved to /var/cache/conftool/dbconfig/20240618-155000-marostegui.json
15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65163 and previous config saved to /var/cache/conftool/dbconfig/20240618-154938-marostegui.json
15:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2003
15:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2003
15:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:47 swfrench-wmf: included conftool 3.0.0 into buster/bullseye/bookworm-wikimedia on apt.w.o for T365123
15:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:46 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:45 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:44 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.*} and A:cp
15:43 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:42 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.*} and A:cp
15:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5030.*} and A:cp
15:39 fabfur: upgrade haproxy to v2.8.10 on cp5030,cp5032 (T367756)
15:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5030.*} and A:cp
15:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp3066.*} and A:cp
15:36 fabfur: upgrade haproxy to v2.8.10 on cp3066 (T367756)
15:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp3066.*} and A:cp
15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65162 and previous config saved to /var/cache/conftool/dbconfig/20240618-153537-arnaudb.json
15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65161 and previous config saved to /var/cache/conftool/dbconfig/20240618-153430-marostegui.json
15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
15:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
15:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65159 and previous config saved to /var/cache/conftool/dbconfig/20240618-152031-arnaudb.json
15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65158 and previous config saved to /var/cache/conftool/dbconfig/20240618-151923-marostegui.json
15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775 (duration: 00m 15s)
15:07 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775
15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1002.eqiad.wmnet with OS bookworm
15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775 (duration: 00m 47s)
15:05 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775
15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775 (duration: 00m 36s)
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775
15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65157 and previous config saved to /var/cache/conftool/dbconfig/20240618-150416-marostegui.json
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:00 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4f7d29a]: (no justification provided) (duration: 00m 28s)
15:00 topranks: rebooting lsw1-f7-eqiad to upgrade JunOS on switch T365984
15:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
15:00 mforns@deploy1002: Started deploy [airflow-dags/analytics@4f7d29a]: (no justification provided)
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
14:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
14:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
14:47 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
14:44 jynus: reenable puppet on backup2002
14:40 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
14:40 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 depool - T365984', diff saved to https://phabricator.wikimedia.org/P65156 and previous config saved to /var/cache/conftool/dbconfig/20240618-143951-arnaudb.json
14:39 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
14:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
14:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
14:36 sukhe: enabling puppet and running puppet agent on cp4037
14:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
14:24 claime: trafficserver: move 100% of traffic to mw-on-k8s - T362323
14:23 btullis@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
14:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
14:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
14:09 swfrench-wmf: included conftool 3.0.0 into buster-wikimedia on apt.w.o for T365123
14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
14:03 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:51 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:51 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
13:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1002.eqiad.wmnet with OS bookworm
13:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
13:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
13:39 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
13:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
13:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes azwiktionary --fix # T367264; 7 pages fixed, 10 links fixed
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1047057Add VL namespace alias to Azerbaijani Wiktionary (T367264) (duration: 16m 07s)
13:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
13:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
13:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
13:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for gerrit:1047057Add VL namespace alias to Azerbaijani Wiktionary (T367264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1208.eqiad.wmnet
13:19 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1208.eqiad.wmnet
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1047057Add VL namespace alias to Azerbaijani Wiktionary (T367264)
13:16 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
13:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
13:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
13:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
13:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
13:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
13:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: sync
13:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: sync
13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: sync
13:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: sync
13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
13:06 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
13:06 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
13:04 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
13:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
12:56 vgutierrez: rolling upgrade on A:cp-eqsin to fifo-log-demux 0.7.5 - T364383
12:53 vgutierrez: disable puppet on A:cp-eqsin before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047070 - T364383
12:52 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
12:51 marostegui: Deploy schema change on old s4 eqiad master db1160 dbmaint T364069
12:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P65155 and previous config saved to /var/cache/conftool/dbconfig/20240618-124945-root.json
12:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:47 fabfur: upgrade haproxy to v2.8.10 on all ulsfo cp hosts (T367756)
12:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
12:42 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
12:22 moritzm: rebalance ganeti eqiad/D following reboots
12:15 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
12:15 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
12:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
12:04 topranks: adding Netbox-generated IPv6 DNS records for wikikube-worker, mw and parse hosts
12:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:59 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:58 effie: Slowly pointing mediawiki in eqiad to mw-mcrouter daemonset - T346690
11:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:50 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists.wikimedia.org on all recursors
11:50 eoghan@cumin1002: START - Cookbook sre.dns.wipe-cache lists.wikimedia.org on all recursors
11:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bookworm
11:42 marostegui: Delete ipblocks table on clouddb2002-dev (labtestwiki) T367632
11:40 marostegui: Rename ipblocks table on db1169 (enwiki) T367632
11:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
11:24 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
11:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
11:14 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
11:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
11:13 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
11:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65152 and previous config saved to /var/cache/conftool/dbconfig/20240618-111001-marostegui.json
11:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:09 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bookworm
11:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65151 and previous config saved to /var/cache/conftool/dbconfig/20240618-110939-marostegui.json
11:08 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
11:05 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
11:01 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
10:58 fabfur: cp3066 repooled and puppet enabled (T367756)
10:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65150 and previous config saved to /var/cache/conftool/dbconfig/20240618-105432-marostegui.json
10:48 marostegui: dbmaint codfw s2 deploy schema change T364069
10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65149 and previous config saved to /var/cache/conftool/dbconfig/20240618-103925-marostegui.json
10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:30 moritzm: upload openjdk-21 21.0.3+9-2~deb12u2 for bookworm/wikimedia (secondary rebuild on build2001 following the initial bootstrap build) https://phabricator.wikimedia.org/T367487
10:30 cgoubert@deploy1002: Finished scap: Deploy statsd exporter - T365265 (duration: 03m 39s)
10:27 cgoubert@deploy1002: Started scap: Deploy statsd exporter - T365265
10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65148 and previous config saved to /var/cache/conftool/dbconfig/20240618-102418-marostegui.json
10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65147 and previous config saved to /var/cache/conftool/dbconfig/20240618-102130-root.json
10:14 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
10:14 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65146 and previous config saved to /var/cache/conftool/dbconfig/20240618-100624-root.json
10:05 fabfur: cp3066 currently depooled and puppet disabled for T367756
10:04 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
09:53 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1019.eqiad.wmnet|wikikube-worker1020.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet),cluster=kubernetes,service=kubesvc
09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65145 and previous config saved to /var/cache/conftool/dbconfig/20240618-095119-root.json
09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65144 and previous config saved to /var/cache/conftool/dbconfig/20240618-093614-root.json
09:27 moritzm: arm keyholder on acmechief2002
09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65143 and previous config saved to /var/cache/conftool/dbconfig/20240618-092108-root.json
09:13 moritzm: rebooting ganeti2029
09:10 marostegui: dbmaint eqiad s4 deploy schema change T367261
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65142 and previous config saved to /var/cache/conftool/dbconfig/20240618-090603-root.json
09:05 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
08:53 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.10 refs T361404
08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 depool to troubleshoot hardware issues', diff saved to https://phabricator.wikimedia.org/P65141 and previous config saved to /var/cache/conftool/dbconfig/20240618-085254-arnaudb.json
08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
08:51 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65140 and previous config saved to /var/cache/conftool/dbconfig/20240618-085057-root.json
08:45 hashar@deploy1002: Finished deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate (duration: 00m 07s)
08:45 hashar@deploy1002: Started deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate
08:43 fabfur: cp4037 currently depooled and puppet disabled for T367756
08:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
08:40 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
08:34 marostegui: dbmaint eqiad s6 deploy schema change on eqiad master T364069
08:29 XioNoX: deploy pfw policy update 1718644831 - T367796
07:56 moritzm: uploaded python-irc 8.5.3+dfsg-4+wmf1 to apt.wikimedia.org T331702
07:40 marostegui: dbmaint codfw s7 deploy schema change on codfw master T364069
07:33 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
07:31 kart_: Updated cxserver to 2024-06-13-045621-production (T364122, T138401)
07:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
07:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
07:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
07:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
07:26 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
07:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
07:20 kartik@deploy1002: Finished scap: Backport for gerrit:1046810Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) (duration: 16m 36s)
07:15 marostegui: dbmaint eqiad s5 deploy schema change on primary master T364069
07:12 marostegui: dbmaint codfw s4 deploy schema change T367261
07:12 marostegui: dbmaint codfw s4 deploy schema change
07:11 kartik@deploy1002: kartik: Continuing with sync
07:09 kartik@deploy1002: kartik: Backport for gerrit:1046810Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 kartik@deploy1002: Started scap: Backport for gerrit:1046810Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838)
06:52 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
06:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65139 and previous config saved to /var/cache/conftool/dbconfig/20240618-060100-marostegui.json
06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
06:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
06:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65138 and previous config saved to /var/cache/conftool/dbconfig/20240618-060038-marostegui.json
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2102.codfw.wmnet
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
05:53 jynus@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
05:50 jynus@cumin2002: START - Cookbook sre.dns.netbox
05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65137 and previous config saved to /var/cache/conftool/dbconfig/20240618-054531-marostegui.json
05:44 jynus@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2102.codfw.wmnet
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65136 and previous config saved to /var/cache/conftool/dbconfig/20240618-053024-marostegui.json
05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65135 and previous config saved to /var/cache/conftool/dbconfig/20240618-051517-marostegui.json
05:00 marostegui: dbmaint codfw s5 deploy schema change on db2213 T364299
04:57 marostegui: dbmaint eqiad s2 deploy schema change on db2207 T364299
04:54 marostegui: dbmaint eqiad s4 deploy schema change on db1160 T364299
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T367378', diff saved to https://phabricator.wikimedia.org/P65134 and previous config saved to /var/cache/conftool/dbconfig/20240618-044908-root.json
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1238 to s4 primary and set section read-write T367378', diff saved to https://phabricator.wikimedia.org/P65133 and previous config saved to /var/cache/conftool/dbconfig/20240618-044806-marostegui.json
04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T367378', diff saved to https://phabricator.wikimedia.org/P65132 and previous config saved to /var/cache/conftool/dbconfig/20240618-044747-marostegui.json
04:47 marostegui: Starting s4 eqiad failover from db1160 to db1238 - T367378
04:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1238 with weight 0 T367378', diff saved to https://phabricator.wikimedia.org/P65131 and previous config saved to /var/cache/conftool/dbconfig/20240618-042054-marostegui.json
04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
04:02 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.7 (duration: 02m 50s)
04:01 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.10 refs T361404 (duration: 58m 57s)
03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.10 refs T361404
01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65130 and previous config saved to /var/cache/conftool/dbconfig/20240618-013639-marostegui.json
01:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
01:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65129 and previous config saved to /var/cache/conftool/dbconfig/20240618-013616-marostegui.json
01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65128 and previous config saved to /var/cache/conftool/dbconfig/20240618-012109-marostegui.json
01:10 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65127 and previous config saved to /var/cache/conftool/dbconfig/20240618-010601-marostegui.json
00:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
00:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65126 and previous config saved to /var/cache/conftool/dbconfig/20240618-005054-marostegui.json
00:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
00:31 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65125 and previous config saved to /var/cache/conftool/dbconfig/20240618-002823-ladsgroup.json
00:18 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 14m 03s)
00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65124 and previous config saved to /var/cache/conftool/dbconfig/20240618-001316-ladsgroup.json
00:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
00:10 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
00:05 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=u4cwiki --cluster=all 2>&1 | tee /tmp/u4c.UpdateSearchIndexConfig.log # T366649
00:04 zabe@deploy1002: Started scap: Update interwiki cache
00:02 zabe@deploy1002: Finished scap: T366649 (duration: 15m 16s)
00:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye

2024-06-17

23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65123 and previous config saved to /var/cache/conftool/dbconfig/20240617-235809-ladsgroup.json
23:52 zabe@deploy1002: zabe: Continuing with sync
23:52 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
23:51 zabe@deploy1002: zabe: T366649 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=arbcom_itwiki --cluster=all 2>&1 | tee /tmp/arbcom_it.UpdateSearchIndexConfig.log # T363825
23:47 zabe@deploy1002: Started scap: T366649
23:46 zabe: Create an 'Universal Code of Conduct Coordinating Committee (U4C)' private wiki # T366649
23:44 zabe@deploy1002: Finished scap: T363825 (duration: 15m 00s)
23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65122 and previous config saved to /var/cache/conftool/dbconfig/20240617-234302-ladsgroup.json
23:34 zabe@deploy1002: zabe: Continuing with sync
23:34 zabe@deploy1002: zabe: T363825 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:29 zabe@deploy1002: Started scap: T363825
23:29 zabe: create private wiki for itwiki arbcom # T363825
23:23 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
23:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
22:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
22:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
22:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65121 and previous config saved to /var/cache/conftool/dbconfig/20240617-223010-ladsgroup.json
22:28 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
22:26 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
22:25 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
22:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65120 and previous config saved to /var/cache/conftool/dbconfig/20240617-221503-ladsgroup.json
22:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
22:11 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
22:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65119 and previous config saved to /var/cache/conftool/dbconfig/20240617-215956-ladsgroup.json
21:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
21:55 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65118 and previous config saved to /var/cache/conftool/dbconfig/20240617-214449-ladsgroup.json
21:41 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
21:05 jforrester@deploy1002: Finished scap: Backport for gerrit:1046767Fix styles for new heading HTML (T367468) (duration: 18m 57s)
20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65117 and previous config saved to /var/cache/conftool/dbconfig/20240617-205955-marostegui.json
20:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
20:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
20:55 jforrester@deploy1002: jforrester: Continuing with sync
20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
20:50 jforrester@deploy1002: jforrester: Backport for gerrit:1046767Fix styles for new heading HTML (T367468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:50 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
20:46 jforrester@deploy1002: Started scap: Backport for gerrit:1046767Fix styles for new heading HTML (T367468)
20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
20:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
20:07 jforrester@deploy1002: jforrester: Continuing with sync
20:07 jforrester@deploy1002: jforrester: Backport for [[gerrit:1041659|[wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627)]], gerrit:1039767Add a note that you cannot change wgCategoryCollation easily (T362494 T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
20:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
20:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
20:02 jforrester@deploy1002: Started scap: Backport for [[gerrit:1041659|[wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627)]], gerrit:1039767Add a note that you cannot change wgCategoryCollation easily (T362494 T366809)
19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65116 and previous config saved to /var/cache/conftool/dbconfig/20240617-195520-ladsgroup.json
19:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
19:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
19:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
19:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
19:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
19:15 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
18:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
18:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
18:42 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1027150Change static footer icons to the new one (T256190), gerrit:1046750Remove footer override (duration: 17m 12s)
18:36 ejegg: fundraising civicrm upgraded from 66acce1f to a25a359b
18:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
18:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
18:30 ladsgroup@deploy1002: ladsgroup, jforrester: Continuing with sync
18:29 ladsgroup@deploy1002: ladsgroup, jforrester: Backport for gerrit:1027150Change static footer icons to the new one (T256190), gerrit:1046750Remove footer override synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:24 ladsgroup@deploy1002: Started scap: Backport for gerrit:1027150Change static footer icons to the new one (T256190), gerrit:1046750Remove footer override
18:19 ladsgroup@deploy1002: Started scap: Backport for gerrit:1027150Change static footer icons to the new one (T256190)
18:17 ejegg: standalone SmashPig upgraded from 1d1b770c to c8993ec6
18:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: sync
18:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: sync
18:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
18:10 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
18:09 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
18:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
18:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
18:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
18:07 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
18:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
18:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
18:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
18:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
18:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
18:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
18:02 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
18:01 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
18:00 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
17:58 ejegg: fundraising civicrm upgraded from aa127608 to 66acce1f
17:53 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
17:53 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
17:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
17:37 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
17:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
17:35 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
17:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
17:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
17:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
17:32 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
17:31 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
17:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
17:29 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
17:18 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4042.ulsfo.wmnet
17:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
17:16 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
17:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
17:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
17:05 claime: Pooling and uncordoning wikikube-worker1019.eqiad.wmnet,wikikube-worker1020.eqiad.wmnet,wikikube-worker1021.eqiad.wmnet - T351074
17:02 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
16:59 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
16:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1021.eqiad.wmnet with OS bullseye
16:58 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
16:58 claime: homer 'cr*eqiad*' commit 'T351074'
16:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
16:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
16:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
16:42 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 32s)
16:42 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
16:42 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
16:42 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
16:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
16:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
16:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
16:28 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
16:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
16:27 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
16:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
16:25 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
16:24 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
16:21 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
16:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
16:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
16:09 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:1046698 Bumping portals to master (T128546) (duration: 14m 13s)
16:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
16:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1019.eqiad.wmnet with OS bullseye
16:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
16:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
16:05 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:03 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:00 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
15:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:57 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
15:56 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:55 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:55 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:52 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: gerrit:1046698 Bumping portals to master (T128546) (duration: 14m 41s)
15:50 topranks: rebooting cr2-eqdfw to upgrade JunOS T364092
15:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1021.eqiad.wmnet with OS bullseye
15:46 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1020.eqiad.wmnet with OS bullseye
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1489 to wikikube-worker1021
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1021
15:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1021
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
15:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
15:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1489 to wikikube-worker1021
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1447 to wikikube-worker1020
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1020
15:39 topranks: deactivate Tranist and peering sessions on cr2-eqdfw in advance of power-supply change T366864
15:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1020
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
15:37 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:37 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:34 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1447 to wikikube-worker1020
15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1444 to wikikube-worker1019
15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1019
15:32 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1019
15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:31 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
15:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2002.codfw.wmnet
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:28 fabfur: upgrading haproxy to 2.8.10 on cp4037 (T367756)
15:28 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.*} and A:cp
15:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.*} and A:cp
15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
15:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1444 to wikikube-worker1019
15:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
15:24 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:21 claime: Depooling mw1444.eqiad.wmnet,mw1447.eqiad.wmnet,mw1489.eqiad.wmnet for reimage - T351074
15:20 topranks: draining transport circuits in/out of eqdfw in advance of router power-supply work/upgrade T366864
15:17 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
15:17 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
15:16 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wikikube-ctrl2002.codfw.wmnet
15:16 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
15:10 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
15:03 claime: Repooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet pending fw upgrade - T351074
15:03 cgoubert@cumin1002: conftool action : set/weight=30:pooled=yes; selector: name=(mw1359.eqiad.wmnet|mw1364.eqiad.wmnet|mw1365.eqiad.wmnet|mw1412.eqiad.wmnet)
14:59 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:58 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
14:58 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
14:56 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
14:56 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
14:55 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
14:55 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Wikimedia Foundation/Legal/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Zabe" --reason "per request phab:T367216T367216"
14:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
14:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:50 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement/Short" "Wikimedia Foundation/Legal/Committee appointments/Announcement/Short" "Zabe" --reason "per request phab:T367216T367216"
14:48 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1412.eqiad.wmnet
14:48 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1412.eqiad.wmnet
14:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
14:47 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement" "Wikimedia Foundation/Legal/Committee appointments/Announcement" "Zabe" --reason "per request phab:T367216T367216"
14:45 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1365.eqiad.wmnet
14:45 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1365.eqiad.wmnet
14:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:44 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2001.codfw.wmnet
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:43 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
14:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments" "Wikimedia Foundation/Legal/Committee appointments" "Zabe" --reason "per request phab:T367216T367216"
14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
14:39 joal@deploy1002: Finished deploy [airflow-dags/analytics@b682892]: (no justification provided) (duration: 00m 33s)
14:38 joal@deploy1002: Started deploy [airflow-dags/analytics@b682892]: (no justification provided)
14:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Tools and processes" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Tools and processes" "Zabe" --reason "per request phab:T367217T367217"
14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:34 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources/What is a conduct warning" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources/What is a conduct warning" "Zabe" --reason "per request phab:T367217T367217"
14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
14:31 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources" "Zabe" --reason "per request phab:T367217T367217"
14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
14:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
14:28 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Legal agreement" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Legal agreement" "Zabe" --reason "per request phab:T367217T367217"
14:27 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Brand Stewardship Report" "Wikimedia Foundation/Legal/Brand Stewardship Report" "Zabe" --reason "per request phab:T367216T367216"
14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1359.eqiad.wmnet
14:23 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1359.eqiad.wmnet
14:23 taavi@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:22 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.eqiad.wmnet
14:21 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2001.codfw.wmnet
14:21 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Announcement/2023 OC and CRC appointments process" "Wikimedia Foundation/Legal/Announcement/2023 OC and CRC appointments process" "Zabe" --reason "per request phab:T367216T367216"
14:18 claime: Depooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet for reimage - T351074
14:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
14:17 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
14:17 urbanecm@deploy1002: Finished scap: Backport for gerrit:1043784Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) (duration: 15m 34s)
14:16 Amir1: killing updateMenteeData.php --wiki=enwiki --statsd --dbshard s1
14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
14:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
14:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/talkheader" "Wikimedia Foundation/Legal/2023 ToU updates/talkheader" "Zabe" --reason "per request phab:T367216T367216"
14:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:07 taavi@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudvirt-wdqs1001.eqiad.wmnet
14:06 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Proposed update" "Wikimedia Foundation/Legal/2023 ToU updates/Proposed update" "Zabe" --reason "per request phab:T367216T367216"
14:06 urbanecm@deploy1002: urbanecm: Continuing with sync
14:06 vgutierrez: rolling upgrade on A:cp-codfw to fifo-log-demux 0.7.5 - T364383
14:05 urbanecm@deploy1002: urbanecm: Backport for gerrit:1043784Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Charter" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Charter" "Zabe" --reason "per request phab:T367217T367217"
14:02 vgutierrez: disable puppet on A:cp-codfw before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046681 - T364383
14:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Call for applicants" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Call for applicants" "Zabe" --reason "per request phab:T367217T367217"
14:01 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:01 urbanecm@deploy1002: Started scap: Backport for gerrit:1043784Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895)
14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
14:00 urbanecm@deploy1002: Finished scap: Backport for gerrit:1046597Backport all commits from master (T364895), gerrit:1046598Check EntitySchemaIsRepo in more hook handlers (T363153) (duration: 16m 47s)
13:54 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
13:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
13:50 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Continuing with sync
13:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
13:48 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Backport for gerrit:1046597Backport all commits from master (T364895), gerrit:1046598Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:48 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:45 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
13:44 urbanecm@deploy1002: Started scap: Backport for gerrit:1046597Backport all commits from master (T364895), gerrit:1046598Check EntitySchemaIsRepo in more hook handlers (T363153)
13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:43 urbanecm@deploy1002: Sync cancelled.
13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:43 urbanecm@deploy1002: lucaswerkmeister-wmde, urbanecm: Backport for gerrit:1046597Backport all commits from master (T364895), gerrit:1046598Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65112 and previous config saved to /var/cache/conftool/dbconfig/20240617-133951-ladsgroup.json
13:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:37 urbanecm@deploy1002: Started scap: Backport for gerrit:1046597Backport all commits from master (T364895), gerrit:1046598Check EntitySchemaIsRepo in more hook handlers (T363153)
13:34 claime: Drained and cordoned wikikube-ctrl2001.codfw.wmnet wikikube-ctrl2002.codfw.wmnet
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
13:33 claime: Uncordoned wikikube-ctrl2003.codfw.wmnet
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
13:25 urbanecm@deploy1002: Finished scap: Backport for gerrit:1046116Enable subpages for the main namespace in sourceswiki (T367674), gerrit:1036613CommunityConfiguration: set feedback url instead of bug tool (T363801) (duration: 23m 07s)
13:24 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
13:14 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
13:13 vgutierrez: rolling upgrade on A:cp-ulsfo to fifo-log-demux 0.7.5 - T364383
13:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65111 and previous config saved to /var/cache/conftool/dbconfig/20240617-131222-ladsgroup.json
13:10 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Continuing with sync
13:07 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Backport for gerrit:1046116Enable subpages for the main namespace in sourceswiki (T367674), gerrit:1036613CommunityConfiguration: set feedback url instead of bug tool (T363801) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
13:03 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
13:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
13:03 vgutierrez: disable puppet on A:cp-ulsfo before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046665 - T364383
13:02 urbanecm@deploy1002: Started scap: Backport for gerrit:1046116Enable subpages for the main namespace in sourceswiki (T367674), gerrit:1036613CommunityConfiguration: set feedback url instead of bug tool (T363801)
12:59 joal@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: (no justification provided) (duration: 00m 03s)
12:59 joal@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: (no justification provided)
12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65110 and previous config saved to /var/cache/conftool/dbconfig/20240617-125715-ladsgroup.json
12:53 vgutierrez: upload fifo-log-demux 0.7.5 to apt.wm.o (bullseye-wikimedia)
12:47 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65109 and previous config saved to /var/cache/conftool/dbconfig/20240617-124207-ladsgroup.json
12:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
12:34 vgutierrez: upgrading HAProxy to version 2.8.10 on cp4051
12:34 vgutierrez: fetch HAProxy 2.8.10 into thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
12:28 jynus: restarting ms-backup100[12], backup1004-7,11
12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65108 and previous config saved to /var/cache/conftool/dbconfig/20240617-122700-ladsgroup.json
12:14 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2003.codfw.wmnet|wikikube-worker2004.codfw.wmnet|wikikube-worker2007.codfw.wmnet|wikikube-worker2008.codfw.wmnet|wikikube-worker2009.codfw.wmnet|wikikube-worker2010.codfw.wmnet),cluster=kubernetes,service=kubesvc
12:14 claime: pooling and uncordoning wikikube-worker2003.codfw.wmnet wikikube-worker2004.codfw.wmnet wikikube-worker2007.codfw.wmnet wikikube-worker2008.codfw.wmnet wikikube-worker2009.codfw.wmnet wikikube-worker2010.codfw.wmnet - T351074
12:09 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 15830
12:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
12:04 jynus: restart db1204, db1205
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
12:03 claime: homer 'cr*codfw*' commit 'T351074'
12:02 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: archiva
12:01 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
12:01 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2003.codfw.wmnet
11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
11:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: archiva
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2007.codfw.wmnet with OS bullseye
11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
11:47 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
11:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
11:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee" "Zabe" --reason "per request phab:T367217T367217"
11:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
11:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
11:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
11:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
11:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Reminder" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Reminder" "Zabe" --reason "per request phab:T367216T367216"
11:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
11:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Announcement" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Announcement" "Zabe" --reason "per request phab:T367216T367216"
11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
11:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:23 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
11:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours" "Zabe" --reason "per request phab:T367216T367216"
11:17 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/LandingCNTranslate" "Wikimedia Foundation/Legal/2023 ToU updates/LandingCNTranslate" "Zabe" --reason "per request phab:T367216T367216"
11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
11:17 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
11:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
11:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
11:13 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
11:13 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
11:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/About" "Wikimedia Foundation/Legal/2023 ToU updates/About" "Zabe" --reason "per request phab:T367216T367216"
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2329 to wikikube-worker2010
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
11:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
11:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
11:03 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates" "Wikimedia Foundation/Legal/2023 ToU updates" "Zabe" --reason "per request phab:T367216T367216"
11:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:59 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:58 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2329 to wikikube-worker2010
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2328 to wikikube-worker2009
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
10:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
10:52 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2328 to wikikube-worker2009
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2327 to wikikube-worker2008
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
10:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
10:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
10:48 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2327 to wikikube-worker2008
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2326 to wikikube-worker2007
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
10:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
10:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2326 to wikikube-worker2007
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2324 to wikikube-worker2004
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2004
10:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2004
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
10:38 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
10:37 jynus: restarting ms-backup200[12], backup2004-7,11
10:35 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2324 to wikikube-worker2004
10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2323 to wikikube-worker2003
10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
10:34 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
10:34 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2003
10:34 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2003
10:33 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
10:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:31 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2323 to wikikube-worker2003
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65107 and previous config saved to /var/cache/conftool/dbconfig/20240617-102938-marostegui.json
10:26 jynus: restarting db2183, db2184
10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
10:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65106 and previous config saved to /var/cache/conftool/dbconfig/20240617-101431-marostegui.json
10:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:10 kamila@cumin1002: START - Cookbook sre.dns.netbox
10:09 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
10:08 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage
10:01 claime: draining and cordoning mw2321 - T367702
10:01 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
10:01 taavi@deploy1002: Finished scap: Backport for gerrit:1041742Stop loading OSM i18n (T161553) (duration: 34m 07s)
09:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65104 and previous config saved to /var/cache/conftool/dbconfig/20240617-095924-marostegui.json
09:54 jayme@deploy1002: Finished deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1 (duration: 00m 24s)
09:53 jayme@deploy1002: Started deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1
09:52 jayme@deploy1002: Finished deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1 (duration: 00m 38s)
09:51 jayme@deploy1002: Started deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1
09:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:49 taavi@deploy1002: taavi: Continuing with sync
09:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65103 and previous config saved to /var/cache/conftool/dbconfig/20240617-094926-marostegui.json
09:48 taavi@deploy1002: taavi: Backport for gerrit:1041742Stop loading OSM i18n (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65102 and previous config saved to /var/cache/conftool/dbconfig/20240617-094417-marostegui.json
09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65101 and previous config saved to /var/cache/conftool/dbconfig/20240617-094034-marostegui.json
09:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
09:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65100 and previous config saved to /var/cache/conftool/dbconfig/20240617-093427-marostegui.json
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65099 and previous config saved to /var/cache/conftool/dbconfig/20240617-093419-marostegui.json
09:26 taavi@deploy1002: Started scap: Backport for gerrit:1041742Stop loading OSM i18n (T161553)
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65098 and previous config saved to /var/cache/conftool/dbconfig/20240617-091920-marostegui.json
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65097 and previous config saved to /var/cache/conftool/dbconfig/20240617-091912-marostegui.json
09:05 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
09:04 _joe_: removed damaged AOF file for redis rdb1014-6379, resyncing with primary
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65096 and previous config saved to /var/cache/conftool/dbconfig/20240617-090413-marostegui.json
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65095 and previous config saved to /var/cache/conftool/dbconfig/20240617-090405-marostegui.json
09:01 urbanecm@deploy1002: Finished scap: Backport for gerrit:1046599throttle: Fix exemption for ongoing course (duration: 25m 05s)
08:53 claime: hardcycling rdb1014
08:49 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet
08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65094 and previous config saved to /var/cache/conftool/dbconfig/20240617-084906-marostegui.json
08:40 claime: powercycling rdb1014
08:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65093 and previous config saved to /var/cache/conftool/dbconfig/20240617-083755-marostegui.json
08:36 urbanecm@deploy1002: Started scap: Backport for gerrit:1046599throttle: Fix exemption for ongoing course
08:25 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65092 and previous config saved to /var/cache/conftool/dbconfig/20240617-082248-marostegui.json
08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65091 and previous config saved to /var/cache/conftool/dbconfig/20240617-080741-marostegui.json
07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65090 and previous config saved to /var/cache/conftool/dbconfig/20240617-075234-marostegui.json
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65089 and previous config saved to /var/cache/conftool/dbconfig/20240617-074542-marostegui.json
07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
07:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65088 and previous config saved to /var/cache/conftool/dbconfig/20240617-074530-marostegui.json
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65087 and previous config saved to /var/cache/conftool/dbconfig/20240617-073023-marostegui.json
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65086 and previous config saved to /var/cache/conftool/dbconfig/20240617-071516-marostegui.json
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65085 and previous config saved to /var/cache/conftool/dbconfig/20240617-070009-marostegui.json
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65084 and previous config saved to /var/cache/conftool/dbconfig/20240617-065647-ladsgroup.json
06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65083 and previous config saved to /var/cache/conftool/dbconfig/20240617-065625-ladsgroup.json
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65082 and previous config saved to /var/cache/conftool/dbconfig/20240617-065357-marostegui.json
06:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
06:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65081 and previous config saved to /var/cache/conftool/dbconfig/20240617-065335-marostegui.json
06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65080 and previous config saved to /var/cache/conftool/dbconfig/20240617-064118-ladsgroup.json
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65079 and previous config saved to /var/cache/conftool/dbconfig/20240617-063923-root.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65078 and previous config saved to /var/cache/conftool/dbconfig/20240617-063826-marostegui.json
06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65077 and previous config saved to /var/cache/conftool/dbconfig/20240617-062612-ladsgroup.json
06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65076 and previous config saved to /var/cache/conftool/dbconfig/20240617-062511-root.json
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65075 and previous config saved to /var/cache/conftool/dbconfig/20240617-062418-root.json
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65074 and previous config saved to /var/cache/conftool/dbconfig/20240617-062319-marostegui.json
06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65073 and previous config saved to /var/cache/conftool/dbconfig/20240617-061105-ladsgroup.json
06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65072 and previous config saved to /var/cache/conftool/dbconfig/20240617-061006-root.json
06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65071 and previous config saved to /var/cache/conftool/dbconfig/20240617-060913-root.json
06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65070 and previous config saved to /var/cache/conftool/dbconfig/20240617-060812-marostegui.json
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65069 and previous config saved to /var/cache/conftool/dbconfig/20240617-060352-marostegui.json
06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65068 and previous config saved to /var/cache/conftool/dbconfig/20240617-060326-marostegui.json
05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65067 and previous config saved to /var/cache/conftool/dbconfig/20240617-055501-root.json
05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65066 and previous config saved to /var/cache/conftool/dbconfig/20240617-055407-root.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65065 and previous config saved to /var/cache/conftool/dbconfig/20240617-054819-marostegui.json
05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65064 and previous config saved to /var/cache/conftool/dbconfig/20240617-053955-root.json
05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65063 and previous config saved to /var/cache/conftool/dbconfig/20240617-053902-root.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65062 and previous config saved to /var/cache/conftool/dbconfig/20240617-053312-marostegui.json
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65061 and previous config saved to /var/cache/conftool/dbconfig/20240617-052450-root.json
05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65060 and previous config saved to /var/cache/conftool/dbconfig/20240617-052355-root.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65059 and previous config saved to /var/cache/conftool/dbconfig/20240617-051805-marostegui.json
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65058 and previous config saved to /var/cache/conftool/dbconfig/20240617-050944-root.json
05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65057 and previous config saved to /var/cache/conftool/dbconfig/20240617-050852-marostegui.json
05:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65056 and previous config saved to /var/cache/conftool/dbconfig/20240617-050849-root.json
05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65055 and previous config saved to /var/cache/conftool/dbconfig/20240617-050756-marostegui.json
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367261)', diff saved to https://phabricator.wikimedia.org/P65054 and previous config saved to /var/cache/conftool/dbconfig/20240617-050324-marostegui.json
05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance

2024-06-16

22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65053 and previous config saved to /var/cache/conftool/dbconfig/20240616-221944-ladsgroup.json
22:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65052 and previous config saved to /var/cache/conftool/dbconfig/20240616-221921-ladsgroup.json
22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65051 and previous config saved to /var/cache/conftool/dbconfig/20240616-220414-ladsgroup.json
21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65050 and previous config saved to /var/cache/conftool/dbconfig/20240616-214907-ladsgroup.json
21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65049 and previous config saved to /var/cache/conftool/dbconfig/20240616-213400-ladsgroup.json
14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65047 and previous config saved to /var/cache/conftool/dbconfig/20240616-140214-ladsgroup.json
14:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65046 and previous config saved to /var/cache/conftool/dbconfig/20240616-140152-ladsgroup.json
13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65045 and previous config saved to /var/cache/conftool/dbconfig/20240616-134645-ladsgroup.json
13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65044 and previous config saved to /var/cache/conftool/dbconfig/20240616-133137-ladsgroup.json
13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65043 and previous config saved to /var/cache/conftool/dbconfig/20240616-131630-ladsgroup.json
05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65042 and previous config saved to /var/cache/conftool/dbconfig/20240616-055411-ladsgroup.json
05:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65041 and previous config saved to /var/cache/conftool/dbconfig/20240616-055359-ladsgroup.json
05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65040 and previous config saved to /var/cache/conftool/dbconfig/20240616-053852-ladsgroup.json
05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65039 and previous config saved to /var/cache/conftool/dbconfig/20240616-052345-ladsgroup.json
05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65038 and previous config saved to /var/cache/conftool/dbconfig/20240616-050838-ladsgroup.json
03:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
03:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65037 and previous config saved to /var/cache/conftool/dbconfig/20240616-032102-marostegui.json
03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65036 and previous config saved to /var/cache/conftool/dbconfig/20240616-030555-marostegui.json
02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65035 and previous config saved to /var/cache/conftool/dbconfig/20240616-025048-marostegui.json
02:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65034 and previous config saved to /var/cache/conftool/dbconfig/20240616-023541-marostegui.json
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65033 and previous config saved to /var/cache/conftool/dbconfig/20240616-000421-ladsgroup.json
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65032 and previous config saved to /var/cache/conftool/dbconfig/20240616-000343-ladsgroup.json

2024-06-15

23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65031 and previous config saved to /var/cache/conftool/dbconfig/20240615-234836-ladsgroup.json
23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65030 and previous config saved to /var/cache/conftool/dbconfig/20240615-233329-ladsgroup.json
23:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65029 and previous config saved to /var/cache/conftool/dbconfig/20240615-231822-ladsgroup.json
21:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65028 and previous config saved to /var/cache/conftool/dbconfig/20240615-211811-marostegui.json
21:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
21:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
21:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65027 and previous config saved to /var/cache/conftool/dbconfig/20240615-211750-marostegui.json
21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65026 and previous config saved to /var/cache/conftool/dbconfig/20240615-210243-marostegui.json
20:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65025 and previous config saved to /var/cache/conftool/dbconfig/20240615-204735-marostegui.json
20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65024 and previous config saved to /var/cache/conftool/dbconfig/20240615-203229-marostegui.json
16:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65021 and previous config saved to /var/cache/conftool/dbconfig/20240615-163203-marostegui.json
16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65020 and previous config saved to /var/cache/conftool/dbconfig/20240615-161656-marostegui.json
16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65019 and previous config saved to /var/cache/conftool/dbconfig/20240615-160149-marostegui.json
11:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65018 and previous config saved to /var/cache/conftool/dbconfig/20240615-115812-marostegui.json
11:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65017 and previous config saved to /var/cache/conftool/dbconfig/20240615-115750-marostegui.json
11:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65016 and previous config saved to /var/cache/conftool/dbconfig/20240615-114243-marostegui.json
11:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65015 and previous config saved to /var/cache/conftool/dbconfig/20240615-112736-marostegui.json
11:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65014 and previous config saved to /var/cache/conftool/dbconfig/20240615-111229-marostegui.json
09:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65013 and previous config saved to /var/cache/conftool/dbconfig/20240615-092730-ladsgroup.json
09:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65012 and previous config saved to /var/cache/conftool/dbconfig/20240615-071215-marostegui.json
07:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
07:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65011 and previous config saved to /var/cache/conftool/dbconfig/20240615-071152-marostegui.json
06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65010 and previous config saved to /var/cache/conftool/dbconfig/20240615-065645-marostegui.json
06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65009 and previous config saved to /var/cache/conftool/dbconfig/20240615-064138-marostegui.json
06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65008 and previous config saved to /var/cache/conftool/dbconfig/20240615-062631-marostegui.json
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367261)', diff saved to https://phabricator.wikimedia.org/P65007 and previous config saved to /var/cache/conftool/dbconfig/20240615-061919-marostegui.json
06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65006 and previous config saved to /var/cache/conftool/dbconfig/20240615-061908-marostegui.json
06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65005 and previous config saved to /var/cache/conftool/dbconfig/20240615-060401-marostegui.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65004 and previous config saved to /var/cache/conftool/dbconfig/20240615-054854-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65003 and previous config saved to /var/cache/conftool/dbconfig/20240615-053346-marostegui.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65002 and previous config saved to /var/cache/conftool/dbconfig/20240615-050236-marostegui.json
05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P65001 and previous config saved to /var/cache/conftool/dbconfig/20240615-024019-ladsgroup.json
02:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65000 and previous config saved to /var/cache/conftool/dbconfig/20240615-023904-marostegui.json
02:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
02:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
02:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64999 and previous config saved to /var/cache/conftool/dbconfig/20240615-023842-marostegui.json
02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64998 and previous config saved to /var/cache/conftool/dbconfig/20240615-022512-ladsgroup.json
02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64997 and previous config saved to /var/cache/conftool/dbconfig/20240615-022335-marostegui.json
02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64996 and previous config saved to /var/cache/conftool/dbconfig/20240615-021005-ladsgroup.json
02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64995 and previous config saved to /var/cache/conftool/dbconfig/20240615-020827-marostegui.json
01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64994 and previous config saved to /var/cache/conftool/dbconfig/20240615-015458-ladsgroup.json
01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64993 and previous config saved to /var/cache/conftool/dbconfig/20240615-015320-marostegui.json

2024-06-14

23:09 mnz@deploy1002: Finished deploy [airflow-dags/research@ee5a291]: (no justification provided) (duration: 00m 30s)
23:09 mnz@deploy1002: Started deploy [airflow-dags/research@ee5a291]: (no justification provided)
22:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
22:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
22:33 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 31s)
22:33 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
22:27 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
22:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
22:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
22:02 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64992 and previous config saved to /var/cache/conftool/dbconfig/20240614-214910-marostegui.json
21:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
21:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
21:33 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet
21:33 Emperor: restart swift-proxy on ms-fe1010 T360913
21:31 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
21:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64991 and previous config saved to /var/cache/conftool/dbconfig/20240614-211239-ladsgroup.json
20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64990 and previous config saved to /var/cache/conftool/dbconfig/20240614-205731-ladsgroup.json
20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64989 and previous config saved to /var/cache/conftool/dbconfig/20240614-204224-ladsgroup.json
20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64988 and previous config saved to /var/cache/conftool/dbconfig/20240614-202717-ladsgroup.json
20:22 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4040.ulsfo.wmnet
20:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
19:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
19:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
19:27 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
19:27 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS bullseye
19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64987 and previous config saved to /var/cache/conftool/dbconfig/20240614-192643-ladsgroup.json
19:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
19:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
19:00 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
18:54 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4040.ulsfo.wmnet
17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
17:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
17:11 jdrewniak@deploy1002: Finished scap: Backport for gerrit:1043827For now scope hatnote and infobox styles (T367462) (duration: 16m 06s)
17:01 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
16:31 jan_drewniak: starting friday backport for T367462 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMessages/+/1043827
16:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
16:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
16:00 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
15:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
15:55 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
15:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
15:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
15:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64984 and previous config saved to /var/cache/conftool/dbconfig/20240614-153727-marostegui.json
15:37 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be1002.eqiad.wmnet with OS bookworm
15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:27 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4039.ulsfo.wmnet
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64982 and previous config saved to /var/cache/conftool/dbconfig/20240614-152220-marostegui.json
15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64981 and previous config saved to /var/cache/conftool/dbconfig/20240614-150713-marostegui.json
14:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bookworm
14:54 jynus: upgrade db1245 to mariadb 10.6 T360751
14:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64980 and previous config saved to /var/cache/conftool/dbconfig/20240614-145206-marostegui.json
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64979 and previous config saved to /var/cache/conftool/dbconfig/20240614-144925-marostegui.json
14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64978 and previous config saved to /var/cache/conftool/dbconfig/20240614-143418-marostegui.json
14:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
14:31 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64976 and previous config saved to /var/cache/conftool/dbconfig/20240614-141911-marostegui.json
14:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
14:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2002.codfw.wmnet with OS bookworm
14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
14:10 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64975 and previous config saved to /var/cache/conftool/dbconfig/20240614-140404-marostegui.json
14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64974 and previous config saved to /var/cache/conftool/dbconfig/20240614-140125-marostegui.json
14:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
14:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
13:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
13:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64973 and previous config saved to /var/cache/conftool/dbconfig/20240614-135900-marostegui.json
13:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
13:52 jynus: restart db2139, db2141
13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
13:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
13:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
13:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64972 and previous config saved to /var/cache/conftool/dbconfig/20240614-134354-marostegui.json
13:41 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64971 and previous config saved to /var/cache/conftool/dbconfig/20240614-132847-marostegui.json
13:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2002.codfw.wmnet with OS bookworm
13:24 jynus: restart db1216, db1225, db1240, db1245
13:23 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
13:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
13:22 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
13:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2001.codfw.wmnet with OS bookworm
13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64970 and previous config saved to /var/cache/conftool/dbconfig/20240614-131339-marostegui.json
13:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64969 and previous config saved to /var/cache/conftool/dbconfig/20240614-131113-marostegui.json
13:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64968 and previous config saved to /var/cache/conftool/dbconfig/20240614-131051-marostegui.json
13:05 jynus: restart db1150, db1171
12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64967 and previous config saved to /var/cache/conftool/dbconfig/20240614-125543-marostegui.json
12:54 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
12:51 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
12:45 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
12:45 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
12:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64966 and previous config saved to /var/cache/conftool/dbconfig/20240614-124036-marostegui.json
12:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64964 and previous config saved to /var/cache/conftool/dbconfig/20240614-122530-marostegui.json
12:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64963 and previous config saved to /var/cache/conftool/dbconfig/20240614-122255-marostegui.json
12:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64962 and previous config saved to /var/cache/conftool/dbconfig/20240614-122233-marostegui.json
12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64961 and previous config saved to /var/cache/conftool/dbconfig/20240614-122210-ladsgroup.json
12:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64960 and previous config saved to /var/cache/conftool/dbconfig/20240614-120918-ladsgroup.json
12:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
12:08 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
12:08 fnegri@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host clouddb1018.eqiad.wmnet
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64959 and previous config saved to /var/cache/conftool/dbconfig/20240614-120727-marostegui.json
12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64958 and previous config saved to /var/cache/conftool/dbconfig/20240614-120704-ladsgroup.json
12:01 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64957 and previous config saved to /var/cache/conftool/dbconfig/20240614-115411-ladsgroup.json
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64956 and previous config saved to /var/cache/conftool/dbconfig/20240614-115220-marostegui.json
11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64955 and previous config saved to /var/cache/conftool/dbconfig/20240614-115159-ladsgroup.json
11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64954 and previous config saved to /var/cache/conftool/dbconfig/20240614-114002-ladsgroup.json
11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64953 and previous config saved to /var/cache/conftool/dbconfig/20240614-113904-ladsgroup.json
11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64952 and previous config saved to /var/cache/conftool/dbconfig/20240614-113712-marostegui.json
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint1001.eqiad.wmnet
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint1001.eqiad.wmnet with OS bookworm
11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64951 and previous config saved to /var/cache/conftool/dbconfig/20240614-113654-ladsgroup.json
11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64950 and previous config saved to /var/cache/conftool/dbconfig/20240614-113325-marostegui.json
11:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
11:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64949 and previous config saved to /var/cache/conftool/dbconfig/20240614-113303-marostegui.json
11:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
11:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64948 and previous config saved to /var/cache/conftool/dbconfig/20240614-112357-ladsgroup.json
11:21 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1018.eqiad.wmnet
11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
11:18 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
11:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64947 and previous config saved to /var/cache/conftool/dbconfig/20240614-111756-marostegui.json
11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
11:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:02 jynus: restart backup* hosts
11:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64946 and previous config saved to /var/cache/conftool/dbconfig/20240614-110249-marostegui.json
11:00 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
10:59 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
10:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
10:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
10:55 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
10:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
10:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
10:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
10:54 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
10:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
10:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
10:52 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64945 and previous config saved to /var/cache/conftool/dbconfig/20240614-104742-marostegui.json
10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
10:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64943 and previous config saved to /var/cache/conftool/dbconfig/20240614-104352-marostegui.json
10:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
10:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64942 and previous config saved to /var/cache/conftool/dbconfig/20240614-104330-marostegui.json
10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
10:33 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
10:30 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64941 and previous config saved to /var/cache/conftool/dbconfig/20240614-102823-marostegui.json
10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
10:25 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
10:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint1001.eqiad.wmnet with OS bookworm
10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64940 and previous config saved to /var/cache/conftool/dbconfig/20240614-101316-marostegui.json
10:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64939 and previous config saved to /var/cache/conftool/dbconfig/20240614-095809-marostegui.json
09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64938 and previous config saved to /var/cache/conftool/dbconfig/20240614-095434-marostegui.json
09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64937 and previous config saved to /var/cache/conftool/dbconfig/20240614-095356-marostegui.json
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint1001.eqiad.wmnet on all recursors
09:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
09:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint1001.eqiad.wmnet on all recursors
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
09:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
09:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
09:43 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
09:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64936 and previous config saved to /var/cache/conftool/dbconfig/20240614-093849-marostegui.json
09:37 jynus: upgrade and restart dbprov[12]00[3456]
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64935 and previous config saved to /var/cache/conftool/dbconfig/20240614-093657-marostegui.json
09:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64934 and previous config saved to /var/cache/conftool/dbconfig/20240614-093634-marostegui.json
09:31 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
09:31 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint1001.eqiad.wmnet
09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
09:30 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
09:29 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
09:29 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
09:25 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
09:25 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
09:23 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
09:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64933 and previous config saved to /var/cache/conftool/dbconfig/20240614-092342-marostegui.json
09:23 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
09:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
09:22 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64932 and previous config saved to /var/cache/conftool/dbconfig/20240614-092127-marostegui.json
09:14 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
09:13 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
09:10 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hadoop.reboot-workers (exit_code=97) for Hadoop analytics cluster
09:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64931 and previous config saved to /var/cache/conftool/dbconfig/20240614-090835-marostegui.json
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint2001.codfw.wmnet
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint2001.codfw.wmnet with OS bookworm
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64930 and previous config saved to /var/cache/conftool/dbconfig/20240614-090620-marostegui.json
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64929 and previous config saved to /var/cache/conftool/dbconfig/20240614-090457-marostegui.json
09:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
09:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
09:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
09:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
09:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
09:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64928 and previous config saved to /var/cache/conftool/dbconfig/20240614-085817-marostegui.json
08:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64927 and previous config saved to /var/cache/conftool/dbconfig/20240614-085113-marostegui.json
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
08:44 marostegui: dbmaint eqiad s8 deploy schema change T367261
08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64926 and previous config saved to /var/cache/conftool/dbconfig/20240614-084310-marostegui.json
08:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint2001.codfw.wmnet with OS bookworm
08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64925 and previous config saved to /var/cache/conftool/dbconfig/20240614-082803-marostegui.json
08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint2001.codfw.wmnet on all recursors
08:27 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint2001.codfw.wmnet on all recursors
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint2001.codfw.wmnet
08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
08:21 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
08:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
08:14 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:14 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
08:14 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64924 and previous config saved to /var/cache/conftool/dbconfig/20240614-081255-marostegui.json
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64923 and previous config saved to /var/cache/conftool/dbconfig/20240614-080938-marostegui.json
08:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64922 and previous config saved to /var/cache/conftool/dbconfig/20240614-080915-marostegui.json
08:03 marostegui: dbmaint codfw s8 deploy schema change T367261
07:56 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64921 and previous config saved to /var/cache/conftool/dbconfig/20240614-075408-marostegui.json
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64920 and previous config saved to /var/cache/conftool/dbconfig/20240614-073902-marostegui.json
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1003.eqiad.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64919 and previous config saved to /var/cache/conftool/dbconfig/20240614-072354-marostegui.json
07:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64918 and previous config saved to /var/cache/conftool/dbconfig/20240614-072034-marostegui.json
07:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
07:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64917 and previous config saved to /var/cache/conftool/dbconfig/20240614-072012-marostegui.json
07:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:17 marostegui: dbmaint eqiad s1 deploy schema change T367261
07:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1003.eqiad.wmnet
07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2003.codfw.wmnet
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
07:07 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64916 and previous config saved to /var/cache/conftool/dbconfig/20240614-070505-marostegui.json
06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
06:53 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2003.codfw.wmnet
06:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64915 and previous config saved to /var/cache/conftool/dbconfig/20240614-064958-marostegui.json
06:41 marostegui: dbmaint codfw s1 deploy schema change T367261
06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64914 and previous config saved to /var/cache/conftool/dbconfig/20240614-063451-marostegui.json
06:34 moritzm: rebalance ganeti/C in eqiad following reboots
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64913 and previous config saved to /var/cache/conftool/dbconfig/20240614-063138-marostegui.json
06:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64912 and previous config saved to /var/cache/conftool/dbconfig/20240614-063116-marostegui.json
06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64911 and previous config saved to /var/cache/conftool/dbconfig/20240614-061609-marostegui.json
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64910 and previous config saved to /var/cache/conftool/dbconfig/20240614-060102-marostegui.json
05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64909 and previous config saved to /var/cache/conftool/dbconfig/20240614-054555-marostegui.json
05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64908 and previous config saved to /var/cache/conftool/dbconfig/20240614-054041-marostegui.json
05:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
05:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64907 and previous config saved to /var/cache/conftool/dbconfig/20240614-054019-marostegui.json
05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64906 and previous config saved to /var/cache/conftool/dbconfig/20240614-053023-ladsgroup.json
05:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64905 and previous config saved to /var/cache/conftool/dbconfig/20240614-053001-ladsgroup.json
05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64904 and previous config saved to /var/cache/conftool/dbconfig/20240614-052512-marostegui.json
05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64903 and previous config saved to /var/cache/conftool/dbconfig/20240614-051454-ladsgroup.json
05:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64902 and previous config saved to /var/cache/conftool/dbconfig/20240614-051005-marostegui.json
04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64901 and previous config saved to /var/cache/conftool/dbconfig/20240614-045947-ladsgroup.json
04:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64900 and previous config saved to /var/cache/conftool/dbconfig/20240614-045458-marostegui.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64899 and previous config saved to /var/cache/conftool/dbconfig/20240614-045129-marostegui.json
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64898 and previous config saved to /var/cache/conftool/dbconfig/20240614-044840-marostegui.json
04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64897 and previous config saved to /var/cache/conftool/dbconfig/20240614-044440-ladsgroup.json
03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqsin
03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqsin
01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64896 and previous config saved to /var/cache/conftool/dbconfig/20240614-010717-ladsgroup.json
01:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
01:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance

2024-06-13

23:56 zabe@deploy1002: Finished scap: T361041, gerrit:1043311Update interwiki cache (duration: 11m 07s)
23:48 foks: removing 7 files for legal compliance
23:45 zabe@deploy1002: Started scap: T361041, gerrit:1043311Update interwiki cache
23:23 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=sysop_plwiki --cluster=all 2>&1 | tee /tmp/sysop_plwiki.UpdateSearchIndexConfig.log # T361041
23:20 zabe@deploy1002: Finished scap: T361041 (duration: 11m 36s)
23:17 foks: removing 9 files for legal compliance
23:08 zabe@deploy1002: Started scap: T361041
23:06 zabe@deploy1002: Sync cancelled.
23:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
23:01 zabe@deploy1002: zabe: T361041 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:59 zabe@deploy1002: Started scap: T361041
22:49 zabe: create plwiki sysop wiki # T361041
22:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
22:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
21:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
21:32 jsn@deploy1002: Finished scap: Backport for gerrit:1041699Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 14m 18s)
21:23 jsn@deploy1002: jsn, kgraessle: Continuing with sync
21:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64894 and previous config saved to /var/cache/conftool/dbconfig/20240613-212230-marostegui.json
21:20 jsn@deploy1002: jsn, kgraessle: Backport for gerrit:1041699Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:17 jsn@deploy1002: Started scap: Backport for gerrit:1041699Deploy QuickSurvey for Automoderator patroller workstream survey (T362969)
21:15 jsn@deploy1002: Finished scap: Backport for gerrit:1043110Look for iPadOS in user-agent, in addition to iOS. (T362723) (duration: 14m 11s)
21:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64893 and previous config saved to /var/cache/conftool/dbconfig/20240613-210723-marostegui.json
21:07 jsn@deploy1002: dbrant, jsn: Continuing with sync
21:04 jsn@deploy1002: dbrant, jsn: Backport for gerrit:1043110Look for iPadOS in user-agent, in addition to iOS. (T362723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:04 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqdfw (T367439)
21:03 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqord (T367439)
21:01 jsn@deploy1002: Started scap: Backport for gerrit:1043110Look for iPadOS in user-agent, in addition to iOS. (T362723)
20:55 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
20:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64892 and previous config saved to /var/cache/conftool/dbconfig/20240613-205215-marostegui.json
20:50 cdobbins@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.eqsin.wmnet
20:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64891 and previous config saved to /var/cache/conftool/dbconfig/20240613-203708-marostegui.json
20:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:14 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:13 foks: removing 1 file for legal compliance
20:00 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1003.eqiad.wmnet
19:59 foks: removing 2 files for legal compliance
19:58 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1003.eqiad.wmnet
19:58 kamila@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1003.eqiad.wmnet
19:53 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
19:51 foks: removing 2 files for legal compliance
19:51 cdobbins@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
19:41 foks: removing 2 files for legal compliance
19:28 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
19:27 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1013.eqiad.wmnet
19:27 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1013.eqiad.wmnet
19:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
19:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
19:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64890 and previous config saved to /var/cache/conftool/dbconfig/20240613-184924-ladsgroup.json
18:36 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64889 and previous config saved to /var/cache/conftool/dbconfig/20240613-183417-ladsgroup.json
18:29 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.9 refs T361403
18:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:28 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64888 and previous config saved to /var/cache/conftool/dbconfig/20240613-181911-ladsgroup.json
18:17 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
18:16 brennen: 1.43.0-wmf.9 train (T361403): no current blockers, rolling to group2
18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64887 and previous config saved to /var/cache/conftool/dbconfig/20240613-180404-ladsgroup.json
17:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
17:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
17:33 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet
17:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64886 and previous config saved to /var/cache/conftool/dbconfig/20240613-170602-marostegui.json
16:57 brennen@deploy1002: Finished scap: Backport for gerrit:1043126Convert local function to arrow function to fix context (T367366) (duration: 16m 51s)
16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
16:43 brennen@deploy1002: jforrester, brennen: Backport for gerrit:1043126Convert local function to arrow function to fix context (T367366) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
16:40 brennen@deploy1002: Started scap: Backport for gerrit:1043126Convert local function to arrow function to fix context (T367366)
16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64884 and previous config saved to /var/cache/conftool/dbconfig/20240613-163547-marostegui.json
16:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
16:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
16:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
16:24 mutante: gitlab-replica.wikimedia.org - short downtime - renaming to gitlab-replica-a
16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64883 and previous config saved to /var/cache/conftool/dbconfig/20240613-162321-arnaudb.json
16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64882 and previous config saved to /var/cache/conftool/dbconfig/20240613-162040-marostegui.json
16:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
16:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
16:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64881 and previous config saved to /var/cache/conftool/dbconfig/20240613-161641-marostegui.json
16:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
16:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64880 and previous config saved to /var/cache/conftool/dbconfig/20240613-161617-marostegui.json
16:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
16:11 cdanis: gnt-node failover -f ganeti2028.codfw.wmnet
16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
16:09 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:08 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
16:08 cdanis: forcibly rebooted ganeti2028, drdbd hung
16:08 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64878 and previous config saved to /var/cache/conftool/dbconfig/20240613-160816-arnaudb.json
16:07 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others (duration: 00m 22s)
16:06 ebernhardson@deploy1002: Started deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others
16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64877 and previous config saved to /var/cache/conftool/dbconfig/20240613-160453-marostegui.json
16:04 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
16:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64876 and previous config saved to /var/cache/conftool/dbconfig/20240613-160431-marostegui.json
16:04 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64875 and previous config saved to /var/cache/conftool/dbconfig/20240613-160110-marostegui.json
15:54 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64874 and previous config saved to /var/cache/conftool/dbconfig/20240613-155310-arnaudb.json
15:52 elukey: drop mediawiki-services-restbase docker images from the Docker Registry - T367427
15:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
15:50 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
15:50 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-fe2002.codfw.wmnet with OS bookworm
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64873 and previous config saved to /var/cache/conftool/dbconfig/20240613-154924-marostegui.json
15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64872 and previous config saved to /var/cache/conftool/dbconfig/20240613-154603-marostegui.json
15:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
15:42 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
15:41 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqsin
15:38 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqsin
15:38 ChrisDobbins901_: cdobbins@cumin1002 sudo -i cookbook sre.cdn.roll-reboot --alias 'cp-upload_eqsin' --batchsize 1 --reason T366555 --task-id T366555 --grace-sleep 5400
15:38 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64871 and previous config saved to /var/cache/conftool/dbconfig/20240613-153805-arnaudb.json
15:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
15:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
15:36 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2002.codfw.wmnet with OS bookworm
15:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
15:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
15:34 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64870 and previous config saved to /var/cache/conftool/dbconfig/20240613-153417-marostegui.json
15:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64869 and previous config saved to /var/cache/conftool/dbconfig/20240613-153056-marostegui.json
15:28 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date # Ctrl+C – had slowed down, unnecessary work by this point; was at --start '["55914913"]'
15:28 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64868 and previous config saved to /var/cache/conftool/dbconfig/20240613-152748-marostegui.json
15:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
15:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
15:27 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:26 elukey: drop mediawiki-services-parsoid docker images from the Docker Registry - T367427
15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
15:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
15:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64867 and previous config saved to /var/cache/conftool/dbconfig/20240613-152420-marostegui.json
15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64866 and previous config saved to /var/cache/conftool/dbconfig/20240613-152300-arnaudb.json
15:22 elukey: drop eventgate-ci docker images from the Docker Registry
15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64865 and previous config saved to /var/cache/conftool/dbconfig/20240613-151910-marostegui.json
15:15 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64864 and previous config saved to /var/cache/conftool/dbconfig/20240613-150913-marostegui.json
15:08 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:05 volans: upgrading spicerack on cumin1002 to v8.6.0
15:04 topranks: rebooting lsw1-f6-codfw to upgrade JunOS on switch T365983
15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
15:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64863 and previous config saved to /var/cache/conftool/dbconfig/20240613-150332-ladsgroup.json
15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
15:03 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
15:01 cdanis@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:01 cdanis@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:00 cdanis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:59 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:59 cdanis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:59 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:59 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:57 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:57 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:55 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64862 and previous config saved to /var/cache/conftool/dbconfig/20240613-145406-marostegui.json
14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 depool ahead of T365983', diff saved to https://phabricator.wikimedia.org/P64861 and previous config saved to /var/cache/conftool/dbconfig/20240613-145035-arnaudb.json
14:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
14:49 moritzm: rebalance ganeti/B in eqiad following reboots
14:49 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
14:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64860 and previous config saved to /var/cache/conftool/dbconfig/20240613-144825-ladsgroup.json
14:47 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:46 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
14:45 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
14:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
14:44 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
14:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
14:41 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762 (duration: 00m 05s)
14:41 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762
14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64859 and previous config saved to /var/cache/conftool/dbconfig/20240613-143859-marostegui.json
14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64858 and previous config saved to /var/cache/conftool/dbconfig/20240613-143554-marostegui.json
14:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
14:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64857 and previous config saved to /var/cache/conftool/dbconfig/20240613-143531-marostegui.json
14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64856 and previous config saved to /var/cache/conftool/dbconfig/20240613-143318-ladsgroup.json
14:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762 (duration: 00m 07s)
14:32 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762
14:27 bblack: authdns-update for https://gerrit.wikimedia.org/r/1042490 (remaps some Facebook ranges to codfw+eqiad)
14:24 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
14:21 cgoubert@deploy1002: Finished scap: Change mwapi listener to mw-api-int - T333120 (duration: 06m 47s)
14:21 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64855 and previous config saved to /var/cache/conftool/dbconfig/20240613-142024-marostegui.json
14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64854 and previous config saved to /var/cache/conftool/dbconfig/20240613-141810-ladsgroup.json
14:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
14:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
14:15 cgoubert@deploy1002: Started scap: Change mwapi listener to mw-api-int - T333120
14:05 Lucas_WMDE: UTC afternoon backport+config window done
14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1042208Load EntitySchema on Test Wikidata clients (T363153) (duration: 14m 14s)
14:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64853 and previous config saved to /var/cache/conftool/dbconfig/20240613-140517-marostegui.json
14:03 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
14:00 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
14:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync
13:59 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
13:59 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: sync
13:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync
13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64852 and previous config saved to /var/cache/conftool/dbconfig/20240613-135523-ladsgroup.json
13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: sync
13:55 claime: roll-restarting shellbox-constraints
13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for gerrit:1042208Load EntitySchema on Test Wikidata clients (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1042208Load EntitySchema on Test Wikidata clients (T363153)
13:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64851 and previous config saved to /var/cache/conftool/dbconfig/20240613-135010-marostegui.json
13:48 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
13:47 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64850 and previous config saved to /var/cache/conftool/dbconfig/20240613-134701-marostegui.json
13:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
13:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
13:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
13:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64849 and previous config saved to /var/cache/conftool/dbconfig/20240613-134639-marostegui.json
13:45 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1042997|[svwikt] Add a temporary logo for the 100.000 pages (T364247)]] (duration: 13m 24s)
13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T352010)', diff saved to https://phabricator.wikimedia.org/P64848 and previous config saved to /var/cache/conftool/dbconfig/20240613-134456-ladsgroup.json
13:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
13:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64847 and previous config saved to /var/cache/conftool/dbconfig/20240613-134017-ladsgroup.json
13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Continuing with sync
13:34 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Backport for [[gerrit:1042997|[svwikt] Add a temporary logo for the 100.000 pages (T364247)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:33 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:33 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1042997|[svwikt] Add a temporary logo for the 100.000 pages (T364247)]]
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64846 and previous config saved to /var/cache/conftool/dbconfig/20240613-133132-marostegui.json
13:30 volans: upgrading spicerack on cumin2002 to v8.6.0
13:26 moritzm: installing pillow security updates
13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64845 and previous config saved to /var/cache/conftool/dbconfig/20240613-132512-ladsgroup.json
13:18 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bookworm
13:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64844 and previous config saved to /var/cache/conftool/dbconfig/20240613-131746-ladsgroup.json
13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64843 and previous config saved to /var/cache/conftool/dbconfig/20240613-131625-marostegui.json
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64842 and previous config saved to /var/cache/conftool/dbconfig/20240613-131006-ladsgroup.json
13:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
13:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
13:06 moritzm: installing pillow security updates
13:03 jmm@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64841 and previous config saved to /var/cache/conftool/dbconfig/20240613-130117-marostegui.json
12:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64840 and previous config saved to /var/cache/conftool/dbconfig/20240613-125700-marostegui.json
12:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
12:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
12:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64839 and previous config saved to /var/cache/conftool/dbconfig/20240613-125648-marostegui.json
12:52 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
12:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
12:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
12:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64838 and previous config saved to /var/cache/conftool/dbconfig/20240613-124141-marostegui.json
12:39 elukey: reset BIOS/BMC to factory default on sretest1001 - T365372
12:30 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
12:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64837 and previous config saved to /var/cache/conftool/dbconfig/20240613-122634-marostegui.json
12:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
12:26 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
12:21 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1043006Temporarily bump circuit breaking threshold to 350 (duration: 12m 13s)
12:20 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:19 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:17 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:16 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:12 ladsgroup@deploy1002: ladsgroup: Continuing with sync
12:12 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:1043006Temporarily bump circuit breaking threshold to 350 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64836 and previous config saved to /var/cache/conftool/dbconfig/20240613-121127-marostegui.json
12:09 ladsgroup@deploy1002: Started scap: Backport for gerrit:1043006Temporarily bump circuit breaking threshold to 350
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64835 and previous config saved to /var/cache/conftool/dbconfig/20240613-120711-marostegui.json
12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
12:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64834 and previous config saved to /var/cache/conftool/dbconfig/20240613-120644-marostegui.json
11:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
11:57 fabfur: enabling puppet && repool cp4037 (T360454)
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64832 and previous config saved to /var/cache/conftool/dbconfig/20240613-115137-marostegui.json
11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64831 and previous config saved to /var/cache/conftool/dbconfig/20240613-113630-marostegui.json
11:35 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:29 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:28 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
11:22 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64830 and previous config saved to /var/cache/conftool/dbconfig/20240613-112122-marostegui.json
11:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
11:19 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64829 and previous config saved to /var/cache/conftool/dbconfig/20240613-111706-marostegui.json
11:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64828 and previous config saved to /var/cache/conftool/dbconfig/20240613-111655-ladsgroup.json
11:16 moritzm: installing pillow security updates
11:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
11:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64827 and previous config saved to /var/cache/conftool/dbconfig/20240613-111642-marostegui.json
11:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64826 and previous config saved to /var/cache/conftool/dbconfig/20240613-111633-ladsgroup.json
11:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
11:09 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
11:08 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:08 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64825 and previous config saved to /var/cache/conftool/dbconfig/20240613-110135-marostegui.json
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64824 and previous config saved to /var/cache/conftool/dbconfig/20240613-110126-ladsgroup.json
10:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
10:55 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
10:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
10:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
10:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
10:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:46 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64823 and previous config saved to /var/cache/conftool/dbconfig/20240613-104628-marostegui.json
10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64822 and previous config saved to /var/cache/conftool/dbconfig/20240613-104619-ladsgroup.json
10:43 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
10:42 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
10:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2010.codfw.wmnet
10:41 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
10:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:34 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2010.codfw.wmnet
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2009.codfw.wmnet
10:33 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64821 and previous config saved to /var/cache/conftool/dbconfig/20240613-103120-marostegui.json
10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64820 and previous config saved to /var/cache/conftool/dbconfig/20240613-103111-ladsgroup.json
10:31 cmooney@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
10:30 cmooney@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2009.codfw.wmnet
10:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2008.codfw.wmnet
10:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64819 and previous config saved to /var/cache/conftool/dbconfig/20240613-102659-marostegui.json
10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2287-2290].codfw.wmnet
10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:23 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
10:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
10:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2008.codfw.wmnet
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2007.codfw.wmnet
10:21 hashar: Gerrit upgrade completed
10:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
10:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
10:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64818 and previous config saved to /var/cache/conftool/dbconfig/20240613-102016-marostegui.json
10:20 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2007.codfw.wmnet
10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2006.codfw.wmnet
10:10 fabfur: cp4037 depooled && puppet disable to profile benthos configuration (T360454)
10:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2006.codfw.wmnet
10:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
10:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135 (duration: 00m 06s)
10:08 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135
10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2287-2290].codfw.wmnet
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2281,2283-2286].codfw.wmnet
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64816 and previous config saved to /var/cache/conftool/dbconfig/20240613-100509-marostegui.json
10:04 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 (duration: 00m 08s)
10:04 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1
10:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2003.codfw.wmnet
10:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:02 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:01 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
09:59 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1010.eqiad.wmnet
09:53 kamila@cumin1002: START - Cookbook sre.dns.netbox
09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1010.eqiad.wmnet
09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1009.eqiad.wmnet
09:50 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.eqiad.wmnet
09:50 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.eqiad.wmnet
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64815 and previous config saved to /var/cache/conftool/dbconfig/20240613-095002-marostegui.json
09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2281,2283-2286].codfw.wmnet
09:46 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2003.codfw.wmnet
09:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1009.eqiad.wmnet
09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1008.eqiad.wmnet
09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1008.eqiad.wmnet
09:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1007.eqiad.wmnet
09:39 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
09:38 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64814 and previous config saved to /var/cache/conftool/dbconfig/20240613-093455-marostegui.json
09:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1007.eqiad.wmnet
09:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1006.eqiad.wmnet
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64813 and previous config saved to /var/cache/conftool/dbconfig/20240613-093158-marostegui.json
09:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
09:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64812 and previous config saved to /var/cache/conftool/dbconfig/20240613-093136-marostegui.json
09:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1006.eqiad.wmnet
09:22 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64811 and previous config saved to /var/cache/conftool/dbconfig/20240613-091629-marostegui.json
09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64810 and previous config saved to /var/cache/conftool/dbconfig/20240613-091200-arnaudb.json
09:07 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
09:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64809 and previous config saved to /var/cache/conftool/dbconfig/20240613-090122-marostegui.json
08:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64808 and previous config saved to /var/cache/conftool/dbconfig/20240613-085654-arnaudb.json
08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64807 and previous config saved to /var/cache/conftool/dbconfig/20240613-084615-marostegui.json
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64806 and previous config saved to /var/cache/conftool/dbconfig/20240613-084310-marostegui.json
08:43 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
08:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64805 and previous config saved to /var/cache/conftool/dbconfig/20240613-084248-marostegui.json
08:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64804 and previous config saved to /var/cache/conftool/dbconfig/20240613-084149-arnaudb.json
08:37 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:36 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:30 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:29 kart_: Updated MinT to 2024-06-12-111204-production (T363563)
08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64803 and previous config saved to /var/cache/conftool/dbconfig/20240613-082741-marostegui.json
08:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64802 and previous config saved to /var/cache/conftool/dbconfig/20240613-082643-arnaudb.json
08:25 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
08:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
08:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64801 and previous config saved to /var/cache/conftool/dbconfig/20240613-081234-marostegui.json
08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64800 and previous config saved to /var/cache/conftool/dbconfig/20240613-081138-arnaudb.json
08:11 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
08:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2125.codfw.wmnet with reason: index issue
08:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2125.codfw.wmnet with reason: index issue
08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'index error depool db2125', diff saved to https://phabricator.wikimedia.org/P64799 and previous config saved to /var/cache/conftool/dbconfig/20240613-080624-arnaudb.json
08:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:59 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64798 and previous config saved to /var/cache/conftool/dbconfig/20240613-075727-marostegui.json
07:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64797 and previous config saved to /var/cache/conftool/dbconfig/20240613-075500-root.json
07:54 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64796 and previous config saved to /var/cache/conftool/dbconfig/20240613-075420-marostegui.json
07:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64795 and previous config saved to /var/cache/conftool/dbconfig/20240613-075358-marostegui.json
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64794 and previous config saved to /var/cache/conftool/dbconfig/20240613-073955-root.json
07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64793 and previous config saved to /var/cache/conftool/dbconfig/20240613-073851-marostegui.json
07:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64792 and previous config saved to /var/cache/conftool/dbconfig/20240613-072450-root.json
07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64791 and previous config saved to /var/cache/conftool/dbconfig/20240613-072344-marostegui.json
07:21 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64790 and previous config saved to /var/cache/conftool/dbconfig/20240613-070944-root.json
07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64789 and previous config saved to /var/cache/conftool/dbconfig/20240613-070837-marostegui.json
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64788 and previous config saved to /var/cache/conftool/dbconfig/20240613-070531-marostegui.json
07:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
07:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64787 and previous config saved to /var/cache/conftool/dbconfig/20240613-070509-marostegui.json
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64786 and previous config saved to /var/cache/conftool/dbconfig/20240613-065439-root.json
06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64785 and previous config saved to /var/cache/conftool/dbconfig/20240613-065002-marostegui.json
06:42 moritzm: rebalance ganeti clusters in eqiad following reboots
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64784 and previous config saved to /var/cache/conftool/dbconfig/20240613-063934-root.json
06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64783 and previous config saved to /var/cache/conftool/dbconfig/20240613-063455-marostegui.json
06:27 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64782 and previous config saved to /var/cache/conftool/dbconfig/20240613-061948-marostegui.json
06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64781 and previous config saved to /var/cache/conftool/dbconfig/20240613-061636-marostegui.json
06:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64780 and previous config saved to /var/cache/conftool/dbconfig/20240613-061613-marostegui.json
06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64779 and previous config saved to /var/cache/conftool/dbconfig/20240613-060927-ladsgroup.json
06:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
06:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64778 and previous config saved to /var/cache/conftool/dbconfig/20240613-060905-ladsgroup.json
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64777 and previous config saved to /var/cache/conftool/dbconfig/20240613-060107-marostegui.json
05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64776 and previous config saved to /var/cache/conftool/dbconfig/20240613-055747-marostegui.json
05:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
05:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64775 and previous config saved to /var/cache/conftool/dbconfig/20240613-055725-marostegui.json
05:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64774 and previous config saved to /var/cache/conftool/dbconfig/20240613-055358-ladsgroup.json
05:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
05:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64773 and previous config saved to /var/cache/conftool/dbconfig/20240613-054600-marostegui.json
05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64772 and previous config saved to /var/cache/conftool/dbconfig/20240613-054218-marostegui.json
05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64771 and previous config saved to /var/cache/conftool/dbconfig/20240613-053851-ladsgroup.json
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64770 and previous config saved to /var/cache/conftool/dbconfig/20240613-053052-marostegui.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64769 and previous config saved to /var/cache/conftool/dbconfig/20240613-052746-marostegui.json
05:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
05:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64768 and previous config saved to /var/cache/conftool/dbconfig/20240613-052723-marostegui.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64767 and previous config saved to /var/cache/conftool/dbconfig/20240613-052711-marostegui.json
05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64766 and previous config saved to /var/cache/conftool/dbconfig/20240613-052344-ladsgroup.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64765 and previous config saved to /var/cache/conftool/dbconfig/20240613-051216-marostegui.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64764 and previous config saved to /var/cache/conftool/dbconfig/20240613-051204-marostegui.json
04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64763 and previous config saved to /var/cache/conftool/dbconfig/20240613-045709-marostegui.json
04:55 marostegui: dbmaint eqiad s5 deploy schema change on db1230 T364299
04:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
04:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1230 T367146', diff saved to https://phabricator.wikimedia.org/P64762 and previous config saved to /var/cache/conftool/dbconfig/20240613-045254-root.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary and set section read-write T367146', diff saved to https://phabricator.wikimedia.org/P64761 and previous config saved to /var/cache/conftool/dbconfig/20240613-045141-root.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T367146', diff saved to https://phabricator.wikimedia.org/P64760 and previous config saved to /var/cache/conftool/dbconfig/20240613-045121-root.json
04:51 marostegui: Starting s5 eqiad failover from db1230 to db1183 - T367146
04:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64759 and previous config saved to /var/cache/conftool/dbconfig/20240613-044201-marostegui.json
04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64758 and previous config saved to /var/cache/conftool/dbconfig/20240613-043848-marostegui.json
04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
04:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T367146', diff saved to https://phabricator.wikimedia.org/P64757 and previous config saved to /var/cache/conftool/dbconfig/20240613-043239-root.json
04:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
00:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64756 and previous config saved to /var/cache/conftool/dbconfig/20240613-004247-marostegui.json
00:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
00:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64755 and previous config saved to /var/cache/conftool/dbconfig/20240613-003507-ladsgroup.json
00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
00:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
00:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64754 and previous config saved to /var/cache/conftool/dbconfig/20240613-003444-ladsgroup.json
00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64753 and previous config saved to /var/cache/conftool/dbconfig/20240613-001937-ladsgroup.json
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64752 and previous config saved to /var/cache/conftool/dbconfig/20240613-000430-ladsgroup.json

2024-06-12

23:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64751 and previous config saved to /var/cache/conftool/dbconfig/20240612-234923-ladsgroup.json
22:17 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
22:13 krinkle@deploy1002: Finished scap: Backport for gerrit:891733Move etcd.php from wmf-config/ to src/ (T308932) (duration: 13m 42s)
22:10 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
22:08 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
22:06 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
22:04 krinkle@deploy1002: krinkle: Continuing with sync
22:04 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
22:03 krinkle@deploy1002: krinkle: Backport for gerrit:891733Move etcd.php from wmf-config/ to src/ (T308932) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:59 krinkle@deploy1002: Started scap: Backport for gerrit:891733Move etcd.php from wmf-config/ to src/ (T308932)
21:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
21:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
21:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
21:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
21:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
21:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
21:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
21:33 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
21:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
21:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: sync
21:31 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: sync
21:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
21:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
21:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
21:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
21:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
21:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
21:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
21:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
21:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
21:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
21:22 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
21:21 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
21:20 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
21:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
21:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
21:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
21:17 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
21:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
21:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
21:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
21:05 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
21:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
20:53 cjming: end of UTC late backport window
20:52 cjming@deploy1002: Finished scap: Backport for gerrit:1041674Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) (duration: 12m 52s)
20:47 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
20:44 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
20:42 cjming@deploy1002: cjming, jdlrobson: Backport for gerrit:1041674Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:39 cjming@deploy1002: Started scap: Backport for gerrit:1041674Don't squish images in non-responsive skins e.g. Vector 2010 (T113101)
20:29 cjming@deploy1002: Finished scap: Backport for gerrit:1041748Disable quick surveys using deprecated configuration (T367128) (duration: 11m 59s)
20:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64750 and previous config saved to /var/cache/conftool/dbconfig/20240612-202233-marostegui.json
20:21 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
20:19 cjming@deploy1002: jdlrobson, cjming: Backport for gerrit:1041748Disable quick surveys using deprecated configuration (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:17 cjming@deploy1002: Started scap: Backport for gerrit:1041748Disable quick surveys using deprecated configuration (T367128)
20:10 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw
20:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64749 and previous config saved to /var/cache/conftool/dbconfig/20240612-200726-marostegui.json
20:00 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
19:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
19:58 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.9 refs T361403
19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64748 and previous config saved to /var/cache/conftool/dbconfig/20240612-195219-marostegui.json
19:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155 (duration: 00m 07s)
19:49 hashar@deploy1002: Started deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155
19:48 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
19:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
19:48 brennen: 1.43.0-wmf.9 train (T361403): blockers (hopefully) resolved, rolling to group1
19:46 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
19:45 brennen@deploy1002: Finished scap: Backport for gerrit:1042343Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) (duration: 13m 06s)
19:45 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
19:43 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
19:43 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
19:41 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
19:40 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
19:40 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
19:39 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
19:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
19:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
19:37 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
19:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64747 and previous config saved to /var/cache/conftool/dbconfig/20240612-193712-marostegui.json
19:36 brennen@deploy1002: brennen: Continuing with sync
19:36 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
19:36 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
19:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
19:35 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
19:35 brennen@deploy1002: brennen: Backport for gerrit:1042343Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
19:34 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
19:34 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
19:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
19:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
19:32 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
19:32 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
19:32 brennen@deploy1002: Started scap: Backport for gerrit:1042343Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153)
19:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
19:31 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
19:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
19:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
19:30 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
19:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
19:29 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
19:29 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
19:28 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
19:27 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
19:26 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
19:25 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64746 and previous config saved to /var/cache/conftool/dbconfig/20240612-192327-marostegui.json
19:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
19:23 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
19:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64745 and previous config saved to /var/cache/conftool/dbconfig/20240612-192303-marostegui.json
19:22 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
19:22 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:22 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:19 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
19:19 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
19:18 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
19:17 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
19:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
19:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
19:11 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
19:10 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
19:09 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:08 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64744 and previous config saved to /var/cache/conftool/dbconfig/20240612-190755-marostegui.json
19:06 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:06 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:03 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:02 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:02 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
19:02 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
18:59 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:59 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:59 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
18:58 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
18:58 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:57 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:55 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64742 and previous config saved to /var/cache/conftool/dbconfig/20240612-185248-marostegui.json
18:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:49 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
18:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
18:42 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
18:41 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
18:40 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
18:40 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
18:39 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
18:39 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
18:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64741 and previous config saved to /var/cache/conftool/dbconfig/20240612-183741-marostegui.json
18:24 ejegg: fundraising civicrm upgraded from 955166d1 to 76857844
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64740 and previous config saved to /var/cache/conftool/dbconfig/20240612-182343-marostegui.json
18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64739 and previous config saved to /var/cache/conftool/dbconfig/20240612-182321-marostegui.json
18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64738 and previous config saved to /var/cache/conftool/dbconfig/20240612-180814-marostegui.json
18:04 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
18:01 brennen: 1.43.0-wmf.9 train (T361403): currently blocked on T367334, holding at group0 until resolved.
17:59 mutante: gitlab-replica-old - downtime, renaming to gitlab-replica-b
17:58 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
17:58 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
17:58 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:57 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
17:57 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
17:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64737 and previous config saved to /var/cache/conftool/dbconfig/20240612-175306-marostegui.json
17:52 brett: authdns-update run on dns1004 (T364891)
17:51 brett: Repool ulsfo as A:cp-text nvme upgrades are complete (T364891)
17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:39 brett: Remove downtime of cache_text/cp text servers in ulsfo - T364891
17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64736 and previous config saved to /var/cache/conftool/dbconfig/20240612-173759-marostegui.json
17:30 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=ulsfo
17:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:25 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:24 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:24 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64735 and previous config saved to /var/cache/conftool/dbconfig/20240612-172406-marostegui.json
17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
17:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
17:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64734 and previous config saved to /var/cache/conftool/dbconfig/20240612-172344-marostegui.json
17:13 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
17:13 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:10 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
17:09 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
17:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:sessionstore
17:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64733 and previous config saved to /var/cache/conftool/dbconfig/20240612-170837-marostegui.json
16:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64732 and previous config saved to /var/cache/conftool/dbconfig/20240612-165329-marostegui.json
16:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
16:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64730 and previous config saved to /var/cache/conftool/dbconfig/20240612-162426-marostegui.json
16:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
16:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64729 and previous config saved to /var/cache/conftool/dbconfig/20240612-162403-marostegui.json
16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64728 and previous config saved to /var/cache/conftool/dbconfig/20240612-162134-ladsgroup.json
16:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64727 and previous config saved to /var/cache/conftool/dbconfig/20240612-162110-ladsgroup.json
16:20 brett: cumin 'A:cp-text and A:ulsfo' 'systemctl poweroff' - T364891
16:19 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
16:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 8 hosts with reason: T364891
16:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 8 hosts with reason: T364891
16:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
16:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:17 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
16:13 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
16:11 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 03m 19s)
16:10 jhathaway@deploy1002: Started scap: (no justification provided)
16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64726 and previous config saved to /var/cache/conftool/dbconfig/20240612-160856-marostegui.json
16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64725 and previous config saved to /var/cache/conftool/dbconfig/20240612-160603-ladsgroup.json
16:05 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:sessionstore
16:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:55 otto@deploy1002: Finished scap: Backport for gerrit:1041115Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) (duration: 12m 19s)
15:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64724 and previous config saved to /var/cache/conftool/dbconfig/20240612-155349-marostegui.json
15:53 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64723 and previous config saved to /var/cache/conftool/dbconfig/20240612-155056-ladsgroup.json
15:47 otto@deploy1002: otto: Continuing with sync
15:46 otto@deploy1002: otto: Backport for gerrit:1041115Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:43 otto@deploy1002: Started scap: Backport for gerrit:1041115Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817)
15:42 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64722 and previous config saved to /var/cache/conftool/dbconfig/20240612-153842-marostegui.json
15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64721 and previous config saved to /var/cache/conftool/dbconfig/20240612-153549-ladsgroup.json
15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
15:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
15:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
15:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
15:28 denisse@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=logstash,name=eqiad
15:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
15:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
15:25 volans: uploaded spicerack_8.6.0 to apt.wikimedia.org bullseye-wikimedia
15:25 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
15:24 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64720 and previous config saved to /var/cache/conftool/dbconfig/20240612-152403-marostegui.json
15:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64719 and previous config saved to /var/cache/conftool/dbconfig/20240612-152351-marostegui.json
15:23 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
15:12 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
15:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64718 and previous config saved to /var/cache/conftool/dbconfig/20240612-150844-marostegui.json
15:02 cdanis: T364907 💙cdanis@apt1002.wikimedia.org ~ 🕚☕ sudo -i reprepro --keepunreferencedfiles includedeb bullseye-wikimedia ~/otelcol-contrib_0.102.0_linux_amd64.deb
15:02 brett: authdns-update run on dns1004
15:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:00 brett: Depooling ulsfo in preparation for A:cp-text downtime/poweroff for nvme upgrades (T364891)
15:00 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1042283Revert "Only register EntitySchema namespace when feature is enabled", gerrit:1042284Revert "Allow loading EntitySchema on client (only) wikis" (duration: 12m 36s)
14:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64717 and previous config saved to /var/cache/conftool/dbconfig/20240612-145337-marostegui.json
14:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
14:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for gerrit:1042283Revert "Only register EntitySchema namespace when feature is enabled", gerrit:1042284Revert "Allow loading EntitySchema on client (only) wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-eqiad
14:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1042283Revert "Only register EntitySchema namespace when feature is enabled", gerrit:1042284Revert "Allow loading EntitySchema on client (only) wikis"
14:46 oblivian@deploy1002: Finished scap: Backport for gerrit:1041656Use the statsd-exporter service where available (T365265) (duration: 12m 05s)
14:44 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
14:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64716 and previous config saved to /var/cache/conftool/dbconfig/20240612-143830-marostegui.json
14:38 oblivian@deploy1002: oblivian: Continuing with sync
14:37 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:37 oblivian@deploy1002: oblivian: Backport for gerrit:1041656Use the statsd-exporter service where available (T365265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:36 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:35 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:35 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
14:34 moritzm: failover ganeti master in eqiad to ganeti1028
14:34 oblivian@deploy1002: Started scap: Backport for gerrit:1041656Use the statsd-exporter service where available (T365265)
14:34 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
14:31 moritzm: installing gst-plugins-base1.0 security updates
14:31 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:31 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:29 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
14:29 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
14:28 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:27 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:27 claime: trafficserver: move 95% of traffic to mw-on-k8s
14:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1041677Allow loading EntitySchema on client (only) wikis (T363153), gerrit:1041679Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 32s)
14:27 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:24 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
14:24 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64715 and previous config saved to /var/cache/conftool/dbconfig/20240612-142412-marostegui.json
14:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64714 and previous config saved to /var/cache/conftool/dbconfig/20240612-142335-marostegui.json
14:22 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:22 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:22 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
14:21 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
14:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
14:20 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
14:19 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
14:19 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
14:19 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
14:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
14:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for gerrit:1041677Allow loading EntitySchema on client (only) wikis (T363153), gerrit:1041679Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1020.eqiad.wmnet
14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1041677Allow loading EntitySchema on client (only) wikis (T363153), gerrit:1041679Only register EntitySchema namespace when feature is enabled (T363153)
14:10 moritzm: installing libarchive security updates
14:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64713 and previous config saved to /var/cache/conftool/dbconfig/20240612-140827-marostegui.json
14:07 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1020.eqiad.wmnet
14:02 vgutierrez: repool text@esams with IPIP encapsulation enabled - T366466
14:02 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
14:00 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
13:55 dcausse@deploy1002: Finished deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh) (duration: 00m 13s)
13:54 dcausse@deploy1002: Started deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh)
13:53 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 - T366466
13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64712 and previous config saved to /var/cache/conftool/dbconfig/20240612-135319-marostegui.json
13:49 fabfur: depooled cp4037 to test benthos/haproxy configuration (T365718)
13:48 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
13:48 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
13:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
13:46 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-eqiad
13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
13:45 claime: Starting kafka-main reboots in eqiad
13:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
13:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64710 and previous config saved to /var/cache/conftool/dbconfig/20240612-134414-marostegui.json
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
13:39 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64709 and previous config saved to /var/cache/conftool/dbconfig/20240612-133812-marostegui.json
13:38 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:37 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:36 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
13:36 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:35 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:34 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1010.eqiad.wmnet
13:34 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1010.eqiad.wmnet
13:34 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:34 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
13:30 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
13:30 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64708 and previous config saved to /var/cache/conftool/dbconfig/20240612-132907-marostegui.json
13:28 sukhe: add ntp-[abc].anycast.wmnet: 10.3.0.[5-7]/32: T366360
13:28 sukhe@cumin1002: START - Cookbook sre.dns.netbox
13:26 vgutierrez: depool text@esams before enabling IPIP encapsulation - T366466
13:26 dcausse@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023 (duration: 00m 14s)
13:25 dcausse@deploy1002: Started deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023
13:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64707 and previous config saved to /var/cache/conftool/dbconfig/20240612-132351-marostegui.json
13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
13:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1041678Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 15s)
13:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
13:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
13:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64706 and previous config saved to /var/cache/conftool/dbconfig/20240612-131400-marostegui.json
13:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
13:13 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for gerrit:1041678Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1041678Only register EntitySchema namespace when feature is enabled (T363153)
13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64705 and previous config saved to /var/cache/conftool/dbconfig/20240612-130232-root.json
13:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64704 and previous config saved to /var/cache/conftool/dbconfig/20240612-125853-marostegui.json
12:58 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1042197override circuit breaking threshold for ES hosts (duration: 16m 34s)
12:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
12:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
12:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
12:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64703 and previous config saved to /var/cache/conftool/dbconfig/20240612-124727-root.json
12:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64702 and previous config saved to /var/cache/conftool/dbconfig/20240612-124456-marostegui.json
12:44 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:1042197override circuit breaking threshold for ES hosts synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:42 ladsgroup@deploy1002: Started scap: Backport for gerrit:1042197override circuit breaking threshold for ES hosts
12:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
12:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64701 and previous config saved to /var/cache/conftool/dbconfig/20240612-123222-root.json
12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64700 and previous config saved to /var/cache/conftool/dbconfig/20240612-122948-marostegui.json
12:29 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
12:29 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
12:28 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
12:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
12:24 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
12:18 Emperor: restart swift-proxy on ms-fe1013 T360913
12:17 Emperor: restart swift-proxy on ms-fe2011 ms-fe2014 T360913
12:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64699 and previous config saved to /var/cache/conftool/dbconfig/20240612-121716-root.json
12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64698 and previous config saved to /var/cache/conftool/dbconfig/20240612-121441-marostegui.json
12:14 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
12:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
12:13 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
12:13 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
12:13 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
12:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
12:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
12:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
12:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
12:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
12:10 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
12:10 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
12:05 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
12:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64697 and previous config saved to /var/cache/conftool/dbconfig/20240612-120211-root.json
12:00 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64696 and previous config saved to /var/cache/conftool/dbconfig/20240612-115934-marostegui.json
11:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
11:59 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
11:58 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
11:57 claime: Manual restart of dump_cloud_ip_ranges.service on A:puppetserver and A:puppetmaster
11:55 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
11:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
11:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
11:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
11:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
11:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
11:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
11:53 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
11:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
11:52 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64695 and previous config saved to /var/cache/conftool/dbconfig/20240612-115143-marostegui.json
11:51 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
11:51 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64693 and previous config saved to /var/cache/conftool/dbconfig/20240612-115103-marostegui.json
11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
11:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
11:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64692 and previous config saved to /var/cache/conftool/dbconfig/20240612-114705-root.json
11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
11:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
11:45 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
11:45 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
11:45 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
11:45 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
11:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
11:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
11:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1191', diff saved to https://phabricator.wikimedia.org/P64691 and previous config saved to /var/cache/conftool/dbconfig/20240612-114410-root.json
11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
11:38 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:37 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
11:37 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
11:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
11:35 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64690 and previous config saved to /var/cache/conftool/dbconfig/20240612-113556-marostegui.json
11:35 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
11:31 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
11:31 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:30 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bookworm
11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64689 and previous config saved to /var/cache/conftool/dbconfig/20240612-112048-marostegui.json
11:14 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:14 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:13 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:12 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:10 moritzm: rebalance ganeti cluster in eqsin following reboots
11:08 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
11:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1041697EntitySchemaSlotViewRenderer: Fix Phan failure (duration: 12m 10s)
11:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
11:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64688 and previous config saved to /var/cache/conftool/dbconfig/20240612-110541-marostegui.json
11:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety" "Zabe" --reason "per request phab:T367217T367217"
11:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1003.eqiad.wmnet
11:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
11:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department" "Wikimedia Foundation/Legal" "Zabe" --reason "per request phab:T367216T367216"
11:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
10:58 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for gerrit:1041697EntitySchemaSlotViewRenderer: Fix Phan failure synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:57 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/Conversation hours and Events" "Wikimedia Foundation/Legal/Global Advocacy/Conversation hours and Events" "Zabe" --reason "per request phab:T367219T367219"
10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64687 and previous config saved to /var/cache/conftool/dbconfig/20240612-105615-marostegui.json
10:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
10:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1041697EntitySchemaSlotViewRenderer: Fix Phan failure
10:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
10:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64686 and previous config saved to /var/cache/conftool/dbconfig/20240612-105554-marostegui.json
10:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
10:54 kamila@cumin1002: START - Cookbook sre.dns.netbox
10:53 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/About" "Wikimedia Foundation/Legal/Global Advocacy/About" "Zabe" --reason "per request phab:T367219T367219"
10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
10:52 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
10:48 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1003.eqiad.wmnet
10:46 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1003.eqiad.wmnet
10:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy" "Wikimedia Foundation/Legal/Global Advocacy" "Zabe" --reason "per request phab:T367219T367219"
10:41 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1019.eqiad.wmnet
10:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64685 and previous config saved to /var/cache/conftool/dbconfig/20240612-104047-marostegui.json
10:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bookworm
10:27 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1019.eqiad.wmnet
10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64684 and previous config saved to /var/cache/conftool/dbconfig/20240612-102540-marostegui.json
10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
10:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
10:24 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
10:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
10:23 godog: remove MediaWiki.jawiki.GrowthExperiments.NewcomerTask.update_.* from graphite hosts - T362633
10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
10:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
10:19 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Zabe" --reason "per request phab:T365837T365837"
10:17 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
10:14 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
10:14 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
10:10 claime: Depooling mw2281.codfw.wmnet,mw22[83-90].codfw.wmnet for decommission - T367275
10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64683 and previous config saved to /var/cache/conftool/dbconfig/20240612-101032-marostegui.json
10:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
10:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
10:07 zabe: zabe@mwmaint1002:~$ foreachwikiindblist 'all - s4' refreshImageMetadata.php --mime image/webp # T364680
09:48 fabfur: disabling puppet on cp4037 to test benthos configuration (T360454)
09:47 fabfur: disabling puppet on cp4037 to test benthos configuration
09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64680 and previous config saved to /var/cache/conftool/dbconfig/20240612-094738-marostegui.json
09:47 _joe_: running dump_cloud_ip_ranges on puppetmaster1001 to test fixed script
09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
09:33 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64679 and previous config saved to /var/cache/conftool/dbconfig/20240612-093231-marostegui.json
09:32 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64678 and previous config saved to /var/cache/conftool/dbconfig/20240612-091724-marostegui.json
09:11 moritzm: failover ganeti cluster for eqsin to ganeti5004
09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64677 and previous config saved to /var/cache/conftool/dbconfig/20240612-090959-marostegui.json
09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64676 and previous config saved to /var/cache/conftool/dbconfig/20240612-090937-marostegui.json
09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64675 and previous config saved to /var/cache/conftool/dbconfig/20240612-090834-ladsgroup.json
09:06 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
09:04 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date
09:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64674 and previous config saved to /var/cache/conftool/dbconfig/20240612-090435-ladsgroup.json
09:04 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date # Ctrl+C, had become very slow, trying restart
08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64673 and previous config saved to /var/cache/conftool/dbconfig/20240612-085430-marostegui.json
08:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64672 and previous config saved to /var/cache/conftool/dbconfig/20240612-085329-ladsgroup.json
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
08:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64671 and previous config saved to /var/cache/conftool/dbconfig/20240612-084929-ladsgroup.json
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
08:42 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
08:42 zabe: zabe@mwmaint1002:~$ mwscript refreshImageMetadata.php commonswiki --mime image/webp # T364680
08:39 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mike Pham out of all services on: 2200 hosts
08:39 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64670 and previous config saved to /var/cache/conftool/dbconfig/20240612-083923-marostegui.json
08:38 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging Mike Pham out of all services on: 2200 hosts
08:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64669 and previous config saved to /var/cache/conftool/dbconfig/20240612-083824-ladsgroup.json
08:36 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12703' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12703 --new-data-type external-id --summary 'phabricator:T367174T367174' # succeeded
08:35 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12583' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12583 --new-data-type external-id --summary 'phabricator:T367174T367174' # succeeded
08:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64668 and previous config saved to /var/cache/conftool/dbconfig/20240612-083424-ladsgroup.json
08:28 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
08:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P64667 and previous config saved to /var/cache/conftool/dbconfig/20240612-082702-marostegui.json
08:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw
08:26 fabfur: start rebooting all cp-upload_codfw hosts for T366555 (spaced 1.5 hrs)
08:25 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
08:25 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
08:25 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64666 and previous config saved to /var/cache/conftool/dbconfig/20240612-082415-marostegui.json
08:24 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64665 and previous config saved to /var/cache/conftool/dbconfig/20240612-082318-ladsgroup.json
08:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
08:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
08:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64664 and previous config saved to /var/cache/conftool/dbconfig/20240612-081918-ladsgroup.json
08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
08:17 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64663 and previous config saved to /var/cache/conftool/dbconfig/20240612-081643-root.json
08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64662 and previous config saved to /var/cache/conftool/dbconfig/20240612-081551-marostegui.json
08:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
08:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
08:15 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:15 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:12 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:12 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64661 and previous config saved to /var/cache/conftool/dbconfig/20240612-081158-ladsgroup.json
08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
08:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
08:09 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
08:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
07:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
07:36 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
07:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
07:20 marostegui: dbmaint optimize pagelinks on old s6 codfw master db2214 T364069
07:16 kartik@deploy1002: Finished scap: Backport for gerrit:1041899Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) (duration: 13m 11s)
07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:14 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T367262', diff saved to https://phabricator.wikimedia.org/P64660 and previous config saved to /var/cache/conftool/dbconfig/20240612-071340-root.json
07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2129 to s6 primary T367262', diff saved to https://phabricator.wikimedia.org/P64659 and previous config saved to /var/cache/conftool/dbconfig/20240612-071158-root.json
07:06 kartik@deploy1002: kartik: Continuing with sync
07:05 kartik@deploy1002: kartik: Backport for gerrit:1041899Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 marostegui: Starting s6 codfw failover from db2214 to db2129 - T367262
07:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64658 and previous config saved to /var/cache/conftool/dbconfig/20240612-070302-marostegui.json
07:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
07:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:02 kartik@deploy1002: Started scap: Backport for gerrit:1041899Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356)
07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64657 and previous config saved to /var/cache/conftool/dbconfig/20240612-070240-marostegui.json
07:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
06:55 moritzm: remove ganeti1019 from eqiad cluster T367071
06:54 moritzm: rebalance ganeti clusters in codfw following reboots
06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64656 and previous config saved to /var/cache/conftool/dbconfig/20240612-064733-marostegui.json
06:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
06:43 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2129 with weight 0 T367262', diff saved to https://phabricator.wikimedia.org/P64655 and previous config saved to /var/cache/conftool/dbconfig/20240612-064200-root.json
06:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
06:40 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:38 hashar@deploy1002: Finished deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550 (duration: 00m 07s)
06:38 hashar@deploy1002: Started deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64654 and previous config saved to /var/cache/conftool/dbconfig/20240612-063225-marostegui.json
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64653 and previous config saved to /var/cache/conftool/dbconfig/20240612-061718-marostegui.json
05:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:17 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
05:17 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
05:17 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
05:16 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
05:16 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
05:16 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64652 and previous config saved to /var/cache/conftool/dbconfig/20240612-005420-marostegui.json
00:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
00:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
00:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64651 and previous config saved to /var/cache/conftool/dbconfig/20240612-005347-marostegui.json
00:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw
00:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64650 and previous config saved to /var/cache/conftool/dbconfig/20240612-003840-marostegui.json
00:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64649 and previous config saved to /var/cache/conftool/dbconfig/20240612-002332-marostegui.json
00:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64648 and previous config saved to /var/cache/conftool/dbconfig/20240612-000825-marostegui.json

2024-06-11

23:45 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
23:45 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
22:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
22:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-codfw
21:56 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1041297Fix Linker::makeExternalLink build failures (T367127) (duration: 12m 33s)
21:51 ejegg: fundraising civicrm upgraded from 7252b1b9 to f7855d25
21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Continuing with sync
21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Backport for gerrit:1041297Fix Linker::makeExternalLink build failures (T367127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:44 ladsgroup@deploy1002: Started scap: Backport for gerrit:1041297Fix Linker::makeExternalLink build failures (T367127)
21:42 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1041698Reduce the threshold for section wide circuit breaking to 300 (duration: 12m 08s)
21:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
21:32 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:1041698Reduce the threshold for section wide circuit breaking to 300 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:30 ladsgroup@deploy1002: Started scap: Backport for gerrit:1041698Reduce the threshold for section wide circuit breaking to 300
21:27 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:1038899|[zghwiki] Add patroller and autopatrolled groups (T357411)]] (duration: 11m 53s)
21:18 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
21:18 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [[gerrit:1038899|[zghwiki] Add patroller and autopatrolled groups (T357411)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:16 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:1038899|[zghwiki] Add patroller and autopatrolled groups (T357411)]]
21:15 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1041769Stop writing to the old pagelinks columns of s2 (T352010) (duration: 12m 02s)
21:06 ladsgroup@deploy1002: ladsgroup: Continuing with sync
21:05 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:1041769Stop writing to the old pagelinks columns of s2 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:03 ladsgroup@deploy1002: Started scap: Backport for gerrit:1041769Stop writing to the old pagelinks columns of s2 (T352010)
21:01 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1041311Avoid wrapping floated tables using computed styles (T366314) (duration: 14m 28s)
20:52 ejegg: re-enabled fundraising scheduled jobs
20:52 ladsgroup@deploy1002: jdlrobson, ladsgroup: Continuing with sync
20:49 ladsgroup@deploy1002: jdlrobson, ladsgroup: Backport for gerrit:1041311Avoid wrapping floated tables using computed styles (T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:46 ladsgroup@deploy1002: Started scap: Backport for gerrit:1041311Avoid wrapping floated tables using computed styles (T366314)
20:46 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1031459Drop unused config, enable responsive tables on group 0 (T301212 T366314) (duration: 14m 18s)
20:36 ladsgroup@deploy1002: ladsgroup, jdlrobson: Continuing with sync
20:34 ladsgroup@deploy1002: ladsgroup, jdlrobson: Backport for gerrit:1031459Drop unused config, enable responsive tables on group 0 (T301212 T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:31 ladsgroup@deploy1002: Started scap: Backport for gerrit:1031459Drop unused config, enable responsive tables on group 0 (T301212 T366314)
20:30 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:1038901|[ptwikinews] Set atom feed link (T356003)]], [[gerrit:1038897|[jawikinews] Set $wgArticleCountMethod to any (T364189)]] (duration: 12m 52s)
20:21 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
20:20 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [[gerrit:1038901|[ptwikinews] Set atom feed link (T356003)]], [[gerrit:1038897|[jawikinews] Set $wgArticleCountMethod to any (T364189)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:17 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:1038901|[ptwikinews] Set atom feed link (T356003)]], [[gerrit:1038897|[jawikinews] Set $wgArticleCountMethod to any (T364189)]]
20:16 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1041249MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) (duration: 12m 54s)
20:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-codfw
20:07 ladsgroup@deploy1002: ladsgroup, pppery: Continuing with sync
20:06 ladsgroup@deploy1002: ladsgroup, pppery: Backport for gerrit:1041249MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:03 ladsgroup@deploy1002: Started scap: Backport for gerrit:1041249MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994)
19:38 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
19:38 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
19:33 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64646 and previous config saved to /var/cache/conftool/dbconfig/20240611-192403-ladsgroup.json
19:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64645 and previous config saved to /var/cache/conftool/dbconfig/20240611-190855-ladsgroup.json
18:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-eqiad
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64644 and previous config saved to /var/cache/conftool/dbconfig/20240611-185348-ladsgroup.json
18:46 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:44 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:41 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64643 and previous config saved to /var/cache/conftool/dbconfig/20240611-183841-ladsgroup.json
18:37 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
18:22 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.9 refs T361403
18:19 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:19 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64642 and previous config saved to /var/cache/conftool/dbconfig/20240611-181526-marostegui.json
18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
18:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64641 and previous config saved to /var/cache/conftool/dbconfig/20240611-181448-marostegui.json
18:10 brennen: 1.43.0-wmf.9 train (T361403): no blockers, rolling to group0
18:08 ejegg: stopped fundraising scheduled jobs
17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64640 and previous config saved to /var/cache/conftool/dbconfig/20240611-175941-marostegui.json
17:59 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:56 taavi@deploy1002: Finished scap: Backport for gerrit:1038750wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) (duration: 12m 00s)
17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:47 taavi@deploy1002: taavi: Continuing with sync
17:47 taavi@deploy1002: taavi: Backport for gerrit:1038750wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64639 and previous config saved to /var/cache/conftool/dbconfig/20240611-174434-marostegui.json
17:44 taavi@deploy1002: Started scap: Backport for gerrit:1038750wikitech: Stop loading OpenStackManager (T161553 T338477 T359544)
17:37 rzl@deploy1002: Finished scap: (no justification provided) (duration: 11m 40s)
17:33 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'enable-puppet T366649'
17:33 rzl@deploy1002: rzl: Continuing with sync
17:30 rzl@deploy1002: rzl: (no justification provided) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64638 and previous config saved to /var/cache/conftool/dbconfig/20240611-172928-marostegui.json
17:26 rzl@deploy1002: Started scap: (no justification provided)
17:14 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'disable-puppet T366649'
17:11 ejegg: fundraising civicrm upgraded from ebfbad86 to 7252b1b9
17:09 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:09 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:09 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
17:08 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:08 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
17:04 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:04 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
17:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
16:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:56 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
16:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
16:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
16:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:36 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:35 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:35 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:33 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
16:31 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1013.eqiad.wmnet|wikikube-worker1014.eqiad.wmnet|wikikube-worker1017.eqiad.wmnet|wikikube-worker1018.eqiad.wmnet),cluster=kubernetes,service=kubesvc
16:31 claime: pool and uncordon wikikube-worker1013.eqiad.wmnet,wikikube-worker1014.eqiad.wmnet,wikikube-worker1017.eqiad.wmnet,wikikube-worker1018.eqiad.wmnet - T351074
16:31 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
16:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:28 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:27 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.eqiad.wmnet
16:26 kamila@cumin1002: START - Cookbook sre.dns.netbox
16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64637 and previous config saved to /var/cache/conftool/dbconfig/20240611-162154-arnaudb.json
16:21 claime: homer 'cr*eqiad*' commit 'T351074'
16:16 elukey: manual run of docker-report-k8s on build2001 (some failed results)
16:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1017.eqiad.wmnet with OS bullseye
16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1018.eqiad.wmnet with OS bullseye
16:07 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64636 and previous config saved to /var/cache/conftool/dbconfig/20240611-160649-arnaudb.json
16:06 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1014.eqiad.wmnet with OS bullseye
16:05 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
16:05 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:05 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
16:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
16:04 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
16:03 claime: roll restarting eventgate-main eqiad
16:00 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64635 and previous config saved to /var/cache/conftool/dbconfig/20240611-155143-arnaudb.json
15:51 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
15:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: apply
15:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
14:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1013.eqiad.wmnet with OS bullseye
14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1013.eqiad.wmnet on all recursors
14:52 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1013.eqiad.wmnet on all recursors
14:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014
14:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
14:51 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1403 to wikikube-worker1014.eqiad.wmnet
14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014.eqiad.wmnet
14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1402 to wikikube-worker1013
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1013
14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 depool T365982', diff saved to https://phabricator.wikimedia.org/P64631 and previous config saved to /var/cache/conftool/dbconfig/20240611-144624-arnaudb.json
14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1013
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
14:45 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
14:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1002.eqiad.wmnet
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
14:42 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
14:38 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
14:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1402 to wikikube-worker1013
14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
14:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
14:29 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1002.eqiad.wmnet
14:29 claime: depooling mw1402 mw1403 mw1406 mw1411 for reimage to k8s - T351074
14:29 Lucas_WMDE: UTC afternoon backport+config window done
14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1041320Enable Vector appearance menu & larger font-size on wikipedias (T362148) (duration: 19m 08s)
14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
14:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
14:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
14:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Continuing with sync
14:18 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1002.eqiad.wmnet
14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
14:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Backport for gerrit:1041320Enable Vector appearance menu & larger font-size on wikipedias (T362148) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1041320Enable Vector appearance menu & larger font-size on wikipedias (T362148)
14:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
14:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1041096Enable CampaignEvents on swahili wikipedia (T366502) (duration: 14m 40s)
14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
14:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
14:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1017.eqiad.wmnet
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
13:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
13:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Continuing with sync
13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Backport for gerrit:1041096Enable CampaignEvents on swahili wikipedia (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1041096Enable CampaignEvents on swahili wikipedia (T366502)
13:52 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for gerrit:1041094Configures the necessary user rights for CampaignEvents on swahili (T366502) (duration: 44m 51s)
13:50 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1007.eqiad.wmnet
13:50 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:50 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
13:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
13:48 btullis@cumin1002: START - Cookbook sre.dns.netbox
13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
13:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
13:45 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
13:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
13:45 vgutierrez: rolling switch from tcp-mss-clamper to ferm based MSS clamping on A:ncredir - T365689
13:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
13:42 jiji@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-eqiad
13:40 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1007.eqiad.wmnet
13:40 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1006.eqiad.wmnet
13:40 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:40 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
13:36 vgutierrez: repool ncredir6001 - T365689
13:36 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
13:33 moritzm: failover ganeti cluster for esams01 to ganeti3005
13:32 moritzm: failover ganeti cluster for esams02 to ganeti3006
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64630 and previous config saved to /var/cache/conftool/dbconfig/20240611-132043-ladsgroup.json
13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Continuing with sync
13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
13:15 vgutierrez: depool ncredir6001 - T365689
13:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
13:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Backport for gerrit:1041094Configures the necessary user rights for CampaignEvents on swahili (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
13:09 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:07 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw
13:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
13:06 vgutierrez: disable puppet on A:ncredir before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1035724 - T365689
13:06 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for gerrit:1041094Configures the necessary user rights for CampaignEvents on swahili (T366502)
13:06 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:06 btullis@cumin1002: START - Cookbook sre.dns.netbox
13:06 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64629 and previous config saved to /var/cache/conftool/dbconfig/20240611-130535-ladsgroup.json
13:04 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1016.eqiad.wmnet
13:03 vgutierrez: repool text@eqiad with IPIP encapsulation enabled - T366466
13:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:01 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
12:59 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1006.eqiad.wmnet
12:53 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1016.eqiad.wmnet
12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64628 and previous config saved to /var/cache/conftool/dbconfig/20240611-125028-ladsgroup.json
12:50 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 - T366466
12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64627 and previous config saved to /var/cache/conftool/dbconfig/20240611-123521-ladsgroup.json
12:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64626 and previous config saved to /var/cache/conftool/dbconfig/20240611-123046-ladsgroup.json
12:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
12:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
12:26 fabfur: cancelled previous command (text@eqiad is going to be depooled at the same time)
12:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
12:23 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
12:19 vgutierrez: depool text@eqiad before enabling IPIP encapsulation - T366466
12:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
12:14 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
12:13 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
12:13 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
12:11 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
12:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
12:10 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
12:09 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64625 and previous config saved to /var/cache/conftool/dbconfig/20240611-120710-ladsgroup.json
12:07 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
12:06 claime: Finished kafka-main reboots in codfw
12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-codfw
12:05 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
12:05 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
12:04 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1005.eqiad.wmnet
12:04 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:04 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
12:04 moritzm: rebalance ganeti cluster in ulsfo following reboots
12:04 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
12:03 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
12:02 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
11:57 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
11:55 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
11:55 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
11:55 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
11:54 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64624 and previous config saved to /var/cache/conftool/dbconfig/20240611-115203-ladsgroup.json
11:51 jayme: removed similar-users deployments from all k8s clusters - T345274
11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64621 and previous config saved to /var/cache/conftool/dbconfig/20240611-113656-ladsgroup.json
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64620 and previous config saved to /var/cache/conftool/dbconfig/20240611-113452-marostegui.json
11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64619 and previous config saved to /var/cache/conftool/dbconfig/20240611-113430-marostegui.json
11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64618 and previous config saved to /var/cache/conftool/dbconfig/20240611-113121-root.json
11:29 moritzm: failover ganeti master in ulsfo to ganeti4008
11:27 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
11:26 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:24 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
11:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64617 and previous config saved to /var/cache/conftool/dbconfig/20240611-112149-ladsgroup.json
11:21 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64616 and previous config saved to /var/cache/conftool/dbconfig/20240611-111922-marostegui.json
11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64615 and previous config saved to /var/cache/conftool/dbconfig/20240611-111616-root.json
11:15 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:13 jayme: removing similar-users service - T345274
11:12 btullis@cumin1002: START - Cookbook sre.dns.netbox
11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s6
11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
11:07 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1015.eqiad.wmnet
11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
11:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
11:06 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-codfw
11:05 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:05 claime: Starting kafka-main reboots in codfw
11:04 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1004.eqiad.wmnet
11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64614 and previous config saved to /var/cache/conftool/dbconfig/20240611-110414-marostegui.json
11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
10:57 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
10:57 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
10:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64613 and previous config saved to /var/cache/conftool/dbconfig/20240611-104908-marostegui.json
10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364069
10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364299
10:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
10:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
10:45 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1015.eqiad.wmnet
10:45 claime: move 90% of traffic to mw-on-k8s - T362323
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123 T367145', diff saved to https://phabricator.wikimedia.org/P64612 and previous config saved to /var/cache/conftool/dbconfig/20240611-104336-root.json
10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T367145', diff saved to https://phabricator.wikimedia.org/P64611 and previous config saved to /var/cache/conftool/dbconfig/20240611-104232-root.json
10:42 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
10:42 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
10:42 marostegui: Starting s5 codfw failover from db2123 to db2213 - T367145
10:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s6
10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
10:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:37 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
10:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
10:32 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T367145', diff saved to https://phabricator.wikimedia.org/P64610 and previous config saved to /var/cache/conftool/dbconfig/20240611-102900-root.json
10:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2213 with weight 0 T367145', diff saved to https://phabricator.wikimedia.org/P64609 and previous config saved to /var/cache/conftool/dbconfig/20240611-102820-root.json
10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
10:27 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
10:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64608 and previous config saved to /var/cache/conftool/dbconfig/20240611-102444-ladsgroup.json
10:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
10:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64607 and previous config saved to /var/cache/conftool/dbconfig/20240611-102125-ladsgroup.json
10:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
10:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
10:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
10:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
10:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
10:16 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
10:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
10:16 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
10:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1014.eqiad.wmnet
10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
10:14 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
10:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64606 and previous config saved to /var/cache/conftool/dbconfig/20240611-101400-arnaudb.json
10:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
10:10 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
10:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
10:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
10:08 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
10:08 jayme@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
10:07 brouberol@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:06 brouberol@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
10:04 brouberol@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:04 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1014.eqiad.wmnet
10:03 brouberol@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:02 brouberol@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:01 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
10:01 brouberol@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
10:01 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:00 brouberol@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
10:00 sukhe: [end] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru: T346722
09:59 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:59 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:59 sukhe: [start] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru
09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64605 and previous config saved to /var/cache/conftool/dbconfig/20240611-095853-arnaudb.json
09:58 brouberol@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:58 brouberol@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:57 brouberol@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:57 brouberol@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
09:45 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
09:44 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
09:44 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
09:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64604 and previous config saved to /var/cache/conftool/dbconfig/20240611-094347-arnaudb.json
09:43 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
09:42 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
09:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
09:42 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
09:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
09:37 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet
09:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
09:35 moritzm: rebalance ganeti clusters in codfw following reboots
09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64603 and previous config saved to /var/cache/conftool/dbconfig/20240611-092839-arnaudb.json
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64602 and previous config saved to /var/cache/conftool/dbconfig/20240611-092504-arnaudb.json
09:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
09:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
09:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
09:16 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
09:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
09:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
08:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:53 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
08:47 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:46 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
08:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
08:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
08:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
08:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
08:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
08:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
08:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date
08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
08:33 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.ulsfo.wmnet
08:32 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
08:31 marostegui: Install 10.11 on db1153 (non used x2 replica) T365805
08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
08:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
08:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
08:31 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
08:30 marostegui: Install 10.11 on db1153 (non used x2 replioca)
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64600 and previous config saved to /var/cache/conftool/dbconfig/20240611-081314-root.json
08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
08:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
07:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64599 and previous config saved to /var/cache/conftool/dbconfig/20240611-075809-root.json
07:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
07:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
07:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
07:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
07:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64598 and previous config saved to /var/cache/conftool/dbconfig/20240611-074304-root.json
07:40 kart_: Updated MinT to 2024-06-11-052620-production (T364122, T346226, T357548)
07:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64597 and previous config saved to /var/cache/conftool/dbconfig/20240611-074009-root.json
07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
07:37 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:36 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
07:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
07:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64596 and previous config saved to /var/cache/conftool/dbconfig/20240611-072758-root.json
07:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
07:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64595 and previous config saved to /var/cache/conftool/dbconfig/20240611-072504-root.json
07:18 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
07:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64594 and previous config saved to /var/cache/conftool/dbconfig/20240611-071253-root.json
07:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64593 and previous config saved to /var/cache/conftool/dbconfig/20240611-070958-root.json
07:05 arnaudb@deploy1002: Finished scap: Backport for gerrit:1041401Revert "dbconfig: temporary disable writes on es6" (duration: 11m 36s)
07:02 moritzm: failover ganeti master in codfw to ganeti2020
06:57 arnaudb@deploy1002: arnaudb: Continuing with sync
06:56 arnaudb@deploy1002: arnaudb: Backport for gerrit:1041401Revert "dbconfig: temporary disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64592 and previous config saved to /var/cache/conftool/dbconfig/20240611-065453-root.json
06:54 arnaudb@deploy1002: Started scap: Backport for gerrit:1041401Revert "dbconfig: temporary disable writes on es6"
06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64591 and previous config saved to /var/cache/conftool/dbconfig/20240611-064041-arnaudb.json
06:40 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe (duration: 15m 33s)
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64590 and previous config saved to /var/cache/conftool/dbconfig/20240611-063947-root.json
06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64589 and previous config saved to /var/cache/conftool/dbconfig/20240611-063903-arnaudb.json
06:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1037 to es6 primary T367055', diff saved to https://phabricator.wikimedia.org/P64588 and previous config saved to /var/cache/conftool/dbconfig/20240611-063109-arnaudb.json
06:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:30 arnaudb: Starting es6 eqiad failover from es1038 to es1037 - T367055
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64587 and previous config saved to /var/cache/conftool/dbconfig/20240611-062441-root.json
06:24 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe
06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1037 with weight 0 T367055', diff saved to https://phabricator.wikimedia.org/P64586 and previous config saved to /var/cache/conftool/dbconfig/20240611-062353-arnaudb.json
06:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
06:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
06:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64585 and previous config saved to /var/cache/conftool/dbconfig/20240611-061413-root.json
06:12 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
06:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64584 and previous config saved to /var/cache/conftool/dbconfig/20240611-060935-root.json
06:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:07 arnaudb@deploy1002: Finished scap: Backport for gerrit:1041107dbconfig: temporary disable writes on es6 (T367055) (duration: 15m 42s)
05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64583 and previous config saved to /var/cache/conftool/dbconfig/20240611-055907-root.json
05:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
05:58 arnaudb@deploy1002: arnaudb: Continuing with sync
05:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
05:58 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1233', diff saved to https://phabricator.wikimedia.org/P64582 and previous config saved to /var/cache/conftool/dbconfig/20240611-055816-arnaudb.json
05:56 arnaudb@deploy1002: arnaudb: Backport for gerrit:1041107dbconfig: temporary disable writes on es6 (T367055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
05:51 arnaudb@deploy1002: Started scap: Backport for gerrit:1041107dbconfig: temporary disable writes on es6 (T367055)
05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64581 and previous config saved to /var/cache/conftool/dbconfig/20240611-054401-root.json
05:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64580 and previous config saved to /var/cache/conftool/dbconfig/20240611-052856-root.json
05:24 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364069
05:22 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364299
05:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
05:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223 T367140', diff saved to https://phabricator.wikimedia.org/P64579 and previous config saved to /var/cache/conftool/dbconfig/20240611-052101-root.json
05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write T367140', diff saved to https://phabricator.wikimedia.org/P64578 and previous config saved to /var/cache/conftool/dbconfig/20240611-052000-root.json
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T367140', diff saved to https://phabricator.wikimedia.org/P64577 and previous config saved to /var/cache/conftool/dbconfig/20240611-051941-root.json
05:19 marostegui: Starting s3 eqiad failover from db1223 to db1157 - T367140
05:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64576 and previous config saved to /var/cache/conftool/dbconfig/20240611-051351-root.json
05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1157 with weight 0 T367140', diff saved to https://phabricator.wikimedia.org/P64575 and previous config saved to /var/cache/conftool/dbconfig/20240611-050351-root.json
05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64574 and previous config saved to /var/cache/conftool/dbconfig/20240611-045845-root.json
04:57 marostegui: dbmaint eqiad s2 deploy schema change on db1222 T364299
04:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
04:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1222 T366687', diff saved to https://phabricator.wikimedia.org/P64573 and previous config saved to /var/cache/conftool/dbconfig/20240611-045447-root.json
04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1162 to s2 primary and set section read-write T366687', diff saved to https://phabricator.wikimedia.org/P64572 and previous config saved to /var/cache/conftool/dbconfig/20240611-045359-root.json
04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T366687', diff saved to https://phabricator.wikimedia.org/P64571 and previous config saved to /var/cache/conftool/dbconfig/20240611-045341-root.json
04:53 marostegui: Starting s2 eqiad failover from db1222 to db1162 - T366687
04:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64570 and previous config saved to /var/cache/conftool/dbconfig/20240611-044616-marostegui.json
04:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
04:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64569 and previous config saved to /var/cache/conftool/dbconfig/20240611-044339-root.json
04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1162 with weight 0 T366687', diff saved to https://phabricator.wikimedia.org/P64568 and previous config saved to /var/cache/conftool/dbconfig/20240611-043333-marostegui.json
04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
04:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64567 and previous config saved to /var/cache/conftool/dbconfig/20240611-041938-ladsgroup.json
04:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64566 and previous config saved to /var/cache/conftool/dbconfig/20240611-040432-ladsgroup.json
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.6 (duration: 01m 05s)
04:00 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.9 refs T361403 (duration: 57m 19s)
03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64565 and previous config saved to /var/cache/conftool/dbconfig/20240611-034925-ladsgroup.json
03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64564 and previous config saved to /var/cache/conftool/dbconfig/20240611-033418-ladsgroup.json
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.9 refs T361403
00:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad

2024-06-10

23:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
23:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
22:30 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:30 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:28 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:27 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:25 reedy@deploy1002: Synchronized wmf-config/: sync interwiki lists (duration: 10m 07s)
22:14 reedy@deploy1002: Synchronized langlist-labs: Add fr and bn (duration: 14m 29s)
21:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
21:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
21:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64563 and previous config saved to /var/cache/conftool/dbconfig/20240610-215622-marostegui.json
21:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64562 and previous config saved to /var/cache/conftool/dbconfig/20240610-214115-marostegui.json
21:27 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64561 and previous config saved to /var/cache/conftool/dbconfig/20240610-212608-marostegui.json
21:19 ejegg: fundraising python tools upgraded from 8c98b674 to c51f6e62
21:19 ejegg: Standalone SmashPig upgraded from edf573bb to 1d1b770c
21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64560 and previous config saved to /var/cache/conftool/dbconfig/20240610-211101-marostegui.json
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64559 and previous config saved to /var/cache/conftool/dbconfig/20240610-204622-ladsgroup.json
20:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
20:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64558 and previous config saved to /var/cache/conftool/dbconfig/20240610-204600-ladsgroup.json
20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64557 and previous config saved to /var/cache/conftool/dbconfig/20240610-203053-ladsgroup.json
20:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
20:30 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64556 and previous config saved to /var/cache/conftool/dbconfig/20240610-201546-ladsgroup.json
20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64555 and previous config saved to /var/cache/conftool/dbconfig/20240610-200039-ladsgroup.json
19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64554 and previous config saved to /var/cache/conftool/dbconfig/20240610-195826-marostegui.json
19:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
19:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64553 and previous config saved to /var/cache/conftool/dbconfig/20240610-195804-marostegui.json
19:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64552 and previous config saved to /var/cache/conftool/dbconfig/20240610-194256-marostegui.json
19:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64551 and previous config saved to /var/cache/conftool/dbconfig/20240610-192749-marostegui.json
19:22 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
19:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64550 and previous config saved to /var/cache/conftool/dbconfig/20240610-191242-marostegui.json
19:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
19:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
18:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:50 amastilovic@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:50 amastilovic@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
17:47 amastilovic@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:46 amastilovic@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64547 and previous config saved to /var/cache/conftool/dbconfig/20240610-174349-marostegui.json
17:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
17:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64546 and previous config saved to /var/cache/conftool/dbconfig/20240610-174327-marostegui.json
17:37 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:36 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:30 dancy@deploy1002: Installation of scap version "4.87.0" completed for 285 hosts
17:29 amastilovic@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:29 amastilovic@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64545 and previous config saved to /var/cache/conftool/dbconfig/20240610-172820-marostegui.json
17:25 dancy@deploy1002: Installing scap version "4.87.0" for 285 hosts
17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64544 and previous config saved to /var/cache/conftool/dbconfig/20240610-171313-marostegui.json
17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
16:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64543 and previous config saved to /var/cache/conftool/dbconfig/20240610-165806-marostegui.json
16:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:21 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:20 marostegui: Drop flaggedpage_pending from s1 T365568
16:05 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕛☕ sudo cumin -b 8 '*.codfw.wmnet and C:geoip::data::puppet%fetch_ipinfo_dbs=true' 'sha512sum /usr/share/GeoIPInfo/GeoLite2-ASN.mmdb || run-puppet-agent'
16:01 cdanis: 💙cdanis@puppetserver2001.codfw.wmnet ~ 🕛☕ sudo systemctl restart sync-puppet-volatile
16:00 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
15:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:47 marostegui: Drop flaggedpage_pending from s3 T365568
15:46 marostegui: Drop flaggedpage_pending from s5 T365568
15:43 marostegui: Drop flaggedpage_pending from s2 T365568
15:42 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:41 godog: bounce benthos@mw_accesslog_metrics.service on centrallog hosts
15:41 marostegui: Drop flaggedpage_pending from s7 T365568
15:40 marostegui: Drop flaggedpage_pending from s6 T365568
15:34 ladsgroup@deploy1002: Synchronized portals: (no justification provided) (duration: 11m 20s)
15:31 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
15:31 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
15:29 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
15:22 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: (no justification provided) (duration: 10m 28s)
15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
15:05 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4046.ulsfo.wmnet
15:04 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1041091errorpages: Add dark mode support (duration: 17m 15s)
15:03 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
15:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:01 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4046.ulsfo.wmnet
15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
15:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
14:59 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
14:58 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
14:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
14:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
14:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
14:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
14:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
14:55 ladsgroup@deploy1002: ladsgroup and ebrahim: Continuing with sync
14:54 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:54 ladsgroup@deploy1002: ladsgroup and ebrahim: Backport for gerrit:1041091errorpages: Add dark mode support synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
14:53 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:53 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
14:52 moritzm: powercycling ganeti1019, stuck on reboot
14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
14:52 ChrisDobbins901_: sudo -i cookbook sre.hosts.reboot-single -r 'Kernel upgrade' 'P{cp4046.*}'
14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
14:51 cdobbins@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4046.ulsfo.wmnet
14:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:51 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
14:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:50 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
14:50 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
14:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
14:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
14:48 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
14:47 urandom: aqs1010: restarting cassandra to apply upgrade to Java 11 — T350567
14:47 ladsgroup@deploy1002: Started scap: Backport for gerrit:1041091errorpages: Add dark mode support
14:46 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4046.ulsfo.wmnet
14:46 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:45 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
14:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64539 and previous config saved to /var/cache/conftool/dbconfig/20240610-144501-marostegui.json
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64538 and previous config saved to /var/cache/conftool/dbconfig/20240610-144439-marostegui.json
14:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
14:43 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
14:43 swfrench@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:43 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
14:42 swfrench@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:41 swfrench@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
14:41 swfrench@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
14:39 swfrench@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:38 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
14:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
14:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
14:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
14:31 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2023.codfw.wmnet
14:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
14:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64537 and previous config saved to /var/cache/conftool/dbconfig/20240610-142931-marostegui.json
14:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config-next: apply
14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config-next: apply
14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64536 and previous config saved to /var/cache/conftool/dbconfig/20240610-141422-marostegui.json
14:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:10 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64535 and previous config saved to /var/cache/conftool/dbconfig/20240610-135914-marostegui.json
13:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
13:57 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1107 for T348977 - bking@cumin2002
13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107 for T348977 - bking@cumin2002
13:50 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
13:49 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
13:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
13:47 taavi@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
13:46 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
13:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
13:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
13:36 elukey: move recommendation-api on wikikube to prometheus metrics (offboarded from statsd) - T205870
13:36 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
13:35 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
13:30 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364069
13:29 taavi: taavi@mw1447 ~ $ sudo /usr/local/sbin/restart-php-fpm-all php7.4-fpm 9223372 # leftover from me restarting LVS during deployment
13:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
13:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
13:27 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
13:26 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64534 and previous config saved to /var/cache/conftool/dbconfig/20240610-132619-ladsgroup.json
13:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
13:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
13:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
13:20 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:1041044|[huwiki] Add "suppressredirect" user right to editor user group (T366438)]] (duration: 15m 05s)
13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4047.ulsfo.wmnet
13:18 taavi@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=99) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:18 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
13:11 taavi: restarting eqiad low-traffic LVS for https://gerrit.wikimedia.org/r/c/operations/puppet/+/941459
13:11 ladsgroup@deploy1002: ladsgroup and gergesshamon: Continuing with sync
13:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
13:10 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4047.ulsfo.wmnet
13:09 fabfur: rebooting cp4047 (T366555)
13:09 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:08 ladsgroup@deploy1002: ladsgroup and gergesshamon: Backport for [[gerrit:1041044|[huwiki] Add "suppressredirect" user right to editor user group (T366438)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
13:05 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:1041044|[huwiki] Add "suppressredirect" user right to editor user group (T366438)]]
13:04 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:04 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
13:03 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
13:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
13:01 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:58 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:58 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:55 fabfur: repooling text@drmrs (IPIP encapsulation enabled) (T366466)
12:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
12:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
12:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:48 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
12:46 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
12:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
12:44 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
12:43 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
12:41 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
12:40 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
12:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64532 and previous config saved to /var/cache/conftool/dbconfig/20240610-122847-arnaudb.json
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
12:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
12:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
12:15 oblivian@deploy1002: Finished scap: Deploying change to base mediawiki image (take 2) (duration: 22m 39s)
12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64531 and previous config saved to /var/cache/conftool/dbconfig/20240610-121341-arnaudb.json
12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
11:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64530 and previous config saved to /var/cache/conftool/dbconfig/20240610-115834-arnaudb.json
11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
11:53 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image (take 2)
11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64528 and previous config saved to /var/cache/conftool/dbconfig/20240610-114957-marostegui.json
11:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
11:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64527 and previous config saved to /var/cache/conftool/dbconfig/20240610-114934-marostegui.json
11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
11:44 oblivian@deploy1002: sync-world aborted: Deploying change to base mediawiki image (duration: 10m 21s)
11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64526 and previous config saved to /var/cache/conftool/dbconfig/20240610-114329-arnaudb.json
11:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
11:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
11:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
11:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
11:34 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64525 and previous config saved to /var/cache/conftool/dbconfig/20240610-113426-marostegui.json
11:34 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments (duration: 10m 22s)
11:32 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
11:29 fabfur: restarting pybal on lvs6003,lvs6001 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
11:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64524 and previous config saved to /var/cache/conftool/dbconfig/20240610-112821-arnaudb.json
11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
11:26 fabfur: enabling && running puppet on A:lvs-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
11:23 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments
11:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64523 and previous config saved to /var/cache/conftool/dbconfig/20240610-111917-marostegui.json
11:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64522 and previous config saved to /var/cache/conftool/dbconfig/20240610-111315-arnaudb.json
10:47 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 1%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64519 and previous config saved to /var/cache/conftool/dbconfig/20240610-104303-arnaudb.json
10:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
10:41 fabfur: depooling text@drmrs to apply IPIP encapsulation patches (T366466)
10:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
10:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
10:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
10:25 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2204 T367019', diff saved to https://phabricator.wikimedia.org/P64518 and previous config saved to /var/cache/conftool/dbconfig/20240610-102511-arnaudb.json
10:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
10:21 claime: repooled all active/active mediawiki services from codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=api-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-ro,name=codfw
10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
10:08 claime: depooled all active/active mediawiki services from codfw
10:08 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw
10:07 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw
10:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
10:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
10:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
10:02 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-int-ro,name=codfw
10:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=codfw
10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-web-ro,name=codfw
10:01 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
09:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
09:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 26 hosts with reason: Issue from T367019
09:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 26 hosts with reason: Issue from T367019
09:54 arnaudb@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 5:00:00 on 870 hosts with reason: Issue from T367019
09:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 870 hosts with reason: Issue from T367019
09:53 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
09:53 jayme@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
09:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
09:37 godog: roll upgrade prometheus-statsd-exporter to baremetal - T302373
09:34 taavi@deploy1002: Finished scap: Backport for gerrit:1040222Reapply "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 17s)
09:25 taavi@deploy1002: taavi: Continuing with sync
09:25 taavi@deploy1002: taavi: Backport for gerrit:1040222Reapply "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:24 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
09:24 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:22 taavi@deploy1002: Started scap: Backport for gerrit:1040222Reapply "wikitech: Replace OSM class in Gerrit blocking hook"
09:22 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
09:22 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64517 and previous config saved to /var/cache/conftool/dbconfig/20240610-091631-marostegui.json
09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64516 and previous config saved to /var/cache/conftool/dbconfig/20240610-091606-marostegui.json
09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2207 to s2 primary T367019', diff saved to https://phabricator.wikimedia.org/P64515 and previous config saved to /var/cache/conftool/dbconfig/20240610-091506-arnaudb.json
09:14 arnaudb: Starting s2 codfw failover from db2204 to db2207 - T367019
09:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
09:01 godog: upload prometheus-statsd-exporter 0.26.1-1 to apt - T302373
09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64514 and previous config saved to /var/cache/conftool/dbconfig/20240610-090058-marostegui.json
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64513 and previous config saved to /var/cache/conftool/dbconfig/20240610-085721-arnaudb.json
08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
08:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64512 and previous config saved to /var/cache/conftool/dbconfig/20240610-085548-arnaudb.json
08:54 godog: upgrade prometheus-statsd-exporter on webperf - T302373
08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
08:50 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
08:48 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4048.ulsfo.wmnet
08:47 cmooney@cumin1002: START - Cookbook sre.dns.netbox
08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64511 and previous config saved to /var/cache/conftool/dbconfig/20240610-084550-marostegui.json
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64510 and previous config saved to /var/cache/conftool/dbconfig/20240610-084042-arnaudb.json
08:39 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4048.ulsfo.wmnet
08:39 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping1004.eqiad.wmnet with OS bookworm
08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping1004.eqiad.wmnet with reason: host reimage
08:14 kostajh: UTC morning deploys done
08:13 kharlan@deploy1002: Finished scap: Backport for gerrit:1038723IPInfo: Switch to using GeoLite2 data (T361884) (duration: 14m 07s)
08:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64507 and previous config saved to /var/cache/conftool/dbconfig/20240610-081030-arnaudb.json
08:04 kharlan@deploy1002: kharlan: Continuing with sync
08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
08:02 kharlan@deploy1002: kharlan: Backport for gerrit:1038723IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:59 kharlan@deploy1002: Started scap: Backport for gerrit:1038723IPInfo: Switch to using GeoLite2 data (T361884)
07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
07:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
07:57 kharlan@deploy1002: kharlan: Backport for gerrit:1038723IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64506 and previous config saved to /var/cache/conftool/dbconfig/20240610-075524-arnaudb.json
07:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
07:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
07:53 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
07:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64505 and previous config saved to /var/cache/conftool/dbconfig/20240610-075056-root.json
07:50 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
07:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2207.codfw.wmnet
07:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
07:43 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2207.codfw.wmnet
07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2207 maintenance', diff saved to https://phabricator.wikimedia.org/P64504 and previous config saved to /var/cache/conftool/dbconfig/20240610-074157-arnaudb.json
07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
07:41 kharlan@deploy1002: Started scap: Backport for gerrit:1038723IPInfo: Switch to using GeoLite2 data (T361884)
07:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Revert db2207 with weight 500 T367019', diff saved to https://phabricator.wikimedia.org/P64503 and previous config saved to /var/cache/conftool/dbconfig/20240610-073838-arnaudb.json
07:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
07:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1010.eqiad.wmnet
07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
07:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
07:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64502 and previous config saved to /var/cache/conftool/dbconfig/20240610-073549-root.json
07:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
07:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
07:33 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
07:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
07:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
07:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
07:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
07:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
07:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
07:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64501 and previous config saved to /var/cache/conftool/dbconfig/20240610-072043-root.json
07:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
07:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
07:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64500 and previous config saved to /var/cache/conftool/dbconfig/20240610-070537-root.json
07:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64499 and previous config saved to /var/cache/conftool/dbconfig/20240610-070249-marostegui.json
07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64498 and previous config saved to /var/cache/conftool/dbconfig/20240610-070224-marostegui.json
07:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
06:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
06:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64497 and previous config saved to /var/cache/conftool/dbconfig/20240610-065640-ladsgroup.json
06:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
06:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64496 and previous config saved to /var/cache/conftool/dbconfig/20240610-065031-root.json
06:47 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364299
06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64495 and previous config saved to /var/cache/conftool/dbconfig/20240610-064716-marostegui.json
06:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
06:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64494 and previous config saved to /var/cache/conftool/dbconfig/20240610-064132-ladsgroup.json
06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64493 and previous config saved to /var/cache/conftool/dbconfig/20240610-063912-arnaudb.json
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2140 T367017', diff saved to https://phabricator.wikimedia.org/P64492 and previous config saved to /var/cache/conftool/dbconfig/20240610-063904-root.json
06:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2179 to s4 primary T367017', diff saved to https://phabricator.wikimedia.org/P64491 and previous config saved to /var/cache/conftool/dbconfig/20240610-063830-root.json
06:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
06:38 marostegui: Starting s4 codfw failover from db2140 to db2179 - T367017
06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64490 and previous config saved to /var/cache/conftool/dbconfig/20240610-063524-root.json
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64489 and previous config saved to /var/cache/conftool/dbconfig/20240610-063208-marostegui.json
06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64488 and previous config saved to /var/cache/conftool/dbconfig/20240610-062624-ladsgroup.json
06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64487 and previous config saved to /var/cache/conftool/dbconfig/20240610-062017-root.json
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2179 from API/vslow/dump T367017', diff saved to https://phabricator.wikimedia.org/P64486 and previous config saved to /var/cache/conftool/dbconfig/20240610-061939-root.json
06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2179 with weight 0 T367017', diff saved to https://phabricator.wikimedia.org/P64485 and previous config saved to /var/cache/conftool/dbconfig/20240610-061849-root.json
06:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64484 and previous config saved to /var/cache/conftool/dbconfig/20240610-061658-marostegui.json
06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64483 and previous config saved to /var/cache/conftool/dbconfig/20240610-061116-ladsgroup.json
05:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64482 and previous config saved to /var/cache/conftool/dbconfig/20240610-052941-ladsgroup.json
05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64481 and previous config saved to /var/cache/conftool/dbconfig/20240610-051432-ladsgroup.json
05:13 marostegui: dbmaint codfw s7 deploy schema change on db2218 T364299
05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T366875', diff saved to https://phabricator.wikimedia.org/P64480 and previous config saved to /var/cache/conftool/dbconfig/20240610-050738-root.json
05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2121 to s7 primary T366875', diff saved to https://phabricator.wikimedia.org/P64479 and previous config saved to /var/cache/conftool/dbconfig/20240610-050637-marostegui.json
05:06 marostegui: Starting s7 codfw failover from db2218 to db2121 - T366875
04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64478 and previous config saved to /var/cache/conftool/dbconfig/20240610-045922-ladsgroup.json
04:52 kart_: Updated Apertium to 2024-06-07-143238-production (T356252)
04:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
04:49 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
04:44 marostegui: Rename flaggedpage_pending in s5 T365568
04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64477 and previous config saved to /var/cache/conftool/dbconfig/20240610-044414-ladsgroup.json
04:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
04:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
04:37 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2121 from API/vslow/dump T366875', diff saved to https://phabricator.wikimedia.org/P64476 and previous config saved to /var/cache/conftool/dbconfig/20240610-043741-root.json
04:37 kartik@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
04:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2121 with weight 0 T366875', diff saved to https://phabricator.wikimedia.org/P64475 and previous config saved to /var/cache/conftool/dbconfig/20240610-043649-root.json
04:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64474 and previous config saved to /var/cache/conftool/dbconfig/20240610-043615-marostegui.json
04:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-09

23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64473 and previous config saved to /var/cache/conftool/dbconfig/20240609-234110-ladsgroup.json
23:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64472 and previous config saved to /var/cache/conftool/dbconfig/20240609-234047-ladsgroup.json
23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64471 and previous config saved to /var/cache/conftool/dbconfig/20240609-232921-ladsgroup.json
23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64470 and previous config saved to /var/cache/conftool/dbconfig/20240609-232539-ladsgroup.json
23:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64469 and previous config saved to /var/cache/conftool/dbconfig/20240609-231413-ladsgroup.json
23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64468 and previous config saved to /var/cache/conftool/dbconfig/20240609-231031-ladsgroup.json
22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64467 and previous config saved to /var/cache/conftool/dbconfig/20240609-225905-ladsgroup.json
22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64466 and previous config saved to /var/cache/conftool/dbconfig/20240609-225523-ladsgroup.json
22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64465 and previous config saved to /var/cache/conftool/dbconfig/20240609-224357-ladsgroup.json
19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64464 and previous config saved to /var/cache/conftool/dbconfig/20240609-192428-ladsgroup.json
19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64463 and previous config saved to /var/cache/conftool/dbconfig/20240609-192404-ladsgroup.json
19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64462 and previous config saved to /var/cache/conftool/dbconfig/20240609-190856-ladsgroup.json
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64461 and previous config saved to /var/cache/conftool/dbconfig/20240609-185347-ladsgroup.json
18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64460 and previous config saved to /var/cache/conftool/dbconfig/20240609-183839-ladsgroup.json
16:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64459 and previous config saved to /var/cache/conftool/dbconfig/20240609-160621-marostegui.json
15:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64458 and previous config saved to /var/cache/conftool/dbconfig/20240609-155113-marostegui.json
15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64457 and previous config saved to /var/cache/conftool/dbconfig/20240609-153605-marostegui.json
15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64456 and previous config saved to /var/cache/conftool/dbconfig/20240609-152057-marostegui.json
15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64455 and previous config saved to /var/cache/conftool/dbconfig/20240609-152020-ladsgroup.json
15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64454 and previous config saved to /var/cache/conftool/dbconfig/20240609-151956-ladsgroup.json
15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64453 and previous config saved to /var/cache/conftool/dbconfig/20240609-150448-ladsgroup.json
14:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64452 and previous config saved to /var/cache/conftool/dbconfig/20240609-144940-ladsgroup.json
14:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64451 and previous config saved to /var/cache/conftool/dbconfig/20240609-143432-ladsgroup.json
14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64450 and previous config saved to /var/cache/conftool/dbconfig/20240609-143128-ladsgroup.json
14:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
14:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64449 and previous config saved to /var/cache/conftool/dbconfig/20240609-143105-ladsgroup.json
14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64448 and previous config saved to /var/cache/conftool/dbconfig/20240609-143032-marostegui.json
14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64447 and previous config saved to /var/cache/conftool/dbconfig/20240609-141557-ladsgroup.json
14:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64446 and previous config saved to /var/cache/conftool/dbconfig/20240609-141524-marostegui.json
14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64445 and previous config saved to /var/cache/conftool/dbconfig/20240609-140049-ladsgroup.json
14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64444 and previous config saved to /var/cache/conftool/dbconfig/20240609-140016-marostegui.json
13:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64443 and previous config saved to /var/cache/conftool/dbconfig/20240609-134541-ladsgroup.json
13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64442 and previous config saved to /var/cache/conftool/dbconfig/20240609-134508-marostegui.json
12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64441 and previous config saved to /var/cache/conftool/dbconfig/20240609-120817-marostegui.json
12:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64440 and previous config saved to /var/cache/conftool/dbconfig/20240609-120753-marostegui.json
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64439 and previous config saved to /var/cache/conftool/dbconfig/20240609-120400-marostegui.json
12:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
12:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64438 and previous config saved to /var/cache/conftool/dbconfig/20240609-115245-marostegui.json
11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64437 and previous config saved to /var/cache/conftool/dbconfig/20240609-113737-marostegui.json
11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64436 and previous config saved to /var/cache/conftool/dbconfig/20240609-112229-marostegui.json
11:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64435 and previous config saved to /var/cache/conftool/dbconfig/20240609-111945-ladsgroup.json
11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64434 and previous config saved to /var/cache/conftool/dbconfig/20240609-110437-ladsgroup.json
10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64433 and previous config saved to /var/cache/conftool/dbconfig/20240609-104929-ladsgroup.json
10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64432 and previous config saved to /var/cache/conftool/dbconfig/20240609-103421-ladsgroup.json
09:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64431 and previous config saved to /var/cache/conftool/dbconfig/20240609-095854-marostegui.json
09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64430 and previous config saved to /var/cache/conftool/dbconfig/20240609-094346-marostegui.json
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64429 and previous config saved to /var/cache/conftool/dbconfig/20240609-092837-marostegui.json
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64428 and previous config saved to /var/cache/conftool/dbconfig/20240609-091329-marostegui.json
08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64427 and previous config saved to /var/cache/conftool/dbconfig/20240609-080149-marostegui.json
08:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
08:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64426 and previous config saved to /var/cache/conftool/dbconfig/20240609-080125-marostegui.json
07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64425 and previous config saved to /var/cache/conftool/dbconfig/20240609-075533-marostegui.json
07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
07:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64424 and previous config saved to /var/cache/conftool/dbconfig/20240609-074617-marostegui.json
07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64423 and previous config saved to /var/cache/conftool/dbconfig/20240609-073109-marostegui.json
07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64422 and previous config saved to /var/cache/conftool/dbconfig/20240609-071601-marostegui.json
06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64421 and previous config saved to /var/cache/conftool/dbconfig/20240609-064733-ladsgroup.json
06:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
06:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64420 and previous config saved to /var/cache/conftool/dbconfig/20240609-064709-ladsgroup.json
06:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64419 and previous config saved to /var/cache/conftool/dbconfig/20240609-063607-ladsgroup.json
06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64418 and previous config saved to /var/cache/conftool/dbconfig/20240609-063543-ladsgroup.json
06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64417 and previous config saved to /var/cache/conftool/dbconfig/20240609-063201-ladsgroup.json
06:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64416 and previous config saved to /var/cache/conftool/dbconfig/20240609-062033-ladsgroup.json
06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64415 and previous config saved to /var/cache/conftool/dbconfig/20240609-061653-ladsgroup.json
06:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64414 and previous config saved to /var/cache/conftool/dbconfig/20240609-060525-ladsgroup.json
06:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64413 and previous config saved to /var/cache/conftool/dbconfig/20240609-060146-ladsgroup.json
05:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64412 and previous config saved to /var/cache/conftool/dbconfig/20240609-055017-ladsgroup.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64411 and previous config saved to /var/cache/conftool/dbconfig/20240609-054833-marostegui.json
05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64410 and previous config saved to /var/cache/conftool/dbconfig/20240609-054809-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64409 and previous config saved to /var/cache/conftool/dbconfig/20240609-053301-marostegui.json
05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64408 and previous config saved to /var/cache/conftool/dbconfig/20240609-052358-ladsgroup.json
05:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64407 and previous config saved to /var/cache/conftool/dbconfig/20240609-052334-ladsgroup.json
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64406 and previous config saved to /var/cache/conftool/dbconfig/20240609-051753-marostegui.json
05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64405 and previous config saved to /var/cache/conftool/dbconfig/20240609-050826-ladsgroup.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64404 and previous config saved to /var/cache/conftool/dbconfig/20240609-050245-marostegui.json
04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64403 and previous config saved to /var/cache/conftool/dbconfig/20240609-045319-ladsgroup.json
04:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64402 and previous config saved to /var/cache/conftool/dbconfig/20240609-043811-ladsgroup.json
02:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64401 and previous config saved to /var/cache/conftool/dbconfig/20240609-025921-marostegui.json
02:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
02:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
02:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64400 and previous config saved to /var/cache/conftool/dbconfig/20240609-025856-marostegui.json
02:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64399 and previous config saved to /var/cache/conftool/dbconfig/20240609-024349-marostegui.json
02:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64398 and previous config saved to /var/cache/conftool/dbconfig/20240609-022840-marostegui.json
02:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64397 and previous config saved to /var/cache/conftool/dbconfig/20240609-021333-marostegui.json
02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64396 and previous config saved to /var/cache/conftool/dbconfig/20240609-020120-ladsgroup.json
02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
02:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
01:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
01:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
01:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64395 and previous config saved to /var/cache/conftool/dbconfig/20240609-012432-marostegui.json
01:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64394 and previous config saved to /var/cache/conftool/dbconfig/20240609-010922-marostegui.json
00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64393 and previous config saved to /var/cache/conftool/dbconfig/20240609-005414-marostegui.json
00:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64392 and previous config saved to /var/cache/conftool/dbconfig/20240609-003906-marostegui.json
00:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64391 and previous config saved to /var/cache/conftool/dbconfig/20240609-000718-marostegui.json
00:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64390 and previous config saved to /var/cache/conftool/dbconfig/20240609-000640-marostegui.json

2024-06-08

23:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64389 and previous config saved to /var/cache/conftool/dbconfig/20240608-235132-marostegui.json
23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64388 and previous config saved to /var/cache/conftool/dbconfig/20240608-233623-marostegui.json
23:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64387 and previous config saved to /var/cache/conftool/dbconfig/20240608-232115-marostegui.json
22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64386 and previous config saved to /var/cache/conftool/dbconfig/20240608-222832-ladsgroup.json
22:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64385 and previous config saved to /var/cache/conftool/dbconfig/20240608-222808-ladsgroup.json
22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64384 and previous config saved to /var/cache/conftool/dbconfig/20240608-221259-ladsgroup.json
21:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64383 and previous config saved to /var/cache/conftool/dbconfig/20240608-215751-ladsgroup.json
21:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64382 and previous config saved to /var/cache/conftool/dbconfig/20240608-214243-ladsgroup.json
21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64381 and previous config saved to /var/cache/conftool/dbconfig/20240608-212701-marostegui.json
21:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
21:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64380 and previous config saved to /var/cache/conftool/dbconfig/20240608-212637-marostegui.json
21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64379 and previous config saved to /var/cache/conftool/dbconfig/20240608-211527-marostegui.json
21:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
21:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64378 and previous config saved to /var/cache/conftool/dbconfig/20240608-211503-marostegui.json
21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64377 and previous config saved to /var/cache/conftool/dbconfig/20240608-211128-marostegui.json
20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64376 and previous config saved to /var/cache/conftool/dbconfig/20240608-205955-marostegui.json
20:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64375 and previous config saved to /var/cache/conftool/dbconfig/20240608-205618-marostegui.json
20:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64374 and previous config saved to /var/cache/conftool/dbconfig/20240608-204447-marostegui.json
20:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64373 and previous config saved to /var/cache/conftool/dbconfig/20240608-204106-marostegui.json
20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64372 and previous config saved to /var/cache/conftool/dbconfig/20240608-202939-marostegui.json
20:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64371 and previous config saved to /var/cache/conftool/dbconfig/20240608-202016-ladsgroup.json
20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
20:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64370 and previous config saved to /var/cache/conftool/dbconfig/20240608-201948-ladsgroup.json
20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64369 and previous config saved to /var/cache/conftool/dbconfig/20240608-200440-ladsgroup.json
19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64368 and previous config saved to /var/cache/conftool/dbconfig/20240608-194932-ladsgroup.json
19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64367 and previous config saved to /var/cache/conftool/dbconfig/20240608-193424-ladsgroup.json
18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64366 and previous config saved to /var/cache/conftool/dbconfig/20240608-182811-ladsgroup.json
18:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
18:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64365 and previous config saved to /var/cache/conftool/dbconfig/20240608-182747-ladsgroup.json
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64364 and previous config saved to /var/cache/conftool/dbconfig/20240608-181559-marostegui.json
18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64363 and previous config saved to /var/cache/conftool/dbconfig/20240608-181536-marostegui.json
18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64362 and previous config saved to /var/cache/conftool/dbconfig/20240608-181238-ladsgroup.json
18:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64361 and previous config saved to /var/cache/conftool/dbconfig/20240608-180027-marostegui.json
17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64360 and previous config saved to /var/cache/conftool/dbconfig/20240608-175730-ladsgroup.json
17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64359 and previous config saved to /var/cache/conftool/dbconfig/20240608-174519-marostegui.json
17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64358 and previous config saved to /var/cache/conftool/dbconfig/20240608-174222-ladsgroup.json
17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64357 and previous config saved to /var/cache/conftool/dbconfig/20240608-173011-marostegui.json
17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64356 and previous config saved to /var/cache/conftool/dbconfig/20240608-171628-marostegui.json
17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
17:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64355 and previous config saved to /var/cache/conftool/dbconfig/20240608-152142-marostegui.json
15:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
15:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
14:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
14:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64354 and previous config saved to /var/cache/conftool/dbconfig/20240608-144229-marostegui.json
14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64353 and previous config saved to /var/cache/conftool/dbconfig/20240608-142721-marostegui.json
14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64352 and previous config saved to /var/cache/conftool/dbconfig/20240608-141514-ladsgroup.json
14:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64351 and previous config saved to /var/cache/conftool/dbconfig/20240608-141450-ladsgroup.json
14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64350 and previous config saved to /var/cache/conftool/dbconfig/20240608-141212-marostegui.json
13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64349 and previous config saved to /var/cache/conftool/dbconfig/20240608-135942-ladsgroup.json
13:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64348 and previous config saved to /var/cache/conftool/dbconfig/20240608-135704-marostegui.json
13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64347 and previous config saved to /var/cache/conftool/dbconfig/20240608-134434-ladsgroup.json
13:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64346 and previous config saved to /var/cache/conftool/dbconfig/20240608-134110-ladsgroup.json
13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64345 and previous config saved to /var/cache/conftool/dbconfig/20240608-132926-ladsgroup.json
13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64344 and previous config saved to /var/cache/conftool/dbconfig/20240608-132602-ladsgroup.json
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64343 and previous config saved to /var/cache/conftool/dbconfig/20240608-131054-ladsgroup.json
12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64342 and previous config saved to /var/cache/conftool/dbconfig/20240608-125546-ladsgroup.json
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64341 and previous config saved to /var/cache/conftool/dbconfig/20240608-113928-ladsgroup.json
11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64340 and previous config saved to /var/cache/conftool/dbconfig/20240608-113905-ladsgroup.json
11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64339 and previous config saved to /var/cache/conftool/dbconfig/20240608-112357-ladsgroup.json
11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64338 and previous config saved to /var/cache/conftool/dbconfig/20240608-110849-ladsgroup.json
10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64337 and previous config saved to /var/cache/conftool/dbconfig/20240608-105341-ladsgroup.json
10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64336 and previous config saved to /var/cache/conftool/dbconfig/20240608-105032-marostegui.json
10:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
10:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64335 and previous config saved to /var/cache/conftool/dbconfig/20240608-105008-marostegui.json
10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64334 and previous config saved to /var/cache/conftool/dbconfig/20240608-103501-marostegui.json
10:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64333 and previous config saved to /var/cache/conftool/dbconfig/20240608-101953-marostegui.json
10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64332 and previous config saved to /var/cache/conftool/dbconfig/20240608-100443-marostegui.json
06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64331 and previous config saved to /var/cache/conftool/dbconfig/20240608-064353-marostegui.json
06:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
06:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64330 and previous config saved to /var/cache/conftool/dbconfig/20240608-064328-marostegui.json
06:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64329 and previous config saved to /var/cache/conftool/dbconfig/20240608-062820-marostegui.json
06:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64328 and previous config saved to /var/cache/conftool/dbconfig/20240608-061313-marostegui.json
05:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64327 and previous config saved to /var/cache/conftool/dbconfig/20240608-055804-marostegui.json
05:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64326 and previous config saved to /var/cache/conftool/dbconfig/20240608-054609-ladsgroup.json
05:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
05:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64325 and previous config saved to /var/cache/conftool/dbconfig/20240608-054545-ladsgroup.json
05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64324 and previous config saved to /var/cache/conftool/dbconfig/20240608-053037-ladsgroup.json
05:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64323 and previous config saved to /var/cache/conftool/dbconfig/20240608-052817-ladsgroup.json
05:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
05:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
05:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64322 and previous config saved to /var/cache/conftool/dbconfig/20240608-052753-ladsgroup.json
05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64321 and previous config saved to /var/cache/conftool/dbconfig/20240608-051529-ladsgroup.json
05:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64320 and previous config saved to /var/cache/conftool/dbconfig/20240608-051244-ladsgroup.json
05:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64319 and previous config saved to /var/cache/conftool/dbconfig/20240608-050021-ladsgroup.json
04:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64318 and previous config saved to /var/cache/conftool/dbconfig/20240608-045736-ladsgroup.json
04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64317 and previous config saved to /var/cache/conftool/dbconfig/20240608-044228-ladsgroup.json
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64316 and previous config saved to /var/cache/conftool/dbconfig/20240608-024534-ladsgroup.json
02:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
02:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64315 and previous config saved to /var/cache/conftool/dbconfig/20240608-024511-ladsgroup.json
02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64314 and previous config saved to /var/cache/conftool/dbconfig/20240608-024455-marostegui.json
02:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
02:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64313 and previous config saved to /var/cache/conftool/dbconfig/20240608-024431-marostegui.json
02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64312 and previous config saved to /var/cache/conftool/dbconfig/20240608-023735-ladsgroup.json
02:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
02:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64311 and previous config saved to /var/cache/conftool/dbconfig/20240608-023711-ladsgroup.json
02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64310 and previous config saved to /var/cache/conftool/dbconfig/20240608-023003-ladsgroup.json
02:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64309 and previous config saved to /var/cache/conftool/dbconfig/20240608-022923-marostegui.json
02:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64308 and previous config saved to /var/cache/conftool/dbconfig/20240608-022203-ladsgroup.json
02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64307 and previous config saved to /var/cache/conftool/dbconfig/20240608-021455-ladsgroup.json
02:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64306 and previous config saved to /var/cache/conftool/dbconfig/20240608-021415-marostegui.json
02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64305 and previous config saved to /var/cache/conftool/dbconfig/20240608-020655-ladsgroup.json
01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64304 and previous config saved to /var/cache/conftool/dbconfig/20240608-015947-ladsgroup.json
01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64303 and previous config saved to /var/cache/conftool/dbconfig/20240608-015906-marostegui.json
01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64302 and previous config saved to /var/cache/conftool/dbconfig/20240608-015147-ladsgroup.json

2024-06-07

22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64301 and previous config saved to /var/cache/conftool/dbconfig/20240607-224306-marostegui.json
22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64300 and previous config saved to /var/cache/conftool/dbconfig/20240607-224242-marostegui.json
22:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64299 and previous config saved to /var/cache/conftool/dbconfig/20240607-223300-marostegui.json
22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64298 and previous config saved to /var/cache/conftool/dbconfig/20240607-222734-marostegui.json
22:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64297 and previous config saved to /var/cache/conftool/dbconfig/20240607-221752-marostegui.json
22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64296 and previous config saved to /var/cache/conftool/dbconfig/20240607-221224-marostegui.json
22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64295 and previous config saved to /var/cache/conftool/dbconfig/20240607-220244-marostegui.json
21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64294 and previous config saved to /var/cache/conftool/dbconfig/20240607-215716-marostegui.json
21:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64293 and previous config saved to /var/cache/conftool/dbconfig/20240607-214736-marostegui.json
21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64292 and previous config saved to /var/cache/conftool/dbconfig/20240607-211842-ladsgroup.json
21:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
21:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64291 and previous config saved to /var/cache/conftool/dbconfig/20240607-211818-ladsgroup.json
21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64290 and previous config saved to /var/cache/conftool/dbconfig/20240607-210310-ladsgroup.json
20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64289 and previous config saved to /var/cache/conftool/dbconfig/20240607-204801-ladsgroup.json
20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64288 and previous config saved to /var/cache/conftool/dbconfig/20240607-203253-ladsgroup.json
19:42 dduvall@deploy1002: Finished scap: Backport for gerrit:1040081mediawiki.diff: Fix color regression and also use one more token (T366845) (duration: 16m 10s)
19:33 dduvall@deploy1002: dduvall: Continuing with sync
19:28 dduvall@deploy1002: dduvall: Backport for gerrit:1040081mediawiki.diff: Fix color regression and also use one more token (T366845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:26 dduvall@deploy1002: Started scap: Backport for gerrit:1040081mediawiki.diff: Fix color regression and also use one more token (T366845)
19:25 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
19:25 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
19:07 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:06 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64287 and previous config saved to /var/cache/conftool/dbconfig/20240607-184232-marostegui.json
18:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
18:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64286 and previous config saved to /var/cache/conftool/dbconfig/20240607-184208-marostegui.json
18:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64285 and previous config saved to /var/cache/conftool/dbconfig/20240607-182700-marostegui.json
18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64284 and previous config saved to /var/cache/conftool/dbconfig/20240607-181151-marostegui.json
18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64283 and previous config saved to /var/cache/conftool/dbconfig/20240607-181021-ladsgroup.json
18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64282 and previous config saved to /var/cache/conftool/dbconfig/20240607-180958-ladsgroup.json
17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64281 and previous config saved to /var/cache/conftool/dbconfig/20240607-175643-marostegui.json
17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64280 and previous config saved to /var/cache/conftool/dbconfig/20240607-175450-ladsgroup.json
17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64279 and previous config saved to /var/cache/conftool/dbconfig/20240607-173942-ladsgroup.json
17:31 topranks: resetting line card 1/0 on cr2-codfw to enable new 100G link to ssw1-d8-codfw T364095
17:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:24 topranks: re-route traffic from cr2-eqord away from circuit to cr2-codfw to allow for line card reset T364095
17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64278 and previous config saved to /var/cache/conftool/dbconfig/20240607-172432-ladsgroup.json
17:23 topranks: disable IP transit to Lumen AS3356 from cr2-eqiad to allow line card reset T364095
17:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64277 and previous config saved to /var/cache/conftool/dbconfig/20240607-170634-ladsgroup.json
17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64276 and previous config saved to /var/cache/conftool/dbconfig/20240607-170555-ladsgroup.json
16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64275 and previous config saved to /var/cache/conftool/dbconfig/20240607-165616-marostegui.json
16:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64274 and previous config saved to /var/cache/conftool/dbconfig/20240607-165533-marostegui.json
16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64273 and previous config saved to /var/cache/conftool/dbconfig/20240607-165047-ladsgroup.json
16:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64272 and previous config saved to /var/cache/conftool/dbconfig/20240607-164025-marostegui.json
16:38 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4048.ulsfo.wmnet
16:36 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64271 and previous config saved to /var/cache/conftool/dbconfig/20240607-163539-ladsgroup.json
16:32 topranks: enabling new transport circuit from cr1-drmrs to cr2-eqiad T343385
16:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64270 and previous config saved to /var/cache/conftool/dbconfig/20240607-162516-marostegui.json
16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64269 and previous config saved to /var/cache/conftool/dbconfig/20240607-162031-ladsgroup.json
16:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
16:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64268 and previous config saved to /var/cache/conftool/dbconfig/20240607-161007-marostegui.json
16:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
16:06 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
16:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
16:05 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
16:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:59 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
15:59 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
15:52 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
15:45 sukhe@cumin1002: START - Cookbook sre.dns.netbox
15:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:35 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:23 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:14 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
15:10 topranks: disabling netbox service on primary netbox server netbox1001 to restore db from backup
15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
15:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64267 and previous config saved to /var/cache/conftool/dbconfig/20240607-145937-ladsgroup.json
14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64266 and previous config saved to /var/cache/conftool/dbconfig/20240607-145913-ladsgroup.json
14:55 topranks: enabling port et-1/0/2 for 100G mode on cr2-codfw T364095
14:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
14:45 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
14:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64265 and previous config saved to /var/cache/conftool/dbconfig/20240607-144404-ladsgroup.json
14:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
14:39 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:39 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:38 jhathaway@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:38 jhathaway@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:37 jhathaway@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:37 jhathaway@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64264 and previous config saved to /var/cache/conftool/dbconfig/20240607-142856-ladsgroup.json
14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64263 and previous config saved to /var/cache/conftool/dbconfig/20240607-141349-ladsgroup.json
14:02 Emperor: restart swift-proxy on ms-fe1009 ms-fe1011 ms-fe1012 ms-fe1014 T360913
13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64262 and previous config saved to /var/cache/conftool/dbconfig/20240607-132342-marostegui.json
13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64261 and previous config saved to /var/cache/conftool/dbconfig/20240607-132319-marostegui.json
13:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64260 and previous config saved to /var/cache/conftool/dbconfig/20240607-130811-marostegui.json
13:05 moritzm: uploaded wmf-laptop 1.0.0 to component/wmf-laptop for bookworm-wikimedia
13:04 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:04 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:01 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
13:01 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64259 and previous config saved to /var/cache/conftool/dbconfig/20240607-125303-marostegui.json
12:49 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64258 and previous config saved to /var/cache/conftool/dbconfig/20240607-124641-ladsgroup.json
12:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
12:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64257 and previous config saved to /var/cache/conftool/dbconfig/20240607-124616-ladsgroup.json
12:44 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
12:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
12:41 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
12:40 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
12:38 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:38 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64256 and previous config saved to /var/cache/conftool/dbconfig/20240607-123754-marostegui.json
12:33 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:31 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64255 and previous config saved to /var/cache/conftool/dbconfig/20240607-123108-ladsgroup.json
12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64254 and previous config saved to /var/cache/conftool/dbconfig/20240607-122413-marostegui.json
12:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
12:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64253 and previous config saved to /var/cache/conftool/dbconfig/20240607-122349-marostegui.json
12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64252 and previous config saved to /var/cache/conftool/dbconfig/20240607-121559-ladsgroup.json
12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64251 and previous config saved to /var/cache/conftool/dbconfig/20240607-120841-marostegui.json
12:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
12:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
12:07 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
12:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
12:07 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
12:07 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
12:01 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64250 and previous config saved to /var/cache/conftool/dbconfig/20240607-120051-ladsgroup.json
11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64249 and previous config saved to /var/cache/conftool/dbconfig/20240607-115333-marostegui.json
11:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
11:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1013.eqiad.wmnet
11:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64248 and previous config saved to /var/cache/conftool/dbconfig/20240607-113824-marostegui.json
11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1013.eqiad.wmnet
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
11:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
11:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
11:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
11:05 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
11:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
11:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
11:00 jelto@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64246 and previous config saved to /var/cache/conftool/dbconfig/20240607-110025-ladsgroup.json
11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64245 and previous config saved to /var/cache/conftool/dbconfig/20240607-110000-ladsgroup.json
10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
10:50 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64244 and previous config saved to /var/cache/conftool/dbconfig/20240607-104452-ladsgroup.json
10:33 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64243 and previous config saved to /var/cache/conftool/dbconfig/20240607-102944-ladsgroup.json
10:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
10:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64242 and previous config saved to /var/cache/conftool/dbconfig/20240607-101436-ladsgroup.json
10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1002.eqiad.wmnet
09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
09:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
09:54 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
09:54 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
09:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
09:54 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:53 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:52 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:52 moritzm: powercycle pki1002
09:43 jynus: upgrading and restarting db1239 T360751
09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki1002.eqiad.wmnet
09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
09:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
09:30 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:25 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:24 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
09:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64241 and previous config saved to /var/cache/conftool/dbconfig/20240607-091849-ladsgroup.json
09:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
09:11 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
09:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
09:03 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:03 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
08:51 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2099.codfw.wmnet
08:51 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:51 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:50 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:49 taavi: import opentofu 1.7.2 to apt.wikimedia.org T365696
08:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
08:48 jynus: reboot dbprov1001,1002,2001,2002
08:46 jynus@cumin1002: START - Cookbook sre.dns.netbox
08:41 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2099.codfw.wmnet
08:40 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2098.codfw.wmnet
08:40 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:39 jynus@cumin1002: START - Cookbook sre.dns.netbox
08:39 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2097.codfw.wmnet
08:39 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:39 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:37 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:35 jynus@cumin1002: START - Cookbook sre.dns.netbox
08:19 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4049.ulsfo.wmnet
08:19 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
08:15 jynus: deleted from zarcillo db2097, db2098, db2099 T362802 T366877 T362883
08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
08:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64239 and previous config saved to /var/cache/conftool/dbconfig/20240607-075742-marostegui.json
07:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
07:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
07:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
07:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
07:44 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bookworm
07:19 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
07:19 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
07:12 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
07:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bookworm
06:51 moritzm: reimaging bast1003 to bookworm
06:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
06:34 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
06:31 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
05:15 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
04:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64238 and previous config saved to /var/cache/conftool/dbconfig/20240607-043343-ladsgroup.json
04:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
04:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64237 and previous config saved to /var/cache/conftool/dbconfig/20240607-043320-ladsgroup.json
04:23 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64236 and previous config saved to /var/cache/conftool/dbconfig/20240607-041812-ladsgroup.json
04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64235 and previous config saved to /var/cache/conftool/dbconfig/20240607-040302-ladsgroup.json
04:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:01 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64234 and previous config saved to /var/cache/conftool/dbconfig/20240607-034755-ladsgroup.json
03:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64233 and previous config saved to /var/cache/conftool/dbconfig/20240607-033141-ladsgroup.json
03:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64232 and previous config saved to /var/cache/conftool/dbconfig/20240607-033118-ladsgroup.json
03:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64231 and previous config saved to /var/cache/conftool/dbconfig/20240607-032809-marostegui.json
03:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
03:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64230 and previous config saved to /var/cache/conftool/dbconfig/20240607-032746-marostegui.json
03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64229 and previous config saved to /var/cache/conftool/dbconfig/20240607-031610-ladsgroup.json
03:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64228 and previous config saved to /var/cache/conftool/dbconfig/20240607-031238-marostegui.json
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64227 and previous config saved to /var/cache/conftool/dbconfig/20240607-030102-ladsgroup.json
02:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64226 and previous config saved to /var/cache/conftool/dbconfig/20240607-025729-marostegui.json
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64225 and previous config saved to /var/cache/conftool/dbconfig/20240607-024554-ladsgroup.json
02:44 ejegg: fundraising civicrm upgraded from 757f8528 to ebfbad86
02:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64224 and previous config saved to /var/cache/conftool/dbconfig/20240607-024221-marostegui.json
02:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64223 and previous config saved to /var/cache/conftool/dbconfig/20240607-021501-ladsgroup.json
02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64222 and previous config saved to /var/cache/conftool/dbconfig/20240607-021418-ladsgroup.json
01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64221 and previous config saved to /var/cache/conftool/dbconfig/20240607-015910-ladsgroup.json
01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64220 and previous config saved to /var/cache/conftool/dbconfig/20240607-014403-ladsgroup.json
afk: fundraising civicrm upgraded from 286bd2b8 to 757f8528
01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64219 and previous config saved to /var/cache/conftool/dbconfig/20240607-012855-ladsgroup.json
01:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
01:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64218 and previous config saved to /var/cache/conftool/dbconfig/20240607-011438-ladsgroup.json
00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64217 and previous config saved to /var/cache/conftool/dbconfig/20240607-005930-ladsgroup.json
00:55 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
00:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64216 and previous config saved to /var/cache/conftool/dbconfig/20240607-004423-ladsgroup.json
00:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64215 and previous config saved to /var/cache/conftool/dbconfig/20240607-002915-ladsgroup.json
00:23 bd808@deploy1002: Finished scap: Backport for gerrit:1039866Revert "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 24s)
00:15 bd808@deploy1002: bd808 and trainbranchbot: Continuing with sync
00:14 bd808@deploy1002: bd808 and trainbranchbot: Backport for gerrit:1039866Revert "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:12 bd808@deploy1002: Started scap: Backport for gerrit:1039866Revert "wikitech: Replace OSM class in Gerrit blocking hook"

2024-06-06

23:32 bd808@deploy1002: Finished scap: Backport for gerrit:1038749wikitech: Replace OSM class in Gerrit blocking hook (T161553) (duration: 11m 24s)
23:23 bd808@deploy1002: taavi and bd808: Continuing with sync
23:23 bd808@deploy1002: taavi and bd808: Backport for gerrit:1038749wikitech: Replace OSM class in Gerrit blocking hook (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:20 bd808@deploy1002: Started scap: Backport for gerrit:1038749wikitech: Replace OSM class in Gerrit blocking hook (T161553)
23:16 bd808@deploy1002: Finished scap: Backport for gerrit:1039307wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) (duration: 12m 01s)
23:07 bd808@deploy1002: bd808: Continuing with sync
23:06 bd808@deploy1002: bd808: Backport for gerrit:1039307wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:04 bd808@deploy1002: Started scap: Backport for gerrit:1039307wikitech: Update Phabricator Conduit calls to disable/enable users (T366587)
21:46 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
21:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
21:10 jdrewniak@deploy1002: Finished scap: Backport for gerrit:1038876Disable font size options on specified pages for all wikis (T366625) (duration: 12m 50s)
21:01 jdrewniak@deploy1002: jdrewniak and toyofuku: Continuing with sync
21:00 jdrewniak@deploy1002: jdrewniak and toyofuku: Backport for gerrit:1038876Disable font size options on specified pages for all wikis (T366625) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:57 jdrewniak@deploy1002: Started scap: Backport for gerrit:1038876Disable font size options on specified pages for all wikis (T366625)
20:54 urbanecm@deploy1002: Finished scap: Backport for gerrit:1038701testwiki: Enable CommunityConfiguration (T360954) (duration: 12m 09s)
20:50 urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=testwiki # T360954
20:46 urbanecm@deploy1002: urbanecm: Continuing with sync
20:44 urbanecm@deploy1002: urbanecm: Backport for gerrit:1038701testwiki: Enable CommunityConfiguration (T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:42 urbanecm@deploy1002: Started scap: Backport for gerrit:1038701testwiki: Enable CommunityConfiguration (T360954)
20:41 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:1039729|[mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549)]], gerrit:1038843Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), gerrit:1038714Drop logging level for unsupported providers to DEBUG (T366519 T360954) (duration: 19m 42s)
20:33 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Continuing with sync
20:32 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
20:31 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
20:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:29 ejegg: fundraising civicrm upgraded from 71ed6bed to 286bd2b8
20:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:24 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Backport for [[gerrit:1039729|[mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549)]], gerrit:1038843Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), gerrit:1038714Drop logging level for unsupported providers to DEBUG (T366519 T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki
20:22 urbanecm@deploy1002: Started scap: Backport for [[gerrit:1039729|[mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549)]], gerrit:1038843Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), gerrit:1038714Drop logging level for unsupported providers to DEBUG (T366519 T360954)
20:21 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
20:20 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
20:20 urbanecm@deploy1002: Finished scap: Backport for gerrit:1031174Assign applychangetags right to group "all" on plwiktionary (T363638), gerrit:1038886InitialiseSettings: Enable AutoModerator on trwiki (T362622), gerrit:1038388InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) (duration: 14m 10s)
20:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:18 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
20:13 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:11 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Continuing with sync
20:08 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Backport for gerrit:1031174Assign applychangetags right to group "all" on plwiktionary (T363638), gerrit:1038886InitialiseSettings: Enable AutoModerator on trwiki (T362622), gerrit:1038388InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki
20:06 urbanecm@deploy1002: Started scap: Backport for gerrit:1031174Assign applychangetags right to group "all" on plwiktionary (T363638), gerrit:1038886InitialiseSettings: Enable AutoModerator on trwiki (T362622), gerrit:1038388InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969)
20:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
19:31 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707. (duration: 00m 26s)
19:30 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707.
18:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.8 refs T361402
18:17 thcipriani@deploy1002: Finished deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided) (duration: 00m 43s)
18:17 thcipriani@deploy1002: Started deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided)
17:57 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:57 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
17:56 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
17:48 topranks: re-enabling pybal on lvs1017 after cable move T366361
17:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64211 and previous config saved to /var/cache/conftool/dbconfig/20240606-173121-marostegui.json
17:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
17:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
17:26 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
17:26 topranks: disabling pybal on lvs1017 to move traffic to lvs1020 in advance of cable move T366361
17:26 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
17:23 topranks: re-enabling pybal on lvs1018 after cable move T366361
17:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:15 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64210 and previous config saved to /var/cache/conftool/dbconfig/20240606-171359-ladsgroup.json
17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
17:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64209 and previous config saved to /var/cache/conftool/dbconfig/20240606-171336-ladsgroup.json
17:11 topranks: disabling pybal on lvs1018 to move traffic to lvs1020 in advance of cable move T366361
17:11 topranks: re-enabling pybal on lvs1019 after cable move T366361
16:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64208 and previous config saved to /var/cache/conftool/dbconfig/20240606-165828-ladsgroup.json
16:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
16:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
16:50 topranks: disabling pybal on lvs1019 to move traffic to lvs1020 in advance of cable move T366361
16:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64207 and previous config saved to /var/cache/conftool/dbconfig/20240606-164320-ladsgroup.json
16:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64206 and previous config saved to /var/cache/conftool/dbconfig/20240606-162812-ladsgroup.json
16:28 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: (no justification provided) (duration: 00m 05s)
16:28 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: (no justification provided)
16:25 dancy@deploy1002: Installation of scap version "4.86.1" completed for 285 hosts
16:25 dancy@deploy1002: Installing scap version "4.86.1" for 285 hosts
16:24 dancy@deploy1002: Installing scap version "4.86.1" for 286 hosts
16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64205 and previous config saved to /var/cache/conftool/dbconfig/20240606-161338-ladsgroup.json
16:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64204 and previous config saved to /var/cache/conftool/dbconfig/20240606-161312-ladsgroup.json
16:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
16:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64203 and previous config saved to /var/cache/conftool/dbconfig/20240606-160028-ladsgroup.json
16:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
16:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64202 and previous config saved to /var/cache/conftool/dbconfig/20240606-160004-ladsgroup.json
15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64201 and previous config saved to /var/cache/conftool/dbconfig/20240606-155804-ladsgroup.json
15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64199 and previous config saved to /var/cache/conftool/dbconfig/20240606-154457-ladsgroup.json
15:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
15:42 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64198 and previous config saved to /var/cache/conftool/dbconfig/20240606-154255-ladsgroup.json
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64197 and previous config saved to /var/cache/conftool/dbconfig/20240606-154028-ladsgroup.json
15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
15:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64196 and previous config saved to /var/cache/conftool/dbconfig/20240606-154004-ladsgroup.json
15:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
15:38 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
15:37 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64195 and previous config saved to /var/cache/conftool/dbconfig/20240606-153730-arnaudb.json
15:29 topranks: rebooting ssw1-f1-eqiad to install new JunOS release T366361
15:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64194 and previous config saved to /var/cache/conftool/dbconfig/20240606-152949-ladsgroup.json
15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64193 and previous config saved to /var/cache/conftool/dbconfig/20240606-152747-ladsgroup.json
15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64192 and previous config saved to /var/cache/conftool/dbconfig/20240606-152456-ladsgroup.json
15:23 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
15:23 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:23 jforrester@deploy1002: Finished scap: Backport for gerrit:1039746Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) (duration: 13m 58s)
15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64191 and previous config saved to /var/cache/conftool/dbconfig/20240606-152222-arnaudb.json
15:19 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:18 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64190 and previous config saved to /var/cache/conftool/dbconfig/20240606-151440-ladsgroup.json
15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:12 jforrester@deploy1002: jforrester: Continuing with sync
15:11 jforrester@deploy1002: jforrester: Backport for gerrit:1039746Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64189 and previous config saved to /var/cache/conftool/dbconfig/20240606-150948-ladsgroup.json
15:09 jforrester@deploy1002: Started scap: Backport for gerrit:1039746Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809)
15:08 jforrester@deploy1002: Finished scap: Backport for gerrit:1038828Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) (duration: 12m 05s)
15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64188 and previous config saved to /var/cache/conftool/dbconfig/20240606-150714-arnaudb.json
15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
14:59 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
14:59 jforrester@deploy1002: jforrester: Continuing with sync
14:59 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
14:58 jforrester@deploy1002: jforrester: Backport for gerrit:1038828Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:58 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:56 topranks: disable ssw1-f1-eqiad leaf-facing ports in advance of upgrade T366361
14:56 jforrester@deploy1002: Started scap: Backport for gerrit:1038828Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782)
14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64187 and previous config saved to /var/cache/conftool/dbconfig/20240606-145440-ladsgroup.json
14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64186 and previous config saved to /var/cache/conftool/dbconfig/20240606-145205-arnaudb.json
14:51 elukey: kill sessionstore pod running on mw1390.eqiad.wmnet (no dedicated='kask' taint)
14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64185 and previous config saved to /var/cache/conftool/dbconfig/20240606-144943-arnaudb.json
14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
14:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
14:43 sukhe: sudo cumin -b1 -s60 'A:cp and A:eqsin' 'run-puppet-agent --enable "merging CR 1038881"'
14:25 TheresNoTime: close UTC afternoon backport window
14:18 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 10s)
14:18 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
14:17 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 09s)
14:17 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
14:17 samtar@deploy1002: Finished scap: Backport for gerrit:1037006commonswiki: Enable numeric wgCategoryCollation (T362494), gerrit:1037505Add project namespace alias for Azerbaijani Wikisource (T365966) (duration: 12m 58s)
14:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
14:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
14:14 topranks: disabling BGP on cr2-eqiad towards ssw1-f1-eqiad prior to upgrade of ssw later T366361
14:14 ChrisDobbins901_: sudo cumin 'A:cp and A:eqsin' 'disable-puppet "merging CR 1038881"'
14:08 samtar@deploy1002: samtar and anzx and nmw03: Continuing with sync
14:07 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
14:06 samtar@deploy1002: samtar and anzx and nmw03: Backport for gerrit:1037006commonswiki: Enable numeric wgCategoryCollation (T362494), gerrit:1037505Add project namespace alias for Azerbaijani Wikisource (T365966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:06 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
14:05 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
14:04 samtar@deploy1002: Started scap: Backport for gerrit:1037006commonswiki: Enable numeric wgCategoryCollation (T362494), gerrit:1037505Add project namespace alias for Azerbaijani Wikisource (T365966)
14:02 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
14:00 kartik@deploy1002: Finished scap: Backport for gerrit:1039571CX: Fix translation container max width for large screens (T366374) (duration: 13m 11s)
13:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
13:56 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4050.ulsfo.wmnet
13:52 kartik@deploy1002: kartik: Continuing with sync
13:50 kartik@deploy1002: kartik: Backport for gerrit:1039571CX: Fix translation container max width for large screens (T366374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
13:47 kartik@deploy1002: Started scap: Backport for gerrit:1039571CX: Fix translation container max width for large screens (T366374)
13:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:1039612|[mswiktionary] Change the default Sitename value to Wikikamus (T366549)]] (duration: 16m 05s)
13:45 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
13:44 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
13:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
13:37 samtar@deploy1002: samtar and gergesshamon: Continuing with sync
13:32 samtar@deploy1002: samtar and gergesshamon: Backport for [[gerrit:1039612|[mswiktionary] Change the default Sitename value to Wikikamus (T366549)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 samtar@deploy1002: Started scap: Backport for [[gerrit:1039612|[mswiktionary] Change the default Sitename value to Wikikamus (T366549)]]
13:28 samtar@deploy1002: Finished scap: Backport for gerrit:1038862Activate campaignEvents extension on Igbo wiki. (T363199) (duration: 14m 07s)
13:19 samtar@deploy1002: mhorsey and samtar: Continuing with sync
13:16 samtar@deploy1002: mhorsey and samtar: Backport for gerrit:1038862Activate campaignEvents extension on Igbo wiki. (T363199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:15 samtar@deploy1002: Started scap: Backport for gerrit:1038862Activate campaignEvents extension on Igbo wiki. (T363199)
13:11 taavi: taavi@deploy1002 ~ $ sudo kill 32174 # kill forgotten scap sync-world process
13:08 klausman@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
12:57 vgutierrez: repool text@cofw with IPIP encapsulation enabled - T366466
12:56 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
12:56 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2011 - T366466
12:44 topranks: disabling PyBal on lvs1019 to allow for cable move T366361
12:40 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
12:39 topranks: rebooting ssw1-e1-eqiad to upgrade JunOS
12:39 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4051.ulsfo.wmnet
12:33 topranks: disabling BGP to ssw1-e1-eqiad from cr1-eqiad in advance of upgrade T366361
12:33 vgutierrez: depool text@codfw before enabling IPIP encapsulation - T366466
12:29 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4051.ulsfo.wmnet
12:28 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4051.ulsfo.wmnet
12:25 topranks: disabling PyBal on lvs1018 to allow for cable move T366361
12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
12:25 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
12:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1017.eqiad.wmnet
12:24 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1017.eqiad.wmnet
12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
12:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
11:56 topranks: disabling PyBal on lvs1017 to allow for cable move T366361
11:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
11:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
11:27 effie: kicking off k8s eqiad restarts - T366555
11:25 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
11:09 klausman@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
11:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
10:47 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
10:45 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
10:45 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
10:43 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
10:41 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
10:41 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
10:40 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
10:40 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
10:38 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
10:37 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
10:35 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
10:27 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
10:26 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
10:11 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64181 and previous config saved to /var/cache/conftool/dbconfig/20240606-100747-arnaudb.json
09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64180 and previous config saved to /var/cache/conftool/dbconfig/20240606-095240-arnaudb.json
09:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64179 and previous config saved to /var/cache/conftool/dbconfig/20240606-095053-marostegui.json
09:47 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2004.codfw.wmnet
09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64178 and previous config saved to /var/cache/conftool/dbconfig/20240606-093734-arnaudb.json
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64177 and previous config saved to /var/cache/conftool/dbconfig/20240606-093545-marostegui.json
09:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2003.codfw.wmnet
09:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64176 and previous config saved to /var/cache/conftool/dbconfig/20240606-092228-arnaudb.json
09:22 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:20 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64175 and previous config saved to /var/cache/conftool/dbconfig/20240606-092037-marostegui.json
09:20 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
09:17 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
09:17 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:15 stevemunene@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:13 stevemunene@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:12 stevemunene@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:11 stevemunene@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:08 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1004.eqiad.wmnet
09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64174 and previous config saved to /var/cache/conftool/dbconfig/20240606-090722-arnaudb.json
09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64173 and previous config saved to /var/cache/conftool/dbconfig/20240606-090529-marostegui.json
09:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2002.codfw.wmnet
09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
08:57 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
08:56 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
08:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64172 and previous config saved to /var/cache/conftool/dbconfig/20240606-085216-arnaudb.json
08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
08:50 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1031.eqiad.wmnet
08:47 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1003.eqiad.wmnet
08:44 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
08:44 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
08:40 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2001.codfw.wmnet
08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
08:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64171 and previous config saved to /var/cache/conftool/dbconfig/20240606-083710-arnaudb.json
08:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
08:35 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:35 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
08:19 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64167 and previous config saved to /var/cache/conftool/dbconfig/20240606-081753-marostegui.json
08:14 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:14 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64166 and previous config saved to /var/cache/conftool/dbconfig/20240606-081412-ladsgroup.json
08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64165 and previous config saved to /var/cache/conftool/dbconfig/20240606-080245-marostegui.json
08:02 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
08:01 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1001.eqiad.wmnet
08:00 urbanecm@deploy1002: Started scap: Backport for gerrit:1039287Add throttle exception for an upcoming workshop (T366748)
07:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64164 and previous config saved to /var/cache/conftool/dbconfig/20240606-075904-ladsgroup.json
07:50 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64163 and previous config saved to /var/cache/conftool/dbconfig/20240606-074737-marostegui.json
07:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64162 and previous config saved to /var/cache/conftool/dbconfig/20240606-074356-ladsgroup.json
07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64161 and previous config saved to /var/cache/conftool/dbconfig/20240606-073229-marostegui.json
07:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
07:06 hashar: Restarting Gerrit
07:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64160 and previous config saved to /var/cache/conftool/dbconfig/20240606-070558-ladsgroup.json
07:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
07:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
06:56 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1034.eqiad.wmnet
06:49 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1034.eqiad.wmnet
05:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
05:21 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
05:20 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
05:04 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
05:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64159 and previous config saved to /var/cache/conftool/dbconfig/20240606-041714-marostegui.json
04:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
04:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
04:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64158 and previous config saved to /var/cache/conftool/dbconfig/20240606-041650-marostegui.json
04:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64157 and previous config saved to /var/cache/conftool/dbconfig/20240606-040142-marostegui.json
03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64156 and previous config saved to /var/cache/conftool/dbconfig/20240606-034732-ladsgroup.json
03:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64155 and previous config saved to /var/cache/conftool/dbconfig/20240606-034709-ladsgroup.json
03:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64154 and previous config saved to /var/cache/conftool/dbconfig/20240606-034635-marostegui.json
03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64153 and previous config saved to /var/cache/conftool/dbconfig/20240606-033201-ladsgroup.json
03:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64152 and previous config saved to /var/cache/conftool/dbconfig/20240606-033125-marostegui.json
03:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64151 and previous config saved to /var/cache/conftool/dbconfig/20240606-032907-ladsgroup.json
03:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
03:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64150 and previous config saved to /var/cache/conftool/dbconfig/20240606-032844-ladsgroup.json
03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64149 and previous config saved to /var/cache/conftool/dbconfig/20240606-031653-ladsgroup.json
03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64148 and previous config saved to /var/cache/conftool/dbconfig/20240606-031336-ladsgroup.json
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64147 and previous config saved to /var/cache/conftool/dbconfig/20240606-030145-ladsgroup.json
02:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64146 and previous config saved to /var/cache/conftool/dbconfig/20240606-025828-ladsgroup.json
02:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64145 and previous config saved to /var/cache/conftool/dbconfig/20240606-024321-ladsgroup.json
01:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64144 and previous config saved to /var/cache/conftool/dbconfig/20240606-012208-marostegui.json
01:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
01:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64143 and previous config saved to /var/cache/conftool/dbconfig/20240606-012144-marostegui.json
01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64142 and previous config saved to /var/cache/conftool/dbconfig/20240606-010636-marostegui.json
00:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64141 and previous config saved to /var/cache/conftool/dbconfig/20240606-005128-marostegui.json
00:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64140 and previous config saved to /var/cache/conftool/dbconfig/20240606-003620-marostegui.json
00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64139 and previous config saved to /var/cache/conftool/dbconfig/20240606-003232-marostegui.json
00:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
00:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64138 and previous config saved to /var/cache/conftool/dbconfig/20240606-003208-marostegui.json
00:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64137 and previous config saved to /var/cache/conftool/dbconfig/20240606-001700-marostegui.json
00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64136 and previous config saved to /var/cache/conftool/dbconfig/20240606-000151-marostegui.json

2024-06-05

23:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64135 and previous config saved to /var/cache/conftool/dbconfig/20240605-234643-marostegui.json
23:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64134 and previous config saved to /var/cache/conftool/dbconfig/20240605-232926-ladsgroup.json
23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
22:54 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
22:50 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
22:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
22:03 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
21:43 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
21:42 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
21:42 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
21:36 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
21:18 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
21:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in2001.wikimedia.org
21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in2001.wikimedia.org with OS bookworm
20:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
20:42 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64133 and previous config saved to /var/cache/conftool/dbconfig/20240605-202949-marostegui.json
20:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
20:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
20:26 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in2001.wikimedia.org with OS bookworm
20:26 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:25 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:1038740|[CheckUser] Stop writing old for event tables migration on group0 (T360685)]], gerrit:1038882Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [[gerrit:1035473|[Beta] Enable CommunityConfiguration extension in all wikis (T364892)]] (duration: 22m 04s)
20:25 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in2001.wikimedia.org on all recursors
20:25 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in2001.wikimedia.org on all recursors
20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:24 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:21 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
20:21 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in2001.wikimedia.org
20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in1001.wikimedia.org
20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in1001.wikimedia.org with OS bookworm
20:16 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Continuing with sync
20:12 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:06 ejegg: payments-wiki upgraded from c255fda8 to 82a5e588
20:06 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Backport for [[gerrit:1038740|[CheckUser] Stop writing old for event tables migration on group0 (T360685)]], gerrit:1038882Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [[gerrit:1035473|[Beta] Enable CommunityConfiguration extension in all wikis (T364892)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/M
20:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:1038740|[CheckUser] Stop writing old for event tables migration on group0 (T360685)]], gerrit:1038882Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [[gerrit:1035473|[Beta] Enable CommunityConfiguration extension in all wikis (T364892)]]
20:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
19:57 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in1001.wikimedia.org with OS bookworm
19:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:44 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in1001.wikimedia.org on all recursors
19:43 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in1001.wikimedia.org on all recursors
19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:38 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:36 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
19:36 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in1001.wikimedia.org
19:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
19:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
18:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
18:53 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.8 refs T361402
18:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64132 and previous config saved to /var/cache/conftool/dbconfig/20240605-184250-ladsgroup.json
18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64131 and previous config saved to /var/cache/conftool/dbconfig/20240605-182742-ladsgroup.json
18:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64130 and previous config saved to /var/cache/conftool/dbconfig/20240605-181234-ladsgroup.json
18:12 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
18:11 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
18:07 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
18:06 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64129 and previous config saved to /var/cache/conftool/dbconfig/20240605-175725-ladsgroup.json
17:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64128 and previous config saved to /var/cache/conftool/dbconfig/20240605-175503-ladsgroup.json
17:50 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
17:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
17:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
17:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64127 and previous config saved to /var/cache/conftool/dbconfig/20240605-174724-marostegui.json
17:42 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1039256Stop writing to pagelinks old columns in enwiki (T352010) (duration: 12m 19s)
17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64126 and previous config saved to /var/cache/conftool/dbconfig/20240605-173954-ladsgroup.json
17:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
17:32 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:1039256Stop writing to pagelinks old columns in enwiki (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64125 and previous config saved to /var/cache/conftool/dbconfig/20240605-173216-marostegui.json
17:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
17:29 ladsgroup@deploy1002: Started scap: Backport for gerrit:1039256Stop writing to pagelinks old columns in enwiki (T352010)
17:27 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
17:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64124 and previous config saved to /var/cache/conftool/dbconfig/20240605-172446-ladsgroup.json
17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64123 and previous config saved to /var/cache/conftool/dbconfig/20240605-171708-marostegui.json
17:13 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
17:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:10 jhathaway: phabricator email now egressing via mx-out{1001,2001}.wikimedia.org, which should solve the SPF warnings in your inbox
17:10 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1033.eqiad.wmnet
17:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64122 and previous config saved to /var/cache/conftool/dbconfig/20240605-170938-ladsgroup.json
17:06 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
17:06 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1033.eqiad.wmnet
17:06 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
17:05 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
17:05 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
17:04 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json
16:56 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
16:56 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
16:56 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
16:54 mutante: downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785
16:53 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
16:53 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
16:52 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1038392Bump XML dump schema to version 0.11 (T365155) (duration: 18m 23s)
16:48 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64120 and previous config saved to /var/cache/conftool/dbconfig/20240605-164635-ladsgroup.json
16:46 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
16:45 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:43 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Continuing with sync
16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
16:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:36 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Backport for gerrit:1038392Bump XML dump schema to version 0.11 (T365155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:34 ladsgroup@deploy1002: Started scap: Backport for gerrit:1038392Bump XML dump schema to version 0.11 (T365155)
16:32 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64119 and previous config saved to /var/cache/conftool/dbconfig/20240605-163129-ladsgroup.json
16:20 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:18 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:18 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1032.eqiad.wmnet
16:18 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64118 and previous config saved to /var/cache/conftool/dbconfig/20240605-161622-ladsgroup.json
16:16 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:15 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:14 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:12 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1032.eqiad.wmnet
16:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:10 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:08 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:05 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:01 aokoth@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:01 aokoth@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:01 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64117 and previous config saved to /var/cache/conftool/dbconfig/20240605-160116-ladsgroup.json
15:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64116 and previous config saved to /var/cache/conftool/dbconfig/20240605-155955-ladsgroup.json
15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:59 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
15:58 aokoth@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:58 aokoth@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:57 aokoth@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:56 aokoth@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:51 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
15:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1081.eqiad.wmnet
15:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64115 and previous config saved to /var/cache/conftool/dbconfig/20240605-155023-ladsgroup.json
15:46 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:44 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
15:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
15:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
15:37 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
15:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
15:37 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
15:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
15:34 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
15:32 moritzm: rebalancing drmrs Ganeti clusters
15:30 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
15:29 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
15:28 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
15:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
15:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2078.codfw.wmnet
15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ping1004.eqiad.wmnet
15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ping1004.eqiad.wmnet with OS bookworm
15:24 sukhe@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
15:21 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
15:20 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
15:19 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2078.codfw.wmnet
15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
15:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
15:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
15:10 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
15:10 kamila@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-ctrl1001']
15:09 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
15:09 jnuche@deploy1002: Installation of scap version "4.86.0" completed for 285 hosts
15:08 jnuche@deploy1002: Installing scap version "4.86.0" for 285 hosts
15:07 jnuche@deploy1002: Installing scap version "4.86.0" for 286 hosts
15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64114 and previous config saved to /var/cache/conftool/dbconfig/20240605-150605-marostegui.json
15:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
15:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64113 and previous config saved to /var/cache/conftool/dbconfig/20240605-150542-marostegui.json
15:05 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:04 vgutierrez: repool text@eqsin with IPIP encapsulation enabled - T366466
15:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
15:01 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
14:59 cwhite@deploy1002: Finished scap: Backport for gerrit:1039212MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) (duration: 12m 32s)
14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64112 and previous config saved to /var/cache/conftool/dbconfig/20240605-145757-ladsgroup.json
14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64111 and previous config saved to /var/cache/conftool/dbconfig/20240605-145735-ladsgroup.json
14:55 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
14:55 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
14:55 vgutierrez: rolling restart of pybal on lvs5006 and lvs5004 - T366466
14:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64110 and previous config saved to /var/cache/conftool/dbconfig/20240605-145034-marostegui.json
14:50 cwhite@deploy1002: matmarex and cwhite: Continuing with sync
14:49 cwhite@deploy1002: matmarex and cwhite: Backport for gerrit:1039212MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
14:46 cwhite@deploy1002: Started scap: Backport for gerrit:1039212MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657)
14:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
14:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64109 and previous config saved to /var/cache/conftool/dbconfig/20240605-144227-ladsgroup.json
14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64108 and previous config saved to /var/cache/conftool/dbconfig/20240605-143526-marostegui.json
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
14:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
14:28 vgutierrez: depool text@eqsin before enabling IPIP encapsulation - T366466
14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64107 and previous config saved to /var/cache/conftool/dbconfig/20240605-142718-ladsgroup.json
14:23 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
14:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
14:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64106 and previous config saved to /var/cache/conftool/dbconfig/20240605-142018-marostegui.json
14:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
14:15 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
14:13 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
14:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64105 and previous config saved to /var/cache/conftool/dbconfig/20240605-141210-ladsgroup.json
14:10 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
14:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
14:05 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
14:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
14:02 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:02 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:00 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
14:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1004.eqiad.wmnet on all recursors
14:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1004.eqiad.wmnet on all recursors
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
13:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
13:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
13:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
13:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
13:51 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
13:48 inflatador: bking@an-db1001 install python3-psycopg2 pkg T363001
13:48 daniel@deploy1002: Finished scap: Backport for gerrit:1038688Set LinterParseOnDerivedDataUpdate to false (T361013) (duration: 17m 50s)
13:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:48 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1004.eqiad.wmnet
13:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
13:47 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
13:46 elukey: factory reset for sretest1001 to test the new provision cookbook - T365372
13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
13:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
13:45 inflatador: bking@an-db1001 install acl pkg T363001
13:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
13:40 daniel@deploy1002: daniel: Continuing with sync
13:39 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
13:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
13:37 filippo@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1005.eqiad.wmnet
13:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
13:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
13:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
13:35 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
13:34 daniel@deploy1002: daniel: Backport for gerrit:1038688Set LinterParseOnDerivedDataUpdate to false (T361013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:30 daniel@deploy1002: Started scap: Backport for gerrit:1038688Set LinterParseOnDerivedDataUpdate to false (T361013)
13:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
13:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
13:27 elukey: systemctl reset-failed prometheus-redis-exporter@6380.service redis-instance-tcp_6380.service on netbox[12]002 + apt-get purge of redis-server and prometheus-redis-exporter packages to clean up stale configs (no local redis is used)
13:27 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
13:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
13:26 dreamyjazz@deploy1002: Finished scap: Backport for gerrit:1038839Follow-up: Don't run interact with block buttons if they don't exist (T329493) (duration: 11m 39s)
13:25 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
13:21 fabfur: enable magru DC after applying IPIP encapsulation patches (T366466)
13:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
13:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
13:17 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
13:17 dreamyjazz@deploy1002: dreamyjazz: Backport for gerrit:1038839Follow-up: Don't run interact with block buttons if they don't exist (T329493) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64104 and previous config saved to /var/cache/conftool/dbconfig/20240605-131647-marostegui.json
13:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64103 and previous config saved to /var/cache/conftool/dbconfig/20240605-131623-marostegui.json
13:14 dreamyjazz@deploy1002: Started scap: Backport for gerrit:1038839Follow-up: Don't run interact with block buttons if they don't exist (T329493)
13:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
13:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
13:13 dreamyjazz@deploy1002: Finished scap: Backport for [[gerrit:1013386|[CheckUser] Stop writing old for event table migration on testwiki (T360686)]] (duration: 19m 13s)
13:10 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker
13:06 fabfur: restarting pybal on lvs7001/lvs7003 to appy IPIP conf (T366466)
13:04 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
13:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
13:02 dreamyjazz@deploy1002: dreamyjazz: Backport for [[gerrit:1013386|[CheckUser] Stop writing old for event table migration on testwiki (T360686)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64102 and previous config saved to /var/cache/conftool/dbconfig/20240605-130115-marostegui.json
12:56 elukey@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker
12:55 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
12:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
12:53 dreamyjazz@deploy1002: Started scap: Backport for [[gerrit:1013386|[CheckUser] Stop writing old for event table migration on testwiki (T360686)]]
12:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2004.codfw.wmnet
12:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping2004.codfw.wmnet with OS bookworm
12:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
12:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
12:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
12:49 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1246 T363119', diff saved to https://phabricator.wikimedia.org/P64101 and previous config saved to /var/cache/conftool/dbconfig/20240605-124918-arnaudb.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64100 and previous config saved to /var/cache/conftool/dbconfig/20240605-124607-marostegui.json
12:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
12:45 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
12:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
12:43 moritzm: failover ganeti masters in drmrs
12:40 cgoubert@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-codfw
12:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
12:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
12:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
12:35 fabfur: disabling puppet on A:cp-text to test IPIP encapsulation on magru (T366466)
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
12:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64099 and previous config saved to /var/cache/conftool/dbconfig/20240605-123059-marostegui.json
12:29 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
12:26 fabfur: disabling magru DC to apply IPIP encapsulation patches (T366466)
12:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
12:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
12:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
12:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
12:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
12:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
12:17 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
12:16 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
12:15 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping2004.codfw.wmnet with OS bookworm
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
12:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
12:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2004.codfw.wmnet on all recursors
12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2004.codfw.wmnet on all recursors
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
12:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
12:09 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
12:08 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
12:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2004.codfw.wmnet
12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
12:04 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
12:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
12:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
12:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
11:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
11:52 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
11:50 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
11:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
11:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
11:44 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
11:41 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
11:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
11:39 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1008.eqiad.wmnet|wikikube-worker1009.eqiad.wmnet|wikikube-worker1010.eqiad.wmnet|wikikube-worker1011.eqiad.wmnet|wikikube-worker1012.eqiad.wmnet),cluster=kubernetes,service=kubesvc
11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
11:38 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
11:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
11:37 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
11:36 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
11:31 hnowlan: running homer to configure bgp on 5 new k8s workers
11:31 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
11:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
11:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
11:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
11:17 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
11:12 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
11:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
11:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
11:03 claime: restarted send_tile_invalidations.service on maps1009
11:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64098 and previous config saved to /var/cache/conftool/dbconfig/20240605-110303-ladsgroup.json
10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
10:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64097 and previous config saved to /var/cache/conftool/dbconfig/20240605-105400-root.json
10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
10:53 hnowlan@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
10:52 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
10:52 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
10:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64096 and previous config saved to /var/cache/conftool/dbconfig/20240605-104757-ladsgroup.json
10:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
10:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
10:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
10:39 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64094 and previous config saved to /var/cache/conftool/dbconfig/20240605-103854-root.json
10:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
10:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1012.eqiad.wmnet with OS bullseye
10:35 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
10:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1010.eqiad.wmnet with OS bullseye
10:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64093 and previous config saved to /var/cache/conftool/dbconfig/20240605-103251-ladsgroup.json
10:32 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
10:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1008.eqiad.wmnet with OS bullseye
10:30 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64091 and previous config saved to /var/cache/conftool/dbconfig/20240605-102348-root.json
10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64090 and previous config saved to /var/cache/conftool/dbconfig/20240605-102252-ladsgroup.json
10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
10:22 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
10:22 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
10:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
10:21 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
10:18 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
10:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64088 and previous config saved to /var/cache/conftool/dbconfig/20240605-101744-ladsgroup.json
10:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1152.eqiad.wmnet with OS bookworm
10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64087 and previous config saved to /var/cache/conftool/dbconfig/20240605-101521-ladsgroup.json
10:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
10:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
10:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
10:13 dcaro@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcephosd1031.eqiad.wmnet
10:13 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
10:11 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1152 back to x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64086 and previous config saved to /var/cache/conftool/dbconfig/20240605-101019-root.json
10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64085 and previous config saved to /var/cache/conftool/dbconfig/20240605-100842-root.json
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64084 and previous config saved to /var/cache/conftool/dbconfig/20240605-100810-root.json
10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64083 and previous config saved to /var/cache/conftool/dbconfig/20240605-100117-root.json
10:00 fabfur: disabling puppet on cp4037 to test Benthos performances (T358109)
10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1012.eqiad.wmnet with OS bullseye
10:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
10:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
09:59 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc
09:58 claime: pooling and uncordoning wikikube-worker1001 - T351074
09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1456 to wikikube-worker1012
09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1012
09:56 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1010.eqiad.wmnet with OS bullseye
09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1008.eqiad.wmnet with OS bullseye
09:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
09:55 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
09:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1012
09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
09:54 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
09:53 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64082 and previous config saved to /var/cache/conftool/dbconfig/20240605-095336-root.json
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64081 and previous config saved to /var/cache/conftool/dbconfig/20240605-095303-root.json
09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1428 to wikikube-worker1011
09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1011
09:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
09:51 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
09:51 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:51 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
09:50 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1011
09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
09:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
09:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64080 and previous config saved to /var/cache/conftool/dbconfig/20240605-094611-root.json
09:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1428 to wikikube-worker1011
09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1456 to wikikube-worker1012
09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1410 to wikikube-worker1010
09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1010
09:44 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:44 claime: homer 'cr*eqiad*' commit 'T351074'
09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1001.eqiad.wmnet with OS bullseye
09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1401 to wikikube-worker1009
09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1009
09:42 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1010
09:42 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:41 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1009
09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
09:41 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1400 to wikikube-worker1008
09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1008
09:39 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
09:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
09:38 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1008
09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
09:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64079 and previous config saved to /var/cache/conftool/dbconfig/20240605-093830-root.json
09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64078 and previous config saved to /var/cache/conftool/dbconfig/20240605-093757-root.json
09:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1152.eqiad.wmnet with OS bookworm
09:35 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1151 to temp x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64077 and previous config saved to /var/cache/conftool/dbconfig/20240605-093507-root.json
09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
09:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
09:33 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010
09:33 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
09:31 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1410 to wikikube-worker1010.eqiad.wmnet
09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010.eqiad.wmnet
09:31 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1030.eqiad.wmnet
09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1401 to wikikube-worker1009
09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64076 and previous config saved to /var/cache/conftool/dbconfig/20240605-093105-root.json
09:30 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:30 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008
09:29 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1400 to wikikube-worker1008.eqiad.wmnet
09:29 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008.eqiad.wmnet
09:26 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1030.eqiad.wmnet
09:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
09:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
09:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
09:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS bookworm
09:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64075 and previous config saved to /var/cache/conftool/dbconfig/20240605-092324-root.json
09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64074 and previous config saved to /var/cache/conftool/dbconfig/20240605-092251-root.json
09:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
09:19 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
09:18 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
09:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64073 and previous config saved to /var/cache/conftool/dbconfig/20240605-091559-root.json
09:15 brouberol@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
09:11 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
09:11 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
09:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64072 and previous config saved to /var/cache/conftool/dbconfig/20240605-090745-root.json
09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
09:06 brouberol@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
09:02 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1001.eqiad.wmnet with OS bullseye
09:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
09:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1001.eqiad.wmnet on all recursors
09:01 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1001.eqiad.wmnet on all recursors
09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4052.ulsfo.wmnet
09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64071 and previous config saved to /var/cache/conftool/dbconfig/20240605-090053-root.json
09:00 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4044.ulsfo.wmnet
08:58 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
08:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
08:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
08:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
08:54 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
08:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
08:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
08:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/ipoid: apply
08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
08:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64070 and previous config saved to /var/cache/conftool/dbconfig/20240605-085239-root.json
08:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1029.eqiad.wmnet
08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4052.ulsfo.wmnet
08:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
08:50 fabfur@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp4044.ulsfo.wmnet
08:50 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
08:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
08:47 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64069 and previous config saved to /var/cache/conftool/dbconfig/20240605-084547-root.json
08:45 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1029.eqiad.wmnet
08:45 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
08:44 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
08:44 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bookworm
08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1227', diff saved to https://phabricator.wikimedia.org/P64068 and previous config saved to /var/cache/conftool/dbconfig/20240605-084211-root.json
08:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
08:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
08:37 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64067 and previous config saved to /var/cache/conftool/dbconfig/20240605-083733-root.json
08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1358 to wikikube-worker1001
08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1001
08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
08:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1186.eqiad.wmnet with OS bookworm
08:18 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64063 and previous config saved to /var/cache/conftool/dbconfig/20240605-081755-marostegui.json
08:14 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
08:08 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1027.eqiad.wmnet
08:07 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64062 and previous config saved to /var/cache/conftool/dbconfig/20240605-080247-marostegui.json
08:01 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1026.eqiad.wmnet
08:00 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1025.eqiad.wmnet
08:00 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
07:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mirror1001.wikimedia.org
07:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
07:54 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1025.eqiad.wmnet
07:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mirror1001.wikimedia.org
07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64061 and previous config saved to /var/cache/conftool/dbconfig/20240605-074739-marostegui.json
07:45 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
07:38 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
07:38 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
07:37 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
07:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1186', diff saved to https://phabricator.wikimedia.org/P64060 and previous config saved to /var/cache/conftool/dbconfig/20240605-073024-root.json
07:28 marostegui: dbmaint codfw s2 deploy schema change on db2207 T364299
07:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
07:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T366038', diff saved to https://phabricator.wikimedia.org/P64059 and previous config saved to /var/cache/conftool/dbconfig/20240605-072509-root.json
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T366038', diff saved to https://phabricator.wikimedia.org/P64058 and previous config saved to /var/cache/conftool/dbconfig/20240605-072427-marostegui.json
07:24 marostegui: Starting s2 codfw failover from db2207 to db2204 - T366038
07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T366038', diff saved to https://phabricator.wikimedia.org/P64057 and previous config saved to /var/cache/conftool/dbconfig/20240605-070758-root.json
07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64056 and previous config saved to /var/cache/conftool/dbconfig/20240605-044418-marostegui.json
04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64055 and previous config saved to /var/cache/conftool/dbconfig/20240605-044355-marostegui.json
04:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64054 and previous config saved to /var/cache/conftool/dbconfig/20240605-042847-marostegui.json
04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64053 and previous config saved to /var/cache/conftool/dbconfig/20240605-041339-marostegui.json
04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64052 and previous config saved to /var/cache/conftool/dbconfig/20240605-041306-marostegui.json
04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64051 and previous config saved to /var/cache/conftool/dbconfig/20240605-041227-marostegui.json
03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64050 and previous config saved to /var/cache/conftool/dbconfig/20240605-035855-ladsgroup.json
03:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
03:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64049 and previous config saved to /var/cache/conftool/dbconfig/20240605-035832-ladsgroup.json
03:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64048 and previous config saved to /var/cache/conftool/dbconfig/20240605-035831-marostegui.json
03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64047 and previous config saved to /var/cache/conftool/dbconfig/20240605-035719-marostegui.json
03:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64046 and previous config saved to /var/cache/conftool/dbconfig/20240605-034326-ladsgroup.json
03:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64045 and previous config saved to /var/cache/conftool/dbconfig/20240605-034212-marostegui.json
03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64044 and previous config saved to /var/cache/conftool/dbconfig/20240605-032817-ladsgroup.json
03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64043 and previous config saved to /var/cache/conftool/dbconfig/20240605-032704-marostegui.json
03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64042 and previous config saved to /var/cache/conftool/dbconfig/20240605-031310-ladsgroup.json
02:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64041 and previous config saved to /var/cache/conftool/dbconfig/20240605-023423-ladsgroup.json
02:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
02:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance

2024-06-04

23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64040 and previous config saved to /var/cache/conftool/dbconfig/20240604-234228-marostegui.json
23:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
23:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
23:15 tzatziki: removing one file for legal compliance
23:09 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
23:09 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
22:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:47 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:47 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:47 tzatziki: removing one file for legal compliance
22:46 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:46 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:36 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:36 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:36 mutante: CI - (integration.wikimedia.org) short downtime for maintenance
22:35 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:35 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:29 tzatziki: removing two files for legal compliance
22:16 tzatziki: removing three files for legal compliance
22:08 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:00 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
21:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:41 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:33 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:33 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:33 urbanecm@deploy1002: Finished scap: Backport for gerrit:1038444Disable font size options on specified pages for most wikis (T366334) (duration: 15m 10s)
21:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:24 urbanecm@deploy1002: toyofuku and urbanecm: Continuing with sync
21:21 urbanecm@deploy1002: toyofuku and urbanecm: Backport for gerrit:1038444Disable font size options on specified pages for most wikis (T366334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:18 urbanecm@deploy1002: Started scap: Backport for gerrit:1038444Disable font size options on specified pages for most wikis (T366334)
21:10 tgr@deploy1002: Finished scap: Backport for gerrit:1037929multiversion: Support beta for upload hostname check, gerrit:1037930multiversion: Add tests for MWMultiVersion::getMediaWiki() (duration: 16m 33s)
21:07 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
21:06 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
21:01 tgr@deploy1002: tgr: Continuing with sync
20:58 tgr@deploy1002: tgr: Backport for gerrit:1037929multiversion: Support beta for upload hostname check, gerrit:1037930multiversion: Add tests for MWMultiVersion::getMediaWiki() synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:56 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
20:53 tgr@deploy1002: Started scap: Backport for gerrit:1037929multiversion: Support beta for upload hostname check, gerrit:1037930multiversion: Add tests for MWMultiVersion::getMediaWiki()
20:52 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
20:47 tgr@deploy1002: Finished scap: Backport for gerrit:1035749beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) (duration: 13m 12s)
20:39 tgr@deploy1002: tgr and pmiazga: Continuing with sync
20:37 tgr@deploy1002: tgr and pmiazga: Backport for gerrit:1035749beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:34 tgr@deploy1002: Started scap: Backport for gerrit:1035749beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281)
20:28 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:1037945|[pawiki] Enable wgMinervaEnableSiteNotice (T366434)]] (duration: 13m 24s)
20:27 jhathaway: vacuuming pcc db
20:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64039 and previous config saved to /var/cache/conftool/dbconfig/20240604-202554-marostegui.json
20:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
20:22 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
20:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
20:21 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
20:19 ladsgroup@deploy1002: pppery and ladsgroup: Continuing with sync
20:17 ladsgroup@deploy1002: pppery and ladsgroup: Backport for [[gerrit:1037945|[pawiki] Enable wgMinervaEnableSiteNotice (T366434)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:15 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:1037945|[pawiki] Enable wgMinervaEnableSiteNotice (T366434)]]
20:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64038 and previous config saved to /var/cache/conftool/dbconfig/20240604-201047-marostegui.json
20:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:59 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
19:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64037 and previous config saved to /var/cache/conftool/dbconfig/20240604-195539-marostegui.json
19:49 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:49 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:47 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
19:44 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64036 and previous config saved to /var/cache/conftool/dbconfig/20240604-194031-marostegui.json
19:38 mutante: https://gerrit-replica.wikimedia.org - short downtime for maintenance
19:38 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
19:38 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
19:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:37 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
19:37 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:37 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
19:36 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:33 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
19:32 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
19:16 mutante: releases.wikimedia.org - short downtime for maintenance
19:14 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
19:13 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
19:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
19:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64035 and previous config saved to /var/cache/conftool/dbconfig/20240604-190931-marostegui.json
19:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64034 and previous config saved to /var/cache/conftool/dbconfig/20240604-190906-marostegui.json
19:06 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
19:00 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: 0.3.142 (duration: 12m 53s)
18:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64033 and previous config saved to /var/cache/conftool/dbconfig/20240604-185358-marostegui.json
18:48 ryankemper: [WDQS Deploy] Forgot to run the command to set git hash to tip of origin/master so deploy was a partial no-op. Re-rolling...
18:47 ryankemper@deploy1002: Started deploy [wdqs/wdqs@43b966f]: 0.3.142
18:46 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@143ca33]: 0.3.142 (duration: 02m 02s)
18:45 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.142` on canary `wdqs1016`; proceeding to rest of fleet
18:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@143ca33]: 0.3.142
18:41 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.142`. Pre-deploy tests passing on canary `wdqs1016`
18:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64032 and previous config saved to /var/cache/conftool/dbconfig/20240604-183850-marostegui.json
18:35 mutante: aphlict - (phab realtime notifications) - reboots
18:30 mutante: doc.wikimedia.org - very short downtime for maintenance
18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
18:26 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.8 refs T361402
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64031 and previous config saved to /var/cache/conftool/dbconfig/20240604-182342-marostegui.json
18:15 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7014*} and A:cp
17:54 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7014*} and A:cp
17:53 sukhe: sudo cumin 'A:cp-upload and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
17:51 sukhe: sudo cumin 'A:cp-text and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
17:49 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7002*} and A:cp
17:39 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7002*} and A:cp
17:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:22 sukhe: sudo cumin 'A:cp and A:magru' 'run-puppet-agent'
17:15 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:15 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
17:14 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
17:11 kamila@cumin1002: START - Cookbook sre.dns.netbox
16:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp700[12].magru.wmnet,service=(cdn|ats-be)
16:52 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:51 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:41 elukey: delete other 2 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
16:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
16:31 elukey: delete 3 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
16:29 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1010.eqiad.wmnet
16:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64028 and previous config saved to /var/cache/conftool/dbconfig/20240604-162241-root.json
16:22 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7002.magru.wmnet
16:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7001.magru.wmnet
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64025 and previous config saved to /var/cache/conftool/dbconfig/20240604-161233-marostegui.json
16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64024 and previous config saved to /var/cache/conftool/dbconfig/20240604-161210-marostegui.json
16:11 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7002.magru.wmnet
16:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7001.magru.wmnet
16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
16:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1013.eqiad.wmnet
16:09 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
16:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
16:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64023 and previous config saved to /var/cache/conftool/dbconfig/20240604-160735-root.json
16:05 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
16:05 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
16:04 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
16:04 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab2002.codfw.wmnet
16:04 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
16:02 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1005.eqiad.wmnet
16:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
16:00 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
15:59 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
15:58 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab2002.codfw.wmnet
15:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1004.eqiad.wmnet
15:57 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1003.eqiad.wmnet
15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64022 and previous config saved to /var/cache/conftool/dbconfig/20240604-155701-marostegui.json
15:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64021 and previous config saved to /var/cache/conftool/dbconfig/20240604-155629-ladsgroup.json
15:55 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1013.eqiad.wmnet
15:53 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1003.eqiad.wmnet
15:53 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
15:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
15:52 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1004.eqiad.wmnet
15:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64020 and previous config saved to /var/cache/conftool/dbconfig/20240604-155228-root.json
15:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1003.eqiad.wmnet
15:52 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s3
15:51 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1002.eqiad.wmnet
15:51 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s1
15:48 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2003.codfw.wmnet
15:47 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1002.eqiad.wmnet
15:47 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1003.eqiad.wmnet
15:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
15:47 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1001.eqiad.wmnet
15:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
15:44 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host miscweb2003.codfw.wmnet
15:43 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
15:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:43 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM aux-k8s-etcd1001.eqiad.wmnet
15:42 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
15:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:42 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2001.codfw.wmnet
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64019 and previous config saved to /var/cache/conftool/dbconfig/20240604-154153-marostegui.json
15:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_magru
15:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2001.codfw.wmnet
15:37 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1002.eqiad.wmnet
15:37 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(0|3|5).codfw.wmnet,cluster=kubernetes,service=kubesvc
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64018 and previous config saved to /var/cache/conftool/dbconfig/20240604-153722-root.json
15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2030,2033,2035].codfw.wmnet
15:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1002.eqiad.wmnet
15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes[2030,2033,2035].codfw.wmnet
15:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
15:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
15:31 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1002.eqiad.wmnet
15:31 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1002.eqiad.wmnet
15:29 tchin@deploy1002: Finished deploy [airflow-dags/analytics_test@a279784]: (no justification provided) (duration: 00m 10s)
15:29 tchin@deploy1002: Started deploy [airflow-dags/analytics_test@a279784]: (no justification provided)
15:29 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
15:27 tchin@deploy1002: Finished deploy [airflow-dags/analytics@a279784]: (no justification provided) (duration: 00m 27s)
15:27 dcausse@deploy1002: Finished deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps (duration: 00m 27s)
15:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:27 tchin@deploy1002: Started deploy [airflow-dags/analytics@a279784]: (no justification provided)
15:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
15:27 dcausse@deploy1002: Started deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps
15:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64017 and previous config saved to /var/cache/conftool/dbconfig/20240604-152644-marostegui.json
15:25 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1001.eqiad.wmnet
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64015 and previous config saved to /var/cache/conftool/dbconfig/20240604-152216-root.json
15:22 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1001.eqiad.wmnet
15:21 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
15:21 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
15:19 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:19 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
15:18 elukey@cumin1002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM aux-k8s-ctrl1001.eqiad.wmnet
15:18 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
15:18 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:16 ejegg: fundraising civicrm upgraded from 44900b8c to 71ed6bed
15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
15:15 ejegg: payments-wiki upgraded from 0174d89c to c255fda8
15:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:12 dancy@deploy1002: Installation of scap version "4.85.0" completed for 294 hosts
15:11 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
15:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_magru
15:11 dancy@deploy1002: Installing scap version "4.85.0" for 294 hosts
15:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:09 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64014 and previous config saved to /var/cache/conftool/dbconfig/20240604-150835-ladsgroup.json
15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:08 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64013 and previous config saved to /var/cache/conftool/dbconfig/20240604-150710-root.json
15:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3066*} and A:cp
15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:04 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605 (duration: 00m 32s)
15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605
15:03 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
15:03 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
15:03 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605 (duration: 00m 33s)
15:02 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605
15:02 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
15:02 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
14:55 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
14:55 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3066*} and A:cp
14:53 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:52 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64012 and previous config saved to /var/cache/conftool/dbconfig/20240604-145203-root.json
14:49 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
14:48 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp4045*} and A:cp
14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
14:48 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
14:46 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(1|4).codfw.wmnet,cluster=kubernetes,service=kubesvc
14:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
14:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
14:38 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp4045*} and A:cp
14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7003.magru.wmnet
14:27 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7003.magru.wmnet
14:22 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-codfw
14:14 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1001.eqiad.wmnet
14:14 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:14 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:10 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:06 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:02 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7002.magru.wmnet
14:00 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1001.eqiad.wmnet
13:59 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7002.magru.wmnet
13:59 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1001.eqiad.wmnet
13:46 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
13:42 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
13:42 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
13:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
13:35 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64009 and previous config saved to /var/cache/conftool/dbconfig/20240604-133250-ladsgroup.json
13:29 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
13:29 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
13:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
13:24 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
13:23 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
13:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
13:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
13:20 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
13:17 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
13:17 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
13:14 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7001.magru.wmnet
13:12 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_magru
13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru
13:11 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7001.magru.wmnet
13:10 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
13:08 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
13:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
13:05 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
13:05 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
13:03 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
13:02 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
13:00 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
12:59 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
12:57 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
12:56 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
12:53 brouberol@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
12:53 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
12:52 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
12:48 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
12:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
12:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64008 and previous config saved to /var/cache/conftool/dbconfig/20240604-124432-ladsgroup.json
12:43 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
12:39 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
12:32 brouberol@cumin2002: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64007 and previous config saved to /var/cache/conftool/dbconfig/20240604-122924-ladsgroup.json
12:29 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
12:28 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database dtpwiki (T365229)
12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64006 and previous config saved to /var/cache/conftool/dbconfig/20240604-122602-root.json
12:22 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
12:17 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-eqiad
12:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
12:15 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
12:14 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64005 and previous config saved to /var/cache/conftool/dbconfig/20240604-121415-ladsgroup.json
12:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
12:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
12:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64004 and previous config saved to /var/cache/conftool/dbconfig/20240604-121056-root.json
12:08 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-codfw
12:02 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database dtpwiki (T365229)
11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64003 and previous config saved to /var/cache/conftool/dbconfig/20240604-115907-ladsgroup.json
11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64002 and previous config saved to /var/cache/conftool/dbconfig/20240604-115549-root.json
11:54 hnowlan: depooling 3 api appservers and 2 appservers in advance of reimaging
11:50 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-eqiad
11:44 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-codfw
11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64001 and previous config saved to /var/cache/conftool/dbconfig/20240604-114157-marostegui.json
11:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64000 and previous config saved to /var/cache/conftool/dbconfig/20240604-114043-root.json
11:39 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
11:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63999 and previous config saved to /var/cache/conftool/dbconfig/20240604-112537-root.json
11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
11:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63998 and previous config saved to /var/cache/conftool/dbconfig/20240604-111031-root.json
11:06 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
11:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
10:59 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
10:59 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
10:57 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
10:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
10:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63996 and previous config saved to /var/cache/conftool/dbconfig/20240604-105525-root.json
10:54 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
10:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
10:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
10:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
10:50 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
10:49 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
10:48 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
10:45 marostegui: dbmaint codfw s1 deploy schema change on db2203 T364299
10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2203 T366552', diff saved to https://phabricator.wikimedia.org/P63995 and previous config saved to /var/cache/conftool/dbconfig/20240604-104337-root.json
10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T366552', diff saved to https://phabricator.wikimedia.org/P63994 and previous config saved to /var/cache/conftool/dbconfig/20240604-104241-root.json
10:42 marostegui: Starting s1 codfw failover from db2203 to db2212 - T366552
10:42 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper
10:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS bookworm
10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
10:28 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided) (duration: 01m 12s)
10:27 hashar: Upgrading releases Jenkins instances # T366008
10:27 hashar@deploy1002: Started deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided)
10:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper
10:23 claime: Migrating votewiki to mw-on-k8s - T362323
10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
10:20 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
10:16 hashar: Upgrading CI Jenkins # T366008
10:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
10:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
10:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb2002-dev.codfw.wmnet
10:09 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
10:08 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T364299
10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper_monitor
10:07 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
10:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
10:04 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb2002-dev.codfw.wmnet
10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T366552', diff saved to https://phabricator.wikimedia.org/P63993 and previous config saved to /var/cache/conftool/dbconfig/20240604-100024-root.json
10:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
09:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
09:58 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS bookworm
09:58 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
09:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
09:56 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
09:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
09:53 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
09:48 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
09:45 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
09:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3003.wikimedia.org
09:42 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
09:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
09:40 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet
09:40 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
09:39 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper_monitor
09:38 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
09:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
09:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3003.wikimedia.org
09:36 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
09:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
09:34 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2008-dev.codfw.wmnet
09:34 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet
09:34 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
09:33 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
09:30 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
09:29 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
09:29 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2007-dev.codfw.wmnet
09:27 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
09:27 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
09:27 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
09:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2001.codfw.wmnet
09:22 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
09:22 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
09:21 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
09:21 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
09:21 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2001.codfw.wmnet
09:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1001.eqiad.wmnet
09:15 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
09:15 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
09:14 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
09:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1001.eqiad.wmnet
09:08 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
09:08 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
09:01 moritzm: imported python3-xapian-haystack 2.1.1-1+deb12u1 to bookworm-wikimedia (already lined up for the next Bookworm point release to address https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1066136 and needed for the update of the Mailman servers T331706
08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7001.wikimedia.org
08:54 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P63992 and previous config saved to /var/cache/conftool/dbconfig/20240604-085205-marostegui.json
08:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7001.wikimedia.org
08:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63991 and previous config saved to /var/cache/conftool/dbconfig/20240604-085141-marostegui.json
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1003.wikimedia.org
08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1003.wikimedia.org
08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P63990 and previous config saved to /var/cache/conftool/dbconfig/20240604-084428-root.json
08:40 kostajh: UTC morning deploys done
08:38 kharlan@deploy1002: Finished scap: Backport for gerrit:1038634IPReputationHooks: Bump schema version (T354597) (duration: 15m 45s)
08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P63989 and previous config saved to /var/cache/conftool/dbconfig/20240604-083633-marostegui.json
08:19 kharlan@deploy1002: Finished scap: Backport for gerrit:1038633IPReputationHooks: Bump schema version (T354597) (duration: 14m 08s)
08:10 kharlan@deploy1002: kharlan: Continuing with sync
08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63986 and previous config saved to /var/cache/conftool/dbconfig/20240604-080846-marostegui.json
08:08 kharlan@deploy1002: kharlan: Backport for gerrit:1038633IPReputationHooks: Bump schema version (T354597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63985 and previous config saved to /var/cache/conftool/dbconfig/20240604-080617-marostegui.json
08:06 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
08:05 kharlan@deploy1002: Started scap: Backport for gerrit:1038633IPReputationHooks: Bump schema version (T354597)
08:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
08:01 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
07:57 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
07:56 hashar: Restarting Gerrit for Java 17 upgrade # T364342
07:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 03s)
07:56 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342
07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63984 and previous config saved to /var/cache/conftool/dbconfig/20240604-075338-marostegui.json
07:47 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 05s)
07:46 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342
07:42 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2002.codfw.wmnet with OS bookworm
07:42 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bookworm
07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63983 and previous config saved to /var/cache/conftool/dbconfig/20240604-073830-marostegui.json
07:27 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T356166
07:15 moritzm: installing intel-microcode updates on bullseye
07:10 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T355609
07:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
07:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
07:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1184.eqiad.wmnet with OS bookworm
06:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
06:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1184.eqiad.wmnet with OS bookworm
06:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
06:14 marostegui: Rename table flaggedpage_pending on db1185 (s5 eqiad dbmaint) - T365568
06:09 arnaudb@cumin1002: dbctl commit (dc=all): ' fix api db1163 vs db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63982 and previous config saved to /var/cache/conftool/dbconfig/20240604-060925-arnaudb.json
06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'API db1163 T366259', diff saved to https://phabricator.wikimedia.org/P63981 and previous config saved to /var/cache/conftool/dbconfig/20240604-060747-arnaudb.json
06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63980 and previous config saved to /var/cache/conftool/dbconfig/20240604-060703-arnaudb.json
06:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T366259', diff saved to https://phabricator.wikimedia.org/P63979 and previous config saved to /var/cache/conftool/dbconfig/20240604-060324-arnaudb.json
06:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T366259', diff saved to https://phabricator.wikimedia.org/P63978 and previous config saved to /var/cache/conftool/dbconfig/20240604-060208-arnaudb.json
06:01 arnaudb: Starting s1 eqiad failover from db1184 to db1163 - T366259
05:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T366259', diff saved to https://phabricator.wikimedia.org/P63977 and previous config saved to /var/cache/conftool/dbconfig/20240604-052803-arnaudb.json
05:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
05:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
04:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P63976 and previous config saved to /var/cache/conftool/dbconfig/20240604-042011-ladsgroup.json
04:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.5 (duration: 00m 57s)
03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63975 and previous config saved to /var/cache/conftool/dbconfig/20240604-035703-marostegui.json
03:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.8 refs T361402 (duration: 53m 47s)
03:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
03:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63974 and previous config saved to /var/cache/conftool/dbconfig/20240604-035640-marostegui.json
03:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63973 and previous config saved to /var/cache/conftool/dbconfig/20240604-034132-marostegui.json
03:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63972 and previous config saved to /var/cache/conftool/dbconfig/20240604-032625-marostegui.json
03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63971 and previous config saved to /var/cache/conftool/dbconfig/20240604-031117-marostegui.json
03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63970 and previous config saved to /var/cache/conftool/dbconfig/20240604-030906-marostegui.json
03:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
03:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.8 refs T361402
00:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
00:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
00:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63969 and previous config saved to /var/cache/conftool/dbconfig/20240604-002119-ladsgroup.json
00:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63968 and previous config saved to /var/cache/conftool/dbconfig/20240604-000612-ladsgroup.json

2024-06-03

23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63967 and previous config saved to /var/cache/conftool/dbconfig/20240603-235104-ladsgroup.json
23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63966 and previous config saved to /var/cache/conftool/dbconfig/20240603-233555-ladsgroup.json
23:14 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "Extension:DynamicPageList (Wikimedia)" "Extension:DynamicPageList" "Zabe" --reason "per request phab:T366488T366488"
23:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
23:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
23:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63965 and previous config saved to /var/cache/conftool/dbconfig/20240603-231424-marostegui.json
22:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63963 and previous config saved to /var/cache/conftool/dbconfig/20240603-225916-marostegui.json
22:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63962 and previous config saved to /var/cache/conftool/dbconfig/20240603-224408-marostegui.json
22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63961 and previous config saved to /var/cache/conftool/dbconfig/20240603-222900-marostegui.json
22:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63960 and previous config saved to /var/cache/conftool/dbconfig/20240603-222607-marostegui.json
22:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63959 and previous config saved to /var/cache/conftool/dbconfig/20240603-222524-marostegui.json
22:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63958 and previous config saved to /var/cache/conftool/dbconfig/20240603-221016-marostegui.json
21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63957 and previous config saved to /var/cache/conftool/dbconfig/20240603-215508-marostegui.json
21:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63956 and previous config saved to /var/cache/conftool/dbconfig/20240603-214000-marostegui.json
21:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63955 and previous config saved to /var/cache/conftool/dbconfig/20240603-212040-ladsgroup.json
21:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63954 and previous config saved to /var/cache/conftool/dbconfig/20240603-211312-root.json
21:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63953 and previous config saved to /var/cache/conftool/dbconfig/20240603-210532-ladsgroup.json
20:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63952 and previous config saved to /var/cache/conftool/dbconfig/20240603-205806-root.json
20:51 urbanecm@deploy1002: Finished scap: Backport for gerrit:1037600Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), gerrit:1038424Enable night theme on pages which have no color contrast issues (T366370) (duration: 14m 57s)
20:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63951 and previous config saved to /var/cache/conftool/dbconfig/20240603-205024-ladsgroup.json
20:43 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
20:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63950 and previous config saved to /var/cache/conftool/dbconfig/20240603-204300-root.json
20:39 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for gerrit:1037600Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), gerrit:1038424Enable night theme on pages which have no color contrast issues (T366370) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:36 urbanecm@deploy1002: Started scap: Backport for gerrit:1037600Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), gerrit:1038424Enable night theme on pages which have no color contrast issues (T366370)
20:36 urbanecm@deploy1002: Finished scap: Backport for gerrit:1034882EventLogging: Enable IP reputation logging (T354597), [[gerrit:1037897|[trwiki] Allow translator group to publish translation only in Extension:ContentTranslation]], [[gerrit:1038247|[trwiki] Reducing count edits ip and newbie per minute (T330811)]] (duration: 30m 14s)
20:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63949 and previous config saved to /var/cache/conftool/dbconfig/20240603-203514-ladsgroup.json
20:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63948 and previous config saved to /var/cache/conftool/dbconfig/20240603-202754-root.json
20:27 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Continuing with sync
20:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63947 and previous config saved to /var/cache/conftool/dbconfig/20240603-201248-root.json
20:10 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Backport for gerrit:1034882EventLogging: Enable IP reputation logging (T354597), [[gerrit:1037897|[trwiki] Allow translator group to publish translation only in Extension:ContentTranslation]], [[gerrit:1038247|[trwiki] Reducing count edits ip and newbie per minute (T330811)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 urbanecm@deploy1002: Started scap: Backport for gerrit:1034882EventLogging: Enable IP reputation logging (T354597), [[gerrit:1037897|[trwiki] Allow translator group to publish translation only in Extension:ContentTranslation]], [[gerrit:1038247|[trwiki] Reducing count edits ip and newbie per minute (T330811)]]
19:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63946 and previous config saved to /var/cache/conftool/dbconfig/20240603-195742-root.json
19:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63945 and previous config saved to /var/cache/conftool/dbconfig/20240603-194236-root.json
18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63944 and previous config saved to /var/cache/conftool/dbconfig/20240603-183029-marostegui.json
18:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
18:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63943 and previous config saved to /var/cache/conftool/dbconfig/20240603-183006-marostegui.json
18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63942 and previous config saved to /var/cache/conftool/dbconfig/20240603-181459-marostegui.json
17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63941 and previous config saved to /var/cache/conftool/dbconfig/20240603-175951-marostegui.json
17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63940 and previous config saved to /var/cache/conftool/dbconfig/20240603-174442-marostegui.json
17:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1002.eqiad.wmnet|wikikube-worker1003.eqiad.wmnet|wikikube-worker1007.eqiad.wmnet|wikikube-worker1004.eqiad.wmnet),cluster=kubernetes,service=kubesvc
17:27 claime: Pooling and uncordoning wikikube-worker1002.eqiad.wmnet,wikikube-worker1003.eqiad.wmnet,wikikube-worker1007.eqiad.wmnet,wikikube-worker1004.eqiad.wmnet - T351074
17:19 claime: homer 'cr*eqiad*' commit 'T351074'
17:18 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T351074'
17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T35107
17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
17:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
17:15 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
16:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
16:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
16:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
16:18 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1003.eqiad.wmnet with OS bullseye
15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1004.eqiad.wmnet with OS bullseye
15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1002.eqiad.wmnet with OS bullseye
15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212', diff saved to https://phabricator.wikimedia.org/P63939 and previous config saved to /var/cache/conftool/dbconfig/20240603-155048-root.json
15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
15:43 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342 (duration: 00m 05s)
15:43 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342
15:42 jhathaway: deploying more restrictive SPF & DMARC settings for wikipedia.org
15:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-c2-codfw.mgmt.codfw.wmnet
15:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
15:30 dancy@deploy1002: sync-world aborted: testing (duration: 00m 00s)
15:30 dancy@deploy1002: Started scap: testing
15:27 dancy@mwmaint1002: scap failed: FileNotFoundError [Errno 2] No such file or directory: '/etc/helmfile-defaults/mediawiki-deployments.yaml' (duration: 00m 00s)
15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1004.eqiad.wmnet with OS bullseye
15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1003.eqiad.wmnet with OS bullseye
15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1002.eqiad.wmnet with OS bullseye
15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
15:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
15:03 dancy@deploy1002: Installing scap version "4.84.0" for 297 hosts
15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1490 to wikikube-worker1007
15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1007
15:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:00 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-c2-codfw.mgmt.codfw.wmnet
15:00 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1007
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
14:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
14:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
14:57 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
14:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1490 to wikikube-worker1007
14:54 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 08s)
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1443 to wikikube-worker1004
14:54 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1004
14:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1004
14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
14:53 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
14:53 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342
14:52 Dreamy_Jazz: Afternoon UTC backport window done
14:52 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
14:51 dreamyjazz@deploy1002: Finished scap: Backport for gerrit:1038310Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) (duration: 12m 04s)
14:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1443 to wikikube-worker1004
14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1427 to wikikube-worker1003
14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1003
14:43 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
14:42 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1003
14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
14:41 dreamyjazz@deploy1002: dreamyjazz: Backport for gerrit:1038310Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:41 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
14:39 dreamyjazz@deploy1002: Started scap: Backport for gerrit:1038310Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473)
14:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1427 to wikikube-worker1003
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1426 to wikikube-worker1002
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1002
14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1002
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
14:34 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
14:33 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
14:33 vgutierrez: repool text@ulsfo with IPIP encapsulation enabled - T366466
14:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1012.eqiad.wmnet with OS bullseye
14:31 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
14:31 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
14:30 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
14:30 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2001.codfw.wmnet with OS bookworm
14:29 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:29 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1426 to wikikube-worker1002
14:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1010.eqiad.wmnet with OS bullseye
14:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bookworm
14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1358 to wikikube-worker1001
14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1358 to wikikube-worker1001
14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:12 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
14:09 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
14:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
14:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
14:05 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
14:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
14:01 tgr@deploy1002: Finished scap: Backport for [[gerrit:1037896|[trwiki] Create translator group (T356440)]] (duration: 23m 15s)
13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
13:58 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008 - T366466
13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63937 and previous config saved to /var/cache/conftool/dbconfig/20240603-135634-ladsgroup.json
13:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63936 and previous config saved to /var/cache/conftool/dbconfig/20240603-135612-ladsgroup.json
13:54 vgutierrez: re-enable puppet on "A:cp-text_ulsfo" - T366466
13:50 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2001.codfw.wmnet with OS bookworm
13:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bookworm
13:49 vgutierrez: re-enable puppet on "A:cp-text and not A:cp-text_ulsfo" - T366466
13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1012.eqiad.wmnet with OS bullseye
13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1010.eqiad.wmnet with OS bullseye
13:44 tgr@deploy1002: gergesshamon and tgr: Continuing with sync
13:41 vgutierrez: disable puppet on A:cp-text before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1038294/ - T366466
13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63935 and previous config saved to /var/cache/conftool/dbconfig/20240603-134104-ladsgroup.json
13:40 tgr@deploy1002: gergesshamon and tgr: Backport for [[gerrit:1037896|[trwiki] Create translator group (T356440)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 tgr@deploy1002: Started scap: Backport for [[gerrit:1037896|[trwiki] Create translator group (T356440)]]
13:36 vgutierrez: depool text@ulsfo before enabling IPIP encapsulation - T366466
13:32 tgr@deploy1002: Finished scap: Backport for [[gerrit:1035726|[Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892)]], [[gerrit:1036313|[multiversion] Add 'manage-dblist init-labs' subcommand]], [[gerrit:1037887|[arwiki] add ipblock-exempt to bot group (T366404)]] (duration: 19m 07s)
13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63934 and previous config saved to /var/cache/conftool/dbconfig/20240603-132556-ladsgroup.json
13:23 tgr@deploy1002: sgimeno and gergesshamon and tgr: Continuing with sync
13:20 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS bookworm
13:16 tgr@deploy1002: sgimeno and gergesshamon and tgr: Backport for [[gerrit:1035726|[Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892)]], [[gerrit:1036313|[multiversion] Add 'manage-dblist init-labs' subcommand]], [[gerrit:1037887|[arwiki] add ipblock-exempt to bot group (T366404)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:13 tgr@deploy1002: Started scap: Backport for [[gerrit:1035726|[Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892)]], [[gerrit:1036313|[multiversion] Add 'manage-dblist init-labs' subcommand]], [[gerrit:1037887|[arwiki] add ipblock-exempt to bot group (T366404)]]
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63933 and previous config saved to /var/cache/conftool/dbconfig/20240603-131048-ladsgroup.json
13:08 moritzm: uploaded intel-microcode 3.20240312.1~deb11u1 to apt.wikimedia.org (import from bullseye-proposed-updates, to be coupled with forthcoming reboots)
13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
13:03 Emperor: depool moss-fe2001 with a view to returning it to apus T279621
13:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
13:02 Emperor: depool moss-fe1001 with a view to returning it to apus T279621
13:00 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
12:55 Emperor: depool/restart swift-proxy/repool ms-fe10{09,11,12,14} due to rising connection failures T360913
12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63932 and previous config saved to /var/cache/conftool/dbconfig/20240603-124628-marostegui.json
12:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
12:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63931 and previous config saved to /var/cache/conftool/dbconfig/20240603-124605-marostegui.json
12:45 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bookworm
12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1002.eqiad.wmnet with OS bookworm
12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63930 and previous config saved to /var/cache/conftool/dbconfig/20240603-123057-marostegui.json
12:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
12:20 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63929 and previous config saved to /var/cache/conftool/dbconfig/20240603-121549-marostegui.json
12:06 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1002.eqiad.wmnet with OS bookworm
12:03 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1037942Enable numeric sorting for Persian (T329440) (duration: 12m 07s)
12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63928 and previous config saved to /var/cache/conftool/dbconfig/20240603-120041-marostegui.json
11:54 ladsgroup@deploy1002: ebrahim and ladsgroup: Continuing with sync
11:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:53 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:53 ladsgroup@deploy1002: ebrahim and ladsgroup: Backport for gerrit:1037942Enable numeric sorting for Persian (T329440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:51 ladsgroup@deploy1002: Started scap: Backport for gerrit:1037942Enable numeric sorting for Persian (T329440)
11:35 effie: restart memcached on mc1050 and mc2050
11:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63927 and previous config saved to /var/cache/conftool/dbconfig/20240603-113447-ladsgroup.json
11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:27 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:26 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bookworm
11:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host snapshot1013.eqiad.wmnet
11:07 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
11:04 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63926 and previous config saved to /var/cache/conftool/dbconfig/20240603-105416-marostegui.json
10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host snapshot1013.eqiad.wmnet
10:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63925 and previous config saved to /var/cache/conftool/dbconfig/20240603-105352-marostegui.json
10:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bookworm
10:41 moritzm: installing linux 5.10.218 security updates
10:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1038.eqiad.wmnet with OS bookworm
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63924 and previous config saved to /var/cache/conftool/dbconfig/20240603-103844-marostegui.json
10:29 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1013.eqiad.wmnet with OS bullseye
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63923 and previous config saved to /var/cache/conftool/dbconfig/20240603-102335-marostegui.json
10:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
10:18 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63922 and previous config saved to /var/cache/conftool/dbconfig/20240603-100827-marostegui.json
10:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bookworm
10:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
09:58 ladsgroup@deploy1002: Finished scap: Backport for gerrit:1038243Stop writing to the old pagelinks columns in s8 (T352010) (duration: 18m 39s)
09:57 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
09:56 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
09:49 jiji@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host mc-gp2001.codfw.wmnet with OS bookworm
09:45 ladsgroup@deploy1002: ladsgroup: Continuing with sync
09:43 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1013.eqiad.wmnet with OS bullseye
09:42 ladsgroup@deploy1002: ladsgroup: Backport for gerrit:1038243Stop writing to the old pagelinks columns in s8 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1039.eqiad.wmnet with OS bookworm
09:40 ladsgroup@deploy1002: Started scap: Backport for gerrit:1038243Stop writing to the old pagelinks columns in s8 (T352010)
09:31 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
09:29 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
09:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
09:22 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
09:10 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2001.codfw.wmnet with OS bookworm
09:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1039.eqiad.wmnet with OS bookworm
09:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:08 jiji@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc1039.eqiad.wmnet']
08:49 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2002.codfw.wmnet with OS bookworm
08:45 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1003.eqiad.wmnet with OS bookworm
08:15 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887 (duration: 00m 05s)
08:15 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887
08:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1003.eqiad.wmnet with OS bookworm
08:09 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2002.codfw.wmnet with OS bookworm
08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887 (duration: 00m 05s)
08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887
08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887 (duration: 00m 08s)
08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887
08:04 jiji@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc1039.eqiad.wmnet']
07:32 kartik@deploy1002: Finished scap: Backport for gerrit:1037949testwiki: Fix language for nan in Section Translation (duration: 28m 37s)
07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63920 and previous config saved to /var/cache/conftool/dbconfig/20240603-072513-marostegui.json
07:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
07:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63919 and previous config saved to /var/cache/conftool/dbconfig/20240603-072450-marostegui.json
07:22 kartik@deploy1002: kartik: Continuing with sync
07:18 kartik@deploy1002: kartik: Backport for gerrit:1037949testwiki: Fix language for nan in Section Translation synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63918 and previous config saved to /var/cache/conftool/dbconfig/20240603-070942-marostegui.json
07:04 kartik@deploy1002: Started scap: Backport for gerrit:1037949testwiki: Fix language for nan in Section Translation
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63917 and previous config saved to /var/cache/conftool/dbconfig/20240603-065434-marostegui.json
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63916 and previous config saved to /var/cache/conftool/dbconfig/20240603-063925-marostegui.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63915 and previous config saved to /var/cache/conftool/dbconfig/20240603-063814-marostegui.json
06:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63914 and previous config saved to /var/cache/conftool/dbconfig/20240603-063735-marostegui.json
06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63913 and previous config saved to /var/cache/conftool/dbconfig/20240603-062227-marostegui.json
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63912 and previous config saved to /var/cache/conftool/dbconfig/20240603-061956-root.json
06:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63911 and previous config saved to /var/cache/conftool/dbconfig/20240603-060719-marostegui.json
06:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63910 and previous config saved to /var/cache/conftool/dbconfig/20240603-060450-root.json
05:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63909 and previous config saved to /var/cache/conftool/dbconfig/20240603-055210-marostegui.json
05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63908 and previous config saved to /var/cache/conftool/dbconfig/20240603-054944-root.json
05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63907 and previous config saved to /var/cache/conftool/dbconfig/20240603-053438-root.json
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63906 and previous config saved to /var/cache/conftool/dbconfig/20240603-051932-root.json
05:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63905 and previous config saved to /var/cache/conftool/dbconfig/20240603-050424-root.json
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63904 and previous config saved to /var/cache/conftool/dbconfig/20240603-044918-root.json
04:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63903 and previous config saved to /var/cache/conftool/dbconfig/20240603-011839-marostegui.json
01:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
01:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63902 and previous config saved to /var/cache/conftool/dbconfig/20240603-011813-marostegui.json
01:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
01:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
01:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63901 and previous config saved to /var/cache/conftool/dbconfig/20240603-010925-ladsgroup.json
01:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63900 and previous config saved to /var/cache/conftool/dbconfig/20240603-010305-marostegui.json
01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63899 and previous config saved to /var/cache/conftool/dbconfig/20240603-005415-ladsgroup.json
00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63898 and previous config saved to /var/cache/conftool/dbconfig/20240603-004757-marostegui.json
00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63897 and previous config saved to /var/cache/conftool/dbconfig/20240603-003907-ladsgroup.json
00:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63896 and previous config saved to /var/cache/conftool/dbconfig/20240603-003247-marostegui.json
00:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63895 and previous config saved to /var/cache/conftool/dbconfig/20240603-002359-ladsgroup.json
00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-02

23:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63894 and previous config saved to /var/cache/conftool/dbconfig/20240602-232847-marostegui.json
23:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
23:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
23:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
20:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:47 taavi@cumin1002: dbctl commit (dc=all): 'depool db1213', diff saved to https://phabricator.wikimedia.org/P63893 and previous config saved to /var/cache/conftool/dbconfig/20240602-204719-taavi.json
20:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63892 and previous config saved to /var/cache/conftool/dbconfig/20240602-200046-marostegui.json
20:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63891 and previous config saved to /var/cache/conftool/dbconfig/20240602-200021-marostegui.json
20:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63890 and previous config saved to /var/cache/conftool/dbconfig/20240602-194514-marostegui.json
19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63889 and previous config saved to /var/cache/conftool/dbconfig/20240602-193006-marostegui.json
19:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63888 and previous config saved to /var/cache/conftool/dbconfig/20240602-191458-marostegui.json
19:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63887 and previous config saved to /var/cache/conftool/dbconfig/20240602-185215-ladsgroup.json
18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
18:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63886 and previous config saved to /var/cache/conftool/dbconfig/20240602-185151-ladsgroup.json
18:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63885 and previous config saved to /var/cache/conftool/dbconfig/20240602-183643-ladsgroup.json
18:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63884 and previous config saved to /var/cache/conftool/dbconfig/20240602-182135-ladsgroup.json
18:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63883 and previous config saved to /var/cache/conftool/dbconfig/20240602-180627-ladsgroup.json
18:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63882 and previous config saved to /var/cache/conftool/dbconfig/20240602-144924-marostegui.json
14:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
14:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63881 and previous config saved to /var/cache/conftool/dbconfig/20240602-144900-marostegui.json
14:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63880 and previous config saved to /var/cache/conftool/dbconfig/20240602-143352-marostegui.json
14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63879 and previous config saved to /var/cache/conftool/dbconfig/20240602-141843-marostegui.json
14:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63878 and previous config saved to /var/cache/conftool/dbconfig/20240602-141139-root.json
14:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63877 and previous config saved to /var/cache/conftool/dbconfig/20240602-140334-marostegui.json
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63876 and previous config saved to /var/cache/conftool/dbconfig/20240602-135632-root.json
13:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63875 and previous config saved to /var/cache/conftool/dbconfig/20240602-134126-root.json
13:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63874 and previous config saved to /var/cache/conftool/dbconfig/20240602-132620-root.json
13:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63873 and previous config saved to /var/cache/conftool/dbconfig/20240602-131114-root.json
13:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63872 and previous config saved to /var/cache/conftool/dbconfig/20240602-125608-root.json
12:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63871 and previous config saved to /var/cache/conftool/dbconfig/20240602-124102-root.json
12:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63870 and previous config saved to /var/cache/conftool/dbconfig/20240602-120033-ladsgroup.json
12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63869 and previous config saved to /var/cache/conftool/dbconfig/20240602-120010-ladsgroup.json
11:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63868 and previous config saved to /var/cache/conftool/dbconfig/20240602-114503-ladsgroup.json
11:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63867 and previous config saved to /var/cache/conftool/dbconfig/20240602-112955-ladsgroup.json
11:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63866 and previous config saved to /var/cache/conftool/dbconfig/20240602-112512-marostegui.json
11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63865 and previous config saved to /var/cache/conftool/dbconfig/20240602-111447-ladsgroup.json
11:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63864 and previous config saved to /var/cache/conftool/dbconfig/20240602-111004-marostegui.json
10:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63863 and previous config saved to /var/cache/conftool/dbconfig/20240602-105456-marostegui.json
10:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63862 and previous config saved to /var/cache/conftool/dbconfig/20240602-103948-marostegui.json
10:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63861 and previous config saved to /var/cache/conftool/dbconfig/20240602-091021-marostegui.json
09:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63860 and previous config saved to /var/cache/conftool/dbconfig/20240602-090941-marostegui.json
09:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63859 and previous config saved to /var/cache/conftool/dbconfig/20240602-085433-marostegui.json
08:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63858 and previous config saved to /var/cache/conftool/dbconfig/20240602-083925-marostegui.json
08:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P63856 and previous config saved to /var/cache/conftool/dbconfig/20240602-072956-root.json
07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63855 and previous config saved to /var/cache/conftool/dbconfig/20240602-033618-marostegui.json
03:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
03:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63854 and previous config saved to /var/cache/conftool/dbconfig/20240602-033555-marostegui.json
03:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63853 and previous config saved to /var/cache/conftool/dbconfig/20240602-032047-marostegui.json
03:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63852 and previous config saved to /var/cache/conftool/dbconfig/20240602-030539-marostegui.json
03:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63851 and previous config saved to /var/cache/conftool/dbconfig/20240602-025039-ladsgroup.json
02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63850 and previous config saved to /var/cache/conftool/dbconfig/20240602-025031-marostegui.json
02:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
02:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63849 and previous config saved to /var/cache/conftool/dbconfig/20240602-025015-ladsgroup.json
02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63848 and previous config saved to /var/cache/conftool/dbconfig/20240602-023507-ladsgroup.json
02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63847 and previous config saved to /var/cache/conftool/dbconfig/20240602-022710-marostegui.json
02:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
02:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63846 and previous config saved to /var/cache/conftool/dbconfig/20240602-022646-marostegui.json
02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63845 and previous config saved to /var/cache/conftool/dbconfig/20240602-021959-ladsgroup.json
02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63844 and previous config saved to /var/cache/conftool/dbconfig/20240602-021137-marostegui.json
02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63843 and previous config saved to /var/cache/conftool/dbconfig/20240602-020451-ladsgroup.json
02:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63842 and previous config saved to /var/cache/conftool/dbconfig/20240602-015629-marostegui.json
01:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63841 and previous config saved to /var/cache/conftool/dbconfig/20240602-014121-marostegui.json
01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-01

23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63839 and previous config saved to /var/cache/conftool/dbconfig/20240601-215534-marostegui.json
21:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
21:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
21:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63838 and previous config saved to /var/cache/conftool/dbconfig/20240601-201053-ladsgroup.json
20:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
20:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63837 and previous config saved to /var/cache/conftool/dbconfig/20240601-201029-ladsgroup.json
19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63836 and previous config saved to /var/cache/conftool/dbconfig/20240601-195521-ladsgroup.json
19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63835 and previous config saved to /var/cache/conftool/dbconfig/20240601-194013-ladsgroup.json
19:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63834 and previous config saved to /var/cache/conftool/dbconfig/20240601-192505-ladsgroup.json
19:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63833 and previous config saved to /var/cache/conftool/dbconfig/20240601-174133-marostegui.json
17:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63832 and previous config saved to /var/cache/conftool/dbconfig/20240601-172625-marostegui.json
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63831 and previous config saved to /var/cache/conftool/dbconfig/20240601-172455-marostegui.json
17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63830 and previous config saved to /var/cache/conftool/dbconfig/20240601-172432-marostegui.json
17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63829 and previous config saved to /var/cache/conftool/dbconfig/20240601-171116-marostegui.json
17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63828 and previous config saved to /var/cache/conftool/dbconfig/20240601-170924-marostegui.json
17:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63827 and previous config saved to /var/cache/conftool/dbconfig/20240601-165609-marostegui.json
16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63826 and previous config saved to /var/cache/conftool/dbconfig/20240601-165416-marostegui.json
16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63825 and previous config saved to /var/cache/conftool/dbconfig/20240601-163907-marostegui.json
16:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1010.eqiad.wmnet with OS bullseye
13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
13:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63824 and previous config saved to /var/cache/conftool/dbconfig/20240601-125216-marostegui.json
12:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63823 and previous config saved to /var/cache/conftool/dbconfig/20240601-125152-marostegui.json
12:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63822 and previous config saved to /var/cache/conftool/dbconfig/20240601-123644-marostegui.json
12:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63821 and previous config saved to /var/cache/conftool/dbconfig/20240601-122136-marostegui.json
12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63820 and previous config saved to /var/cache/conftool/dbconfig/20240601-120628-marostegui.json
12:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
11:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63819 and previous config saved to /var/cache/conftool/dbconfig/20240601-095545-ladsgroup.json
09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:36 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
07:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:20 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
07:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63818 and previous config saved to /var/cache/conftool/dbconfig/20240601-071723-marostegui.json
07:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
07:17 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
07:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63817 and previous config saved to /var/cache/conftool/dbconfig/20240601-071700-marostegui.json
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63816 and previous config saved to /var/cache/conftool/dbconfig/20240601-070211-marostegui.json
07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63815 and previous config saved to /var/cache/conftool/dbconfig/20240601-070151-marostegui.json
07:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
06:59 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
06:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63814 and previous config saved to /var/cache/conftool/dbconfig/20240601-064643-marostegui.json
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63813 and previous config saved to /var/cache/conftool/dbconfig/20240601-063135-marostegui.json
06:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
04:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
04:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:44 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:44 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:39 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:34 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:19 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:17 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:50 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:50 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63812 and previous config saved to /var/cache/conftool/dbconfig/20240601-021256-marostegui.json
02:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
02:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63811 and previous config saved to /var/cache/conftool/dbconfig/20240601-021233-marostegui.json
02:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63810 and previous config saved to /var/cache/conftool/dbconfig/20240601-015725-marostegui.json
01:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:51 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63809 and previous config saved to /var/cache/conftool/dbconfig/20240601-014216-marostegui.json
01:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:32 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:32 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63808 and previous config saved to /var/cache/conftool/dbconfig/20240601-012708-marostegui.json
01:26 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:26 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:22 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:22 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:20 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:20 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63807 and previous config saved to /var/cache/conftool/dbconfig/20240601-010959-ladsgroup.json
01:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63806 and previous config saved to /var/cache/conftool/dbconfig/20240601-005451-ladsgroup.json
00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63805 and previous config saved to /var/cache/conftool/dbconfig/20240601-003943-ladsgroup.json
00:38 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:38 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63804 and previous config saved to /var/cache/conftool/dbconfig/20240601-002435-ladsgroup.json
00:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:11 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:11 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:09 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:09 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply

Server Admin Log

2024-06-28

2024-06-27

2024-06-26

2024-06-25

2024-06-24

2024-06-23

2024-06-22

2024-06-21

2024-06-20

2024-06-19

2024-06-18

2024-06-17

2024-06-16

2024-06-15

2024-06-14

2024-06-13

2024-06-12

2024-06-11

2024-06-10

2024-06-09

2024-06-08

2024-06-07

2024-06-06

2024-06-05

2024-06-04

2024-06-03

2024-06-02

2024-06-01

Archives