# Rapid Updates

Rapid Updates delivers SUSE RPM security updates to Cray Systems Management (CSM)
Non-Compute Node (NCN) images. These updates are intended to be used out of band
from CSM's official releases, supporting CSM installations starting with
1.5.

- [Overview](#overview)
- [Preparation](#preparation)
- [Installation](#installation)
    - [Collect system information](#collect-system-information)
    - [Configure rapid updates to build new NCN images](#configure-rapid-updates-to-build-new-ncn-images)
    - [Configure rapid updates to update running NCNs](#configure-rapid-updates-to-update-running-ncns)
    - [Execute the rapid updates playbook](#execute-the-rapid-updates-playbook)
- [Boot new NCN images (if new images were built)](#boot-new-ncn-images-if-new-images-were-built)
    - [Rebuild nodes](#rebuild-nodes)
        - [Rebuild storage nodes](#rebuild-storage-nodes)
        - [Rebuild worker nodes](#rebuild-worker-nodes)
        - [Rebuild master nodes](#rebuild-master-nodes)

## Overview

Rapid Updates performs the following core functions:

- Creates an RPM repository in Nexus
- Creates a CFS Ansible repository in Gitea named "csm-security"

Optionally, the Rapid Updates tooling will also:

- Create backups of existing CFS configurations to the current local directory
- Create new CFS configurations that incorporate the new "csm-security" layer
- Execute CFS jobs to build new images that include the security updates
- Executes CFS jobs to update running NCNs with the security updates (optional)

## Preparation

1. (`workstation#`) Set the target server.

   ```bash
   server=<ncn-m001 target system name or IP address>
   ```

1. (`workstation#`) Check that the `ceph-rbd` mount for CSM is present.

   Ensure that `/etc/cray/upgrade/csm/` is mounted using the command below
   (the output should indicate `ceph` as the type):

   ```bash
   ssh root@$server' mount | grep /etc/cray/upgrade/csm'
   ```

1. (`workstation#`) Create csm-security upgrade directory.

   ```bash
   ssh root@$server 'mkdir -p /etc/cray/upgrade/csm/csm-security'
   ```

1. (`workstation#`) Upload the tarball to `ncn-m001` on the target system.

   ```bash
   scp security-update-00000003-1.tar.gz \
     root@$server:/etc/cray/upgrade/csm/csm-security/
   ```

1. (`workstation#`) SSH into `ncn-m001`.

   ```bash
   ssh root@$server
   ```

1. (`ncn-m001#`) Unpack the tarball.

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/
   ```

   ```bash
   tar -xvzf security-update-00000003-1.tar.gz
   ```

   ```bash
   cd SUSE-15.6-security-00000003-1/
   ```

   Unless noted otherwise, run subsequent steps from this directory.

1. (`ncn-m001#`) Review the patch-manifest file.

   The `patch-manifest` file contains a list of SUSE patch names included in this
   security update.

   ```bash
   more patch-manifest
   ```

1. (`ncn-m001#`) Review the RPMs that will be installed by this security update.

   RPMs are listed under the "rpms" key for each node type.

   ```bash
   more cfs/vars/cfs-vars.yml
   ```

   The `cfs/vars/cfs-vars.yml` YAML file controls which packages will be targeted
   by this rapid update.

   ***Important notes:***

   - The csm-security CFS layer will exclude Kernel packages during node personalization;
      **Kernel packages are only installed during image customization**.
   - Image customization will include these packages, and all previously installed
     security updates that have a matching 15.6.

## Installation

At a minimum, the Rapid Updates CFS configuration will perform the following:

- Create and populate a new csm-security repository with the update's included packages
- Push CFS Ansible content to the `csm-security` Git repository

The Rapid Updates

- Upload-only: perform the uploads and repository/config pushes now, and build
  images later. No configuration is required. Proceed to
  [Execute the rapid updates playbook](#execute-the-rapid-updates-playbook).
- Build new NCN images: collect system info, configure image customization, and
  run the playbook to build new images.
- Update running NCNs: collect system info, configure node personalization
  and run the playbook to update selected NCNs.

If you are building new NCN images or updating running NCNs, proceed to
[Collect system information](#collect-system-information).

### Collect system information

1. (`ncn-m001#`) This step will create a file named `system-info.json` in the current directory.
   This file will contain information required for the rapid update process, such as the installed
   CSM versions, IMS image IDs, CFS configuration names, hostnames, and xnames.
   The information contained in `system-info.json` is specific to your system.

   Collect system information:

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/SUSE-15.6-security-00000003-1
   ```

   ```bash
   scripts/collect-system-info.sh
   ```

1. (`ncn-m001#`) Review `system-info.json`.

   This step will aid in determining which IMS images and which CFS
   configurations will be used to build new NCN images and which running
   nodes will be updated.

   `system-info.json` can be viewed by executing the following command:

   ```bash
   more system-info.json
   ```

   The information contained in the following example will differ from the information in the
   `system-info.json` file. The `system-info.json` file is site-specific. and will contain
   the IMS image IDs, hostnames, xnames, and CFS configurations **currently in use** by the
   system's NCNs. Additionally, the file contains the installed CSM versions and the
   original image IDs for the "secure-kubernetes" and "secure-storage-ceph" images.

   > ***NOTE*** If this CSM security update contains Kernel packages, the original CSM image IDs
   > (found in the product catalog) must be used! Failure to do so may result in undesirable
   > Kernel module failures between additional products (e.g., DVS).

   CSM security updates that do not include a Kernel may use the image IDs in the **ims_image_id** key
   as the base images.

   The CFS configurations named in the **current_cfs_config** key will be cloned and modified to include the security updates.

   `system-info.json` abbreviated example:

   ```json
   {
     "management_nodes": [
       {
         "shasta_role": "ncn-master",
         "nodes": [
           {
             "ims_image_id": "3e0ba0a2-2884-494a-bd9f-60b493fe9ed6",
             "hostname": "ncn-m001",
             "xname": "x3000c0s1b0n0",
             "current_cfs_config": "management-release-25.1-1929745"
           }
         ]
       },
       {
         "shasta_role": "ncn-storage",
         "nodes": [
           {
             "ims_image_id": "c5e412e1-1803-4c7c-94b5-daaae3c464fa",
             "hostname": "ncn-s001",
             "xname": "x3000c0s7b0n0",
             "current_cfs_config": "storage-release-25.1-1929745"
           }
         ]
       },
       {
         "shasta_role": "ncn-worker",
         "nodes": [
           {
             "ims_image_id": "2c2e0235-36e8-4e2b-a7f8-5ba7622c1fbb",
             "hostname": "ncn-w001",
             "xname": "x3000c0s13b0n0",
             "current_cfs_config": "management-release-25.1-1929745"
           }
         ]
       }
     ],
     "csm_versions": {
       "1.6.0": {
         "images": {
           "secure-kubernetes-6.2.30-x86_64.squashfs": {
             "id": "ddfb32a0-b93e-4429-9558-935df61970ef"
           },
           "secure-storage-ceph-6.2.30-x86_64.squashfs": {
             "id": "7b4911ef-e637-4e18-8181-ffe029095ed7"
           }
         }
       }
     }
   }
   ```

   The next step depends on the desired path for the CSM security updates:

   - To build new images that include the security updates, proceed to [Configure rapid updates to build new NCN images](#configure-rapid-updates-to-build-new-ncn-images)
   - To update running nodes with the security updates, proceed to [Configure rapid updates to update running NCNs](#configure-rapid-updates-to-update-running-ncns)

### Configure rapid updates to build new NCN images

The following steps will guide you through the process of configuring rapid updates to add CFS
configurations and build new NCN images. You may build all images at once or you may build
new images for each NCN type (master, worker, storage) separately. The
`pb_security_install.yml` playbook can be run multiple times.

1. (`ncn-m001#`) Edit the **vars/cfs-session-vars.yml** configuration file.

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/SUSE-15.6-security-00000003-1
   ```

   ```bash
   vim vars/cfs-session-vars.yml
   ```

   Enable image customization for the images you would like to build.

   ```yaml
   cfs:
     update_cfs_configurations: true
     image_customization_master:
       run: true
     image_customization_worker:
       run: true
     image_customization_storage:
       run: true
   ```

   The variables in the following example are system-specific. Refer to system-info.json for insight into determining the
   appropriate values.

   The CFS configuration names and IMS image IDs listed below are examples; please consult the system administrator for
   assistance in determining the appropriate values for the targeted nodes.

   Example configuration:

   ```yaml
   cfs:
     update_cfs_configurations: true
     ansible_passthrough: ""
     ansible_verbosity: 2
     node_personalization:
       run: false
       limit:
         - ""
     image_customization_master:
       run: true
       cfs_configurations:
         - "management-main-2113495"
       target:
         definition: "image"
         groups:
           - name: "Management_Master"
             members:
               - "3ea3bf16-0e70-49c7-af62-b827876cc6e6"
     image_customization_worker:
       run: true
       cfs_configurations:
         - "management-main-2113495"
       target:
         definition: "image"
         groups:
           - name: "Management_Worker"
             members:
               - "8c3047bf-51fd-4c51-aef8-178536c89563"
     image_customization_storage:
       run: true
       cfs_configurations:
         - "storage-main-2113495"
       target:
         definition: "image"
         groups:
           - name: "Management_Storage"
             members:
               - "2485079f-8a2e-410d-8dc1-51efe21ba6b8"
   ```

   To continue with the image building process, proceed to [Execute the rapid updates playbook](#execute-the-rapid-updates-playbook)

### Configure rapid updates to update running NCNs

This section includes instructions for updating running NCNs without rebooting.

> ***NOTE*** If this security update includes Kernel packages please refer to 
> [Configure rapid updates to build new NCN images](#configure-rapid-updates-to-build-new-ncn-images).

1. (`ncn-m001#`) Edit the **vars/cfs-session-vars.yml** configuration file.

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/SUSE-15.6-security-00000003-1
   ```

   ```bash
   vim vars/cfs-session-vars.yml
   ```

   One or more xnames can be specified in the `node_personalization.limit` list.
   The xnames must match one or more xnames from `system-info.json`.

   Example config for updating running nodes:

   ```yaml
   cfs:
     update_cfs_configurations: true
     node_personalization:
       run: true
       limit:
         - "x3000c0s4b0n0"
         - "x3000c0s7b0n0"
         - "x3000c0s1b0n0"
   ```

   To continue with the update running nodes process, proceed to [Execute the rapid updates playbook](#execute-the-rapid-updates-playbook)

### Execute the rapid updates playbook

The behavior of the `pb_security_install.yml` playbook is determined by the configuration
performed in the previous steps.

1. (`ncn-m001#`) Execute the rapid updates playbook.

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/SUSE-15.6-security-00000003-1
   ```

   ```bash
   ./pb_security_install.yml
   ```

1. (`ncn-m001#`) Review the playbook output.

   - If `vars/cfs-session-vars.yml` was not configured to build new images or
   update running nodes, none of the following output will be present and none of the
   following steps are relevant. The RPM's delivered by this security update have been
   pushed to Nexus and the CFS Ansible content has been pushed to the `csm-security`.

   - If `vars/cfs-session-vars.yml` was configured to build new images or update
   running nodes, the playbook will print the name of each CFS job created when
   it completes.

   Example playbook output:

   ```text
   TASK [Show current env file]
   ok: [localhost] =>
     msg: |-
       env/csm-security.env:

       MASTER_SESSION_NAME=security-master-fw6vjb
       WORKER_SESSION_NAME=security-worker-jj6po2
       STORAGE_SESSION_NAME=security-storage-jhb76g
       MASTER_CFS_CONFIG=management-release-25.1-1929745-2
       WORKER_CFS_CONFIG=management-release-25.1-1929745-2
       STORAGE_CFS_CONFIG=storage-release-25.1-1929745-2
       NODE_PERSONALIZATION_SESSION_NAME=csm-security-s9wh4z
       MASTER_IMAGE_ID=17b92b82-b00b-4038-baff-f881055fb737
       STORAGE_IMAGE_ID=03d25dcc-90ee-49f2-b8c9-455546090bec
       WORKER_IMAGE_ID=60a1f955-a79a-45cf-a19d-1bef7150b53d

   TASK [Show created CFS sessions]
   ok: [localhost] =>
     msg: security-storage-jhb76g
   ```

   **Note:** If `vars/cfs-session-vars.yml` was configured to build new images,
   the resulting image names in IMS will have the CFS session name as a suffix.

1. (`ncn-m001#`) Load session variables.

   The playbook writes the CFS configuration names and CFS session names to `env/csm-security.env`.
   Source this file to load the variables into your shell.

   ```bash
   source env/csm-security.env
   ```

1. (`ncn-m001#`) Check the status of the CFS sessions created by the playbook.

   The CFS sessions will take some time to complete. The playbook will not wait
   for the CFS sessions to complete before exiting. You can check the status of
   the CFS sessions by executing the following commands.

   ```bash
   cray cfs sessions describe "${MASTER_SESSION_NAME}" --format json | jq -r '.status.session'
   cray cfs sessions describe "${WORKER_SESSION_NAME}" --format json | jq -r '.status.session'
   cray cfs sessions describe "${STORAGE_SESSION_NAME}" --format json | jq -r '.status.session'
   ```

   Proceed when the `status` is `complete` and `succeeded` is `true`.

   **You must wait for the CFS sessions to complete successfully before proceeding to the next step.**

1. (`ncn-m001#`) Get the new IMS image IDs and set required variables.

   After the CFS sessions complete, the new IMS image IDs will be available in the CFS session information.
   Set the following image IDs for the images you built and append them to `env/csm-security.env` for later use.

   Master image ID:

   ```bash
   MASTER_IMAGE_ID=$(cray cfs sessions describe "${MASTER_SESSION_NAME}" --format json | jq -r '.status.artifacts[0].result_id')
   echo "MASTER_IMAGE_ID=${MASTER_IMAGE_ID}" >> env/csm-security.env
   ```

   Worker image ID:

   ```bash
   WORKER_IMAGE_ID=$(cray cfs sessions describe "${WORKER_SESSION_NAME}" --format json | jq -r '.status.artifacts[0].result_id')
   echo "WORKER_IMAGE_ID=${WORKER_IMAGE_ID}" >> env/csm-security.env
   ```

   Storage image ID:

   ```bash
   STORAGE_IMAGE_ID=$(cray cfs sessions describe "${STORAGE_SESSION_NAME}" --format json | jq -r '.status.artifacts[0].result_id')
   echo "STORAGE_IMAGE_ID=${STORAGE_IMAGE_ID}" >> env/csm-security.env
   ```

## Boot new NCN images (if new images were built)

If the rapid update process was configured to build new images, the new NCN images can now be booted.

**Note:** Rebuilding NCNs will reboot them and may impact
workloads. Plan a maintenance window accordingly.

### Rebuild nodes

The following rebuild process should be performed to boot into the new images built in the steps above.

#### Rebuild storage nodes

If you wish to monitor the boot output of each node you're upgrading, perform
the steps outlined in
**/usr/share/doc/csm/operations/conman/Establish_a_Serial_Connection_to_NCNs.md**
in a separate terminal before each node you upgrade.

The following steps should be performed on every storage node, **one at a time**.

1. (`ncn-m001#`) Source the environment file.

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/SUSE-15.6-security-00000003-1
   ```

   ```bash
   source env/csm-security.env
   ```

1. (`ncn-m001#`) Set the storage node hostname for the node that will be booted into a new image

   ```bash
   TARGET_NCN=ncn-s00X
   ```

1. (`ncn-m001#`) Set variable needed by subsequent steps.

   ```bash
   export TOKEN=$(curl -k -s -S -d grant_type=client_credentials \
   -d client_id=admin-client \
   -d client_secret=$(kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d) \
   https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
   ```

1. (`ncn-m001#`) Set variable needed by subsequent steps.

   ```bash
   TARGET_XNAME=$(curl -s -k -H "Authorization: Bearer ${TOKEN}" "https://api-gw-service-nmn.local/apis/sls/v1/search/hardware?extra_properties.Role=Management" | \
        jq -r ".[] | select(.ExtraProperties.Aliases[] | contains(\"$TARGET_NCN\")) | .Xname")
   ```

1. (`ncn-m001#`) Set `rd.live.dir`

   ```bash
   csi handoff bss-update-param --set rd.live.dir="${STORAGE_IMAGE_ID}" --limit "${TARGET_XNAME}"
   ```

1. (`ncn-m001#`) Upgrade the storage node to the new image.

   This step will reboot the storage node into the new image.

   ```bash
   /usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-worker-storage-nodes.sh "${TARGET_NCN}" \
   --upgrade \
   --image-id "${STORAGE_IMAGE_ID}" \
   --desired-cfs-conf "${STORAGE_CFS_CONFIG}"
   ```

1. (`ncn-m001#`) Once the upgrade has succeeded, remove old images on the storage node.

   ```bash
   ssh "${TARGET_NCN}" '/srv/cray/scripts/metal/cleanup-live-images.sh -y -o'
   ```

1. (`ncn-m001#`) Validate the health of the various subsystems.

   ```bash
   /opt/cray/tests/install/ncn/automated/ncn-healthcheck-storage
   ```

1. (`ncn-m001#`) Repeat the above steps for each storage node.

   [Rebuild storage nodes](#rebuild-storage-nodes)

#### Rebuild worker nodes

If you wish to monitor the boot output of each node you're upgrading, perform
the steps outlined in
**/usr/share/doc/csm/operations/conman/Establish_a_Serial_Connection_to_NCNs.md**
in a separate terminal before each node you upgrade.

The following steps should be performed on every worker node **one at a time**.

1. (`ncn-m001#`) Source the environment file.

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/SUSE-15.6-security-00000003-1
   ```

   ```bash
   source env/csm-security.env
   ```

1. (`ncn-m001#`) Set the worker node hostname for the node that will be booted into a new image.

   ```bash
   TARGET_NCN=ncn-w00X
   ```

1. (`ncn-m001#`) Set variable needed by subsequent steps.

   ```bash
   export TOKEN=$(curl -k -s -S -d grant_type=client_credentials \
   -d client_id=admin-client \
   -d client_secret=$(kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d) \
   https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
   ```

1. (`ncn-m001#`) Set variable needed by subsequent steps.

   ```bash
   TARGET_XNAME=$(curl -s -k -H "Authorization: Bearer ${TOKEN}" "https://api-gw-service-nmn.local/apis/sls/v1/search/hardware?extra_properties.Role=Management" | \
        jq -r ".[] | select(.ExtraProperties.Aliases[] | contains(\"$TARGET_NCN\")) | .Xname")
   ```

1. (`ncn-m001#`) Upgrade the worker node to the new image.

   This step will reboot the worker node into the new image.

   ```bash
   /usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-worker-storage-nodes.sh "${TARGET_NCN}" \
   --upgrade \
   --image-id "${WORKER_IMAGE_ID}" \
   --desired-cfs-conf "${WORKER_CFS_CONFIG}"
   ```

1. (`ncn-m001#`) Validate the health of the various subsystems.

   ```bash
   /opt/cray/tests/install/ncn/automated/ncn-healthcheck-worker
   ```

1. (`ncn-m001#`) Repeat the above steps for each worker node.

   [Rebuild worker nodes](#rebuild-worker-nodes)

#### Rebuild master nodes

If you wish to monitor the boot output of each node you're upgrading, perform
the steps outlined in
**/usr/share/doc/csm/operations/conman/Establish_a_Serial_Connection_to_NCNs.md**
in a separate terminal before each node you upgrade.

> **`NOTES:`**
>
> - Perform these steps from `ncn-m002` when rebuilding `ncn-m001`.
> - Perform these steps from `ncn-m001` when rebuilding master nodes other than `ncn-m001`.
> - Rebuild master nodes one at a time.

1. (`ncn-m001#`) Source the environment file.

   ```bash
   cd /etc/cray/upgrade/csm/csm-security/SUSE-15.6-security-00000003-1
   ```

   ```bash
   source env/csm-security.env
   ```

1. (`ncn-m#`) Set the master node hostname for the node that will be booted into a new image.

   ```bash
   TARGET_NCN=ncn-m00X
   ```

1. (`ncn-m#`) Set variable needed by subsequent steps.

   ```bash
   export TOKEN=$(curl -k -s -S -d grant_type=client_credentials \
   -d client_id=admin-client \
   -d client_secret=$(kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d) \
   https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
   ```

1. (`ncn-m#`) Set variable needed by subsequent steps.

   ```bash
   TARGET_XNAME=$(curl -s -k -H "Authorization: Bearer ${TOKEN}" "https://api-gw-service-nmn.local/apis/sls/v1/search/hardware?extra_properties.Role=Management" | \
        jq -r ".[] | select(.ExtraProperties.Aliases[] | contains(\"$TARGET_NCN\")) | .Xname")
   ```

1. (`ncn-m#`) Update BSS `bootparameters` to include the new image IDs for the target master node.

   ```bash
   /usr/share/doc/csm/scripts/operations/node_management/assign-ncn-images.sh -p "${MASTER_IMAGE_ID}" "${TARGET_XNAME}"
   ```

1. (`ncn-m#`) Update the desired CFS configuration.

   ```bash
   cray cfs components update "${TARGET_XNAME}" --desired-config "${MASTER_CFS_CONFIG}"
   ```

1. (`ncn-m#`) Master node rebuilds require environment variables `CSM_RELEASE` and
   `CSM_ARTI_DIR` to be set on the node where the rebuild script is executed.

   Set `CSM_RELEASE` and `CSM_ARTI_DIR`. Replace `1.6.0` with the correct CSM release:

   ```bash
   export CSM_RELEASE=1.6.0
   export CSM_ARTI_DIR="/etc/cray/upgrade/csm/csm-${CSM_RELEASE}"
   ```

   > **`NOTES:`**
   >
   > - If `/etc/cray/upgrade/csm/` is empty, create it, then download and extract
       >   the CSM tarball there.
   > - Update `CSM_ARTI_DIR` with the directory created above.

1. (`ncn-m#`) Rebuild the desired master node.

   If `ncn-m001` is being rebuilt, all of the steps above should have been performed on `ncn-m002`.
   Confirm that all of the above steps were performed on `ncn-m002` before running the following
   script.

   ```bash
   /usr/share/doc/csm/upgrade/scripts/rebuild/ncn-rebuild-master-nodes.sh "${TARGET_NCN}"
   ```

   Repeat [Rebuild master nodes](#rebuild-master-nodes) for each master node.
