2
0
mirror of https://github.com/xcat2/xcat-core.git synced 2026-05-17 19:57:18 +00:00
Commit Graph

26999 Commits

Author SHA1 Message Date
Vinícius Ferrão 86f6a12264 fix: set IPMI name-only lookup bit in RAKP1 to match ipmitool
Set bit 4 (0x10) of the requested privilege byte in RAKP Message 1
for name-only user lookup, matching ipmitool behavior. Use the same
value consistently in all HMAC calculations (RAKP2 verification,
RAKP3 auth code, SIK derivation).

Without this, some BMCs fail user lookup with "Unauthorized name"
even though the credentials are correct.

Ref: #7511
2026-05-06 01:25:55 -03:00
Vinícius Ferrão 2bcdc52f92 fix: accept RMCP message tag 0 from OpenBMC with session ID correlation
OpenBMC-based BMCs return message tag 0 in RAKP2/RAKP4 instead of
echoing the tag from the request. xCAT rejected these as stale
responses and retried indefinitely until timeout.

Accept tag 0 but verify the remote console session ID in the response
matches our current sidm. This prevents stale retries from corrupting
session state while allowing OpenBMC responses through.

Applied to got_rmcp_response, got_rakp2, and got_rakp4.

Ref: #7511
2026-05-06 01:25:09 -03:00
Vinícius Ferrão cb2a6b3f3c fix: reject IPMI packets with invalid CBC padding instead of crashing
cbc_pad in decrypt mode reads the last byte as the pad count, then
calls splice(@block, 0 - $count). If decrypted data is corrupt, the
pad count can exceed the array size, crashing with "Modification of
non-creatable array value attempted, subscript -16".

Return empty string on invalid padding so the caller treats it as a
decryption failure rather than accepting corrupted data as a valid
IPMI response.

Ref: #7511
2026-05-06 01:23:10 -03:00
Markus Hilger b006975b54 Merge pull request #7551 from VersatusHPC/fix/sles-legacy-validation
fix: restore legacy SLES provisioning paths
2026-05-06 02:49:34 +02:00
Vinícius Ferrão 9f33b19214 fix: restore legacy SLES provisioning paths 2026-05-05 17:09:37 -03:00
Markus Hilger 5b12eecd40 Merge pull request #7548 from VersatusHPC/fix/update-cuda-docs
docs: update NVIDIA CUDA documentation for modern OS support
2026-05-05 09:30:21 +02:00
Markus Hilger 2618b532c5 Merge pull request #7547 from VersatusHPC/fix/dhcp-dynamic-range-overlap
fix: errors out when node IP overlaps DHCP dynamic range
2026-05-05 09:28:23 +02:00
Markus Hilger 472529e046 Merge pull request #7546 from VersatusHPC/fix/remove-hardcoded-quiet
fix: move kernel quiet flag from hardcoded to osimage default
2026-05-05 09:26:26 +02:00
Vinícius Ferrão 60820b1abe docs: update NVIDIA CUDA documentation for modern OS support
The CUDA docs were frozen at CUDA 9.2 / RHEL 7.5 / Ubuntu 14.04 since
2019. Update to cover all currently supported OS and architecture
combinations (EL 7-10, Ubuntu 20.04-24.04, x86_64/ppc64le/sbsa).

Consolidate the version-specific repo and osimage pages into generic
guides that use placeholder variables, reducing 7 files to 2 while
covering more OS versions. Both online (direct NVIDIA repo URL) and
offline (dnf download / apt download mirroring) workflows are
documented.

All NVIDIA repository URLs validated against
developer.download.nvidia.com/compute/cuda/repos/ and confirmed
accessible with valid repodata.

Addresses #7373
2026-05-05 02:32:09 -03:00
Vinícius Ferrão bb8dd525da Error when node IP overlaps DHCP dynamic range
Previously, makedhcp warned but still created host entries without
a static IP reservation when a node's address fell inside the
dynamic range. The node would silently get a random IP from the
pool instead of its configured address.

Now errors and skips the node on all four DHCP paths (ISC v4/v6,
Kea v4/v6) with a clear message telling the admin to move the IP
outside the range or adjust the dynamic range.

This makes ISC DHCP and Kea behavior consistent and aligns with
xCAT's design: the dynamic range is for hardware discovery,
known nodes should have static IPs outside it.

Closes #6539
2026-05-05 00:48:30 -03:00
Vinícius Ferrão 03a16dd081 fix: move kernel quiet flag from hardcoded to osimage default
The quiet kernel parameter was hardcoded in anaconda.pm and sles.pm,
making it impossible for admins to get verbose boot output without
editing plugin source code. The existing addkcmdline mechanism
(bootparams and linuximage tables) only appends to the kernel command
line, so there was no way to remove quiet.

Move quiet out of the plugin kcmdline construction and into the
linuximage.addkcmdline default set during copycds osimage creation.
Admins who want verbose boot for debugging can now remove it per
osimage:

    chdef -t osimage <image> addkcmdline=""

New osimages get addkcmdline="quiet" by default. Existing osimages
with a custom addkcmdline are not overwritten on re-run of copycds.

Genesis/discovery boot (mknb.pm) is unchanged as it does not use
osimage definitions.

Addresses #6916
2026-05-04 22:58:52 -03:00
Markus Hilger a51a4d7710 Merge pull request #7543 from VersatusHPC/fix/systemd-xcatd
feat: Use systemd instead of legacy initscripts
2026-05-05 01:38:37 +02:00
Markus Hilger e65b968000 Merge pull request #7545 from VersatusHPC/fix/nodeset-empty-master-ip
fix: fail nodeset when MASTER_IP cannot be resolved
2026-05-05 01:34:25 +02:00
Vinícius Ferrão bfbc48c698 fix: fail nodeset when MASTER_IP cannot be resolved
Template.pm silently continued rendering kickstart templates when
getipaddr() failed to resolve the master hostname, producing
kickstarts with an empty MASTER_IP. Nodes would install successfully
but fail on first reboot when post.xcat and xcatinstallpost tried
to contact the master, timing out after 90 retries with:

    the network between the node and  is not ready

Postage.pm (mypostscript generation) already checks for this and
returns a clear error. Apply the same pattern in Template.pm so
nodeset fails immediately with a descriptive message instead of
producing a broken kickstart.

Fixes #7544
2026-05-04 18:52:13 -03:00
Vinícius Ferrão 7897f30bfe Modernize xcatd service packaging 2026-05-04 18:13:23 -03:00
Markus Hilger d5831828d6 Merge pull request #7533 from VersatusHPC/fix/opensuse-leap-support
feat: add openSUSE Leap 15 and SLES 15 provisioning support
2026-05-04 17:20:59 +02:00
Vinícius Ferrão 88da644249 Merge pull request #7532 from VersatusHPC/fix/el10-netboot-dhcp-client
fix: use NetworkManager for EL10 netboot DHCP instead of dhclient
2026-05-04 17:20:11 +02:00
Markus Hilger c7915645b3 Merge pull request #7541 from VersatusHPC/fix/ipmi-rspconfig-set-readback
Improve rspconfig SET readback and fix backupgateway SET target
2026-05-04 17:19:38 +02:00
Markus Hilger 679bed8926 Merge pull request #7542 from VersatusHPC/fix/apache-disable-directory-indexing
fix: disable Apache directory indexing on /install and /tftpboot
2026-05-04 17:18:39 +02:00
Markus Hilger 2bdb0d4d02 Merge pull request #7540 from VersatusHPC/fix/remove-docker-lifecycle
fix: remove Docker container lifecycle management (dead code since 2016)
2026-05-04 17:15:58 +02:00
Vinícius Ferrão 5035697e9b fix: disable Apache directory indexing on /install and /tftpboot
The default xCAT Apache configuration shipped with Options Indexes
enabled for the /install and /tftpboot directories. This allowed
unauthenticated users to browse directory listings, disclosing the
full tree of postscripts, boot files, and (in production deployments)
potentially kickstart files with password hashes, custom scripts with
embedded credentials, and cluster topology details.

Replace Options Indexes with -Indexes in all four shipped Apache config
files (MN and SN, Apache 2.2 and 2.4 variants). Direct file access
by known path continues to work, so all provisioning workflows are
unaffected. Directory browsing for /xcat-doc is preserved as it
contains only public documentation.

Additionally, add an Apache hardening guide documenting recommended
permissions for sensitive directories under /install, network binding
best practices, and IP-based access control options.

Addresses #7450
2026-05-03 23:01:01 -03:00
Vinícius Ferrão d71c7f7ac6 Improve rspconfig SET readback and fix backupgateway SET target
On some BMCs (notably Supermicro), a GET immediately after SET
returns the old value until the BMC applies the change. This made
rspconfig output misleading for network setting operations.

- Store the canonical SET value after normalization and compare
  with the GET readback for ip, netmask, gateway, and backupgateway.
  When they differ, annotate the output:
  "BMC Gateway: 10.20.0.1 (requested 10.20.0.254, not yet reflected)"
- Consolidate ip/netmask/gateway/backupgateway display into one block
- Fix backupgateway SET: was routed through the gateway branch
  writing parameter 0x0C instead of 0x0E. Now has its own branch
  writing the correct IPMI parameter.
- ip=dhcp is unaffected (separate code path, never stores a value)

Tested on Supermicro IPMI BMC (10.20.0.51).

Fixes #3445
2026-05-03 21:01:42 -03:00
Markus Hilger ddd7f8da3f Merge pull request #7539 from VersatusHPC/fix/ipmi-vlan-disable
fix: IPMI VLAN disable
2026-05-03 20:10:47 +02:00
Markus Hilger 1c132aab49 Merge pull request #7538 from VersatusHPC/feat/openbmc-rspconfig-user-snmp
feat: add OpenBMC rspconfig user and alert support
2026-05-03 20:09:35 +02:00
Vinícius Ferrão 4165b26a04 fix: remove Docker container lifecycle management (dead code since 2016)
Docker container lifecycle management (mgt=docker, mkdocker, rmdocker,
lsdocker) was added in 2015-2016 as an experiment targeting Docker API
v1.22 on Ubuntu only. Documentation and man pages were deliberately
removed in 2019 (PRs #6222 and #6324) with the original developer's
approval, noting that "the interface of Docker has become very simple
right now, so there is no value for xCAT to offer such functions."

The plugin was still being shipped but has had no functional code changes
since April 2016, was never listed as a valid mgt value in Schema.pm,
and no user ever filed an issue about it.

Removed:
- xCAT-server/lib/xcat/plugins/docker.pm (1,142 lines)
- xCAT/postscripts/setupdockerhost
- xCAT-server/share/xcat/scripts/setup-dockerhost-cert.sh
- xCAT-test/autotest/testcase/dockercommand/ (test cases)
- Docker attribute definitions in Schema.pm
- Client symlinks (mkdocker, rmdocker, lsdocker)
- Usage entries and dockerhost cert handling in credentials.pm
- Docker attribute documentation in man7 pages

The "Running xCAT in Docker" documentation (dockerized_xcat/) is
retained as it documents containerizing xCAT itself, not the removed
mgt=docker feature.

Closes #7518
2026-05-03 12:11:33 -03:00
Vinícius Ferrão 2fa7fca1ad Allow rspconfig to disable VLAN on IPMI BMCs
rspconfig vlan= only accepted values 1-4096 with no way to disable
VLAN tagging. Users had to resort to raw IPMI commands to clear a
stale VLAN after ip=dhcp.

- Accept vlan=off/disable/disabled to clear VLAN tagging via
  standard IPMI parameter 0x14 with the enable bit unset
- Fix valid range from 1-4096 to 1-4094 (IEEE 802.1Q)
- Use strict digit matching to reject malformed inputs

To clear VLAN after a DHCP reset: rspconfig <node> vlan=off

Tested on Supermicro IPMI BMC (10.20.0.51).

Partially addresses #3725
2026-05-03 12:04:21 -03:00
Vinícius Ferrão 40977b717f Fix alert handler precedence and tighten input matching in setnetinfo
Two pre-existing bugs in the alert on/off conditions:

1. Operator precedence: 'and' with 'or' without parens caused any
   subcommand with argument matching /^en/ or /^dis/ to silently
   trigger the alert handler.

2. Loose prefix matching: /^en/ and /^dis/ accepted typos like
   "enterprise" or "discover". Replace with exact token matching
   while preserving the "en"/"dis" abbreviations used by snmpmon.pm.
2026-05-03 12:04:07 -03:00
Vinícius Ferrão 260ce4420d Add OpenBMC rspconfig user and alert support 2026-05-03 01:34:56 -03:00
Markus Hilger 0d4182c7d9 Merge pull request #7536 from VersatusHPC/fix/noderange-fork-stale-cache
fix: invalidate NodeRange caches inherited across fork
2026-05-03 02:41:55 +02:00
Vinícius Ferrão d455b82b1a fix: silent failure with no site master attribute (#7537)
* Fix silent failure when site.master is not set (#6157)

Hardware control commands (rpower, rinv, etc.) silently return no output
and exit 0 when site.master is empty. The original fix (#6074) was
reverted (#6158) because it warned per-node with the wrong hostname.

Check once in plugin_command before dispatching to plugins, so the error
appears exactly once with the correct command name.

* Also reject empty site.master, not only undef
2026-05-03 02:39:04 +02:00
Vinícius Ferrão a6145b402b Merge pull request #7534 from VersatusHPC/fix/el10-bios-stateful-biosboot
fix: add EL10 BIOS boot partition
2026-05-03 02:36:39 +02:00
Markus Hilger b1b0ca0396 Merge pull request #7535 from VersatusHPC/fix/plugin-error-message
fix: misleading plugin error message
2026-05-03 02:35:33 +02:00
Vinícius Ferrão f139904c3e fix: invalidate NodeRange caches inherited across fork
xcatd forks child processes to handle plugin requests. The child
inherits NodeRange.pm's module-level caches (@allnodeset, %allgrphash,
@grplist) with their timestamps from the parent. If the parent had
populated these caches within the past 5 seconds, the child reuses
stale data that does not reflect database changes committed by other
requests that completed between cache population and the fork.

This causes non-deterministic failures in group-definition regression
tests (chdef_group, mkdef_group, rmdef_group) where lsdef -s runs
noderange expansion inside the forked plugin process and hits the
inherited stale cache that predates the mkdef -t group commit.

Track the PID at cache-build time and treat any cache built by a
different PID as expired, forcing a fresh database read in children.
2026-05-02 19:27:10 -03:00
Vinícius Ferrão b10865c5d4 Keep plugin bug label for XS crashes without $@
The else branch handles a rare case where XS libraries (Sys::Virt,
Net::SNMP) crash without setting $@. This IS a plugin bug, so keep
that label and the debug hint. Only the common case (die with $@)
gets the clean passthrough.
2026-05-02 17:09:54 -03:00
Vinícius Ferrão 34406828b9 Pass through actual error instead of generic "plugin bug" message
When a plugin dies during request processing, xcatd wrapped the error
in a misleading "plugin bug" message that hid the real cause (e.g.
"No space left on device"). Now passes through the actual error from
the eval, making the output useful for any failure, not just disk full.

Fixes #2719
2026-05-02 17:06:18 -03:00
Vinícius Ferrão 5aa1cda179 feat: add openSUSE Leap 15 provisioning support 2026-05-02 16:57:46 -03:00
Vinícius Ferrão 1f9173f07a Fix some EL9 and EL10 provisioning gaps (#7530)
* Fix EL9 and EL10 provisioning gaps
2026-05-02 04:27:54 +02:00
Markus Hilger d168dcad30 Merge pull request #7529 from VersatusHPC/fix/ubuntu-2604-support
Add Ubuntu 26.04 provisioning support
2026-05-01 23:25:09 +02:00
Vinícius Ferrão 023beff053 Add Ubuntu 26.04 provisioning support 2026-05-01 11:13:45 -03:00
Markus Hilger 10c13a3635 Merge pull request #7528 from VersatusHPC/fix/ubuntu-lts-provisioning-clean
fix: improve Ubuntu LTS provisioning support
2026-05-01 01:39:11 +02:00
Markus Hilger 568f1b8a44 Merge pull request #7523 from VersatusHPC/fix/makentp-insecure-config
fix: harden makentp generated ntp.conf
2026-04-30 10:22:50 +02:00
Vinícius Ferrão 1babd7b0e4 fix: improve Ubuntu LTS provisioning support 2026-04-29 18:19:12 -03:00
Markus Hilger d7748b6e3a Merge pull request #7525 from VersatusHPC/kea-uefi-reservation-policy
Fix Kea UEFI reservation boot policy
2026-04-29 11:30:11 +02:00
Markus Hilger 733d076127 Merge pull request #7527 from VersatusHPC/fix/copycds-strip-alternate-suffix
fix: strip redundant alternate suffix from RHEL distnames in copycds
2026-04-29 11:29:26 +02:00
Markus Hilger 40de13dab8 Merge pull request #7526 from VersatusHPC/bump-actions-checkout-v6
Bump actions/checkout v4 to v6 for Node.js 24 compatibility
2026-04-29 11:28:07 +02:00
Vinícius Ferrão ee26cf3f8f fix: strip redundant alternate suffix from RHEL distnames in copycds
RHEL 7 shipped ppc64le ISOs under the "alternate" label, causing
copycds to create distro paths like rhels7.6-alternate/ppc64le.
This mismatched osver() which returns rhels7.6 since /etc/os-release
has no knowledge of the alternate designation.

The architecture (ppc64le vs ppc64) already differentiates the
builds, making the alternate suffix redundant. Strip it during
auto-detection so copycds paths match osver() output.

Fixes #5593
2026-04-28 17:06:32 -03:00
Vinícius Ferrão 8d6eb04daf Bump actions/checkout v4 to v6 for Node.js 24 compatibility 2026-04-28 16:02:41 -03:00
Vinícius Ferrão a716e8ff90 Fix DHCP CI package layout 2026-04-28 15:47:42 -03:00
Vinícius Ferrão 2c7fa228e7 retrigger CI 2026-04-28 11:37:03 -03:00
Vinícius Ferrão 0f606615b3 Fix Kea UEFI reservation boot policy 2026-04-28 03:35:53 -03:00