2
0
mirror of https://github.com/xcat2/confluent.git synced 2026-05-08 09:40:13 +00:00
Commit Graph

1063 Commits

Author SHA1 Message Date
Jarrod Johnson f587539c2a Fix missing ubuntu diskless content 2026-05-01 12:13:24 -04:00
Jarrod Johnson d10e49ed0d Bring chrony fixes to other scripts 2026-04-30 11:17:01 -04:00
Jarrod Johnson 98cbd7581a Fix diskless profiles for chrony.conf modification 2026-04-30 10:44:28 -04:00
Timothy Middelkoop a3f40e2982 Fix el8/el9 hook paths corrupted by symlinked el10 in aarch64 spec
In confluent_osdeploy-aarch64.spec.tmpl, el10 was created as a symlink
to el8, so the subsequent `mv el10/initramfs/usr el10/initramfs/var`
inadvertently renamed el8's usr directory, leaving el8 and el9 (also
symlinked to el8) with hooks at var/lib/dracut/hooks/ instead of
usr/lib/dracut/hooks/. Rocky 9 dracut never found the hooks and dropped
to the emergency shell on all aarch64 nodes.

Use `cp -a el8 el10` as the x86_64 spec already does, so the rename
only affects the el10 copy.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Timothy Middelkoop <tmiddelkoop@internet2.edu>
2026-04-29 16:36:23 -05:00
Jarrod Johnson 069338baf3 Write to stdout as binary
This allows better redirection.

In python3, must write to sys.stdout.buffer.  AttributeError for the unlikely event of a python2 based node being deployed.
2026-04-28 08:16:05 -04:00
Jarrod Johnson d97eba787d Fix mistake in spec file 2026-04-24 09:29:57 -04:00
Jarrod Johnson 260443c1d6 Add Ubuntu 26.04 2026-04-24 08:35:27 -04:00
xu_ren_xian f269200004 Handle confluent= boot arg and IPv4 NIC autodetect
Add support for a confluent=<host> kernel argument in init-premount: configure networking, flush interfaces, autodetect the primary NIC (saved to /tmp/autodetectnic), verify TLS connectivity to the provided server, call the whoami endpoint over TLS to obtain the node name, and write results to /custom-installation/confluent/confluent.info (with fallback to copernicus on failure).

Also update casper-bottom logic to handle IPv4 manager addresses: for IPv6 the manager is still bracketed and scoped interface resolved as before; for IPv4 the script now uses the previously detected NIC (/tmp/autodetectnic) or falls back to an `ip route get <mgr>` lookup to determine DEVICE. This ensures routed IPv4 deployments work correctly.
2026-04-23 23:23:26 +08:00
Jarrod Johnson a123165712 Improve error when unknown user specified in syncfiles 2026-04-02 15:29:31 -04:00
Jarrod Johnson b91b10552c EL10 doesn't do setgid keysign
chmod 600 instead
2026-03-25 12:59:40 -04:00
Jarrod Johnson 779b07d2c2 Only try to use ssh_keys if it exists
EL10 changed from using ssh_keys and setgid to just
do setuid root instead.
2026-03-25 12:56:16 -04:00
Jarrod Johnson 9b00fe5521 Don't try to open a file that doesn't exist 2026-03-17 13:03:18 -04:00
Jarrod Johnson 13a6444541 Fix incorrectly matching older versions as 'el10' 2026-03-17 12:58:04 -04:00
Jarrod Johnson 21c9158491 Carry forward some dns attributes into a bond 2026-01-21 15:12:23 -05:00
Jarrod Johnson 61d7a49163 Revert "Fallback to filename for PE format kernels"
This reverts commit a0a5887214.
2026-01-15 14:29:31 -05:00
Jarrod Johnson a0a5887214 Fallback to filename for PE format kernels
Some ARM64 kernels ship as EFI executables, but it's
not obvious how to extract version numbers from those properly.
2026-01-15 13:27:21 -05:00
Jarrod Johnson a4229fc58d Change name to index in apiclient
confignet was using the index for ipv4
2025-12-12 11:18:33 -05:00
Jarrod Johnson 31c1a865dc Update confignet to match apiclient changes 2025-12-12 09:30:56 -05:00
Jarrod Johnson d7577a04a7 Fix ESXi compatibility of apiclient
apiclient was using Linux specific network  information.

Change to libc getifaddrs for better cross-platform compatibility.
2025-12-11 08:46:19 -05:00
Jarrod Johnson b72d6c9cfc Fix typo 2025-12-10 14:14:14 -05:00
Jarrod Johnson 523c93dfc3 Tolerate more network circumstances in bluefield deploy
If the networking didn't come up well, the 'functions' routines would not be able to handle.

Switch to using apiclient which is designed specifically to handle less cooperative
initial network conditions.
2025-12-09 08:49:27 -05:00
Jarrod Johnson 2464e0ff4f Fix location of the apiclient common resource 2025-12-02 14:35:50 -05:00
Jarrod Johnson 3cbac38d57 Also autoconsole when exactly one serial port is detected at all. 2025-11-25 11:53:50 -05:00
Jarrod Johnson 224f349053 Extend autocons to more use cases
If SPCR comes up blank, see if there is one and exactly one serial with carrier detect

Failing that, give DMI a chance to indicate a preference, for now just SuperMicro, since they have the most
inconsistent carrier detect behavior
but almost always consider ttyS1 to be the answer.
2025-11-25 11:51:07 -05:00
Jarrod Johnson a3b768c70f Draft bluefield deploymeent facilities 2025-11-20 16:44:24 -05:00
Jarrod Johnson 041008a524 Remove redundant el10 initramfs fixup 2025-11-19 15:37:29 -05:00
Jarrod Johnson 100944490c Fix potentially uninitialized curridx 2025-11-17 15:07:17 -05:00
Jarrod Johnson d20c5ac6eb Move handling of the loop directio straight to onboot
There were difficulties in the devfs after
boot, just let the full system handle it.
2025-11-13 15:33:04 -05:00
Jarrod Johnson 4484216198 Fix issues with the tethered memory optimizations 2025-11-13 15:24:26 -05:00
Jarrod Johnson e1efd6a9c5 Implement new 'uncompressed' image method
This allows the FS to just live, uncompressed, in cache.

This is generally a bad idea, however:

- In a hypothetically super-tuned diskless image, the lack of double-cache can offset the lack of compression
- The image will have supreme read performance
- It will have the most deterministic memory behavior
2025-11-13 14:39:53 -05:00
Jarrod Johnson 58d5209595 Port tethered improvments to EL8 2025-11-13 14:35:18 -05:00
Jarrod Johnson 53c918042a Remove double-caching in tethered diskless
By default, the squashfs file was being cached as well as the contents after extraction.

This is superfluous pressure on the cache of the OS.

However, it does help keep the image afloat through 'confignet', so
leave it on until onboot completes, then reclaim cache and disable further caching.
2025-11-13 14:28:25 -05:00
Jarrod Johnson 20292cdfd0 Do not let diskless.conf persist into EL9 diskless images
It fouls run of kdump building the kdump image.
2025-11-07 13:22:21 -05:00
Jarrod Johnson 21155d2091 Bring untethered changes to el10 diskless 2025-11-04 11:17:28 -05:00
Jarrod Johnson 6c0d7ea60e Simplify end untethered el9 diskless environment
Rather than treat both as the same, since untethered has everything up front anyway, go ahead and extract the filesystem.

This makes the mount look more straightforward and makes it so deletion of files from
the image also frees ram.
2025-11-04 11:14:52 -05:00
Jarrod Johnson 36687069aa Fix ESXi8 deployment
The changes for getinstalldisk assumed functionality
in ESXi9.  Target older
functional level for our purposes.

Also expand the fallback to cover cases where the disk interrogation fails.
2025-10-21 11:11:52 -04:00
Jarrod Johnson 11ff2dabfc Clean up kickstart networking
Try to apply hostname through localcli, since
hostname is unsupported through net if dhcp.

Also more affirimatively indicate dhcp.
2025-10-17 10:00:38 -04:00
Jarrod Johnson f9351484a4 Add fallback if getinstalldisk detects no preferred disks 2025-10-17 09:32:33 -04:00
Jarrod Johnson b22c17208a Stop preferring HWE for now
The HWE has some missing hardware support, ironically...
2025-10-16 18:30:46 -04:00
Jarrod Johnson a43d7e11e2 Implement an esxi getinstalldisk 2025-10-15 10:43:36 -04:00
Jarrod Johnson 2d29813320 Store device for future use in ubuntu deployment 2025-10-02 14:28:46 -04:00
Jarrod Johnson a9d15de156 Rework Ubuntu identity image DHCP bringup
The stock Ubuntu approach was inadequate.  It would DHCP out every nic and take the fastest result, and no going back.

Now the CDC nic can frequently win that race.

First, rmmod cdc_ether, as a scenario that is completely right out.

But beyond that, let Ubuntu have one shot at multi-nic bringup.  Beyond that, maintain a list of all link-up devices.

If the check should fail, then start doing one nic at a time, cycling through them.

Also, the openssl s_client timeout is painfully slow, use subshell and kill to speed up things.
2025-10-02 10:55:43 -04:00
Jarrod Johnson a4ba92a2e7 Retry network bringup
ESXi may be slow in being ready for network bringup. Workaround
by retrying.
2025-10-01 13:08:17 -04:00
Jarrod Johnson 6938bba2d3 Have confignet pause until connectivity restored
If we are reconfiguring network for a diskless node, wait for
things to settle back in before continuing.
2025-09-26 13:42:29 -04:00
Jarrod Johnson 871685ea20 Correct missing closure of if 2025-09-25 15:49:25 -04:00
Jarrod Johnson a480cc73df Add connectivity check to esxi ident bringup
If using the identity image bringup
with dhcp, be more careful about waiting
for connectivity before proceeding.
2025-09-25 15:29:33 -04:00
Jarrod Johnson 39eb32df38 Test connection on net cfg apply
When network configuration is applied, wait until we
can reach the deployment server again before exiting.

This should make us more robust against various potential delays after
changing the nature of network interfaces.
2025-09-25 15:18:18 -04:00
Jarrod Johnson f66093680b Attempt to loop on reconfiguring networking
This may induce DHCP to be retried
2025-09-25 10:08:05 -04:00
Jarrod Johnson d7879bad5b Improve robustness of Ubuntu net bringup
If using DHCP, have the loop to validate connectivity repeat.
2025-09-19 15:44:55 -04:00
Jarrod Johnson 8911193aca Implement a test with retry for basic communication
confuesbox is likely to be a very early utility, and the relevant network is at high risk of being merely 'partially' up.
2025-09-19 11:50:12 -04:00