Please review design for automated server installs

mwhudson · July 19, 2019, 12:17am

Hi all,

A feature we want to implement in time for 20.04 for the new server installer is a way of doing an automated install. I’ve written up a wiki page of documentation for this as-yet-unimplemented feature at https://wiki.ubuntu.com/FoundationsTeam/AutomatedServerInstalls and I’ve love any thoughts and comments you have, as replies to this post, mails to the ubuntu-server mailing list or however you want to get them to me.

Cheers,
mwh

tomreyn · July 19, 2019, 11:23am

Thanks for asking for comments on this intended replacement for preseeding. This looks like it could work well, I think.

I have, however, not (yet?) compared this automation mechanism against the myriad of features debian-installer (d-i) preseeding supports, so may have missed important features that are lacking on the current draft of autoinstalls. When server-live installer was introduced it was lacking some much requested features which were available in d-i. Scripted installations should allow (at least the same level of but better) more flexibility than interactive installers, so care needs to be taken to not miss features that many of those currently using d-i would expect to see. (This said, overcomplicating configurations and enabling configurations during installations which could just as well be done after booting into the installed system should be prevented, too.) Only then will current d-i preseed users embrace autoinstalls and may actually want to spend the additional person hours required to migrate to autoinstalls.

My thoughts on the documentation:

when the answer to a question is not present in a preseed, d-i stops and asks the user for input. autoinstalls are not like this: if there is any autoinstall config at all, the installer takes the default for any unanswered question (and fails if there is no default).

It would be nice to have a switch which, if set, prevents the use of defaults, causing the installer to fail if settings are not provided in an autoinstall file. This could help debugging, preventing that undesired defaults are used without the administrator noticing. Autoinstalls could log (to a serial console? to an ssh shell?) the settings that were not found in the autoinstall file (and where defaults were assumed).

Generally, much thought should go into how installation errors resulting from missing or incorrect data in autoinstall files can be prevented, and, if prevention is not possible, how critical failures can be presented to the user in a way that is easy to access, and easy to understand / interpret. Ideally in a presentation format that is understandable by both humans and machines (large installations on heterogenous systems will need to analyze and handle single autoinstall failures which occur only due to hardware issues). This is an aspect where preseeded d-i has been lacking, and where autoinstall could shine.

It would also be desirable to have every manual server installation produce an autoinstall.yml file based on users’ choices made during the installation, stored on the target file system (e.g. at /var/log/installer/${hostname}_autoinstall.yml), so that reinstalling this server as an autoinstall will be easy, and clones can also be created with minimal modifications.

hggdh2 · July 19, 2019, 6:25pm

I assume that, by default, the installer log will be copied into /var/log/installer/ at the end of an install . This may happen only on success but, obviously, a log would be a critical resource on a failure. Docs should make this clear. It is probable that a new top-level key (say, log) and options.
it would be nice if the packages key were to be a mapping, with package-specific options available (and, perhaps, d-i preseeds). This would make it (I think) simpler to find a specific package and its options.

agrnwd · July 20, 2019, 7:25pm

The early / late scripts… what context/data will they have access to… will the early ones be able to mutate the autoinstall configuration? Late ones receive list of success/failures, final state of install etc?
Can these scripts be interactive, I worked in a few places that like to ask for user data during kickstart/preseed… we did that by switching the VT in the installer stage, read user input, write fragments of the kickstart config, and switch back after.
Perhaps the network node (and possibly the installer code) can use netplan and netplan.yml syntax directly to reduce code maintenance and provide a consistant configuration experience? https://netplan.io/examples
Consider what may functionally overlap with cloud-init and possible re-use code or delegate certain tasks. It would be really nice if the admin could simply delegate everything that is possible to a cloud-init configuration.

mwhudson · July 21, 2019, 10:07pm

Thanks for your comments!

Well it’s certainly a goal that anything you can do in the live server installer interactively you can do in the autoinstaller, and there will in practice be things you can do in autoinstalls that you can’t interactively (e.g. curtin supports bcache so you will be able to put bcache config into an autoinstall file but there is no support for bcache in the subiquity ui yet).

On the other hand “everything you can do with d-i” is not something I’m even sure it’s possible to enumerate, never mind replicate. Part of the point of these discussions is to find out which things people actually care about.

Right. At the end of the day early/late commands provide a great deal of rope to do whatever you like to yourself with.

This is one of those things that cuts both ways. One of the things people hate about d-i is having to figure out which questions a new release of Ubuntu adds and so which things you have to add to a preseed to make the install truly non-interactive again. And I just see a way in which forcing someone installing a headless server to say something about keyboard configuration makes any sense.

On the other hand, some way of seeing which defaults you are getting is clearly a useful thing. It could be a flag you set during install, it could be some kind of linter you run on a development system, or both I guess.

Where logs go is clearly something missing in my documentation. For a successful install, logs are already and will continue to be copied to the target system. I have this plan that when the install media is a USB stick we should create a partition on it and write the logs there. For the network case (which needs lots of details hammering out) we should certainly support logging to a tty, and I guess some kind of ssh thing and rsyslog? Anything else?

Ah OK this is kind of what I just said too. Curtin supports reporting progress to an arbitrary URL: https://curtin.readthedocs.io/en/latest/topics/reporting.html (which in practice is MAAS), maybe the live-server should support something like that as well?

This is already part of the plan

mwhudson · July 21, 2019, 10:09pm

Yes, this happens already.

Agreed. Does what I said in reply to tomreyn make sense to you here?

I’m not sure I understand. Do you mean some way of intermingling the proposed debconf-set-selections and packages config items?

mwhudson · July 21, 2019, 10:21pm

Excellent questions. It hadn’t occurred to me but supporting mutation of the config seems reasonably straightforward. We could just document that the config file is at /autoinstall.yaml when the early commands run and that it will be re-read after they complete.

For the late commands, it feels to me that maybe the late commands should probably only run in the case the install was successful, but that maybe one should be able to have “error_commands” or something like that.

Can you talk in more general terms about what you were doing here? My current thinking is that life will be simpler if an install either is automated or it isn’t. If you want to ask questions subiquity doesn’t already ask, maybe there is another way we can support that.

It does and will. Did the docs not make that clear?

Yes, someone else mentioned this idea too. Several of the keys in the current proposed syntax are actually used to produce cloud-init config, but we could document that these are ignored if you supply your own config. (Although some of cloud-init’s default behaviours are a bit unhelpful for a bare metal install so maybe some kind of merging of configs is needed anyway…)

mwhudson · July 22, 2019, 12:05am

I’ve updated the docs to add error_commands and reporting sections to the config, made some clarifications to the descriptions of early_commands and late_commands and added something about cloud-init config to ‘possible future directions’

agrnwd · July 22, 2019, 7:00am

It does and will. Did the docs not make that clear?

That’s great! I did not see the typical renderer: or ethernets: node in the example and I missed the link to the full configuration document.

Can you talk in more general terms about what you were doing here? My current thinking is that life will be simpler if an install either is automated or it isn’t. If you want to ask questions subiquity doesn’t already ask, maybe there is another way we can support that.

There are still many shops that do static IP allocation. We wanted as much as possible to be automatic in the installation so we had consistant installations but we were not using DHCP or Dynamic DNS. We had different groups responsible for IP address allocation, host naming / DNS setup and VM shell creation and a final group (Linux team) that did the OS installation. The desire was to enforce the IP / DNS must be pre-setup before the OS was installed and the script asked the user to enter the FQDN. The script then did a dig towards DNS and configured the IP address with the result basically also eliminating user error configuring the interface differently than DNS and blocking the user from moving forward until the required pre-requisites were completed. It was nice to be able to do all this through the virtual console and not jump around to many tools and satisfy a large number of pre-requisites. We also had the exact same scripting in place for all Linux OS’s we supported so the installation workflow was identical.

Without the ability to interactively prompt, and with the autoinstall using defaults or failing instead of prompting for missing information I believe we would have been stuck having some other script pre-generating a unique yml file and somehow linking that file to the installation process dynamically…

mwhudson · July 22, 2019, 8:51pm

Ah OK thanks, I’ll make the example more interesting (and, uh, actually correct).

Can you talk in more general terms about what you were doing here? My current thinking is that life will be simpler if an install either is automated or it isn’t. If you want to ask questions subiquity doesn’t already ask, maybe there is another way we can support that.

There are still many shops that do static IP allocation. We wanted as much as possible to be automatic in the installation so we had consistant installations but we were not using DHCP or Dynamic DNS. We had different groups responsible for IP address allocation, host naming / DNS setup and VM shell creation and a final group (Linux team) that did the OS installation. The desire was to enforce the IP / DNS must be pre-setup before the OS was installed and the script asked the user to enter the FQDN. The script then did a dig towards DNS and configured the IP address with the result basically also eliminating user error configuring the interface differently than DNS and blocking the user from moving forward until the required pre-requisites were completed. It was nice to be able to do all this through the virtual console and not jump around to many tools and satisfy a large number of pre-requisites. We also had the exact same scripting in place for all Linux OS’s we supported so the installation workflow was identical.

Without the ability to interactively prompt, and with the autoinstall using defaults or failing instead of prompting for missing information I believe we would have been stuck having some other script pre-generating a unique yml file and somehow linking that file to the installation process dynamically…

Thanks for the example. It seems to me a lot of effort to go to to avoid running DHCP but I guess everyone has their reasons! I guess it’s easy enough to say that early_commands will run with the same controlling terminal as subiquity[1] and before all the terminal changes have been made. If we also add the flag to ask questions when an answer is missing, your script then write the desired network config to either /etc/netplan or the autoinstall file. It would be hard to match the UI of subiquity without some plugin style interface I don’t want to design or commit to at this stage, but it could just work.

[1] A wrinkle here is that if there are serial lines configured, a subiquity process runs on each of them too. Which one should do the autoinstall?

allanx · July 24, 2019, 10:10am

+1 for rsyslog. This aligns with similar functionality in kickstart and can handle early errors where the target filesystem doesn’t exist yet. Another option worth considering is logging to a specially named serial device (for VMs), see https://fedoraproject.org/wiki/Anaconda/Logging#Remote_logging_via_virtio or also just logging to the console and not having any tui running.

This is a useful change and along with the logging above addresses concerns I initially had about the lack of handling for error cases in the script.

Not discussed so far and one use case I’m particularly interested in is the ability to install in an isolated or offline network environment, yet have the resulting image be correctly configured to work with e.g. dhcp on boot and have the default repositories set up.

When trying to do this with preseeding, the installation environment bleeds over into the runtime environment so e.g. if you install without a network the network then isn’t automatically set up in the target, or if you use repositories on a provisioning network the target expects to use those same repositories on boot, which isn’t what I’d like.

My main motivation for that use case is to have somewhat reproducible images which are built in an isolated environment and then deployed separately instead of e.g. installing with an internet connection and the installation image having updates installed during the installation, resulting in a different image each time you run the installation script. If this use case can be supported that would be great.

mwhudson · July 24, 2019, 9:46pm

Well I would argue that if you’re using an installer in a VM you’re doing things a bit wrong. Why not use a cloud image?

or also just logging to the console and not having any tui running.

This is my current half-baked plan, yes.

mwhudson:

I’ve updated the docs to add error_commands and reporting sections to the config, made some clarifications to the descriptions of early_commands and late_commands and added something about cloud-init config to ‘possible future directions’

This is a useful change and along with the logging above addresses concerns I initially had about the lack of handling for error cases in the script.

TBH I hadn’t considered the error cases nearly as much as I should have. Score one for asking for feedback!

Not discussed so far and one use case I’m particularly interested in is the ability to install in an isolated or offline network environment, yet have the resulting image be correctly configured to work with e.g. dhcp on boot and have the default repositories set up.

So this should just work™ with the way things are put together in subiquity. The default is to run DHCP on all wired interfaces but only configure it in the target system for interfaces that got a response but this can be overridden.

When trying to do this with preseeding, the installation environment bleeds over into the runtime environment so e.g. if you install without a network the network then isn’t automatically set up in the target, or if you use repositories on a provisioning network the target expects to use those same repositories on boot, which isn’t what I’d like.

So the part about repositories isn’t supported yet – if there is no network at install time, the install process runs with just the repository on the install media available – but it would be fairly simple to add (after all, I had to implement a way to have different apt config for install time to implement the current behaviour). And I guess you can always reconfigure the repository in a late_command.

My main motivation for that use case is to have somewhat reproducible images which are built in an isolated environment and then deployed separately instead of e.g. installing with an internet connection and the installation image having updates installed during the installation, resulting in a different image each time you run the installation script. If this use case can be supported that would be great.

Makes some sense, although in general we do want people to install at least security updates!

stevew · July 25, 2019, 6:08pm

A couple quick comments. I agree with a couple others that I usually need a semi-automated install, answering a few questions here or there that are unique to each installation (static IPs as was mentioned but also disk layout and formatting). I’ve used d-i preseeding in this manner quite successfully up to this point. Would there be a way for the new server installer to support a semi-automated mode like this?

Is it required that a user identity be created during installation? I normally allow root login and don’t populate user accounts until later using configuration management with specific UIDs, GIDs, etc.

mwhudson · July 29, 2019, 1:50am

Well, anything is possible… If you’re not providing values for the networking and disk layout, what are you providing answers for? Proxy/default repository I guess? Would it be OK to be able to change the default here vs providing a value and not asking the question?

In any case, I’ll keep this use case in mind. I don’t see why it would be particularly difficult to support.

Is it required that a user identity be created during installation? I normally allow root login and don’t populate user accounts until later using configuration management with specific UIDs, GIDs, etc.

Without thinking about this terribly hard, I think I’d prefer to handle this use case by allowing the admin to provide a complete cloud-init config for themselves. Would that make sense for you?

mwhudson · July 31, 2019, 2:43am

Hi all, I’ve made some updates to the spec to allow providing cloud-init config, and semi-interactive installs. I think looking at https://wiki.ubuntu.com/FoundationsTeam/AutomatedServerInstalls?action=diff&rev2=37&rev1=33 and https://wiki.ubuntu.com/FoundationsTeam/AutomatedServerInstalls/ConfigReference?action=diff&rev2=14&rev1=10 should give a sense of what I’ve changed.

stevew · July 31, 2019, 5:35pm

I checked out the diffs and the changes look like they’ll meet our needs. It does seem strange, though, to use cloud-init just to avoid having to specify a user identity. But as long as there is a way to do it, I’m happy

hggdh2 · August 2, 2019, 5:06pm

Yes. There are, obviously, advantages and drawbacks to such an approach but, in general, I find that having all related configuration data in a “standard” space helps more than hinders.

mwhudson · August 5, 2019, 2:06am

This makes sense I guess. Can you give some examples of how you think this might look?

allanx · August 20, 2019, 12:17pm

Apologies for the late reply.

Heh, a fair point, but there are a few reasons why you might want to install in a VM evironment.

For example, if you want a different filesystem layout/filesystem choice. That’s like a line or two in an installer, more effort with a could image/tarball. A use case for that would be e.g. creating a few of ubuntu VMs with different filesystems for benchmarking purposes.

Another slightly oddball use case is installing an image in a VM to be used on a bare metal machine. Installing via bare metal takes a while, mostly due to big servers taking minutes to get to the point they can even boot linux. Instead you could e.g. install in a VM environment with an iSCSI disk, then boot the bare metal machine with the disk you used in the VM. It’s a bit crazy, but it does work as long as you’re careful about your disk/network setup in the VM being not too dissimilar from bare metal.

And finally, just from a testing/development point of view surely doing a VM install makes sense?

Understood, a late_command is a viable alternative, it’s just one less thing to do though if the installer takes care of it.

Totally, the default should definitely be to have a fully updated system with all security updates applied. It’s really just about having the option not to do that.

mwhudson · August 21, 2019, 1:58am

Apologies for the late reply.

No worries, I haven’t done any actual implementation work on this yet.

Well I would argue that if you’re using an installer in a VM you’re doing things a bit wrong. Why not use a cloud image?

Heh, a fair point, but there are a few reasons why you might want to install in a VM evironment.

Oh sure, I don’t want to say that it’s not a supported thing to do or anything like that. But in general, installing to a VM is a slightly strange way of going about things, so I’m not sure support for VM-only things like logging via a virtio char device is really worth it (that example is also probably not very much effort so maybe it would be!)

For example, if you want a different filesystem layout/filesystem choice. That’s like a line or two in an installer, more effort with a could image/tarball. A use case for that would be e.g. creating a few of ubuntu VMs with different filesystems for benchmarking purposes.

Another slightly oddball use case is installing an image in a VM to be used on a bare metal machine. Installing via bare metal takes a while, mostly due to big servers taking minutes to get to the point they can even boot linux. Instead you could e.g. install in a VM environment with an iSCSI disk, then boot the bare metal machine with the disk you used in the VM. It’s a bit crazy, but it does work as long as you’re careful about your disk/network setup in the VM being not too dissimilar from bare metal.

I guess these make some kinds of sense Neither necessarily seem like a use case for automated installs though…

And finally, just from a testing/development point of view surely doing a VM install makes sense?

Oh yes, I certainly do this a lot!