Refreshing storage in Anaconda

anaconda-cola
Remixed version of bottles by Tomas Arad on OpenClipArt.
You must be thinking: “What do you mean by refreshing storage? I didn’t think you could drink storage?” No, sad to say, this blog post isn’t about the type of refreshment you get from a crisp cold glass of Anaconda Cola (yum!) It’s about the action of manually refreshing anaconda’s view on the storage configuration it’s working with. Why would you want to do that?
Let’s step back a second first. The custom partitioning tool in Anaconda (Fedora’s installer) still has a lot of rough edges to the design in Fedora 18 GA. The Anaconda team has been putting a lot of work into improving it in their Fedora 19 branch. We’ve got a ton of bugs, thoughtful (and some not so thoughtful 🙂 ) comments from forums and blogposts, and some preliminary anaconda usability test data! (You’ll be hearing a lot more about that last bit soon, don’t worry! 🙂 ) The team has pored over all of this information and has had a number of brainstorming sessions and discussions on how to address the identified issues, both over IRC, the mailing list, in bugs directly, and in person.

The problems to solve

One of the things anaconda does right now is when you first start it up, it scans for storage devices. If you end up plugging in or otherwise connecting additional storage devices (or disconnecting devices that had been connected initially), anaconda doesn’t know to rescan for storage devices to pick up the new information about storage. The only way it would know to rescan for new disk information is if it rescanned the disks! Rescanning the disks, however, erases any configuration you’ve created in the UI, so constantly rescanning the disks (for other reasons too) is a pretty terrible idea here – if you create an LVM volume group using two disks that you then plug out, for example, and don’t have enough space for the configuration you built on top of it, really all anaconda can do is erase that configuration. Bottom line: folks who decide that they don’t have enough disk space to do what they want and want to swap in a new device would have to restart anaconda to get that device recognized.
Another of the bits of feedback that we got is that the custom partitioning UI doesn’t yet let you do a few things that folks want to do. For example, you currently can’t reserve free space within an LVM volume group, and you can’t encrypt individual logical volumes – you can only encrypt an entire volume group in one go. These are all issues on the plate of stuff to be addressed – but until they are, how can users work around any bugs that might be present in the custom partitioning system, or need more control than the user interface can allow?

Working in the tty (or terminal) for More Configuration Control

So, I believe this discussion came up at DevConf last week (a lot of the anaconda team was present at the conference. 🙂 ) The proposed answer for the latter problem was to allow users to simply work with the command-line tools for working with storage – stuff like fdisk, parted, mdadm, lvm, etc. If you pop on over to tty2 and start working on the disks using command-line utilities, anaconda doesn’t have a way to know that changes took place to the disks out from underneath it. For live installs, rather than going to tty2, you can just pop open a terminal from the live desktop and do the same thing.
Now, you could, if you did some elaborate command-line disk modifications on tty2 while anaconda was running, run ‘restart-anaconda’ from the very same command line to get anaconda to restart. This would result in the disks being rescanned (remember, it always scans for disks when it first starts,) so all the changes you made on the command line would be picked up and recognized by anaconda. However, you’d lose any other configuration you may have completed on any of anaconda’s other screens and you’d have to re-do it all – boohiss.

Letting the users rescan for disks

So the idea that came out of DevConf to address this was to allow users to simply refresh the disks without having to restart anaconda. Chris grabbed a whiteboard yesterday and along with Ryan and I we had a discussion about how we should actually make this possible.
Ryan suggested that since the users are on a command-line anyway, we could make a command – something like ‘anaconda-rescan-disks’ – that would rescan the disks for them and not have to involve the user interface at all. That wouldn’t help users who just plugged a disk in and didn’t have any expertise/interest in manual command-line configuration, though. There’s another case where a user might want to rescan the disks – if they fiddled around in the custom UI, weren’t happy with what they came up with, and wanted to just reset everything back to how it was. So at least, for that case, we’d probably need a UI button. One other thing is that if it was a command line command, how would the users discover it? (Some folks tried this go-to-the-tty-and-muck-around trick of working with the storage using command line utilities already in f18, but didn’t know about the restart-anaconda command.) So we decided this function should probably have a UI presence.

Whiteboarding the design

Untitled
This is the whiteboard we came out of the discussion with. Let’s walk through a bit of the thought process and challenges here, since I don’t think they’re immediately obvious by looking at the whiteboard:

Modifying the custom partitioning screen to accommodate the rescan button

When you add a new feature like this to an existing UI, you’ve gotta figure out how the users actually get to it. Since rescanning for storage is something we want to limit to more experienced users, we felt the main rescan button would best live in the custom partitioning UI. We’ve tried to make the custom partitioning UI the place for folks to go who have more expertise about storage and know how they want to configure their disks.
You can see the custom partitioning screen (AKA “custom p,” word) in the upper right corner of the whiteboard:
3712946d-1e9c-48d5-b707-78b8322f6a8a
We’ve very scientifically devised two personas for this screen, right on the whiteboard, and created a button on the custom partitioning screen for each’s individual needs:

  • There’s now a ‘refresh’ button on the button bar of the left-hand-side pane of the custom partitioning screen. This pane is where the detected partitions on the user’s storage device are shown, so we felt that a refresh button that looked like it applied to the partitions would be a logical place to put a button to refresh information about them. The persona ‘kernelguys’ is who we were thinking about when we placed this button – I think it was initially kernel developers who brought up the desire to be able to refresh the storage at DevConf. You could think of ‘kernelguys’ as a pretty good euphemism for ‘really smart people who understand hardware and know what they’re doing.’
  • There’s a ‘reset’ button in the lower right of the custom partitioning screen. The ‘persona’ drawn out here as wanting to use this button is ‘oops ppl.’ These are folks who went through the custom partitioning UI, decided they didn’t like what they did, and want to start over with a clean slate. This button will clear all the customizations they made on this screen and reset it so they can start with a fresh autopart scheme and build on top of that.

So there’s some other changes on this screen you may have noticed:

  • We’ve killed the ‘finish partitioning’ button in the lower right corner of the screen. The dichotomy between having dismissal buttons in the lower right and in the upper left is a big point of user confusion we’ve heard loud and clear. We removed the ‘continue’ button from the lower right corner of the ‘Disk Selection’ screen that comes right before custom partitioning already, and we forgot to remove it on the custom partitioning screen itself. Since we needed a place to put the reset button, this seems like a good time as any to get rid of the continue button there. When users want to leave this screen, they can click ‘Done’ in the upper left corner. [I know there are probably those of you out there shaking your head because you want the continue buttons in the lower right and you don’t want any buttons in the upper right. This is familiar, and comforting, right? I’m going to explain at the bottom of this post why I don’t think that’s a good idea, and I hope it’ll make some sense to you.]
  • d8d7210d-69c4-4b29-9da2-fbc4b6937401When you leave this screen via the ‘Done’ button in the upper left, you’re now taken to a summary screen. We’ve gotten a lot of feedback about folks not being comfortable understanding exactly what custom partitioning is going to do and who want finer-grained details on what exactly is going to happen to their disks. A common use case here is, ‘OMG I *love* my partition XYZ. Please, please, just don’t touch partition XYZ. Do whatever you want to the others – I want assurances partition XYZ hasn’t a hair on its head ruffled!’ (Okay, I know partitions aren’t hairy. You get the gist.) Anyway, based on that feedback, adamw had the idea to create a summary screen. So that’s what we’ve done – any time you exit the custom partitioning screen you’ll get this summary.

Okay, now that we’ve gone through the whiteboard version, here’s the Inkscape version of this mockup:
01-custom-part_rescan-disks

The summary screen

So as previously mentioned, rather orthogonal to the cases where users need to refresh their storage devices, it was suggested we have a summary screen showing all the changes you’ve queued up via your work in the custom partitioning UI so you’ll know exactly what’s going on.
Let’s talk about this a little bit. So ideally, I think, in the land of unicorns and magical elves, we could have a system of interactive diagrams for both the physical and logical layout you’ve configured and let you click through it to check your work. We don’t really have the time needed to implement something like this and get all the fixes and other usability updates done in the UI right now. We do have the storage.log, which lists out all of the storage actions the user’s queued up in execution order. It looks something like this:

    03:58:40,526 DEBUG storage: action: [6] Destroy Format ext4 filesystem on lvmlv fedora-root (id 17)
    03:58:40,526 DEBUG storage: action: [7] Destroy Device lvmlv fedora-root (id 17)
    03:58:40,526 DEBUG storage: action: [4] Destroy Format ext4 filesystem on lvmlv fedora-home (id 16)
    03:58:40,526 DEBUG storage: action: [5] Destroy Device lvmlv fedora-home (id 16)
    03:58:40,526 DEBUG storage: action: [2] Destroy Format swap on lvmlv fedora-swap (id 15)
    03:58:40,526 DEBUG storage: action: [3] Destroy Device lvmlv fedora-swap (id 15)
    03:58:40,527 DEBUG storage: action: [8] Destroy Device lvmvg fedora (id 13)
    03:58:40,527 DEBUG storage: action: [9] Destroy Format lvmpv on partition IMSM00_0p2 (id 12)
    03:58:40,527 DEBUG storage: action: [10] Destroy Device partition IMSM00_0p2 (id 12)
    03:58:40,527 DEBUG storage: action: [0] Destroy Format ext4 filesystem on partition IMSM00_0p1 (id 11)
    03:58:40,527 DEBUG storage: action: [1] Destroy Device partition IMSM00_0p1 (id 11)
    03:58:40,527 DEBUG storage: action: [11] Destroy Format msdos disklabel on mdbiosraidarray IMSM00_0 (id 10)
    03:58:40,527 DEBUG storage: action: [12] Create Format msdos disklabel on mdbiosraidarray IMSM00_0 (id 10)
    03:58:40,527 DEBUG storage: action: [15] Create Device partition IMSM00_0p1 (id 19)
    03:58:40,527 DEBUG storage: action: [16] Create Format ext4 filesystem mounted at /boot on partition IMSM00_0p1 (id 19)
    03:58:40,527 DEBUG storage: action: [13] Create Device partition IMSM00_0p2 (id 18)
    03:58:40,527 DEBUG storage: action: [14] Create Format lvmpv on partition IMSM00_0p2 (id 18)
    03:58:40,527 DEBUG storage: action: [17] Create Device lvmvg fedora (id 20)
    03:58:40,527 DEBUG storage: action: [22] Create Device lvmlv fedora-swap (id 23)
    03:58:40,527 DEBUG storage: action: [23] Create Format swap on lvmlv fedora-swap (id 23)
    03:58:40,528 DEBUG storage: action: [20] Create Device lvmlv fedora-home (id 22)
    03:58:40,528 DEBUG storage: action: [21] Create Format ext4 filesystem mounted at /home on lvmlv fedora-home (id 22)
    03:58:40,528 DEBUG storage: action: [18] Create Device lvmlv fedora-root (id 21)
    03:58:40,528 DEBUG storage: action: [19] Create Format ext4 filesystem mounted at / on lvmlv fedora-root (id 21)

What we decided to do here so it wouldn’t end up being a multi-week project (when it’s not as high priority as other stuff on the team’s plate) but we’d still have something that could serve well enough now and be improved over time was basically to clean up the storage log in a format so users could sort it based on the criteria they were interested in. Going with the idea that there’s going to be a Nervous Nelly wanting to make sure her /home partition isn’t touched at all, you could sort by affected partition name in alphabetical order, look for home, confirm it’s not going to be touched, and feel a little bit better hopefully. 🙂
We had a bit of a back-and-forth about which buttons we should have here and what those button labels should say. Originally, as you can see in the whiteboard sketch, we had three buttons – one is labeled ‘reset’ in the whiteboard sketch. We thought about giving users the option – should they review the summary and become horrified – to undo all the custom partitioning things they did, reset the partitions to the autopart configuration, and allow the user retreat back to the main menu. I decided to drop that button though. I’ll tell you why: with dialogs like this, I think the majority case is you have two buttons – one affirmative, one negative. When you put three buttons in the mix, the dialog becomes a bit more unfamiliar and a little bit less intuitive and will take more time and effort on the user’s part to interpret. I also don’t think it would be so common for a user to opt into custom partitioning, muck with it, hate what they did, and give up and just do autopart. Even if they did – we just added a reset button to the custom partitioning screen, so they don’t need this summary dialog to reset what they did. They can cancel out of this dialog, hit reset, and hit done and be done with it. A little bit longer, but it should simplify the dialog for who I think are the vast majority of users who will go through this path.
The other thing we worried about is what should the two remaining button labels say. If the left button said ‘Cancel,’ it’s not super-clear where you’re going to be whisked off to if you hit it. So we decided to make it a long label (sorry my German-speaking friends!!) and have it say: ‘Cancel & Return to Custom Partitioning.’ For the affirmative button, the label text there is a bit tricky. It first said, ‘Commit Changes.’ Chris very astutely pointed out that label is a lie – we don’t actually commit anything to disk until you hit ‘Begin Installation’ on the main menu screen. I had suggested ‘Queue Changes’ at that point, but it felt really wrong when it was on the screen – it seemed to make the screen more complex again, and would require more concentration on the user’s part to figure out what was going on since ‘Queue’ isn’t a simple affirmative verb. Brian suggested ‘Apply Changes,’ and that seemed accurate enough and affirmative enough that we went with it.
06-summary-changes

Okay, so how do you refresh your disks?

We got a little sidetracked with the summary screen. What happens when you press the ‘refresh’ button in the left-hand-side of the custom partitioning screen? Since it’s a destructive action (you lose basically all configuration you did in the custom partitioning screen when we refresh the disks) and because it’s not a cancel-able action and can take a while (I don’t know all of the technical details here, but once we initiate a disk scan we can’t stop it until it finishes), we decided to make the button pop up a lightbox dialog to prepare the users as to what’s going to happen before the scan starts. Here it is on the whiteboard:
e79cf3bd-a8e4-4fcc-b3ae-4bb122e5d978
So I modified this a little bit from what the whiteboard says. We came up with four discrete blocks of text to go on this screen when working at the whiteboard:

  • You can go to a tty and tweak knobs.
  • All UI changes you made to storage will be dropped.
  • You will have to reselect the disks for installation & will go to disk selection after this is complete.
  • You can plug disks in or out at this point.

After some discussion, we decided to drop that first point – folks who know how to use lvm on the command line, for example, don’t need instructions telling them how going to a tty works or to open up a terminal. They already know. We can probably document this in the user documentation guide, but contextual documentation for this – which can be a dangerous thing to do if you do things wrong – is probably not the right call so we dropped that text.
The text about UI changes getting dropped was done up as a warning label, and the note about going to disk selection afterwards is in the ‘finished scanning’ message. Here’s how this screen came out in the Inkscape mockups (these are a progression):
02-rescan-disks_before
03-rescan-disks_progress1
04-rescan-disks_progress2
05-rescan-disks_finish
Oh yeah, there’s one thing I forgot to mention about the third screen above! We came up with this principle, when we first decided to redesign anaconda’s UI to the hub and spoke model, that we would try to never trap users in a spoke and always let them leave. The tricky part here is that you can’t cancel a disk scan mid-scan. So what we decided to do – if the rescan takes a certain amount of time (maybe 5-10 seconds), we’ll pop up a link that will let the users bail out and go back to the main menu / hub and do other stuff while the scan completes. They’ll still need to revisit the storage spoke – it will have a warning message notifying them the scan has completed and they need to go back and check on it – but they won’t be trapped at the least.
You can also see in these mockups we had a bit more button heartburn. Since the affirmative button on this screen doesn’t actually do anything – it’s the ‘Rescan Disks’ button that does something – we decided to go with a simple ‘OK’ label. Normally I try to avoid ‘OK’ buttons since they don’t really tell you what’s happening. In this case, though, the user is literally reading the screen, seeing the progress made, and acknowledging it – that’s it. ‘OK’ doesn’t take any action, so we figured it was ‘OK’ to use here. Ha, ha, ha.

A note about buttons in the upper left vs buttons in the lower right

We’re following a hub and spoke model with anaconda rather than a linear wizard-based model. What this means is that the movement between a hub and a spoke is not a left-to-right linear movement – it’s a down and up movement. You move down from the main menu into a spoke, and then you move back up to the main menu when you’re done with the spoke. This is why I don’t think it makes sense to have the buttons to dismiss a screen in the lower right for spokes. We were inconsistent in the first version of the anaconda UI – the storage spoke was basically a linear wizard spoke inside a hub and spoke model, which I’m afraid was crossing the streams a bit too much. In newer builds of anaconda you’ll note that the storage spoke is now one screen, and going to custom partitioning is sort of a separate spoke only accessible from the storage selection screen. So far, working with the UI redone in this manner does feel a lot more natural. I’m hoping over time as folks get used to the installer being hub and spoke we’ll get a better feel for how well this works.

Well…

I hope this all makes sense. Let us know what you think (politely and respectfully, please 🙂 ) in the comments.

15 Comments

  1. Olivier

    Why not use udev events to know when disks have changed instead of forcing the user to manually force a refresh? I’m not sure why you need to rescan everything since recent Linux kernel will tell you exactly what has changed.

    • mairin

      We could probably do that for disk plug-in/plug-out detection – that’s a question for the devs – but for users running commands in the terminal outside of the UI, we can’t automatically detect whether or not there’s now a 4-disk RAID array set up from the command line, if that makes sense?

      • Olivier

        I believe you actually can figure that out, as it creates a kernel object which is then exposed through udev. You definitely should get the installer people in contact with Kay (the udev maintainer, also at Red Hat).

  2. Lennart

    Uh, oh. This is just wrong. There’s really no point at all for having a UI for “Refresh”.
    libudev will deliver signals whenever a device is removed or added or when its metrics change. When udisks was created the folks involved carefully made sure that events are generated automatically when fdisk changes the partition table and similar triggers.
    Really, having a manual “Refresh” is technically wrong and bad UI too: you really should not add UI where the system can just figure things out on its own anyway.
    The whole idea of “rescanning” what is found, rather than subscribing to hotplug events is really broken, anaconda shouldn’t do that. It’s the same brokeness that makes LVM so utterly broken.
    Lennart

    • mairin

      Well, hm, if we ditch the refresh UI and make it autodetect devices, we still need a mechanism to reload the storage configuration for the users doing stuff in the tty/terminal – but maybe at that point it could be a command line command they run?

      • We do have to make sure Lennart and Oliver are right, too. Anaconda supports a *lot* of storage technologies; we’d need to be sure that all possible changes to all supported storage devices can be detected without needing a manual rescan.
        I do worry that the ‘rescan’ option introduces yet more complexity to the UI for a genuine but not _huge_ benefit – you _can_ always just reboot – but that’s just me being a negative nancy, it looks like you figured out a lot of the hairy stuff.
        My internal proofreader wants to get out that you ‘pore over’ information (not ‘pour over’, in the blog post text) and that changes take ‘effect’ not ‘affect’ (“Refresh #2” screenshot).
        Thanks for the article, great stuff as always.

        • Nick Coghlan

          Thanks for the write-up, very interesting.
          As far as the observation that manual refresh shouldn’t be necessary goes, I think it makes sense to hit the problem with the “manual refresh” button in the short term, and then over time try to update Anaconda to correctly register for storage hotplug events and pick them up automatically. That way there’s always a fallback option that covers any cases where the event detection falls short.
          There’s merit in having a manual refresh button anyway – even if Anaconda is collecting the events behind the screen, it can be useful to display a discreet notification saying “storage details have changed, hit Refresh to see them” and delay the screen update until the user says “let me see the changes”.
          And Adam already unleashed his inner copy editor, so I don’t have to (I only noticed the s/affect/effect/ though).

        • Lennart

          Adam, storage technologies which do no generate udev events are broken. And these udev events are hardly a new thing, and in the context of udisks most (all?) storage systems got fixed for this where that was necessary.
          Really, this isn’t new stuff, Anaconda would not be the first software relying on these events, and hence thing should just work.
          And even if there was still some broken stuff out there, we’d better invest the bit of engineering to fix that where that’s necessary, rather than invest ill-directed engineering in adding features to the anaconda UI that really shouldn’t exist in the first place.

      • Lennart

        Mairin, tools like fdisk and suchlike already do generate the necessary events. Really, there is *no* need at all for manual reloads of any kind whatsoever. For everything we have udev events. You get them when things are repartitioned, when devices are added/removed, on media change, and everything else.
        Please, forget about the manual refresh, it’s a very bad, unnecessary idea.

        • mairin

          Lennart, this isn’t something that’s going to be there forever, it’s just a temporary stopgap until the storage library (blivet) sends out the events (which it apparently is not sending out now.) That work is slated to be done but not in the timeframe we need to address rescanning the disks after manual manipulation in. Once blivet gets the support it needs to send out the udev events then we’ll pull the refresh dialog and button out (except for the reset button, which I think is still needed. And the summary dialog, that also will stay.)
          I definitely agree this is not ideal, and I really hate having users to do manual things when the computer should be able to just automatically take care of it for you, but at the same time we do have to deal with the reality of the stuff on the development’s team plate and their priorities and what we can do in the time we have until the next release, if that makes sense. There’s a lot more egregious and higher-priority bugs and missing features that need to be worked on and finished before the udev work can happen.
          So yeah, this is a dirty UI hack, and sometimes it sucks to have to make a more pragmatic call and do something like this, but there are definitely plans on the table to do this right, we just wanted to make sure users could get around the issue in the meantime as that work happens.
          I hope this makes sense :-/ even if you don’t agree with it.

  3. Gary Scarborough

    May I suggest that somewhere in the install process, possibly on the “install destination” screen, that you list the current install size of the packages selected? It would make it easier to judge partitioning to know how much space you are going to need.

  4. Pingback: Interview: The Fedora Project’s Máirín Duffy | Fedora Magazine

  5. My problem with Anaconda is the combination of GPT partition table and BTRFS not well supported. Maybe it’s fixed in the meantime, I am using CentOS7 and here it works either with fdisk partition table or not at all. No idea why.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.