Command Line Interfaces (by )

Yesterday I saw this thread on Twitter by @thingskatedid about Kitty, a featureful terminal emulator.

I use the terminal a lot. My normal working environment is:

  • herbstluftwm
  • emacs
  • firefox
  • thunderbird
  • Lots and lots of terminal windows

Up until now, I'd been using alacritty as my terminal of choice, after frustrations with getting xterm to do unicode properly, but I've moved over to kitty now - as far as I can tell it's a superset of alacritty, at least for features I actually use.

So, why was I excited about kitty? Having proper graphics in the shell is a tiny step closer towards what I'd really like to have, but perhaps as far as can be done without ripping up a whole lot more infrastructure... Let me explain.

Wouldn't it be cool if graphical apps could just use escape sequences to switch to doing graphics through standard input and output? Currently, when you start X applications, they look for an environment variable called DISPLAY and find within it connection details to open a socket. They then leave their terminal input and output unused, or just send debug logging to it. The thing is, there's no reason to have two separate systems for terminal versus graphical interaction, and it creates a lot of complexity. If I ssh to another machine, ssh has to both pass my terminal interface over, and also do extra work to forward the X protocol as well, if apps I start on the remote machine are to find their way to my user interface.

That's more work for programmers other than me, but it affects me too - if I start an X application in a terminal, then that terminal is sat there blocked and doing nothing providing an unneeded and unused terminal interface to an app that's then firing up an X window connection to control a separate window.

A terminal emulator could provide complete control of the contents of their window via appropriate terminal escape sequences, including the mouse and clipboard. It could even allow opening popup windows and dialogs! If X toolkits supported this when they detect a compatible terminal, then I wouldn't need ssh X11 forwarding!

(For extra neatness and simplicity, this same interface can be provided by the OS graphics drivers to the full-screen console - so that the windowing system is just a program that runs on a graphical console and provides a bunch of smaller virtual graphical consoles in windows, so you can run the windowing system inside a window (handy if you're developing it)... But that would be a big change for little return now; it's a bit too late.)

This would simplify a bunch of infrastructure, but most importantly, it would end the sharp barrier between "graphical apps" and "terminal apps". Apps could easily be a bit of both, blending graphics and textual interaction without needing to be a graphical app with a whole load of terminal emulation logic built into themselves.

Terminal apps fall into three categories:

  • Ones that just read in or write out some data. You can use them in shell pipelines, or you can interact with them directly.
  • Ones that are interactive yet still fit the terminal paradigm naturally, because they're all about a dialogue with the user: print a prompt, let the user enter something, print some results, repeat.
  • Ones that take control of the window, but aren't graphical apps purely because they chose not to be. The irssi IRC client I mentioned before, tools like top, emacs run in a terminal, etc.

The first group benefit from a combined terminal if what they input or output happens to be graphical in nature. That's what kitty is being used for in Kate's twitter thread; she's running commands that output something hard to display in text, that's very readable as a graphical object in the terminal. We're already winning at that, although kitty's support is rather primitive - basically compositing bitmaps only, rather than things like drawing lines and filling rectangles; nonetheless, it's sufficient for some simple but useful cases.

The second group can benefit, too. I could write apps that write graphics directly as part of an interactive process, using kitty's protocol for displaying images. I can add that capability to existing apps without rewriting them as graphical apps.

And the final category doesn't need to exist any more, because it's no longer easier or more convenient to write a textual app than a graphical app (well, assuming that graphical terminal sequences become widely supported, even in the hardware console drivers in the OS, which is a big if, but you know what I mean). Existing textual apps could integrate more and more graphical stuff with time and become graphical apps if it made sense for them to do so.

Going higher level

But this is still a bit too low-level for a good user interface system. We don't really want apps all interacting with the user interface system at the level of keypresses, mouse events, pixels and characters. Moving from textual to graphical isn't that big of a change, really; it would be cool to go further.

CLIM, the Common Lisp Interface Manager was (but is making a comeback) a user interfacing toolkit based around arbitrary objects. This means that kitty-style interaction that outputs graphics just happens by outputting an object that's graphical instead of a string, but it also means that the system knows what type things are - objects like pathnames and URLs and numbers on the screen maintain their identity, rather than just being a bit of text or graphics that, at best, might be recognised via a regular expression. So when a command expects a pathname as an argument, the system can know this as the command is being typed, and automatically highlight pathnames onscreen to be selected with a single mouse click. A command might expect an image as an argument, and the system will then know to provide a file picker or an interface to take a picture from a camera and provide it.

Ideally, I'd like even fully graphical non-interactive applications to expose their notion of "command" to the user interface system. Most apps have an internal system of commands, and let you configure menus/toolbars/key bindings by mapping to these internal "commands", and maybe even let you create macros by combining commands, and so on. In my opinion, it sucks that apps have to do that individually, even if they use common toolkit libraries to each do it in their own process space without rewriting all the code. Why can't apps expose an object/command model to the user interface system, so that I can record and replay macros across apps? I have a keyboard with Copy and Paste keys on it, but I need to bind those keys individually in every app or they won't work - but if the user interface system could directly access the command system of apps, then I could bind those keys to the Copy and Paste actions globally across all apps that had those commands. If an app is expecting a pathname to be input, then valid pathnames should be selectable across all apps.

Plus, exposing the command interface would mean that assistive technologies can be built into the user interface system rather than individual apps, which would be a great boon for accessibility!

Use cases for building user interfaces with ARGON are along way off - but I'm taking notes...

No Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales