Why the big rewrite?
January 23rd, 2024
Last week was my first week at Zed. I joined right as the team was preparing for this week's release, which they referred to as "Zed 2"—big things in the air. The release marks the end of the team's multi-month rewrite of Zed's UI framework GPUI from version 1 to version 2. All hands on deck, everybody fixing the last bugs, polishing Zed, and I, the newcomer, had so many questions: why the big rewrite? how did you pull this off? what does it get you? how did you organize it?
Luckily Nathan, Max, and Antonio—Zed's three co-founders—were happy to sit down with me and answer my questions.
What follows is an editorialized transcript of an hour long conversation we had about the rewrite from GPUI 1 to 2, what the risks were, why GPUI 1 had to be changed, how the team was organized around it, and why beauty in code matters. I tried to preserve intent and meaning as much as possible, while getting rid of the uhms, the yeahs, the half-sentence questions, backtracking and over-talking that make up a reflective and in-depth conversation.
(You can watch the full conversation on our YouTube channel.)
Thorsten: The upcoming Zed release is a big rewrite. You switched from GPUI 1 to GPUI 2. The question I have is: why the big rewrite? It sounds like it was a big, multi-month thing, and we've all heard one shouldn't ever do the big rewrite. So why did you?
Nathan: I agree. I've heard that, I know that. But I'm a very rewrite-driven programmer, I guess. It's a big part of my process.
When you've lived with a system long enough, it's very difficult to escape the fundamental DNA that's baked into that system: I would fix A if not for B, and I would fix B if I'd already fixed A. Things become almost deadlocked. Decisions become deadlocked on other decisions.
First of all, the central lesson that I learned early in my career is that people often underestimate what they can achieve incrementally. I'll just say that. The original sin of most developers is not realizing that you can increment your way there. I've certainly have a tendency to still run afoul of that, to underestimate what can be achieved incrementally.
But I also think there's a tremendous power, after you've really learned a lot about a problem through building V1, to set yourself free in a single Gordian-knot-slicing chop of all of those constraints and encumbrances that are deadlocked on each other and set everything free, liquify the system as it were, and then reclarify the core ideas that are there.
I think we took on a ton of risk and we may have gotten to slightly better places with GPUI1 in less time by doing it incrementally, but I don't think we could have gotten to as good a place as we are now. We would have spent too much time working around things that didn't even need to be in our way to begin with. That, in my mind, is the virtue of a rewrite.
Thorsten: What made you start this rewrite?
Nathan: It dawned on me slowly. It began as a very specific solution to a very specific problem, which was Derek, a contract designer that we brought on. He came in saying "I love to get my hands dirty. I love to build UI." He was working with the Tailwind CSS team. He's a really good hands-on carpenter of design, and he just couldn't get any traction in the old framework.
We'd had the same issues with Nate. He felt that he could go do stuff in Figma and but then had to try to get other engineers to implement it. But I think it was really creating communication bottlenecks and just weird organizational dynamics.
It dawned on me that this is a technical problem. That was the beginning of it. Oftentimes, I think a failure mode of engineers is that they try to apply technical solutions to human problems. But in this case, I really do think it was a human problem that just didn't need to exist if a more effective technical solution were in place.
Nate has a different skill set than me. I couldn't whip out a beautiful looking site or UI design as fast as he could, although I think I can get by. And Nate can get by in code, he's learning more and more every day, but we weren't making that easy for him.
Antonio: And for other engineers too, right? I remember anytime any of us had to touch UI code, it was always: oh my God, I hate this, why is this so frustrating? There was so much friction. The whole team was feeling it, not just Nate.
Nathan: We felt paralyzed, I think.
Thorsten: I've only seen the newest version—GPUI2. It's this
Flexbox-inspired DSL with div
s and .child
and a model & element framework
that manages state. What did GPUI1 look like? What was so painful about it?
Max: It was inspired by something completely different from HTML. It was based on the Flutter model of layout. It had a different, a little bit more restricted model of how elements are laid out. There was a single layout pass up and down the tree where parent nodes pass constraints to their children and then children decide solely based on that what their concrete size is going to be. It's strictly less powerful than the constraint-solver-based way with which Flexbox works.
Nathan came up with this adaptation of Flutter's model that we got by with for a while. One problem was you had to adopt that mental model of constraints coming down, sizes coming up.
But the other problem was our solution for trying to make the UI development workflow tractable with Rust compile times. We had this combination of Rust code for defining the element tree and we had what we called themes at the time, which were not exactly themes, but giant JSON blobs that contained every single property of every element.
We would load them at runtime so that we could do this thing where we had a TypeScript file that would would describe every property of every element. And we would use some amount of TypeScript abstractions at that layer to reduce duplication in that and have some idea of common styles. At this layer we would then pre-compile it all to JSON which the app would then load at runtime. We thought that that would allow us to tweak the UI at runtime and have elements resize.
But what it ended up doing was creating this split between the way the UI was expressed in the Rust code that sets up the elements and the TypeScript code that had to apply styles to every element. It ended up kind of not working out as we had hoped in terms of being able to dynamically style things. It was too complicated.
In theory, I thought it sounded good. We thought we could make it so that you could have UI themes and some developer could come write a new JSON blob that would completely restyle every element in the app, like define something like the Material theme for Zed all in JSON and everything would be configured at runtime.
But it ended up being just bad in practice. And everyone on the team felt that. Loading things at runtime didn't make us faster at developing.
So I remember, prior to Nathan starting on the development of GPUI2, my experience was that we kind of need to start over with our assumptions of what the workflow for building UIs in Rust should be.
Thorsten: Who proposed the rewrite? Did someone say "let's do a rewrite" and everybody went "let's go! this sounds good!" Or was there skepticism?
Max: I think we were at a certain point in the development of Zed where we had already developed a system and then did a lot of work for a while that didn't involve lots of buttons and forms and UI and stuff. It sort of sat at rest for a long time.
And Nate, the designer, got used to the situation and we hired a bunch of people and they went and did work that didn't involve new UI. So the problem was there but it just didn't matter for a long time.
But then, all of a sudden, we wanted to build channels and different collaboration features. And everyone suddenly said, "hey, how do I even do... how do you guys do this?" And the answer was, "We don't have a good flow."
There were multiple concrete instantiations of this problem. And, I think, Nathan who had created this initially, I think, he began to feel, "Everyone's asking me, how do you do this? And I haven't had a chance to solve it the way I want to yet. So I don't know."
Nathan: For what it's worth, I alone did not create the theme disaster. That was a group sin. [group laughs]
But I was responsible for the layout system. And, to be real, I kind of cargo-culted it from Flutter. Raph Levien gave me the idea to look at Flutter with one of his posts many years ago. But I never fully had my head wrapped around the Flutter way of thinking. And it came from a very different language as well, Dart.
I don't know, but I think a lot of how Flutter works was based on an assumption of a quick-to-compile language like Dart. Whereas we were much more in a scenario like the web. Anyway, sorry, I think I got off on a little bit of a tangent there. I just didn't want to claim sole responsibility for the mess that was our main thing.
Antonio: It didn't even start with "oh, let's do a rewrite." Right, Nathan? It was more just like, "I'm going to prototype this thing because I have this idea for how to implement it. I'm really excited about how Tailwind does UI and has this different way of thinking about UI."
I think it started as experimentation and prototyping. We had GPUI 1 and then there was GPUI 2 and at some point there was even a GPUI 3.
The original thought was to start incrementally, because a lot of what GPUI does is not just the UI part, a lot of it involves managing the state of the application. So the thought was to take only a small part of that and rethink it, just take on how we render stuff.
But as things went along the friction that we were talking about before, this problem of "okay, just want to change the UI, but this is all entangled with the rest of the framework" came up again, so that's how we ended up with the rewrite.
But to me the process of the rewrite was very incremental.
Thorsten: As far as I understand it, two or three weeks ago when switching the codebase to GPUI2 you essentially deleted the GPUI1 folder and renamed the GPUI2 folder. That means you had the two versions existing at the same time. How did you approach this? Did you build GPUI2 and flip back and forth with a feature flag or what was the process?
Max: Yeah, we had two build targets that were complete copies of the same
thing. In many cases, there'd be a crate that had two versions of it that were
90% the same. One which compiled against GPUI1 and one compiled against GPUI2.
And the process was that as soon as we start to migrate a component
from one version to the other—so for example: the project_panel
crate became
project_panel2
—we would stop writing new features on the old one. It was
frozen, unless we had to do bug fixes, which we would apply in both versions.
Nathan: The rationale behind this approach was that it would give us the opportunity to continue to share code. As it turned out, it was the most expedient process to just clone the crate and let go of the old code.
But if things took longer than what we'd hoped they were going to take or for any reason at all, we sort of had these two parallel systems that could freely intermingle code. We could drop another third crate, for example, that shared a bunch of commonality if we needed to. It just preserved a lot of optionality, I think, which we—making a high-risk move like this rewrite—just felt like we needed.
There was incrementality also early in the game, which is worth discussing. The GPUI2 crate for a while was taking GPUI1 and importing almost all of it. We basically got a prototype of GPUI2 running on top of GPUI1. GPUI1 also had a certain way in which an element was defined and we actually made a GPUI2 element a valid GPUI1 element. There definitely were a lot of holes, but it at least allowed Nate to start playing with the new framework and validate quickly that this actually was a place worth going to.
Thorsten: What's the timeline of this rewrite? You started on this when? In the middle of last year and it got merged three weeks ago, right? I remember when we first talked you were all in Italy and this was already in progress, right?
Nathan: Yeah, that was late October. The whole team started being involved in it when we got together in Italy in the last week of October.
But I spent the week prior with Antonio, and I spent the week prior to that sprinting my ass off to get things as far as I could get so that when Antonio and I were together and had that focused time, I really wanted to be in a place... I mean, I don't know, I thought the framework was close then, but it really wasn't at all. Antonio and I, we worked really long hours that week too. The goal was getting this thing ready so that the whole team can build on it. And, you know, we did.
But I think we could have taken on less. The view stuff was the screaming problem that just had to be solved, the way the UI elements and the layout and all that stuff worked.
Then there were other problems. The original GPUI 1 was designed originally to spit JSON at a Electron app. The very original UI of Zed at the very, very beginning was an Electron app that spun up a Rust binary. We experimented with a couple of different approaches. One was just running a subprocess. Then, much earlier, we had literally Rust embedded as a library that we were talking to via Node, via the V8 embedding APIs. You remember that? That was years and years ago.
Thorsten: When was this? This is pre Zed-the-company, right?
Nathan: Yeah, that was like xray era. So that was 2018
Thorsten: Wow.
Nathan: Yeah, there were a lot of assumptions that we were like spitting JSON at this Electron layer.
And because of this original design constraint, it was as if the model layer was weirdly aware of the presentation layer. You're sort of managing the state of this other process, these pools of JSON that need to go into and out of scope, etc. It was designed for that and then adapted.
And the directionality of ownership would get really weird. So I wanted to really tear a sheet and say that all the arrows point in the same direction now. Let's not have model code sort of aware of view code. It wasn't quite that gross, but that was another thing that felt really important to fix.
Another thing: this was the first Rust code that I really wrote, honestly. We
did a little bit of early on Rust with the CRDTs but in that world it was just a
very different problem. And I've become somehow obsessed with giving everything
a type, but all this GPUI 1 code we were using had usize
s all over the place
when we should have had a dedicated type. We had
Pathfinder's geometry types, because GPUI
went through an era where we rendered all the 2D graphics with Pathfinder, and
so we had all these kind of strange vector types. We didn't have fundamental
types that were being passed around everywhere. We didn't have the notion of
pixels in the type system, it was all f32
s, etc. We could have lived with it,
but while we were doing this—it's kind of how you get yourself to buy the most
the most expensive variant of something: "Well, while I'm doing that, I might as
well do that." It just kind of crept up. Yeah.
Antonio: We keep saying like it was risky, but I think it was never risky in the sense that, oh, what we're doing doesn't make sense. We weren't worried that what we're doing wouldn't make sense, or that we can't do this. It was more more an organizational risk. We have this other branch of the code, how do we adapt it and have the whole team work on the transition?
Nathan: And the risk too is: how long is this going to take? How many bugs are going to go out? And I don't even know if we know the answer to that quite yet.
Thorsten: What was your estimate on how long this would this take?
Nathan: The initial estimates was that we would have it done by the end of that team summit in Italy, at the end of October. That was what we were hoping, which was insane, right? So it ended up, I guess, taking two more months than that initial estimate. But as we got closer to that estimate, it was still fairly early on, we retrenched and set the estimate to end of the year. And we did hit that estimate.
I knew if it went much longer than that, then we shouldn't do it. Yeah, that's kind of how I felt.
Thorsten: Antonio, you said it wasn't a question of whether you can do it or not, it was more about how long will it take and other factors.
Antonio: Yeah. You have a team of people working on this code base and everybody has to coordindate with each other. And we also had to say to everyone how to migrate: clone this, create this, use this version—how do you orchestrate that process?
I feel that the summit [the team meeting in Italy] was really instrumental to that. And to ability to use Zed for collaboration was the key to be really productive during the summit. We had groups of three or four people on the same working copy, changing and copying different files. It was also really good timing wise, because it was a really good moment to share context on the assumptions of this new framework, what were the big ideas, and to just get on the same page. And after the summit, I feel like, I don't know, I think things went pretty, pretty smoothly.
Nathan: I feel like Zed almost enabled a totally different way of working. Zed plus Rust. You could actually have four people in a working copy, and be boiling down, or burning off hundreds or even thousands of compile errors.
Being together in real time, and having just a lot of raw work, fixing all these errors and stuff, but it could all be done in parallel. So there was this sort of hyper threading that was occurring as opposed to the typical process.
Thorsten: That's funny. That's what Antonio and I did this week. When we had to make a bunch of renames, he said to me: I start at the bottom, you start at the top. And I thought that it's really nice.
Nathan: Yeah, it didn't feel that nice until the, you know, the pasta. I just kept eating pasta. Pasta really solved the pain. But yes, it felt invigorating and everybody learned about the new framework. And it set us up to then separate and work on it independently.
Thorsten: Looking back, the three of you, would you say it was a success?
Antonio: I would say 100%. Yes, it was stressful, I don't know if it was a success on my health, or anybody's health. I don't know if we should mention this. But I'm happy where the code is at. So I would say yes.
Max: I'm happy where the code is at too. There were some super stressful moments—it kind of was like throwing a bomb into the team and the code base. But what's cool about that is it gave everybody this formative experience to gel. They were saying, they have a hand in rebuilding a lot of things that we hadn't touched for a while.
I think that, obviously, if you had perfect knowledge, you probably could have plotted a more efficient course to solve the problems that we set out to solve. But you also could've have gotten worse in a lot of ways. You could have not done anything. And we could have still continued to iterate along with something that was just harder than it needed to be. We could have, like Nathan said, inefficiently, gradually, one at a time, one PR at a time, stepped our way here. And that would have had challenges too.
I think it 100% is a success in that the risk is now over. We're going to stabilize it. We got it working now. And we solved the problem we set out to solve, which is that the views were painful to build and now they're not. It took a lot of blood, sweat and tears. But none of us could have at the beginning said, here's a better way to solve this problem in shorter time.
Nathan: Yeah, it was a success.
Max: I remember having 1on1s with different people on the team at the beginning of the rewrite. Back then we told everybody, "hey, building UI is a little bit hard, we need to make some changes. Nathan's working on a whole new kind of element system." At the time, that was the way I thought it was going to be: we're going to change the way elements worked in GPUI. And even that felt like a really big undertaking at the time.
So in 1on1s people asked how's that going to work if we just change the way all elements work. But I've worked with Nathan and Antonio long enough and seen how that worked, while having conversations with people telling us "you should be more incremental." It's happened in the past. I remember on Atom, Nathan getting a vision to rewrite the way the text editor worked, from concept A to concept B, and I would just have to say to other people—which is kind of a hard thing to say as an engineering manager—that is not typically the way we do things, but I would never bet against the combination of Nathan having a creative vision for the ideal way it should be and then Antonio coming and no matter what's broken, making it work. I would never bet against that working.
So I would just have to say, yes, this is not typical, but trust me like I trust them.
Nathan: Thanks for vouching for us.
Antonio: Very nice words, Max.
Nathan: But there was one moment where you were like, guys, this fucking sucks. [laughter] But I can't thank you enough, both of you, to be willing to do crazy shit. That's potentially ill advised because, I don't know, I've had my blunders over the years.
Thorsten: You fixed the problem of views being hard to work with, are there other things that are now easier after the rewrite?
Nathan: It's a whole lot easier than it would have been to blog about. That's one thing, which is important, because this is really the foundation of the entire system.
If anybody's going to contribute to Zed, they're going to need to understand the thinking that went in to the system. And what better way to understand the thinking than to present the thinking clearly in source code that we explain?
The old system was just sort of looking at the underlying concepts through, you know, a hazy filter. A lot of noise mixed up with the core ideas that were kind of in my head about how things should work.
Now it's expressed clearly, not perfectly, nothing's perfect, but a lot more clearly. So that's one thing. It's going to be easier to learn, easier to teach, and I think easier to work on. We have to port to two other platforms still. So I think that matters.
Thorsten: Nathan, you mentioned this before, that the goal was always to make Zed open source and you also said you wanted to do the rewrite before we go open source. So was this part of the motivation? First we have to get the code base to this state, and then we can go open source?
Nathan: I think Max mentioned it was almost like dropping a bomb in the middle of the team. You know, you could drop a bomb in the middle of a small team that's all together in a room.
Thorsten: And you hit all of them, yeah.
Nathan: Right. Oh, God. Yeah, that's a little dark. [laughter]
Thorsten: Yeah, maybe.
Max: New metaphor.
Nathan: You can give a small team in a room acid, but you can't give it to the entire open source community.
So that was another thing. There's an opportunity here to get a lot of benefits and get things in order and it's going to be really difficult to do that after open sourcing. It just would have been too much chaos, I think, to introduce.
I'm not sure we can ever do anything quite like this again. Hopefully we'll never need to.
Thorsten: Max, Antonio, do you want to add something here? Do you think it's important to clean up the codebase before going open source? There's a lot of great, successful projects where even the maintainers would say it isn't a perfect code base.
Antonio: I think it's important. For example, what we started doing is documenting functions in GPUI. I have a hard time imagining doing that in the old code base. Now I feel like anybody on the team could look at the code in GPUI and just document it. And I think that's important for open source.
The other thing is that for the vision we want to achieve, as a company, I think we need to be a really good open source project and we need to get all the help we can get from people. Because we're a pretty ambitious project and I don't think we can just do it by ourselves. We need everybody to participate in this effort, so I really think it's valuable.
Max: I think you have a point that open source code doesn't mean the code has to be perfect. I mean, is it perfect now? I would have open sourced it and stood behind it before and if there were weird things about it, I would have said, yeah, that's, that's the way it is right now. I think that's fine. You don't have to have your code base in line with your imaginary ideal before you open source, but it was something we wanted to do anyway. And I think there was an efficiency to it: if we're going open source soon, and we're going to rewrite to solve this problem soon too, then let's do the rewrite first, then open source, not the other way around.
Nathan: Open source takes energy and effort and engagement.
The one other thing I just want to say is that I'm just kind of unapologetically a perfectionist. I mean, I actually do spend time apologizing for it and I wouldn't say perfection matters to me but beauty matters to me.
We've been working on this shit a really long time. I worked way too hard to just not care about beauty. We're putting the code out there, so now it is part of what I'm expressing in the world. If I were in it for the money, I would have started a crypto coin in like, 2014 or whatever. The whole reason for doing this is really, more than anything, about self expression. And the code is a part of that expression.
Now, I think there's a failure mode of that where people become obsessed with the code for its own sake and all they do is admire how many monads they have or whatever.
But for the code to be beautiful and crafted, I think that's a valid thing. As long as it doesn't become a petty god.