-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Vintage post: Project Cascadia Tech Stack Investigation
Important
What follows here is notes from a tech stack investigation I did circa 2018, when we were first starting to look at building a new terminal for Windows. These are my unfiltered notes into what kind of tech stack we should use to build it.
What we didn't consider at the time (and later made this debate more simple) was that we could just re-use the entire console text buffer, renderer, and parser, all together. At the time this was written, those components were more tightly coupled with the rest of the console.
These notes are being shared because they might be interesting to someone. A curious artifact from a moment in time long since past. We did end up using a C++WinRT native application, using XAML Islands.
Stack | Idle, Empty Buffer | Idle, full buffer | Scrolling | Worst-case |
---|---|---|---|---|
VT#(WPF+no renderer) | 34MB | 93.7MB | Peaks at ~100, mean 95MB | |
VT#(WPF+GlyphRun) | 35.9-38MB | 84.6-95MB | see above | |
VT# buffer+UWP | unavailable | |||
xtermjs+UWP | 46.3MB | 69-73.1MB | 102-117, mean 110 MB | |
xtermjs+WPF | 28(+29)=57MB[1] | 37(+48.4)=86MB | 33.4-34(+101)=135MB | |
Prototype benchmarks | ||||
WPF+DX+native Buffer [4] | 22.4MB | |||
VT#+optimal buffer [5] | 17.1MB | 19.6MB | 29.2MB | |
Other benchmarks | ||||
conhost+gdi | 6.8 MB | 6.8 MB | 6.8 MB | 22.5 MB |
C++winrt CoreWindow [3] | 2.0MB | |||
C++winrt Xaml app | 11.5MB->9.8 | |||
CX UWP with nothing | 10.0MB | |||
C# UWP with nothing | 10.8MB | |||
UWP with webview | 21.6MB | |||
WPF with nothing | 24.3MB->14.9 [2] | |||
WPF with webview | 25.2+16.8=42MB |
[1]: A WPF WebView Control uses an out-of-proc web server for it's content. So there's a bunch of memory it's consuming external to it's process tree.
[2]: The empty WPF started at 24MB, then when coming back from lunch it was down to 14. Presumably garbage collector ran? Note that VT# with the glyphrun renderer sitting empty did not seem to repro this memory decrease.
[3]: A CoreWindow by itself doesn't give us any XAML, only a DX surface. So it's not really relevant, as we'd need to implement all of the Fluent features ourselves.
[4]: These prototypes used a fake char[] to emulate a hypothetical implementation. The assumtion was made that the buffer was (9000 lines * 80 cells per row * 11B per cell). This is a very rough estimate of what a real buffer would use - a more optimal solution would not need 11B per cell, nor 80 cells in every row (except worst case), though there will probably be additional overhead introducing helper containers and other std types.
[5]: This removes the existing VT# buffer implementation in favor of an "optimal" 9000x80x11 char buffer. This would be an effective lower bound on the footprint of the VT# implementation, if the buffer were implemented as just a block of bytes, without any data structures to abstract the implementation.
[6]: This is about the same as 5. It represents the worst-case screnario for the buffer, where every single row is totally filled with 80 cells with different attributes. This is the value that should be compared to the [4] entries, which represent the worst-case buffer scenario in c++.
This is using our hackathon implementation with some refactoring. Not optimized, but works well enough.
- Can be used in WPF and UWPs
- Work on renderer, parser might contribute back to conhost.
- Buffer could be greatly optimized from current state.
- optimal empty buffer is only 17MB (x2.5)
- optimal full worst-case buffer is only 29MB (on par with conhost)
- Entire project is in a single language (C#)
- C# will be faster for long-term dev work, more external developer excitement
- Work might not contribute back to conhost.
- VT adapter is incomplete (good enough for conpty, not enough for ssh)
- Buffer, Core in general is incomplete
- Initial buffer implementation leaves much to be desired
- in WPF: 5x,14x increased memory over conhost (empty, full buffer)
- in WPF: 5x,14x increased memory over conhost (empty, full buffer)
- Could another render head potentially save us some memory?
- presumably not, without any GlyphRuns the implementation as-is was already bigger than xtermjs
- Would it be possible to use the DxEngine as a renderer for this option?
- Would enable reusability of some inbox components
- Might be a perf penalty to pinvoking for each Engine call (for each
StartPaint
,InvalidateRect
,PaintBufferLine
, etc)
- adding a webview adds 11MB
- adding a webview with xterm.js adds 35MB (at an empty buffer)
- adding a webview adds 17MB (+>=16.8MB in Desktop App Web Viewer)
- adding a webview with xterm.js adds 32.7MB (at an empty buffer)
- is the WPF doing scary out-of-proc magic?
- YES. Each WPF Web View adds 16.3MB of out-of-proc commit in Desktop App Web Viewer.
- UWP does not seem to have this OOP server.
- is the WPF doing scary out-of-proc magic?
UWP and WPF web views seem comparable in size.
- Existing developer community
- Existing implementation, test coverage for core
- Complete VT implementation
- Contributions to xterm.js benefit many 3rd parties
- Can be used in WPF and UWPs
- Work on xterm.js won't contribute back to conhost
- Need to work in the js ecosystem
- console devs have minimal experience in this area
- Daniel Imms would help transition, own JS bits
- in UWP: 7x,10x increased over conhost (empty, full buffer)
- in WPF: 10x,12x increased memory over conhost (empty, full buffer)
- Debugging JS to C# issues will be painful
- No amount of optimization could we do to improve the webview footprint (roughly 20MB)
- unknown c# to js throughput
- How do I translate the window size into a buffer size? Resizing is always tricky, but now only JS knows how many characters fit in the window. We'll have to do some magic to figure that out I think.
- What kind of overhead is there sending data from C# to JS? I haven't been able to measure this.
- Work will contribute back to inbox console
- Console team is already familiar with c++
- minimal memory footprint
- (WPF,UWP) = (3x, 2x) over conhost
- Can be used in WPF and UWPs with some hassle
- no existing implementation, need to start from scratch
- Renderer and Parser are done, but buffer, adapter, connection, ux, uia, tests will need to be written
- complicated renderer interop/ux layer
- The "Core" component is now a DX render engine + the Terminal Core, and each UX layer has different ways of embedding that DX component. If you wanted to implement a C# renderer, you'd have to create a winrt wrapper around the Terminal
- Each user input is a pinvoke into the Core's UX layer (not terrible)
- WpfDxInterop is abandoned circa 2015. Support would have to come from us.
How do we effectively abstract DX across UWP and WPF?
- WpfDxInterop uses a
IDXGIResource
in theImage
'sOnRender
event - UWP uses the Composition APIs, or a SwapChainPanel or a SurfaceImageHost,
which each have different ways of interacting with DX
- The SwapChainPanel has a SwapChain. For perf reasons, we are limited to 4 swap chains per app. Could we theoretically have one SwapChain for multiple panes?
Is 20MB of overhead for the webview worth the kickstart we get on development?
Hypothetically we could improve the VT# buffer. At empty, with xterm.js we're already at (57,46)MB (wpf, uwp respectively).
How valuable is reusability of the existing components (DxRenderer, Parser) vs the speed of development and relative simplicity of a pure-managed solution?
While working on these benchmarks, I used the following as the math on what an "optimal" buffer layout might be like. It consists of TerminalColor's that are 4B, and TextRuns that are 11B total. In the worst case, each cell in a row has different attributes, requiring one text run per cell.
4B for TerminalColor
00drIIII - isDefault, isRgb, ColorTableIndex
RRRRRRRR - Red
GGGGGGGG - Green
BBBBBBBB - Blue
11B for a TextRun
00000uib - isUnderlined, isItalic, isBold
TerminalColor - foreground
TerminalColor - background
LLLLLLLL - Length
LLLLLLLL - Length
worst case bound buffer [9000 * (80 * 11)] = 7734KB as it's buffer
Its possible that this could be further optimized - talking with Daniel Imms, he suggested that we'd have a separate map of index->attributes, where attributes is a (fg,bg,meta) struct, which means each run would only need an index into that map. That would mean the worst case isn't only just having every cell in a row different colors, but having every cell in the ENTIRE BUFFER different pairings of colors, which is a wildly less likely scenario.
However, I didn't really want to implement that a bunch of times, so I used one that's more similar to conhost's for benchmarking's sake.