Talk: “When Virtual Dom Diffing Is a Little Too Clever” at ComponentsConf 2019

I spoke at ComponentsConf 2019 in Melbourne on 10 September 2019.

From the conference site: “ComponentsConf is an Australian conference for front-end and full-stack developers with exclusive focus on JavaScript frameworks where world-class experts share unique insights.”

Here are the slides from my talk:

A loose transcript

I’m a coder and the creator of a typing practice app called, “Typey Type for Stenographers”, which I made in React for budding court room reporters and mechanical keyboard enthusiasts to learn how to type over 200 words per minute.

Now, there’s a proud tradition of blind stenographers, so I wanted to create an app that was particularly accessible to people with visual impairments. One day I found myself down a rabbit hole trying to create a great experience when presenting material for steno students to type.

An Italian stenographer that uses a screen reader reported a weird bug to me where seemingly random letters were being dropped from words they were supposed to type. Virtual DOMs give us an excellent developer authoring experience but their behaviour can have surprising implications. We’ll get back to that bug later.

Virtual DOM

First up, the virtual DOM (VDOM). This is the concept of an ideal or “virtual” representation of the User Interface that’s kept in memory and synced with the “real” DOM using a library, such as ReactDOM. In React, the syncing process is called reconciliation. I’ll focus on React today but there’s similar Change Detection in Vue and an Incremental DOM in Angular, as well as similar limitations across each.

The DOM or Document Object Model includes elements such as paragraph tags that may contain nodes such as #text nodes and #comment nodes. Here you can see that I’ve arbitrarily split a whole piece of text across 2 text nodes. This can happen for any number of reasons.

On the console of your developer tools you can use .childNodes to see the live NodeList of that paragraph’s child nodes, which shows that we do in fact have 2 text nodes and a comment node here inside this 1 paragraph element.

Now in a Virtual DOM, when you render a JSX element, the entire tree and every single object in it gets updated. This can be fast because nothing gets drawn on-screen for the virtual DOM. React then works out the differences between the virtual and actual DOM—diffing—then efficiently take the steps needed to manipulate the DOM to achieve the intended virtual DOM state:

Screen readers

A screen reader is a type of assistive technology that you might use to interpret and navigate a web page. A screen reader, such as JAWS, NVDA, or VoiceOver may read a web page’s contents aloud to the user, which is especially useful for people with visual impairments such as our Italian stenographer that might otherwise have difficulty reading your page. For this button, you might hear, “Start typing, button. You are currently on a button. To click this button, press Control-Option-Space. “

ARIA live regions

ARIA Live regions are one useful tool in your accessibility toolbox to improve the experience for people using screen readers.

While a screen reader has focus on one part of a page, such as the button, you might want to add content elsewhere on the page, such as adding an alert to notify the user to new content, or an error, or you could update content elsewhere like stock prices, train schedule times, or the number of tickets left. A live region may be used to announce that dynamically updated content elsewhere on the page, without changing the screen reader’s focus. The live region is often specified using an aria-live attribute, usually on a div, and mostly with a value of polite or assertive:

<div aria-live="polite">
  {message}
</div>

Polite means the screen reader will wait politely for an appropriate time to start the announcement such as when you’ve stopped typing. This is often what you want.

Announcing dynamic changes

When you’re announcing dynamic changes, for live regions to work, they must be present in the document so the browser and screen reader knows it’s there. Then any newly added dynamic content will be announced.

Correctly rendering ARIA live regions (this could work)

<div aria-live="polite"
  {this.state.message}
</div>
<Schedule>
  {this.state.value}
</Schedule>
>

Failing to render ARIA live regions (this won’t work!)

<Schedule>
  {newMessage ?
    <div aria-live="polite">
      {this.state.message}
    </div>
  : null}
</Schedule>

If you add a new aria live region already filled with content to a page, it won’t be announced because the browser and screen reader wouldn’t be monitoring it for changes. This means you can’t conditionally render the div as needed. You need a persistent region on the page, then add your content to it.

aria-atomic=“true”

If you change content in a live region, only the changed parts will be announced, unless you add aria-atomic=“true” to re-read the entire region when content changes:

<div aria-live="polite" aria-atomic="true">
  {message}
</div>

For example, if a live region without aria-atomic says, “The estimated time of arrival is 2.30pm.” and you change the time to…

<div aria-live="polite">
  The estimated time of arrival is 2.30pm.
</div>

… 2.40pm and add a note…

<div aria-live="polite">
  The estimated time of arrival is 2.40pm.
</div>

… “ There’s been a delay.”, instead of just reading the change parts, “2.40pm”, “ There’s been a delay.“…

<div aria-live="polite">
  The estimated time of arrival is 2.40pm.
  There's been a delay.
</div>

…an atomic live region would say, “The estimated time of arrival is 2.40pm. There’s been a delay.”

<div aria-live="polite" aria-atomic="true">
  The estimated time of arrival is 2.40pm.
  There's been a delay.
</div>

Virtual DOM + aria-atomic

When a virtual DOM gets involved with aria-atomic, something different happens.

<div aria-live="polite" aria-atomic="true">
  The estimated time of arrival is {time}.
  {delayMessage}
</div>

The reconciliation process efficiently changes the text node directly. Earlier we saw text nodes in developer tools, which React would change in place without re-writing the element. This can confuse a screen reader. For example, an NVDA user of your React app may hear the content in an atomic live region announced repeatedly, once for each character range that’s changed in your live region.

<div aria-live="polite" aria-atomic="true">
  The estimated time of arrival is {time}. {delayMessage}
</div>

We’re changing 2 character ranges in this case so you might hear this entire message announced 2 or more times. Without aria-atomic, the NVDA user might hear only the changed characters in each range, so they might hear “4”, “There’s been a delay.”

Double announcers

To work around this, you might skip aria-atomic and instead use double announcer regions. You put not 1, but 2 live regions on your page:

<div aria-live="polite">
  {message1}
</div>
<div aria-live="polite">
  {message2}
</div>

They both start empty:

<div aria-live="polite">
</div>

<div aria-live="polite">
</div>

To announce something, you add the content to the first live region:

<div aria-live="polite">
  The estimated time of arrival is 2.30pm.
</div>

<div aria-live="polite">
</div>

To announce something else, you clear that region and add content to the second region:

<div aria-live="polite">
</div>

<div aria-live="polite">
  The estimated time of arrival is 2.40pm.
  There's been a delay.
</div>

To announce a third message, you clear the second region and add content to the first region again, and continue alternating like this.

<div aria-live="polite">
  Cancelled. Find alternative transport.
</div>
<div aria-live="polite">
</div>

This mimics aria-atomic behaviour and gives you full authoring control over the announcements. This approach is handy for any live region that will be updated more than once.

The double announcer approach is also used in these packages:

Vue was inspired by React and React was inspired by Angular, which is a great reminder to see what we can learn from other tools and frameworks.

Keep testing

My recommendation is that you keep testing. When you’re thinking about changing frameworks or tools, such as moving from jQuery to React, even a direct port that ought to behave the same might have unexpected consequences that directly impact your user’s experience, so test all your changes with a variety of real users with their real devices.

Even if you diligently replace all your divs…

<div className="list">
  <div className="list-item"></div>
  <div className="list-item"></div>
</div>

… with the right semantic elements or best practices, they might not work in a virtual DOM world.

<ul>
  <li></li>
  <li></li>
</ul>

Beyond ARIA live regions, you might encounter issues with CSS animations. These CSS animations are written without React:

.my-node-enter {
  opacity: 1;
  transition: opacity 200ms;
}
.my-node-exit {
opacity: 0;
  transition: opacity 200ms;
}

You might then need to restructure them to work with react transitions:

.my-node-enter {
  opacity: 0;
}
.my-node-enter-active {
  opacity: 1;
  transition: opacity 200ms;
}
.my-node-exit {
  opacity: 1;
}
.my-node-exit-active {
  opacity: 0;
  transition: opacity 200ms;
}

Even if you diligently build and test something into a reusable component, that doesn’t mean it will work in production. For example, if you have a design system with a notification component that includes well-tested double announcer regions, it might still fail in app if the API doesn’t make sense and when you try to use the component by adding the announcer component to the page at the same time as adding the message to the live region:

{showNotification ?
  <NotifyWithDoubleAnnouncer>
    {message}
  </NotifyWithDoubleAnnouncer>
  : null }

Web Speech API: speech synthesis

Outside of double announcer regions or careful page design, for a small niche of features, you might use speech synthesis using the Web Speech API. This approach won’t be appropriate for a lot of announcements, but just maybe it helps you or gets you excited about this new API.

For the speech synthesis part of the Web Speech API, support is not too shabby in modern browsers, but you need to be mindful of cross-browser behaviour and text-to-speech engines available across operating systems.

Getting back to my steno typing app, the missing letters were initially being dropped because I wasn’t using aria-atomic, leading me to learn about double announcer region. Then, using the Web Speech API’s speech synthesis, I added a feature. For each new word that you need to type, I create a speech synthesis “utterance” for the word and call the speak method with it on the window’s speechSynthesis object. This produces synthesised speech for each word as I supply it using a JavaScript API, which is closer to playing an audio file than using a screen reader:

let synth = window.speechSynthesis;
if (window.SpeechSynthesisUtterance) {
  let utterThis = new SpeechSynthesisUtterance("notevole");
  synth.speak(utterThis);
}

This lets you practice typing with live dictated text. For court reporters and real-time captioners, it’s really valuable to practice typing spoken words. Because of the bug, this web speech dictation became a feature in my app, not just a workaround.

Constraints and creativity

Constraints lead to creativity. The next time you are disheartened by cross-browser, cross-device compatibility bugs, I hope you think of this example where my users came away with an extra feature and we came away with a deeper understanding of Virtual DOMs and accessibility.


To learn more:

In case you’re wondering about the font families used in the slides:

Got questions?

If all of this interests you, I suggest subscribing to the design & dev topics of the newsletter.