Upcoming Changes in R 4.2.1 on Windows - The R Blog (2024)

R 4.2.1 is scheduled to be released nextweek with a number of Windows-specific fixes. All Windows R users currentlyusing R 4.2.0 should upgrade to R 4.2.1. This text has more details on someof the fixes.

R 4.2.0 on Windows came with a significant improvement. It uses UTF-8 asthe native encoding and for that it switched to the Universal C Runtime(UCRT). This in turn required creating a new R toolchain for Windows andre-building R, R packages and all (statically linked) dependencies with it(Rtools42,more details on the transition).

Using UTF-8 as the native encoding significantly reduces the number ofencoding conversion issues when working with characters not representable inthe encoding used normally by Windows, so e.g.problems with Asiancharacters on systems running in Europe, Americas or anywhere else wherelatin scripts are used.

R 4.2.0 has been regularly tested with CRAN and Bioconductor packages beforethe release, but several issues not covered by automated R/package testingand missed by the limited manual testing have been found by users after therelease. Thanks to users whoreported issues via R bugzilla,R-devel mailing list, R-help mailing list as well as private messages, soonafter the R 4.2.0 release, these issues were fixed for R 4.2.1. Moreover,the good news is that no major issues with the rather significant transitionto UTF-8/UCRT have been found to this date.

It would be nice to get more help from the R community volunteers with testing Rbefore releases, as detailed in a blog post from April2021.As far I can tell from when we are receiving bug reports, this is still nothappening much. Such testing doesn’t have to be only “manual”, a lot ofinteractive testing in principle can be automated as well, but in eithercase that requires effort and time that would have to be contributed.

Clipboard connection support in R on Windows (see ?connection and searchfor “clipboard”) was rewritten in R 4.2.0 to use Unicode (UTF16-LE) WindowsAPI interface to fix encoding issues(PR#18267).Unfortunately, there was an error in computing offsets in the connection streamwhich resulted in an bug observed during consecutive writes(PR#18332), fixed inR 4.2.1. This only impacted programmatic access to the clipboard via the Rconnections API.

It was a rather embarrassing omission of a pair of parentheses and apparentlyI was only testing the original bug fix using a single write operation, notmultiple. While fixing the bug with consecutive writes, I also found andfixed a spurious warning about an ignored encoding argument, which is aby-product of internal conversions to/from UTF16-LE inside the connectionscode.

Clipboard connection testing is for good reasons not allowed in automatedCRAN package checks (as clipboard is a user/system-wide device, regarded the sameas user’s home files pace,see CRAN Repository Policy),so the issue hence could not have been found that way.

Another issue found after the release was with the R Sys.getlocalefunction attempting to query an unsupported locale category on Windows. Thefunction is documented to accept also LC_MESSAGES, LC_PAPER andLC_MEASUREMENT categories on Windows, even though they are not supportedthere; Sys.getlocale returns an empty string.

The implementation used to call the C runtime function setlocale to obtainthe locale information even for LC_MESSAGES, and that worked in the past.But, it does no longer with UCRT when invalid parameter handlers are enabled(see ParameterValidationin MSDN).

By default, MinGW-W64 and hence applications built using Rtools42 disablethe invalid parameter handlers, so we have never ran into that duringautomated CRAN and Bioconductor package checking, nor during manual testingusing the “normal” builds. But, if R is embedded in an application builtusing Microsoft compilers, the invalid parameter handlers may be enabled bydefault and may terminate/crash R.

This has only been found after R 4.2.0 release inside RStudio which had thehandlers enabled. It was reported that rJava crashed duringinitialization, because it was using Sys.getlocale to query theLC_MESSAGES locale category.

The getlocale implementation has been fixed in R 4.2.1 not to query theunsupported locale categories. In addition, R-devel has been extended tooptionally enable these handlers for checking (via_R_WIN_CHECK_INVALID_PARAMETERS_), and CRAN package checks were ran usingthis setting. Luckily, only few packages have been affected. One packagetrigered invalid parameter handler by accidentally closing a handle twice,so attempting to close an invalid handle.

As usual, checking all CRAN packages is not only a service to the packagemaintainers, but also serves as a check for R itself.

Perhaps surprisingly, a number of users have found issues in Rgui after theR 4.2.0 release. This shows that Rgui is still actively used, and not onlydirectly, but also as an interactive R console window connected to andcontrolled by other applications (Dasher,Tinn-R).

Problems with transition to UTF-8 were somewhat surprising to me as Rgui hasbeen designed as a Unicode application and, using the GraphApp library,written to support Unicode characters not representable in the native/ANSIWindows encoding. Rgui has limitations in supporting non-BMP characters,but that was not the issue here. GraphApp, at least the version includedand customized in the R distribution, has two very distinct modes ofoperation: “Unicode” and “non-Unicode” windows. Both modes support workingwith characters not representable in the native/ANSI Windows encoding.

However, by default, “non-Unicode” windows are used in a single-byte locale(the native/ANSI) and in some contexts are also used by accident even onmulti-byte locale (due to initialization/bootstrapping issues). Hence,Windows systems of R users of languages using single-byte encoding havealways been using “non-Unicode” GraphApp windows, and it wasn’tdiscovered/reported that the “Unicode” windows lacked some features and hadsome bugs. As R 4.2.0 switched to UTF-8, a multi-byte locale, Rguistarted using “Unicode” GraphApp windows and these issues popped up. Thereports were from users from Europe and South America.

One of the consequences was that the accent keys (dead keys) almost didn’twork. Some were not supported at all and some couldn’t be typed withoutcombining them with the next character. The reported cases (and a number ofadditional I found while debugging) have been fixed. However, handling ofthese characters, at least in the form done in GraphApp “Unicode” windows,is very language-specific and depends on keyboard layouts. It is hencedefinitely not impossible that some accents via dead keys still will notwork: in that case, the best course of action is to use copy-paste (or someother input method common for the specific language) as a work around, andreport a bug. As a last resort, non-ASCII characters in string literals canbe represented using \u and \U escapes.

GraphApp “Unicode” windows are internally designed differently and respondto different Windows API messages. Hence, injection of text viaSendInput, as used in Dasher, didn’t work. Luckily this has still beenfixable and is fixed in R 4.2.1. Tinn-R used WM_CHAR messages, instead,and they stopped working as well. This seems unfixable without biggerchanges to GraphApp, because the “Unicode” windows are simply designed tohandle related messages differently, but Tinn-R luckily can switch toSendInput, which is also a better way to do text injection, despite it hasalso limitations (more detailshere). Ifthere are other similar applications that used WM_CHAR orWM_KEYDOWN/WM_KEYUP messages, the best/simplest course of action is toswitch to SendInput. Switching to embedding may be more flexible andreliable in the long-term, but require a much higher investment.

Rgui has a “Script Editor”, which is implemented using a RichEdit control(part of Windows). GraphApp has been using the ANSI (*A) interface to thecontrol, so one would expect that it should work with UTF-8 as it workedbefore with whatever was the ANSI encoding (even double-byte). However, itturned out that the RichEdit20A version of the control does not, it wasnot possible to copy and execute a line of R code which contained non-ASCIIcharacters (characters were received in the ANSI encoding, not respectingthat Rgui opted for UTF-8 in its manifest). However, the RichEdit20Wversion of the control accepts UTF-8 properly, even using the ANSI (*A).If any expert on these things is reading these lines, I would be happy for areview of the current code or for an explanation, as this doesn’t seem to bedocumented.

Rgui has also experienced a significant performance regression oftxtProgressBar. The progress bar is based on carriage return charactersand repeated rewriting of the previous state. Rgui has a not very efficientway of implementing these: it remembers the full history of the line,interpreting the carriage returns only on redraws. While redrawing a line,Rgui computes width of each character. So, every update of the progress baradds to the work to be done on the next redraw, and even previous linesshown in the window have to be redrawn, so, if one runs the progress barseveral times, the performance overheads are increasing.

This has only been detected in R 4.2.0 running in UTF-8, because UTF-8 is amulti-byte locale and a different code path to compute the character widthshas been used. It turns out that this code contributed long time ago to Rhad a bug in caching a locale identifier, so it was re-computed on everycharacter, plus an optimization for ASCII characters (relevant for theprogress bar) accidentally only took place after the broken caching. Fixingthis old performance bug in R fixed this performance regression in Rgui andpotentially will improve performance also on other systems where R is builtto use the internal width calculation.

Rtools42 have been updated and the official build of R 4.2.1 (at the time ofthis writing R-4.2.1 releasecandidate) willbe built using version 5253.

Compared to version 5168 used to build R 4.2.0, there is now also the tidytool for checking HTML in packages and a number of libraries have beenupdated, from which R itself and then all CRAN packages using those wouldbenefit: 15 out of those are used by R and recommended packages, see acomplete list fordetails. All CRAN packages have been tested (and where needed updated) for thenew versions. Note that CRAN packages are required to use libraries fromRtools when those are available CRAN Repository Policyhas more details).

For a summary of additional updates in R 4.2.1, see the NEWS file of theR-patchedbranch and lookfor “Changes in R 4.2.0 patched” (when still before the release) or to“Changes in R 4.2.1” (when after the release).

Upcoming Changes in R 4.2.1 on Windows - The R Blog (2024)
Top Articles
Latest Posts
Article information

Author: Greg O'Connell

Last Updated:

Views: 5614

Rating: 4.1 / 5 (42 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Greg O'Connell

Birthday: 1992-01-10

Address: Suite 517 2436 Jefferey Pass, Shanitaside, UT 27519

Phone: +2614651609714

Job: Education Developer

Hobby: Cooking, Gambling, Pottery, Shooting, Baseball, Singing, Snowboarding

Introduction: My name is Greg O'Connell, I am a delightful, colorful, talented, kind, lively, modern, tender person who loves writing and wants to share my knowledge and understanding with you.