Bringing a little tidy magic to creating flowcharts in R, how data.table is addressing recent shifts in R's C APIs, and another showcase of R's visualization prowess in the realm of brain imaging.
Episode Links
Episode Links
- This week's curator: Ryo Nakagawara - @[email protected] (Mastodon) & @rbyryo.bsky.social (Bluesky) & @R_by_Ryo) (X/Twitter)
- Flowcharts made easy with the package {flowchart}
- Use of non-API entry points in data.table
- Intro to working with volume and surface brain data
- Entire issue available at rweekly.org/2025-W04
- flowchart - R package for drawing participant flow diagrams directly from a dataframe using tidyverse https://bruigtp.github.io/flowchart
- Mermaid - Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown https://github.com/mermaid-js/mermaid
- DiagrammeR - Graph and network visualization using tabular data in R https://rich-iannone.github.io/DiagrammeR/
- RNifti https://github.com/jonclayden/RNifti
- gifti https://github.com/muschellij2/gifti
- CRAN Cookbook https://r-consortium.org/posts/user-friendly-technical-cookbook-style-cran-guide-for-new-r-programmers-ready/
- 2024 Posit Year in Review https://posit.co/blog/2024-posit-year-in-review/
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)
- Mike Thomas: @[email protected] (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter)
- Secrets Abound (Matoya's Cave) - Final Fantasy: Random Encounter - Midgarian Sky - https://encounter.ocremix.org/
- Succumb to the Wilderness - Wild Aarms: ARMed and DANGerous - Level 99 - https://armed.ocremix.org/
[00:00:03]
Eric Nantz:
Hello, friends. We are back with a 292 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the awesome highlights and additional resources that are shared at this week's Our Weekly Issue. My name is Eric Nantz, and I'm happy you join us from wherever you are around the world. Hopefully, staying warm depending on where you are in the world because it is frigid over here in my humbly abode here. But I'm warming up with this recording, and, of course, keeping me all warm and fuzzy in terms of, you know, cohosting is my awesome cohost, Mike Thomas. Mike, how are you doing today? Doing well, Eric. Yeah. Thankfully, these highlights are hot,
[00:00:37] Mike Thomas:
because in Connecticut, it is just as cold as I'm sure it is in Michigan, right now. So it's pretty out, though, here. We got some we got some nice snow.
[00:00:46] Eric Nantz:
Yeah. That's true. It hasn't all melted yet here. And when the kids see the sun, like, I don't want the snow to melt. They're like, it's not gonna melt at 0 degrees, buddy. It's not. No. Not here anyway. So but as you said, we got some fun, hot topics to talk about in the highlights this week. And, of course, this is a community project. Right? So we've got our curator of the week. This time was Ryo Nakakawara, who is one of our OGs in the curator space of our weekly. And as always, he had tremendous help from our fellow our weekly team members and contributors like all of you around the world with your poll requests and suggestions.
And we lead off with a visualization style of a package that definitely has a lot of utility in terms of the scope of it, especially in my industry. And we're gonna dive into it here. And this post is actually coming to us from the RPost blog, and it's a guest post by Paul Satorra. Hopefully, I said that right. He is a bio statistician. And in this post, he talks about introducing a package that he's created for the art community, now on CRAN, called flowchart. The name should make it pretty intuitive of what it does, and it helps you create flowcharts of R. Now you may be thinking, and this show and other, you know, other, presentations or resources, creating flowcharts, there's a lot of different ways of doing this. Right? Especially in the in the realm of HTML style outputs.
We we've been using I know myself and Mike, I believe, have been using frameworks like Mermaid. Js within our quarter or our markdown documents. So there's definitely ways of creating flowcharts there. I also was a heavy user of the diagrammer package from long ago. That was, helping me out quite a bit with creating may not be necessary flowcharts, but definitely things like decision tree outputs and and, you know, choose your own adventure kind of layouts. But what flowchart brings differently than the rest of those is, in essence, a very tidy interface to make all this happen.
So let's dive into this a little bit from the post. So first, we will have, of course, a link to the the package and the episode show notes here. But as I mentioned, it is on CRAN, so it's all just an install dot packages flowchart away. And it actually, you know, requires you to bring your own data. So for the case study in this example, there is a built in dataset from a publicly available clinical trial set of results called saffo, and it is actually about the journey of patients throughout the life cycle of a trial. When I say journey, I'm thinking in terms of what is the result of their status, and this could be that they are randomized to the trial, I. E. They get one of the treatment assignments, or they discontinue after the randomization for a various reason, or they end up completing it. There are many other nuances in this, but this isn't a clinical trial podcast, so I'll I'll stop there. But the the data is built into the package.
And, basically, in order to register this data to be available for a flowchart, you start off with feeding this dataset name into a function called as underscore fc. And this is basically gonna turn your data frame into a list object with 2 components here. One of which is the original data going into it, and you may be wondering what does the data format look like. In the case of this example, each observe each row is what looks to be a patient with a rant with a unique ID, and then the columns are the different kind of flags in terms of where they're at in the in the trial and what the statuses is. So, again, there is a vignette that describes this data set in more detail, but it's basically a bunch of yes or no type variables for what happened to that patient in the trial, whether they they were, you know, in they met the inclusion criteria, whether they had chronic heart failure or whatnot. Again, you can take a look at the data in in the episode show notes.
So once you feed this in that he has underscore fc, now you may be wondering what do we do with this? Well, you can just simply draw a very bare bones flowchart or one cell with one function called fc_draw, where if you feed in that original data or that original object, you're just gonna get a box with, an optional label of your choosing if you want. And this time, it has, like, all 925 records in one box saying that these are all the the patients inside. Well, that's that's boring. Right? Well, let's start actually having some flow in this flowchart. Right? So that's where the tidy interface kinda comes in here where you can feed in this dataset.
Again, make it a flowchart object with as underscore fc, and then pipe that further to an fc_filter object. And this is where you can perform what looks to be like dplyr manipulation with its filter statement. And in this case, in this first example, we want a simple filter to determine if the patients were randomized or not. Now there is no column for whether they are randomized or not, but there is a grouping column, which in essence acts like that because it determines what treatment group they were assigned to. If that column is missing, it means they weren't randomized in it. So in the case of this example, the filter for rather that group variable is not missing. So an exclamation point is dotna group, and then you can give it a label.
And then you can also show who were not meeting that filter. And that's gonna be automatically labeled in a box called excluded. Then when you draw that, then you get that original box of the 925 patients, but then there are 2 arrows going away. One arrow goes to the right, and it has this excluded box. And then the arrow going down has another box that, the author's label is randomized, which has now 215 patients. So, obviously, not many patients made to that randomization step, but this is a very similar format that we do in a lot of our clinical trial reports, and we get to what's called the disposition section where it shows the flow of the patients that meet certain criteria and who end up actually completing the trial.
So this look quite familiar to me, but you can do a lot more than just that single filter. Right? You can also, at that next step where it show those randomized patients, you can now split that into different boxes as well. You might call parallel boxes, and you can use a function called fc_split. You give it the group, the variable that determines the grouping of that split. In this case, it is simply group, and that's now gonna partition that randomized group into 2 boxes of the 2 different treatment groups. Again, pretty straightforward, pretty neat tidy interface here, and you can do even more manipulations with that fc filter applied to those, you know, middle boxes that we just created at the treatment groups. And you can read the example more in the post here, but, again, it's really just using the fc_filterfunction.
And then you'll see these boxes kind of in parallel chains or or or, you might say trails going down, but the boxes are all parallel next to each other for the equivalent kind of steps. So in essence, the flowchart looks pretty darn polished already. And again, with a tidy interface, I think it is a this is a great package to put in your toolbox if you just want something quick to the point of a familiar tidy verse kind of piping syntax, and you could feed this into whatever document you choose. I could see this going into a quartile document or markdown, you know, whatever have you, whether it's HTML or PDF format.
It looks like it's gonna output these image these, flowcharts as image files, perhaps, although I haven't tested that myself. But it is definitely an interesting paradigm if you want in if you know what your data going into is fairly straightforward, which it is in this example. And you may not necessarily need the additional customization that you get with frameworks like Mermaid. Js and whatnot or diagrammer or some of these other packages past. So, again, you might find a great use case for this. As me as usual, I like having choice in the way I I, perform these flow or I construct these flowcharts. There could be cases where this might not quite fit your needs. If you have more, you know, customized kind of directions of the flow, maybe things kind of feedback to an above step. In that case, maybe mermaid. Js is is a better fit for you. But like I said, for this kind of flowchart where it's a pretty predefined start and stopping points or might say finishes and kinda the trail of where this flowchart goes, I think, Paul's package could be a could be a great fit for your toolbox.
[00:10:09] Mike Thomas:
I agree, Eric. Yeah. I make flowcharts literally every day. They're the way that I communicate with both my team and our clients about sort of the the end to end process that we're going to undergo to to get them to the solution, because that's that's how we bridge the gap. And there's you know, a few years ago, there weren't a lot of great flowcharting tools that integrated well with version control. You know, we used some like Lucidchart and Visio, But you really had to, like, export those as as PDFs or maybe host them somewhere that folks could go take a look at, but not scriptable, not easily versionable.
And nowadays, you know, as you said, there are better options, mermaid. Js being one of them, the diagrammer package being another one. But I'm really impressed with this flowchart package here. You know, it's it's really easy and simple syntax to get started, very tidy friendly for developing these flowcharts. And when you look at it on the surface, especially in some of these examples, a lot of these functions are really just taking one argument. And there, you know, isn't doesn't appear to be a lot of customization, but that's actually because there are a ton of other arguments that have default parameters that can be changed if you want to. You know, originally, I thought that this was a package that, you know, was just very simple syntax, you know, made a lot of decisions for you. And with the default parameters, they do, but you still have a lot of control over all sorts of different types of things, like the direction of the line, within that fc filter function, whether or not you wanna kick those filtered observations out to a flowchart node on the right or on the left, you know, font styling, font size, font color, things like that, rounding for the number of digits that are gonna get displayed, the background color of the node itself. So if you actually take a look at, you know, some of the the the reference, the reference page of the package down site, which is what I'm taking a look at, and click into some of these functions, you'll realize that there are a ton of arguments behind the scenes that are almost ggplot like in terms of the amount of control that you have over each element, in your flowchart. So I'm I'm pretty impressed pretty impressed with some of the new, features and, you know, some of the the interesting functions that they have in here that I'm not sure I've seen anywhere else like fcmerge and fcstack, which allow you to actually combine 2 different flowcharts either horizontally or vertically. I thought that's pretty interesting. Maybe could help in your workflows depending on how you're sort of modularizing your code. So really impressed.
Honestly, just on sort of a side note, I really like the the HEX logo as well. I think it's really cool. And I'm excited to start to play around with flowchart as well because I had not come across it until today. So great way to start off the highlights this week.
[00:13:05] Eric Nantz:
Yeah. I can see a lot of convenience here. A lot of ways just cut that chart done, like I said, for pretty straightforward datasets. And, yeah, I'm gonna show this to a couple of colleagues here as we're thinking about ways of using R and more of the document generation space, especially for these more, I'll be I'll try to be polite here, rigid set of documents that we have to do in my industry. We're we're slowly trying to feed R into these things. And in fact, there is a section that we often have in what's called our analysis data reviewer's guide where we talk about kind of the flow of how the programming works, wherever you go from dataset to program and then to output.
Perhaps flowchart could be useful in that too. So I've got some, I got something to share to some colleagues, I think, later on today. So, yeah. Credit to Paul for sharing this package with us, and, yeah, choice is good as they say. Pray for the day when our audience
[00:14:01] Mike Thomas:
finally lets us do dynamic documentation.
[00:14:03] Eric Nantz:
You can't have everything, Mike. Alright. And our next highlight here, we're gonna shift gears quite a bit because we're gonna get really in the weeds technically here, but with a pretty fundamental issue that I think has affected quite a few package authors in recent months and maybe even the recent year, year and a half, having to do with best practices and recommendations for authoring packages that have to do with more than just, you know, new r code in the package itself. In particular, we're gonna talk about what you wanna do when you extend a package with another language, mainly the c language in your next r package, and some of the learnings that have been that have been shared from a very influential package in this space.
So we are talking about the latest blog post that comes on the data dot table community blog, which has been featured, quite a bit in last year's, section of the highlights. This post comes to us from Ivan Krivlov, and he leads off with the tagline about using non API entry points in data dot table. Now it's amazing. In 2025, I think when most people think of APIs, they're thinking of those web APIs. Right? No. No. No. We're not talking about that here. API actually is a historical term in software development. We are talking about ways you can interface with the language in different constructs or different perspectives.
And in particular, we are talking about the API that the R language itself exposes to package authors via its integrations with the c language. So really getting, you know, setting the stage here from since the beginning of r itself, there has been, you know, the canonical reference if you wanna build something on top of r, which ideally is a package or perhaps are even gonna contribute to the language itself, there is the writing r extensions reference. This can be found directly on the r project homepage, and this is what the CRAN maintainers are using as reference for any new package that's coming into the our ecosystem.
Certainly, there are a lot of automated checks in place to make sure a lot of the principles in these in the extension manual is met. But in particular, why we're talking about this in this post is that within this manual, there are, in essence, entry points that are defined by the R maintainers to interface with the C API of R itself. And in particular, there are 4 categories that you'll find in this manual. 1st is literally called API, and you can think of these as the ones that are documented, they're declared for use, and that they will only be changed if the CRAN maintain or if the r, you know, maintainers end up deprecating that particular API call.
Then you get to the next 3. There is the public designation, which these are exposed for use, you know, by our package developers, although they aren't really documented and they could change without you knowing it. So you could think of this as like maybe in a package you have a function that is technically there, but you don't export it to the user with user facing documentation. But like any package in r, you can look at or use any function in the package with the namespace, you know, prefix between 3colons and the function name.
So again, some call that off label use. You your your terminology may be different. There is another category called private. These are used for building R, and, yes, they are exported, but they're not declared in the header files of R itself. And they say point blank. Do not use these in any package. Do not. No. None at all. And then you get to hidden. This one really is peculiar to me. They are entry points to the API that are sometimes possible to use, but they're not exported. But I think it kinda goes by the name of it. You probably don't wanna touch those. So, historically, there's been no consternation from the R maintainers or the R package authors that the the the header the entry points designated as API, all good. Right? Should be able to use those.
However, there has been a bit of discourse around the use of the public ones because they're not documented. They're not forbidden by our command check, and they've been there for a while. However, there has been a little bit of, you know, modification to the language itself where some of these, to be able to use these, there may have been either somewhat we call escape patches put in by, you know, having a header called define use our internals that was used by package authors in the past to kind of get around maybe some potential issues. Well, that you might call escape hatch or loophole was kind of closed, in recent versions of r.
And then the number of non API blessed calls grew a little bit in between package or between R versions. And, also, another, you know, discussion on the R development list is where is the framework or the header of the library called alt rep fit into this, which got a lot of great press in recent years in the r community about being a more optimized way of operating on vectors. And, in fact, I I was had the pleasure of speaking with, Gabe Becker numerous times who was influential in getting alt rep into the language itself, although it was certainly labeled experimental in those times.
So fast forward a little bit, but there's been a little confusion into, like, which of these API calls, you know, are really ready for package you package authors and whatnot. Luke Tierney on the on the our team, he's actually worked program to try programmatically describing these exported headers, these exported symbols, and to be able to, you know, give a little more clarity into what package authors can can can use. And he's, you know, found 2 additional categories as a result of this. Experimental, which I think sounds a little more intuitive. These are header, you know, pointers that are there. They're in the early stages, so there might be some caution to use them because they could change that NER version.
So be prepared to adapt, basically. And then there's one called embedding, and this is meant for those who wanna create what are called new front ends to the language itself. Itself. But for now, they're keeping it separate. There isn't a lot of traction on whoever to use those or not. And then now our command check has been beefed up a little bit to make sure that it is checking for any calls by that package that are using these non API entry points, I. E. Those that move from the API designation to some of these other ones. And it looks like data dot table was on the recipient of some of these checks and recent upgrades.
And so the next part of this post dives into, as a result of these checks, what the data dot table authors are doing to be compliant with kind of this reorganization of the c, you know, API entry points that data dot table has been relying on for years years. Again, some of these escape patches are being patched up, and I've actually seen discussion on the Mastodon, you know, in the r group from people like cool but useless, Mike FC, you might know him as about some of the adventures he's had with trying to use c API endpoints and some of those packages he's been dealing with and our command check issues and whatnot, but it looks like data dot table has been, looking at this quite a bit.
So I'm not gonna read all these verbatim because there are a lot of corrections that are being made in data dot table to use some of these more either updated or or newer API c endpoints or entry points. There are some that are quite interesting where they've got solutions in place and they link to every poll request that fixes these issues in each of these sections. Some of which is looking at, you know, comparison of calls and pair lists, I call it, and which entry point they're using in the past, what entry point they're using now, looking how strings can be composed as c arrays, refactoring certain reference counts, dealing with encoding and string variables, growing vectors so that doesn't destroy your memory. There are some new entry points for that as well that you can read about.
And then it gets pretty interesting because there is more and especially getting back into the alt rep framework. Apparently, there are ways or there are some might say confusion into where alt rep fits in all this and which parts of alt rep should be exposed in the in the way that a package author is not gonna get dinged in our command check. There is a lot of narrative in this, and this actually does speak to how you grow vectors and do some other type checking. So I thought alt rep was kind of all ready to go. I'm not saying it's not ready to go, but apparently there are some refactoring that needs to be done with starting with r4.3 in terms of how you grow these vectors with the alt rep framework.
So this post talks about the common rep methods in alt rep and other common, you know, you know, interactions with this and the c libraries. And there will have to be some refactoring in data dot table to use some of these newer recommendations for alt rep. And like I said, growing these vectors, growing these table sizes, doing things like fast matching of strings. And this is the one section where things are not fixed yet. There is a lot of refactoring. It needs to be done by the data dot table authors that comply with some of these new endpoints and some of these newer, you know, recommended approaches of using alt rep.
And there is even more going on here with some other attribute setting dealing with missing values where they are very transparent. They're not sure how to fix some of these yet in light of these new API calls or these API calls being shifted. Again, this is an extremely technical deep dive into it. I, for 1, have never authored a package that deals with c, so I don't have a lot of firsthand experience with dealing with these checks. Although I've seen, again, some conversation about this on social media and the rdeveloped channels and whatnot. But if you ever wanna know how a very important large scale set of package package like data dot table and the authors of that package are dealing with some of these newer approaches that the R team is recommending for dealing with these API entry points, boy, this post is for you. There is a lot to digest here. Again, I can't possibly do it justice in this particular highlight, But I think it's important to have things like this as a reference so that it's not just so mysterious to you as a package author if you get dinged by an r command check about these API calls. I'm wondering how would another team approach this. This this is a very technical deep dive in how you can approach it. And as I said, some of these are not fixed yet. There is obviously still time in between releases to get compliant with these newer calls, so I'm sure data dot table is gonna find a way.
But we're all humans after all. Right? It's not always a snap of a finger to get into these newer these newer ways of calling these entry points. So getting into the internals of data dot table quite a bit, but more importantly, also looking at how they're dealing with this new world, if you will, of using c with a new package in the R community. Yeah. That's a lot. But, again, really recommended reading if you find yourself in this space.
[00:27:13] Mike Thomas:
Yeah, Eric. This one is is very technical as you mentioned, but I think it's it's great to have a really technical blog post like this. And it it may seem really niche, but I guarantee you it's going to help someone else out there who's probably going to run into the same situation with their R package where they leveraged, you know, this kind of API interface into sort of the underpinnings of of the c code behind r, to accomplish something and and realizing maybe now that, you know, CRAN is going to start to complain about that. And, you know, as much as we might have mixed feelings about CRAN and the the checks that they enforce can be stressful to us sometimes. Like, I did see a a blue sky post recently. I don't know what they're they're called, toots tweets.
But somebody had, you know, passed 6 checks, I guess, on the different types of operating systems that get checked on CRAN, and then the the 7th was Windows, and it failed. Like, that hasn't happened before. Right. My goodness. And, obviously, that's the worst feeling in the world. But if we really take the time to step back and think about how open source software, and I guess most software in general, is just, you know, software stacked on top of one another over and over and over. And if we're going far enough down the r rabbit hole, right, at c, And not to throw stones, but it's a little scary to me that, you know, something like CRAN doesn't exist in other languages. You know, I'm thinking about the Python ecosystem, and I think it's pretty easy to submit a package to PyPI. And I don't know if they require you to have, you know, any unit tests at all. Not not that R necessarily requires you to have any unit tests, but at at least they're going to try to build your package, right, and let you know if, anything is is breaking. And it's, you know, as you make changes and updates to that package, it'll rerun it and, you know, rerun a lot of those tests, and those tests are getting updated for things like this. Right? Newer versions of r and newer guide lines and guardrails that we have to adhere to to make sure that your package has the best chance of working on everyone's computer. Right? And I think that goes a long way to, you know, at least provide some infrastructure that's going to appease, you know, auditors.
You know, I don't think the SAS community is ever go SAS community is ever going to be happy with us, and they'll point to to situations like this about why their software is is more stable or or better, than open source. But I think you and I could talk for about 10 hours about why that's not the case. You know, but it's it's really interesting, and I'm very appreciative of blogs like this that really take the time to walk through all the decision points, you know, sort of everything that was laid out in front of them and and what they were up against and and why they made the decisions that they did to try to, troubleshoot this particular issue.
And, I'm also grateful to not have to understand any of this. You know, I'm being a little facetious, and I certainly understand that it's it's all c under the hood, but the folks that have really taken the time to understand, you know, the bridge between these two different languages to to build these higher level, right, programming interfaces, for folks like us that that make it easier to work with, you know, it's it's incredible. You know, I think it's why the R language and the the Python language as well, you know, are as popular as they are because the the syntax and the APIs, not to use a buzzword here, that have been developed, you know, make it very user accessible to a wide audience. And, you know, one last note here. I guess it's pretty crazy to think about how old Data. Table is.
2006 was the 1st CRAN release. The oldest version of dplyr released on CRAN, at least from what I can see on the package downside, is 2014. So 8 years later, still a decade old, but we're going on 2 decades of data dot table. And it's definitely, been a package that was transformative for the R community. So great to see it still still thriving, and, you know, the folks that work on that project are are at the cutting edge, you know, of a lot of what's going on, in the open source data science ecosystem. So hats off to them and great blog post.
[00:31:46] Eric Nantz:
Yep. It stands the test of time as as an understatement to say the least that it has that history and it's been that influential in this community. And and, again, not all of this was despair. Right? I mean, there were many of those points that, that are mentioned early in the post. It was simply changing the name of, an API header call or whatnot. And it was straightforward in the documentation of which to change it to. And again, credit in the post, having all the links to various poll requests that fix these. So Ivan did a tremendous job of being transparent of like showing the fix at a high level and then pointing to the actual code that does the fixing. I love that. I can't wait to dive into that a bit further. But again, it calls out that like anything in open source, it's not always a quick fix to everything. So I will be keeping an eye on what's happening with those alt rep style header calls where there are new wrappers that need to be made in this in between world of the current version of r and an r version 4.5 or later, which is due out, I believe, this year. So, as usual, if anything developing a a highly influential production grade package or app, you gotta think about backward compatibility. Right? So that's what their their journey is on, and, yeah, we'll be very interested to see where it goes. And in the cases where they don't know the best fix yet, I hope that the community can help them out too and that there will be a transparent, dialogue for that. But data dot tables, group of authors have been on the cutting edge for many, many years.
I'm so thankful that he got that recent grant to put resources like this blog together and their various presentations that they've had at the conferences. So it's great to kinda get a lens into all the innovations they've been thinking about, you know, now in in the public domain like we get to see here on our very well humble little our weekly project. So we're not gonna talk about see you again for this podcast. Enough and again see, of course. We're gonna go back to some visualization with a very important type of visualization in the in the world of health, especially of a very important organ in our bodies that we're relying on every single day for obvious reasons.
So it's one thing to talk about, you know, how your brains work. Right? But when anytime we're trying to diagnose issues with our humble little organs inside our craniums up in our skull, you often turn to, you know, visualizations, I. E. Scans, of your brain tissue to perhaps diagnose issues or find ways that maybe a treatment is affecting certain, you know, or certain parts of your brain, if you will. Typically, this is done via MRI scans. And just like anything, the art community has stepped up for ways you can bring these visualizations into R itself for further analysis.
And our last highlight for today is a great tutorial on some of the issues and ways that you can import and analyze these type of highly complex visualized data here. This post is coming to us from Joe Edsall, who is a staff scientist at the cognitive cognitive control and psychopathology laboratory at Washington University in Saint Louis. That's a mouthful, but she definitely is a subject matter expert in this field from what I can tell here. And she has written multiple tutorials in the past. In fact, she's, constructed these with Knitter, which is a great great way to use, again, reproducible analysis for tutorials.
And she's addressing some of the points that she had talked about and working with 2 different types of quantities in these brain images. One is the volume and the other is surface area or surface of the of the brain visualizations. So first, she talks about the volumes of this. And, just like anything in in the real world in physics, we are, you know, 3 we have the three-dimensional, you know, perspective here. Right? And when you get these MRI scans, you get three-dimensional coordinates of these if you feed this into some of the more standard software to to actually visualize the readings from these MRI scanners.
And you see some example images here looking at, some off the shelf software where you look at on the right side the three-dimensional layout of the of the brain itself, and and then you get more of a 2 dimensional representation via the different perspectives. So all this data is readily available from these image formats once you import it via this great package called rnifty, r n I f t I, if you wanna look that up after, well, the link in the show notes. But there is, you know, very handy ways to import that image file. I believe these are actually, zipped archives of these of these, images, and you'll get a lot of different attributes of the different pixel dimensions, especially in the three-dimensional space where you can use to help visualize this and perform additional processing.
So that can be very important if you're looking at different areas of the brain and trying to see the coordinates and the different representations of those. So you this package can help you figure out all those different orientations, all those sizes of those areas. And, again, off the shelf software that can be used to visualize this, is readily available, but r itself, again, gives you a nice way to plot this in your in your r session as well. But, again, it's not just the volume perspective. It's also the surface perspective, and this is where you can do some really handy things like looking at within your brain the cortex. Kind of this almost like a winding pipe inside your brain in different regions to see maybe where some areas are maybe are getting a little more, you know, condensed. Maybe they're getting plugged. Maybe there's an anomaly in the in the image there.
But these type of surface visualizations, they require a different type of format for visualization. It is called Giftee. Never seen this in my day to day work, but that is helping consolidate the image data into what's called pairs, kind of representing both the left and the right side of the brain in those corners. And she links again to some previous tutorials that she's authored to import these files into R as well via a function called Gifty, another or a package, I should say, called Gifty. Again, freely available. We'll have links to that in show notes as well, where you can then interrogate this, surface imaging, you know, data and be able to get different dimensional representations via, like, the locations, the maybe triangle type dimensions.
And again, you can plot these as well so you can get a visualization of the different hemispheres of the brain, not too like the hemispheres of a globe. Right? You have the left and the right, and then you can flip that around, do different color ranging depending on the intensity or the different areas of these images. So you get kind of that heat map like structure for the left and the right. Maybe some areas are having an issue, maybe more brightly colored than others. And, again, you get the code right here in this post for how you can define these regions and define the different visualization for how you can distinguish those from the other areas and maybe more of the normal representation.
So it is great in the world of bioinformatics, in the world of other, you know, health data, when we're working on treatments that are trying to help deficiencies or maybe areas in the brain that are getting, you know, affected by diseases. The one that comes to mind immediately is all the research that's being done in Alzheimer's disease where they're looking at things like the plaque, amount of plaque in the brain that's impacting tissue as a hypothesis to try to slow the cognitive decline of patients as they're as they're dealing with that debilitating disease.
But the first step, right, is to see what you got. So this great post by Joe, the look at the different packages that you can import this data in and be able to quantify these different regions and maybe point those out via an additional visualization. It looks really top notch. So if you're in the space of visualizing these readings such as MRIs, this is a wonderful post to kind of show you what is possible here. And again, with links to really dive into it further with these great packages, like I mentioned, Rnifty, as well as the GIFSKI package. Yeah. Really great stuff here.
[00:41:01] Mike Thomas:
Yeah, Eric. And this is just super super cool, and it shows us just how fantastic the graphics capabilities are in R. And there were a few publications that were referenced in Joe's blog post that makes me think about doing reproducible science, and how just impactful this type of work is. And we can create these publication ready visualizations programmatically based upon the data. And not only can we, but in my opinion, I think we have to. We must. My only other takeaway here is that I need to see this somehow integrated with the Ray render package for interactive 3 d visualizations of the brain and the different hemispheres.
So shout out to Tyler Morgan Wall, the author of the Ray render package. If you're listening, you know, no pressure, but it would be pretty cool. We don't nerd snipe on this show, do we? Never. It's usually me putting the pressure on on myself or you doing the same for yourself. So it's about time that we just start calling some other people out.
[00:42:05] Eric Nantz:
Alright. Well, if you wanna see more material like that and more, well, guess what? There is a lot more to this particular issue. As always, our weekly is jam packed with additional packages, great tutorials, great resources. We'll take a couple of minutes for our additional finds here. And we are talking about those that are contributing via add on packages to the r community and our data dot table discussion. Well, there is, in terms of contributing to the language itself, there's we have covered a lot of great initiatives to bring developers that are wanting to contribute to R itself in a friendly, you know, open way, whether it's these meetups or these hackathon type dev sessions with the r forwards group and whatnot.
Well, another great resource that's being developed as we speak and really taking, you know, it to the next level is what's called the CRAN cookbook. We'll have a link to this, from the rconsortium blog in the show notes, of course, and this is meant to be a more user friendly yet technical, you know, recipe type book, which is gonna help those new, you know, those new to the R language in terms of wanting to contribute to the language itself. And it really is great for those that are dealing with issues submitting their packages to CRAN and the different issues that they can come across.
There could be just about, you know, formatting your package as metadata with a description file. Could be about your documentation itself of your functions and, of course, within the code itself. So I don't think it's gonna get into all the weeds of those c header issues that we talked about. But, nonetheless, I think this is a great companion to have with, say, the R packages reference of an offer by Hadley Wickham and Jenny Brian as you're thinking about, you know, getting that submission to CRAN and some of the things that might happen that might blindside you if you're not careful, but a great way and accessible way to look at how you might, you know, get around those issues and how to solve them in a way to get your package on CRAN. So I know this effort has been in the works for quite a while. It's great to see this really taking mature and how it's being used by the Grand team itself and where they're going forward with it. So, yeah, credit to the team, the Jasmine Daley, Benny Ultimate, and others, involved with that project.
[00:44:29] Mike Thomas:
And, Mike, what did you find? Shout out Jasmine Daley, Shiny developer in Connecticut. Heck yeah. Yeah. Yeah. Gotta love that. A a bunch of great stuff. You know, one blog that I I found, which was just sort of really nice to reflect on was from Isabel Velasquez over on the POSIT team. It's the 2024 POSIT year and review. A little trip down memory lane of of all that POSIT worked on, in the last year. And, you know, a lot, obviously, around their R packages for interfacing with LLMs, like Elmer, you know, Shiny Assistant, Shiny Chat, Pal, you know, as well as things out of the Quarto ecosystem, including Quarto dash boards being a big one.
Obviously, all sorts of stuff coming out of the Python ecosystem on both the R and, Python or excuse me, out of the Shiny, world in both the R and Python side of the equation there. Some great advancements from tidy models and survival analysis that were really impactful to our team as well as a bunch of others across, you know, WebR. I know that's one that, you know, impacted you quite a bit in 2024. So it was just nice taking some time to do that reflection on, you know, all of the work and investment that Posit and the other folks that, contributed to projects that Posit maintains.
Shout out myself with one small, contribution to HTTR 2 in the latest release, just yesterday. So thank you. It's, I think, 2 words in the function documentation for Oxygen comments, but we'll take what we can get. I was I was on the list. So, thanks, Hadley, for including me among, I guess, 70 other folks who contributed to that latest release of HTTR 2. But it's it's cool to all collaborate together in the open, and I think that's all I'm trying to say here. And it was nice to to walk through a lot of these projects that have impacted me and my team, you know, in 2024 and beyond. Yeah.
[00:46:32] Eric Nantz:
Excellent. And you're on the score sheet as they say. They can't take that away from you. That is awesome stuff. I I congratulations on that. Yeah. It's amazing the breadth of contributions in this space. And, certainly, you know, AI was a focus for them with their with their awesome innovations of Elmer and Maul and the shiny assistant, which I'm really a big fan of now. I was one of the skeptics on that, so it's great to see him doing it and doing it responsibly. So credit to the the team on that. But, no, they're not just resting on those, innovations. As you said, the WebR stuff really is is jiving. It's really, getting a lot of traction, and I can't wait to see where we take that effort in 2025.
And when I say we more like what Jor Stag comes up with, and I'm just a very, shameless, consumer of it, but I love the stuff that he comes up with. So lots of lots of great stuff here. There's never a dull moment in the in deposit team here. And and never a dull moment. The rest of the issues we say, lots of great resources that Rio has put together for us. But, of course, as I said, this is a community effort, and we cannot do this alone. So one of the ways that we keep this project going is for your contributions. You're just a pull request away from getting your name as a future contributor to our weekly itself.
The great blog post, maybe a new package that you authored or you discovered, there's so many opportunities for it. Head to our weekly.org. You're gonna find a link to get your poll request up there right in the top right corner. We have a a handy draft for template for you to follow. Again, leveraging GitHub for the win on that. And, our curator of the week will be glad to get it in for you. And, of course, we love hearing from you. And, we did hear from one of our more devoted listeners about, apparently, I do not pronounce names well. And even though I practice it, I got, called out for it. So, I'm gonna get it right this time.
Nicola Rennie. Sorry for butchering her name. All these months in the previous highlight podcast. Thank you, Mike. Thank you, Mike, for calling not you, Mike. Mike Smith for calling me out on that. I need to be honest with it. So feedback warranted, and, I I may have to have a little cookie jar of, like, funding I send a nickel every time I butcher her name in the future. Hopefully, never again. Nonetheless. Okay. We love hearing from you, and the ways you can do that are through the contact page and the episode show notes as well as on social media as well. I am [email protected], I believe, is how to call it. Again, this is still not natural to me yet. I'll get there. I'm also on Mastodon with [email protected], and I'm on LinkedIn. Search my name, and you'll find me there.
And, Mike, hopefully, you don't have a hard time butchering names, so we're we're we're gonna find you.
[00:49:18] Mike Thomas:
You can find me, I think, primarily on on blue sky nowadays at mikedashthomas dotbsky.social, or on mastodon, [email protected]. Or, probably even better on LinkedIn, if you search Ketchbrook Analytics, k e t c h b r o o k, you can see what I'm up to lately.
[00:49:42] Eric Nantz:
Awesome stuff. And a little quick shout out to, good friends of mine, from the art community, John Harmon and and Yani City, because I've been using in some of this r wiki infrastructure I'm building, some of the packages they've created to interact with interface with the Slack API of all things. So it's been pretty fun learning there. And, again, h t t r two is involved in some of that as well. So it all comes full circle in this fancy schmancy calendar thing I'm I'm making. So always learning all the time. So shout out to those 2 for making some really elegant packages to interface with an API of a framework that seemed really cryptic to me at the time. But now now it's starting to demystify a little bit. Alright. Well, we'll close-up shop here for episode 192 of our weekly highlights, and we'll be back with another episode of our weekly highlights next week.
Hello, friends. We are back with a 292 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the awesome highlights and additional resources that are shared at this week's Our Weekly Issue. My name is Eric Nantz, and I'm happy you join us from wherever you are around the world. Hopefully, staying warm depending on where you are in the world because it is frigid over here in my humbly abode here. But I'm warming up with this recording, and, of course, keeping me all warm and fuzzy in terms of, you know, cohosting is my awesome cohost, Mike Thomas. Mike, how are you doing today? Doing well, Eric. Yeah. Thankfully, these highlights are hot,
[00:00:37] Mike Thomas:
because in Connecticut, it is just as cold as I'm sure it is in Michigan, right now. So it's pretty out, though, here. We got some we got some nice snow.
[00:00:46] Eric Nantz:
Yeah. That's true. It hasn't all melted yet here. And when the kids see the sun, like, I don't want the snow to melt. They're like, it's not gonna melt at 0 degrees, buddy. It's not. No. Not here anyway. So but as you said, we got some fun, hot topics to talk about in the highlights this week. And, of course, this is a community project. Right? So we've got our curator of the week. This time was Ryo Nakakawara, who is one of our OGs in the curator space of our weekly. And as always, he had tremendous help from our fellow our weekly team members and contributors like all of you around the world with your poll requests and suggestions.
And we lead off with a visualization style of a package that definitely has a lot of utility in terms of the scope of it, especially in my industry. And we're gonna dive into it here. And this post is actually coming to us from the RPost blog, and it's a guest post by Paul Satorra. Hopefully, I said that right. He is a bio statistician. And in this post, he talks about introducing a package that he's created for the art community, now on CRAN, called flowchart. The name should make it pretty intuitive of what it does, and it helps you create flowcharts of R. Now you may be thinking, and this show and other, you know, other, presentations or resources, creating flowcharts, there's a lot of different ways of doing this. Right? Especially in the in the realm of HTML style outputs.
We we've been using I know myself and Mike, I believe, have been using frameworks like Mermaid. Js within our quarter or our markdown documents. So there's definitely ways of creating flowcharts there. I also was a heavy user of the diagrammer package from long ago. That was, helping me out quite a bit with creating may not be necessary flowcharts, but definitely things like decision tree outputs and and, you know, choose your own adventure kind of layouts. But what flowchart brings differently than the rest of those is, in essence, a very tidy interface to make all this happen.
So let's dive into this a little bit from the post. So first, we will have, of course, a link to the the package and the episode show notes here. But as I mentioned, it is on CRAN, so it's all just an install dot packages flowchart away. And it actually, you know, requires you to bring your own data. So for the case study in this example, there is a built in dataset from a publicly available clinical trial set of results called saffo, and it is actually about the journey of patients throughout the life cycle of a trial. When I say journey, I'm thinking in terms of what is the result of their status, and this could be that they are randomized to the trial, I. E. They get one of the treatment assignments, or they discontinue after the randomization for a various reason, or they end up completing it. There are many other nuances in this, but this isn't a clinical trial podcast, so I'll I'll stop there. But the the data is built into the package.
And, basically, in order to register this data to be available for a flowchart, you start off with feeding this dataset name into a function called as underscore fc. And this is basically gonna turn your data frame into a list object with 2 components here. One of which is the original data going into it, and you may be wondering what does the data format look like. In the case of this example, each observe each row is what looks to be a patient with a rant with a unique ID, and then the columns are the different kind of flags in terms of where they're at in the in the trial and what the statuses is. So, again, there is a vignette that describes this data set in more detail, but it's basically a bunch of yes or no type variables for what happened to that patient in the trial, whether they they were, you know, in they met the inclusion criteria, whether they had chronic heart failure or whatnot. Again, you can take a look at the data in in the episode show notes.
So once you feed this in that he has underscore fc, now you may be wondering what do we do with this? Well, you can just simply draw a very bare bones flowchart or one cell with one function called fc_draw, where if you feed in that original data or that original object, you're just gonna get a box with, an optional label of your choosing if you want. And this time, it has, like, all 925 records in one box saying that these are all the the patients inside. Well, that's that's boring. Right? Well, let's start actually having some flow in this flowchart. Right? So that's where the tidy interface kinda comes in here where you can feed in this dataset.
Again, make it a flowchart object with as underscore fc, and then pipe that further to an fc_filter object. And this is where you can perform what looks to be like dplyr manipulation with its filter statement. And in this case, in this first example, we want a simple filter to determine if the patients were randomized or not. Now there is no column for whether they are randomized or not, but there is a grouping column, which in essence acts like that because it determines what treatment group they were assigned to. If that column is missing, it means they weren't randomized in it. So in the case of this example, the filter for rather that group variable is not missing. So an exclamation point is dotna group, and then you can give it a label.
And then you can also show who were not meeting that filter. And that's gonna be automatically labeled in a box called excluded. Then when you draw that, then you get that original box of the 925 patients, but then there are 2 arrows going away. One arrow goes to the right, and it has this excluded box. And then the arrow going down has another box that, the author's label is randomized, which has now 215 patients. So, obviously, not many patients made to that randomization step, but this is a very similar format that we do in a lot of our clinical trial reports, and we get to what's called the disposition section where it shows the flow of the patients that meet certain criteria and who end up actually completing the trial.
So this look quite familiar to me, but you can do a lot more than just that single filter. Right? You can also, at that next step where it show those randomized patients, you can now split that into different boxes as well. You might call parallel boxes, and you can use a function called fc_split. You give it the group, the variable that determines the grouping of that split. In this case, it is simply group, and that's now gonna partition that randomized group into 2 boxes of the 2 different treatment groups. Again, pretty straightforward, pretty neat tidy interface here, and you can do even more manipulations with that fc filter applied to those, you know, middle boxes that we just created at the treatment groups. And you can read the example more in the post here, but, again, it's really just using the fc_filterfunction.
And then you'll see these boxes kind of in parallel chains or or or, you might say trails going down, but the boxes are all parallel next to each other for the equivalent kind of steps. So in essence, the flowchart looks pretty darn polished already. And again, with a tidy interface, I think it is a this is a great package to put in your toolbox if you just want something quick to the point of a familiar tidy verse kind of piping syntax, and you could feed this into whatever document you choose. I could see this going into a quartile document or markdown, you know, whatever have you, whether it's HTML or PDF format.
It looks like it's gonna output these image these, flowcharts as image files, perhaps, although I haven't tested that myself. But it is definitely an interesting paradigm if you want in if you know what your data going into is fairly straightforward, which it is in this example. And you may not necessarily need the additional customization that you get with frameworks like Mermaid. Js and whatnot or diagrammer or some of these other packages past. So, again, you might find a great use case for this. As me as usual, I like having choice in the way I I, perform these flow or I construct these flowcharts. There could be cases where this might not quite fit your needs. If you have more, you know, customized kind of directions of the flow, maybe things kind of feedback to an above step. In that case, maybe mermaid. Js is is a better fit for you. But like I said, for this kind of flowchart where it's a pretty predefined start and stopping points or might say finishes and kinda the trail of where this flowchart goes, I think, Paul's package could be a could be a great fit for your toolbox.
[00:10:09] Mike Thomas:
I agree, Eric. Yeah. I make flowcharts literally every day. They're the way that I communicate with both my team and our clients about sort of the the end to end process that we're going to undergo to to get them to the solution, because that's that's how we bridge the gap. And there's you know, a few years ago, there weren't a lot of great flowcharting tools that integrated well with version control. You know, we used some like Lucidchart and Visio, But you really had to, like, export those as as PDFs or maybe host them somewhere that folks could go take a look at, but not scriptable, not easily versionable.
And nowadays, you know, as you said, there are better options, mermaid. Js being one of them, the diagrammer package being another one. But I'm really impressed with this flowchart package here. You know, it's it's really easy and simple syntax to get started, very tidy friendly for developing these flowcharts. And when you look at it on the surface, especially in some of these examples, a lot of these functions are really just taking one argument. And there, you know, isn't doesn't appear to be a lot of customization, but that's actually because there are a ton of other arguments that have default parameters that can be changed if you want to. You know, originally, I thought that this was a package that, you know, was just very simple syntax, you know, made a lot of decisions for you. And with the default parameters, they do, but you still have a lot of control over all sorts of different types of things, like the direction of the line, within that fc filter function, whether or not you wanna kick those filtered observations out to a flowchart node on the right or on the left, you know, font styling, font size, font color, things like that, rounding for the number of digits that are gonna get displayed, the background color of the node itself. So if you actually take a look at, you know, some of the the the reference, the reference page of the package down site, which is what I'm taking a look at, and click into some of these functions, you'll realize that there are a ton of arguments behind the scenes that are almost ggplot like in terms of the amount of control that you have over each element, in your flowchart. So I'm I'm pretty impressed pretty impressed with some of the new, features and, you know, some of the the interesting functions that they have in here that I'm not sure I've seen anywhere else like fcmerge and fcstack, which allow you to actually combine 2 different flowcharts either horizontally or vertically. I thought that's pretty interesting. Maybe could help in your workflows depending on how you're sort of modularizing your code. So really impressed.
Honestly, just on sort of a side note, I really like the the HEX logo as well. I think it's really cool. And I'm excited to start to play around with flowchart as well because I had not come across it until today. So great way to start off the highlights this week.
[00:13:05] Eric Nantz:
Yeah. I can see a lot of convenience here. A lot of ways just cut that chart done, like I said, for pretty straightforward datasets. And, yeah, I'm gonna show this to a couple of colleagues here as we're thinking about ways of using R and more of the document generation space, especially for these more, I'll be I'll try to be polite here, rigid set of documents that we have to do in my industry. We're we're slowly trying to feed R into these things. And in fact, there is a section that we often have in what's called our analysis data reviewer's guide where we talk about kind of the flow of how the programming works, wherever you go from dataset to program and then to output.
Perhaps flowchart could be useful in that too. So I've got some, I got something to share to some colleagues, I think, later on today. So, yeah. Credit to Paul for sharing this package with us, and, yeah, choice is good as they say. Pray for the day when our audience
[00:14:01] Mike Thomas:
finally lets us do dynamic documentation.
[00:14:03] Eric Nantz:
You can't have everything, Mike. Alright. And our next highlight here, we're gonna shift gears quite a bit because we're gonna get really in the weeds technically here, but with a pretty fundamental issue that I think has affected quite a few package authors in recent months and maybe even the recent year, year and a half, having to do with best practices and recommendations for authoring packages that have to do with more than just, you know, new r code in the package itself. In particular, we're gonna talk about what you wanna do when you extend a package with another language, mainly the c language in your next r package, and some of the learnings that have been that have been shared from a very influential package in this space.
So we are talking about the latest blog post that comes on the data dot table community blog, which has been featured, quite a bit in last year's, section of the highlights. This post comes to us from Ivan Krivlov, and he leads off with the tagline about using non API entry points in data dot table. Now it's amazing. In 2025, I think when most people think of APIs, they're thinking of those web APIs. Right? No. No. No. We're not talking about that here. API actually is a historical term in software development. We are talking about ways you can interface with the language in different constructs or different perspectives.
And in particular, we are talking about the API that the R language itself exposes to package authors via its integrations with the c language. So really getting, you know, setting the stage here from since the beginning of r itself, there has been, you know, the canonical reference if you wanna build something on top of r, which ideally is a package or perhaps are even gonna contribute to the language itself, there is the writing r extensions reference. This can be found directly on the r project homepage, and this is what the CRAN maintainers are using as reference for any new package that's coming into the our ecosystem.
Certainly, there are a lot of automated checks in place to make sure a lot of the principles in these in the extension manual is met. But in particular, why we're talking about this in this post is that within this manual, there are, in essence, entry points that are defined by the R maintainers to interface with the C API of R itself. And in particular, there are 4 categories that you'll find in this manual. 1st is literally called API, and you can think of these as the ones that are documented, they're declared for use, and that they will only be changed if the CRAN maintain or if the r, you know, maintainers end up deprecating that particular API call.
Then you get to the next 3. There is the public designation, which these are exposed for use, you know, by our package developers, although they aren't really documented and they could change without you knowing it. So you could think of this as like maybe in a package you have a function that is technically there, but you don't export it to the user with user facing documentation. But like any package in r, you can look at or use any function in the package with the namespace, you know, prefix between 3colons and the function name.
So again, some call that off label use. You your your terminology may be different. There is another category called private. These are used for building R, and, yes, they are exported, but they're not declared in the header files of R itself. And they say point blank. Do not use these in any package. Do not. No. None at all. And then you get to hidden. This one really is peculiar to me. They are entry points to the API that are sometimes possible to use, but they're not exported. But I think it kinda goes by the name of it. You probably don't wanna touch those. So, historically, there's been no consternation from the R maintainers or the R package authors that the the the header the entry points designated as API, all good. Right? Should be able to use those.
However, there has been a bit of discourse around the use of the public ones because they're not documented. They're not forbidden by our command check, and they've been there for a while. However, there has been a little bit of, you know, modification to the language itself where some of these, to be able to use these, there may have been either somewhat we call escape patches put in by, you know, having a header called define use our internals that was used by package authors in the past to kind of get around maybe some potential issues. Well, that you might call escape hatch or loophole was kind of closed, in recent versions of r.
And then the number of non API blessed calls grew a little bit in between package or between R versions. And, also, another, you know, discussion on the R development list is where is the framework or the header of the library called alt rep fit into this, which got a lot of great press in recent years in the r community about being a more optimized way of operating on vectors. And, in fact, I I was had the pleasure of speaking with, Gabe Becker numerous times who was influential in getting alt rep into the language itself, although it was certainly labeled experimental in those times.
So fast forward a little bit, but there's been a little confusion into, like, which of these API calls, you know, are really ready for package you package authors and whatnot. Luke Tierney on the on the our team, he's actually worked program to try programmatically describing these exported headers, these exported symbols, and to be able to, you know, give a little more clarity into what package authors can can can use. And he's, you know, found 2 additional categories as a result of this. Experimental, which I think sounds a little more intuitive. These are header, you know, pointers that are there. They're in the early stages, so there might be some caution to use them because they could change that NER version.
So be prepared to adapt, basically. And then there's one called embedding, and this is meant for those who wanna create what are called new front ends to the language itself. Itself. But for now, they're keeping it separate. There isn't a lot of traction on whoever to use those or not. And then now our command check has been beefed up a little bit to make sure that it is checking for any calls by that package that are using these non API entry points, I. E. Those that move from the API designation to some of these other ones. And it looks like data dot table was on the recipient of some of these checks and recent upgrades.
And so the next part of this post dives into, as a result of these checks, what the data dot table authors are doing to be compliant with kind of this reorganization of the c, you know, API entry points that data dot table has been relying on for years years. Again, some of these escape patches are being patched up, and I've actually seen discussion on the Mastodon, you know, in the r group from people like cool but useless, Mike FC, you might know him as about some of the adventures he's had with trying to use c API endpoints and some of those packages he's been dealing with and our command check issues and whatnot, but it looks like data dot table has been, looking at this quite a bit.
So I'm not gonna read all these verbatim because there are a lot of corrections that are being made in data dot table to use some of these more either updated or or newer API c endpoints or entry points. There are some that are quite interesting where they've got solutions in place and they link to every poll request that fixes these issues in each of these sections. Some of which is looking at, you know, comparison of calls and pair lists, I call it, and which entry point they're using in the past, what entry point they're using now, looking how strings can be composed as c arrays, refactoring certain reference counts, dealing with encoding and string variables, growing vectors so that doesn't destroy your memory. There are some new entry points for that as well that you can read about.
And then it gets pretty interesting because there is more and especially getting back into the alt rep framework. Apparently, there are ways or there are some might say confusion into where alt rep fits in all this and which parts of alt rep should be exposed in the in the way that a package author is not gonna get dinged in our command check. There is a lot of narrative in this, and this actually does speak to how you grow vectors and do some other type checking. So I thought alt rep was kind of all ready to go. I'm not saying it's not ready to go, but apparently there are some refactoring that needs to be done with starting with r4.3 in terms of how you grow these vectors with the alt rep framework.
So this post talks about the common rep methods in alt rep and other common, you know, you know, interactions with this and the c libraries. And there will have to be some refactoring in data dot table to use some of these newer recommendations for alt rep. And like I said, growing these vectors, growing these table sizes, doing things like fast matching of strings. And this is the one section where things are not fixed yet. There is a lot of refactoring. It needs to be done by the data dot table authors that comply with some of these new endpoints and some of these newer, you know, recommended approaches of using alt rep.
And there is even more going on here with some other attribute setting dealing with missing values where they are very transparent. They're not sure how to fix some of these yet in light of these new API calls or these API calls being shifted. Again, this is an extremely technical deep dive into it. I, for 1, have never authored a package that deals with c, so I don't have a lot of firsthand experience with dealing with these checks. Although I've seen, again, some conversation about this on social media and the rdeveloped channels and whatnot. But if you ever wanna know how a very important large scale set of package package like data dot table and the authors of that package are dealing with some of these newer approaches that the R team is recommending for dealing with these API entry points, boy, this post is for you. There is a lot to digest here. Again, I can't possibly do it justice in this particular highlight, But I think it's important to have things like this as a reference so that it's not just so mysterious to you as a package author if you get dinged by an r command check about these API calls. I'm wondering how would another team approach this. This this is a very technical deep dive in how you can approach it. And as I said, some of these are not fixed yet. There is obviously still time in between releases to get compliant with these newer calls, so I'm sure data dot table is gonna find a way.
But we're all humans after all. Right? It's not always a snap of a finger to get into these newer these newer ways of calling these entry points. So getting into the internals of data dot table quite a bit, but more importantly, also looking at how they're dealing with this new world, if you will, of using c with a new package in the R community. Yeah. That's a lot. But, again, really recommended reading if you find yourself in this space.
[00:27:13] Mike Thomas:
Yeah, Eric. This one is is very technical as you mentioned, but I think it's it's great to have a really technical blog post like this. And it it may seem really niche, but I guarantee you it's going to help someone else out there who's probably going to run into the same situation with their R package where they leveraged, you know, this kind of API interface into sort of the underpinnings of of the c code behind r, to accomplish something and and realizing maybe now that, you know, CRAN is going to start to complain about that. And, you know, as much as we might have mixed feelings about CRAN and the the checks that they enforce can be stressful to us sometimes. Like, I did see a a blue sky post recently. I don't know what they're they're called, toots tweets.
But somebody had, you know, passed 6 checks, I guess, on the different types of operating systems that get checked on CRAN, and then the the 7th was Windows, and it failed. Like, that hasn't happened before. Right. My goodness. And, obviously, that's the worst feeling in the world. But if we really take the time to step back and think about how open source software, and I guess most software in general, is just, you know, software stacked on top of one another over and over and over. And if we're going far enough down the r rabbit hole, right, at c, And not to throw stones, but it's a little scary to me that, you know, something like CRAN doesn't exist in other languages. You know, I'm thinking about the Python ecosystem, and I think it's pretty easy to submit a package to PyPI. And I don't know if they require you to have, you know, any unit tests at all. Not not that R necessarily requires you to have any unit tests, but at at least they're going to try to build your package, right, and let you know if, anything is is breaking. And it's, you know, as you make changes and updates to that package, it'll rerun it and, you know, rerun a lot of those tests, and those tests are getting updated for things like this. Right? Newer versions of r and newer guide lines and guardrails that we have to adhere to to make sure that your package has the best chance of working on everyone's computer. Right? And I think that goes a long way to, you know, at least provide some infrastructure that's going to appease, you know, auditors.
You know, I don't think the SAS community is ever go SAS community is ever going to be happy with us, and they'll point to to situations like this about why their software is is more stable or or better, than open source. But I think you and I could talk for about 10 hours about why that's not the case. You know, but it's it's really interesting, and I'm very appreciative of blogs like this that really take the time to walk through all the decision points, you know, sort of everything that was laid out in front of them and and what they were up against and and why they made the decisions that they did to try to, troubleshoot this particular issue.
And, I'm also grateful to not have to understand any of this. You know, I'm being a little facetious, and I certainly understand that it's it's all c under the hood, but the folks that have really taken the time to understand, you know, the bridge between these two different languages to to build these higher level, right, programming interfaces, for folks like us that that make it easier to work with, you know, it's it's incredible. You know, I think it's why the R language and the the Python language as well, you know, are as popular as they are because the the syntax and the APIs, not to use a buzzword here, that have been developed, you know, make it very user accessible to a wide audience. And, you know, one last note here. I guess it's pretty crazy to think about how old Data. Table is.
2006 was the 1st CRAN release. The oldest version of dplyr released on CRAN, at least from what I can see on the package downside, is 2014. So 8 years later, still a decade old, but we're going on 2 decades of data dot table. And it's definitely, been a package that was transformative for the R community. So great to see it still still thriving, and, you know, the folks that work on that project are are at the cutting edge, you know, of a lot of what's going on, in the open source data science ecosystem. So hats off to them and great blog post.
[00:31:46] Eric Nantz:
Yep. It stands the test of time as as an understatement to say the least that it has that history and it's been that influential in this community. And and, again, not all of this was despair. Right? I mean, there were many of those points that, that are mentioned early in the post. It was simply changing the name of, an API header call or whatnot. And it was straightforward in the documentation of which to change it to. And again, credit in the post, having all the links to various poll requests that fix these. So Ivan did a tremendous job of being transparent of like showing the fix at a high level and then pointing to the actual code that does the fixing. I love that. I can't wait to dive into that a bit further. But again, it calls out that like anything in open source, it's not always a quick fix to everything. So I will be keeping an eye on what's happening with those alt rep style header calls where there are new wrappers that need to be made in this in between world of the current version of r and an r version 4.5 or later, which is due out, I believe, this year. So, as usual, if anything developing a a highly influential production grade package or app, you gotta think about backward compatibility. Right? So that's what their their journey is on, and, yeah, we'll be very interested to see where it goes. And in the cases where they don't know the best fix yet, I hope that the community can help them out too and that there will be a transparent, dialogue for that. But data dot tables, group of authors have been on the cutting edge for many, many years.
I'm so thankful that he got that recent grant to put resources like this blog together and their various presentations that they've had at the conferences. So it's great to kinda get a lens into all the innovations they've been thinking about, you know, now in in the public domain like we get to see here on our very well humble little our weekly project. So we're not gonna talk about see you again for this podcast. Enough and again see, of course. We're gonna go back to some visualization with a very important type of visualization in the in the world of health, especially of a very important organ in our bodies that we're relying on every single day for obvious reasons.
So it's one thing to talk about, you know, how your brains work. Right? But when anytime we're trying to diagnose issues with our humble little organs inside our craniums up in our skull, you often turn to, you know, visualizations, I. E. Scans, of your brain tissue to perhaps diagnose issues or find ways that maybe a treatment is affecting certain, you know, or certain parts of your brain, if you will. Typically, this is done via MRI scans. And just like anything, the art community has stepped up for ways you can bring these visualizations into R itself for further analysis.
And our last highlight for today is a great tutorial on some of the issues and ways that you can import and analyze these type of highly complex visualized data here. This post is coming to us from Joe Edsall, who is a staff scientist at the cognitive cognitive control and psychopathology laboratory at Washington University in Saint Louis. That's a mouthful, but she definitely is a subject matter expert in this field from what I can tell here. And she has written multiple tutorials in the past. In fact, she's, constructed these with Knitter, which is a great great way to use, again, reproducible analysis for tutorials.
And she's addressing some of the points that she had talked about and working with 2 different types of quantities in these brain images. One is the volume and the other is surface area or surface of the of the brain visualizations. So first, she talks about the volumes of this. And, just like anything in in the real world in physics, we are, you know, 3 we have the three-dimensional, you know, perspective here. Right? And when you get these MRI scans, you get three-dimensional coordinates of these if you feed this into some of the more standard software to to actually visualize the readings from these MRI scanners.
And you see some example images here looking at, some off the shelf software where you look at on the right side the three-dimensional layout of the of the brain itself, and and then you get more of a 2 dimensional representation via the different perspectives. So all this data is readily available from these image formats once you import it via this great package called rnifty, r n I f t I, if you wanna look that up after, well, the link in the show notes. But there is, you know, very handy ways to import that image file. I believe these are actually, zipped archives of these of these, images, and you'll get a lot of different attributes of the different pixel dimensions, especially in the three-dimensional space where you can use to help visualize this and perform additional processing.
So that can be very important if you're looking at different areas of the brain and trying to see the coordinates and the different representations of those. So you this package can help you figure out all those different orientations, all those sizes of those areas. And, again, off the shelf software that can be used to visualize this, is readily available, but r itself, again, gives you a nice way to plot this in your in your r session as well. But, again, it's not just the volume perspective. It's also the surface perspective, and this is where you can do some really handy things like looking at within your brain the cortex. Kind of this almost like a winding pipe inside your brain in different regions to see maybe where some areas are maybe are getting a little more, you know, condensed. Maybe they're getting plugged. Maybe there's an anomaly in the in the image there.
But these type of surface visualizations, they require a different type of format for visualization. It is called Giftee. Never seen this in my day to day work, but that is helping consolidate the image data into what's called pairs, kind of representing both the left and the right side of the brain in those corners. And she links again to some previous tutorials that she's authored to import these files into R as well via a function called Gifty, another or a package, I should say, called Gifty. Again, freely available. We'll have links to that in show notes as well, where you can then interrogate this, surface imaging, you know, data and be able to get different dimensional representations via, like, the locations, the maybe triangle type dimensions.
And again, you can plot these as well so you can get a visualization of the different hemispheres of the brain, not too like the hemispheres of a globe. Right? You have the left and the right, and then you can flip that around, do different color ranging depending on the intensity or the different areas of these images. So you get kind of that heat map like structure for the left and the right. Maybe some areas are having an issue, maybe more brightly colored than others. And, again, you get the code right here in this post for how you can define these regions and define the different visualization for how you can distinguish those from the other areas and maybe more of the normal representation.
So it is great in the world of bioinformatics, in the world of other, you know, health data, when we're working on treatments that are trying to help deficiencies or maybe areas in the brain that are getting, you know, affected by diseases. The one that comes to mind immediately is all the research that's being done in Alzheimer's disease where they're looking at things like the plaque, amount of plaque in the brain that's impacting tissue as a hypothesis to try to slow the cognitive decline of patients as they're as they're dealing with that debilitating disease.
But the first step, right, is to see what you got. So this great post by Joe, the look at the different packages that you can import this data in and be able to quantify these different regions and maybe point those out via an additional visualization. It looks really top notch. So if you're in the space of visualizing these readings such as MRIs, this is a wonderful post to kind of show you what is possible here. And again, with links to really dive into it further with these great packages, like I mentioned, Rnifty, as well as the GIFSKI package. Yeah. Really great stuff here.
[00:41:01] Mike Thomas:
Yeah, Eric. And this is just super super cool, and it shows us just how fantastic the graphics capabilities are in R. And there were a few publications that were referenced in Joe's blog post that makes me think about doing reproducible science, and how just impactful this type of work is. And we can create these publication ready visualizations programmatically based upon the data. And not only can we, but in my opinion, I think we have to. We must. My only other takeaway here is that I need to see this somehow integrated with the Ray render package for interactive 3 d visualizations of the brain and the different hemispheres.
So shout out to Tyler Morgan Wall, the author of the Ray render package. If you're listening, you know, no pressure, but it would be pretty cool. We don't nerd snipe on this show, do we? Never. It's usually me putting the pressure on on myself or you doing the same for yourself. So it's about time that we just start calling some other people out.
[00:42:05] Eric Nantz:
Alright. Well, if you wanna see more material like that and more, well, guess what? There is a lot more to this particular issue. As always, our weekly is jam packed with additional packages, great tutorials, great resources. We'll take a couple of minutes for our additional finds here. And we are talking about those that are contributing via add on packages to the r community and our data dot table discussion. Well, there is, in terms of contributing to the language itself, there's we have covered a lot of great initiatives to bring developers that are wanting to contribute to R itself in a friendly, you know, open way, whether it's these meetups or these hackathon type dev sessions with the r forwards group and whatnot.
Well, another great resource that's being developed as we speak and really taking, you know, it to the next level is what's called the CRAN cookbook. We'll have a link to this, from the rconsortium blog in the show notes, of course, and this is meant to be a more user friendly yet technical, you know, recipe type book, which is gonna help those new, you know, those new to the R language in terms of wanting to contribute to the language itself. And it really is great for those that are dealing with issues submitting their packages to CRAN and the different issues that they can come across.
There could be just about, you know, formatting your package as metadata with a description file. Could be about your documentation itself of your functions and, of course, within the code itself. So I don't think it's gonna get into all the weeds of those c header issues that we talked about. But, nonetheless, I think this is a great companion to have with, say, the R packages reference of an offer by Hadley Wickham and Jenny Brian as you're thinking about, you know, getting that submission to CRAN and some of the things that might happen that might blindside you if you're not careful, but a great way and accessible way to look at how you might, you know, get around those issues and how to solve them in a way to get your package on CRAN. So I know this effort has been in the works for quite a while. It's great to see this really taking mature and how it's being used by the Grand team itself and where they're going forward with it. So, yeah, credit to the team, the Jasmine Daley, Benny Ultimate, and others, involved with that project.
[00:44:29] Mike Thomas:
And, Mike, what did you find? Shout out Jasmine Daley, Shiny developer in Connecticut. Heck yeah. Yeah. Yeah. Gotta love that. A a bunch of great stuff. You know, one blog that I I found, which was just sort of really nice to reflect on was from Isabel Velasquez over on the POSIT team. It's the 2024 POSIT year and review. A little trip down memory lane of of all that POSIT worked on, in the last year. And, you know, a lot, obviously, around their R packages for interfacing with LLMs, like Elmer, you know, Shiny Assistant, Shiny Chat, Pal, you know, as well as things out of the Quarto ecosystem, including Quarto dash boards being a big one.
Obviously, all sorts of stuff coming out of the Python ecosystem on both the R and, Python or excuse me, out of the Shiny, world in both the R and Python side of the equation there. Some great advancements from tidy models and survival analysis that were really impactful to our team as well as a bunch of others across, you know, WebR. I know that's one that, you know, impacted you quite a bit in 2024. So it was just nice taking some time to do that reflection on, you know, all of the work and investment that Posit and the other folks that, contributed to projects that Posit maintains.
Shout out myself with one small, contribution to HTTR 2 in the latest release, just yesterday. So thank you. It's, I think, 2 words in the function documentation for Oxygen comments, but we'll take what we can get. I was I was on the list. So, thanks, Hadley, for including me among, I guess, 70 other folks who contributed to that latest release of HTTR 2. But it's it's cool to all collaborate together in the open, and I think that's all I'm trying to say here. And it was nice to to walk through a lot of these projects that have impacted me and my team, you know, in 2024 and beyond. Yeah.
[00:46:32] Eric Nantz:
Excellent. And you're on the score sheet as they say. They can't take that away from you. That is awesome stuff. I I congratulations on that. Yeah. It's amazing the breadth of contributions in this space. And, certainly, you know, AI was a focus for them with their with their awesome innovations of Elmer and Maul and the shiny assistant, which I'm really a big fan of now. I was one of the skeptics on that, so it's great to see him doing it and doing it responsibly. So credit to the the team on that. But, no, they're not just resting on those, innovations. As you said, the WebR stuff really is is jiving. It's really, getting a lot of traction, and I can't wait to see where we take that effort in 2025.
And when I say we more like what Jor Stag comes up with, and I'm just a very, shameless, consumer of it, but I love the stuff that he comes up with. So lots of lots of great stuff here. There's never a dull moment in the in deposit team here. And and never a dull moment. The rest of the issues we say, lots of great resources that Rio has put together for us. But, of course, as I said, this is a community effort, and we cannot do this alone. So one of the ways that we keep this project going is for your contributions. You're just a pull request away from getting your name as a future contributor to our weekly itself.
The great blog post, maybe a new package that you authored or you discovered, there's so many opportunities for it. Head to our weekly.org. You're gonna find a link to get your poll request up there right in the top right corner. We have a a handy draft for template for you to follow. Again, leveraging GitHub for the win on that. And, our curator of the week will be glad to get it in for you. And, of course, we love hearing from you. And, we did hear from one of our more devoted listeners about, apparently, I do not pronounce names well. And even though I practice it, I got, called out for it. So, I'm gonna get it right this time.
Nicola Rennie. Sorry for butchering her name. All these months in the previous highlight podcast. Thank you, Mike. Thank you, Mike, for calling not you, Mike. Mike Smith for calling me out on that. I need to be honest with it. So feedback warranted, and, I I may have to have a little cookie jar of, like, funding I send a nickel every time I butcher her name in the future. Hopefully, never again. Nonetheless. Okay. We love hearing from you, and the ways you can do that are through the contact page and the episode show notes as well as on social media as well. I am [email protected], I believe, is how to call it. Again, this is still not natural to me yet. I'll get there. I'm also on Mastodon with [email protected], and I'm on LinkedIn. Search my name, and you'll find me there.
And, Mike, hopefully, you don't have a hard time butchering names, so we're we're we're gonna find you.
[00:49:18] Mike Thomas:
You can find me, I think, primarily on on blue sky nowadays at mikedashthomas dotbsky.social, or on mastodon, [email protected]. Or, probably even better on LinkedIn, if you search Ketchbrook Analytics, k e t c h b r o o k, you can see what I'm up to lately.
[00:49:42] Eric Nantz:
Awesome stuff. And a little quick shout out to, good friends of mine, from the art community, John Harmon and and Yani City, because I've been using in some of this r wiki infrastructure I'm building, some of the packages they've created to interact with interface with the Slack API of all things. So it's been pretty fun learning there. And, again, h t t r two is involved in some of that as well. So it all comes full circle in this fancy schmancy calendar thing I'm I'm making. So always learning all the time. So shout out to those 2 for making some really elegant packages to interface with an API of a framework that seemed really cryptic to me at the time. But now now it's starting to demystify a little bit. Alright. Well, we'll close-up shop here for episode 192 of our weekly highlights, and we'll be back with another episode of our weekly highlights next week.