Episode 204 of R Weekly Highlights is jam-packed with four highlights! We discuss the recent improvements to the recipes package in the tidymodels suite, Nicola Rennie's adventures with this year's 30 day chart challenge, going back to math school with Jon Carroll's latest explorations in digit rotation, and our picks out of the excellent collection of the top 40 R packages for March.
Episode Links
Episode Links
- This week's curator: Batool Almarzouq - @batoolmm.bsky.social (Bluesky) & @batool664 (X/Twitter)
- recipes 1.3.0
- 30 Day Chart Challenge 2025
- Rotation with Modulo
- March 2025 Top 40 New CRAN Packages
- Entire issue available at rweekly.org/2025-W19
- {recipes} Pipeable steps for feature engineering and data preprocessing to prepare for modeling https://recipes.tidymodels.org/
- tidymodels books https://www.tidymodels.org/books/index.html
- Nicola's 30 day chart challenge browser app https://nrennie.rbind.io/30DayChartChallenge/
- {tidyplots} https://tidyplots.org
- {echarts4r} https://echarts4r.john-coene.com
- https://www.reddit.com/r/hockey/comments/1kfd57j/oc_round_2_matchups_are_set_lego_style/
- Using {autodb} https://cran.r-project.org/web/packages/autodb/vignettes/autodb.html
- {kitesquare} Kite-Square plots for contingency tables https://github.com/HUGLeipzig/kitesquare
- {EQRN} Extreme Quantile Regression Neural Networks for Risk Forecasting https://cran.r-project.org/package=EQRN https://opasche.github.io/EQRN/
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)
- Mike Thomas: @[email protected] (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter)
- A Simple Coin Flip Can Change Fate - Final Fantasy VI - Level 99 - https://ocremix.org/remix/OCR02692
- No Time - Chrono Trigger - Jaxx - https://ocremix.org/remix/OCR00386
- Sly Thai Guy - Street Fighter II: The World Warrior - Tim Sheehy - https://ocremix.org/remix/OCR01750
[00:00:03]
Eric Nantz:
Hello, friends. We are back at episode 204 of the Our Weekly Highlights podcast. Editor's note, I still am not used to saying 200 something yet, but I'll get there eventually. Nonetheless, this is the podcast where we talk about the awesome resources that are shared every single week at rweekly.0rg. My name is Eric Nance, and I'm delighted you join us wherever you are around the world. And joining me at the hip after some, rather nervous firmware updates is my awesome cohost, Mike Thomas. Mike, how are you doing?
[00:00:34] Mike Thomas:
Gotta love Windows, Eric. Yep. It was one of those mornings where I booted up the computer and had to sit and wait for about ten minutes for a firmware update,
[00:00:42] Eric Nantz:
But I think we're okay. We made it through. We made it through. I still remember one of my older Android cell phones did one of those firmware updates, and it never woke up after that. So I'm one of those people, even in 2025, I never see anything do a firmware update. I'm crossing all my fingers and crossing my toes, and nothing pulls up. But but you are here, and I am here as well. No, no car failures this time around. So luckily, from that, tech seems to be kind to us. But what else is kind to us? Our weekly itself, we got a lot to talk about today because we have not one, not two, not three, but four highlights to talk about today. So we're gonna keep you keep your listening ears busy here. And our curator this week is Beto Almerzak. And as always, she had tremendous help from our fellow Rwiki team members and contributors like all of you around the world with your poll requests and other wonderful suggestions. And we'll get your appetite wet for our content here. And because we are talking about one of the mainstays in the tidy models ecosystems for getting all of your data and getting your models into shape for that eventual cooking that you do with your machine learning and other types of models. And that is the recipes package.
Recently had a 1.3.0 release updated on CRAN and, tidy models software engineer at POSIT. And Mel is back with, a nice roundup of the key new features that you may wanna pay attention to. First of which, admittedly, this one gives me flashbacks to my early days of r and especially reading certain CSV files. You'll know why in a second. Because there is an argument that's been around in the recipes ecosystem, strings as factors. Now before you run away from your earbuds here and wondering, oh, no. Why are you bringing up such bad memories here? No. This is this is important because there are situations where by default, a string variable that's in your predictor set of variables that you're gonna use in your model would be converted to factors by default, but that argument was located in a function called prep.
Recipes, like a lot of the tidy models factors, have a lot of function names that definitely either make me hungry or make me wanna cook right away, nonetheless. Now that argument, string as factors, has been moved to recipe itself, the recipe function to be exact. This is important because now you can take advantage of probably disabling this conversion in many different places in your recipes workflow. So, Emil's got a a nice example where they can see what it looked like previously and using in the prep function, and now using it in the actual recipe call right off the bat so you don't have to keep using it over and over again if you have multiple prep calls in your in your pipeline for getting your recipe ready to go.
And then also, there are some deprecations to be aware of, one of which is for the step underscore select function. This was apparently due to maybe some issues people are having in their recipe workflows, and it didn't really play as nicely what they felt was going on in the rest of the workflow. So there are cases where you can kinda migrate to some more, fit for purpose functions on this. One of which is step underscore r m, which then you can feed in the columns that you want removed in that particular step without doing the negative or minus sign annotation that you might have done before with step select. And also for selecting variables, they do recommend that you are able to use some of the more select helpers, which actually gets touched on later in the post from the tidy select package that's often using dply or tidy r and and the like. They're bringing better support for that in the recipes package itself.
And then lastly, there is, a new argument being added to the step underscore dummy function. And there's nothing dumb about this. This is the the convenience function to convert a variable into indicator like variables often in categorical outcomes or predictors so you can use these in many of the machine learning models. And in the past, if you want to specify how the contrast associated with that categorization were performed, you had to feed that into an option statement such as options, contrast, equal, and then you might give it the either treatment, c o n t r dot treatment, or the c o n t r dot poly. Those are two ways to define contrast.
Now you can define that in step underscore dummy itself. To me, I tend to not use options unless I absolutely have to because I think it does break a little bit of reproducibility. And somebody that I'll help debug code with has sent an option way, way, way somewhere else in their dot r profile, in their home directory, or some other place. And I'm like, what gives? Why is this digit rounding happening here? What what what's going on? So nothing I'm bitter. So I think anything you can have more transparent in the function itself is a win in my book. So I think that is a great, great new, new, capability.
And then also, there has been some performance improvements where the step underscore impute underscore bag function, which is often used whenever you have to do the bagging pro preprocessing for your machine learning model, such as random force or gradient boosting. And when I've used those in the past, there have been memory issues in the past for tree models. And now they've minimized the footprint quite a bit. And, in the example, m o sites here going from 75 megabytes run bagging step now down to 20. That's a nice nice win when you scale this up immensely to different, CPU architectures, especially if you're trying to be nice to your HPC admins, which I try to do.
And there's a lot more to this re release. The blog post has a link to the release notes if you wanna get more details. But it's great to see the momentum on tidy models keeps pushing forward. And, yeah, even the, artwork in this blog post makes me wanna eat right after this, Mike. How about you?
[00:07:19] Mike Thomas:
Same here. Yeah. A couple interesting things, you know, on the removal of, the step select are essentially deprecating that. It looks like the recommendations there are that you use just dplyr select to select which variables to use in your model before passing the data into the recipe where possible. And now that step select is deprecated, they're recommending that you use step remove. So that will, by default, remove the variables that you include in that statement. But if you negate them, it will include the negated ones. So it's almost like a flip of what step select was previously doing. So you may have to wrap your mind around that a little bit if you wanna continue to leverage this inside of your recipes workflow. And then the other one, Eric, that I'm not sure if you touched on was that tidy select can now be used anywhere, and everywhere within recipes. And I think previously, there were a few different steps or or functions, if you will, like step underscore p l s and step and puke bag that we also talked about that have maybe specialized arguments that previously asked you to select variables for the arguments of those functions in a particular way that was not compatible with your typical tidy selection, and now that is no more. Tidy selection can be used pretty much everywhere throughout all of the different functions within the recipes ecosystem. So that's definitely a great quality of life improvement here. And, you know, that memory reduction for step impute bag for those that fit a lot of bag tree models, I think is great. You know, I was a little surprised that Ames dataset that they use in the example, I think is pretty small. Right?
And that took 75 megs previously and now takes 20 megs. So if you scale that out to maybe a significant size dataset, I don't know, you know, what the relative size of your average machine learning dataset would be compared to the Ames dataset, but, not only would that that memory footprint probably be bigger than those numbers, but also your what you're getting back is is gonna be bigger as well. So that's a great improvement there. And always love to see that the Tidymodels team continues to work on the recipes package because, in my opinion, and I know that it's an opinionated framework in and of itself, but it is just fantastic for doing machine learning for the hardcore machine learning folks out there to be able to set up these workflows and sort of like this purr like fashion that allows you to really iterate over all sorts of different combinations and different algorithms that you wanna leverage really quickly in a way that, you know, we used to take a crazy amount of time from my days in Karat.
[00:10:01] Eric Nantz:
I was gonna say the same thing. That's how I first got into machine learning of ours was the Karat package. And so I remember being one of Max Kuhn's, early days of the Karat package. He did a workshop at a local stat conference here in the Midwest, and I was mesmerized by the power of it. But, boy, it took me a while to get my head around it. Recipes, alongside Tanya Miles itself, I really like the having dedicated packages for each part of this process. And if you're wondering, like, where do you go to learn more, we'll put a link in the show notes to the Tidymodels website, which has tons of resources, the online book. There are actually more than one online book about all this, some of which is more of the practical day to day, and others more on the statistical theory behind a lot of the choices. There's a lot a lot to choose from here. And like you said, Mike, it's really great to see see this getting pushed forward. And, even though my day job doesn't do as much of the machine learning side as it used to, there are cases where, especially when you get to that infamous exploratory analysis perspective, you're asked to do a lot with less, and you could do much worse than throwing some tiny models action in front of that dataset to find that hidden subgroup of of records that shows that improved effect. Don't ask me why I say it that way. It's a volatile industry. What can I say?
And up next, we're gonna visit the, visualization corner as I like to call it here on the highlights program because, it's an annual tradition since about 2022 or perhaps sooner. Around the springtime, there is a thirty day chart challenge that's, released online, and the thirty day chart challenge was originated by Cedric Sher and Dominic Royal. This is, like I said, been running a few years now, and we've seen a lot of esteemed members of the art community take their, take their, visualization chops to work and learn some new things along the way. And in particular, a frequent contributor to the highlights program, Nicole O'Raney, has bat is back with her roundup and her perspective on what she created for the 2025 edition of the thirty day chart challenge.
And she's been doing this for a few years now, but she kinda did things slightly differently this time around, which I think are kind of intriguing. And one of which is she tried to challenge herself in different technical stacks. She is definitely primarily an R user, but as we've seen in previous highlights, she is venturing into other visualization frameworks along the way, notably, Observable and d three for some great interactive web based visualizations. And also alongside this, in this thirty day chart challenge, they don't necessarily tell you what the data is you're visualizing, unlike the TidyTuesday initiative that does ask you to create a visual based on the dataset in mind for that week. Here, you might be asked to do a a challenge of a visualization showing relationships between variables or maybe a time series visualization, but you get to pick the data.
So what she was trying to do to save time and re and kinda challenge herself at the same time is to reuse datasets in multiple ways. And we'll get to some of that shortly. And then lastly, try the upper, annotation game, so to speak, on some of these visuals by adding more text and annotations to these charts to really, you know, show some key insights in these, in these visualizations. So when we talk about the data she used, there was one particular dataset that she drew upon for multiple visualizations and that was a set released by the our world and data project on income inequality as shown by the share of income that's received by what is deemed the richest 1% of the world population.
And she thought this would be a great way to you visualize not just that dataset on a whole, but also look at where how it relates to other data sets that may or may not be in that same domain. And this rather, fun plot we're about to talk about here is getting back to maybe some examples you've had in either a statistics or data science course where the underlying theme is correlation doesn't necessarily imply causation, because this awesome line plot here is showing how, apparently, as the rich got richer, so to speak, and taking the more, the share of the of the income, some reason, the population of wild birds in the world just started to decrease in that time span.
Are they buying the wild birds and just keeping them capped? I I hey. Hey. You didn't you didn't hear from me. I have no idea. But if that doesn't make sense, you're right. It doesn't make sense. That was definitely for the, spurious relationship side of the challenge. But that was, an interesting take on it because usually you hear about the ice cream and murder rate example. This was this was a fresh take on that, so I I enjoyed that. And then also, that that's just one, plot that she made. If you're interested in all the plots that she made for each thirty day or thirty of the thirty day challenge, she actually created a Shiny app powered by Shiny Live and WebAssembly, which we'll link to in the show notes. And you can look at each day and what she created, which will have links to the source code behind those visualizations. So I have fun, in the in the prep for this episode, just thumbing through that. And there was one other one that caught my eye maybe for, again, a lot of the wrong reasons, but this time very much intentional.
There was a part of the thirty day challenge where you were asked to create a chart associated with extraterrestrial domains like aliens or whatnot. So she makes this, I I do I had to I had to say this abomination of a bad chart, which looks like my screen is glitching in different parts, like if somebody hit the hit the screen in the monitor because it's all weird, jaggy, fuzzy resolution. But she made this chart. You'll have to see it to believe it. Apparently, in future blog posts to illustrate just what can go wrong in these visualizations, and I can already see about five or six things horribly wrong with that chart. So that was, that was a doozy there. But, again, you can check out the, the rest of the visualizations in her Shiny app that she created. Well, again, I have a link to that in the show notes.
But again, this isn't just about being creative. She actually learned some great, you know, new and interesting technical observations too, Mike. So why don't you walk us through that? Yeah, Eric. Some of these plots are really interesting. That extraterrestrial
[00:17:12] Mike Thomas:
one is pretty funny as well. You know, one of the things that Nicola among the many things that she was trying to try out, you know, knew during this thirty day chart challenge is she leveraged not only g g plot, but also explored the tidy plots package. And she was really encouraged by how easily portable her ggplot code was to the tidy plots framework, it seems like, and just how clean the syntax was. And I know that that's one that we've touched on a few times in the highlights. It's not one that I've had the chance to get to yet, but I have been blown away by everything that I have seen from a documentation perspective. On the tidy plots package, I think there's a great landing page for tidy plots that has some fantastic examples. And the visuals just look beautiful. And when you take a look at the code behind the visuals, it's it's almost shocking how, concise the the syntax is, how few lines of code it takes to get to really nice plots using tidy plots. Another thing that she explored on on the R side was using the g g I r f package in JavaScript to create drop down menus. I'm a little interested in how she did that and if that ported over to her Shiny Live app as well.
If it was some sort of an HTML widget that she was able to create. Are you familiar with doing that, Eric?
[00:18:29] Eric Nantz:
I have not tried that, so I'm really looking at that code too after this episode.
[00:18:34] Mike Thomas:
Definitely. And then she built a a function to do something which is something that I always struggle with depending on what operating system I'm working on and what dependencies are there and which aren't, but take a screenshot of charts that you built. And she leveraged, the h t t p u v package, particularly, a run static server and then a stop all servers function from that to, I guess, spin up a server and then use web shot two to actually take the picture, to take the screenshot and then store that as an image in a particular folder, using this really small, nice, little tidy function she developed called save j s p n g. So that was great. And she also leveraged some Python as well. And she still wants to work on some examples of combining Python data processing and visualization in Observable via Quarto, which I can attest to is super easy because we have that just great ability to pass our and Python objects into observable OJS data frames, I think, with, you know, very, very little code, and that's all documented in the Quarto website. So I would encourage you to check that out if you'd like. It's great for static servers, like GitHub pages and things like that, or even just sharing an HTML file with someone if you don't have really the capacity to stand up, you know, a whole server for a a shiny server type of web application.
So that's awesome. She had some advice here for chart challenges if you're somebody who's interested in getting into these thirty day chart challenges to try to push your data visualization skills. And one of her biggest pieces of advice is try to use the the same data for for multiple charts. And I think that that's a great piece of advice to try to get you to think less about the data and more about the visualization side and and really try to open up that creative side that at least me and maybe some other data scientists out there, may be weaker on, you know, compared to the the data processing side. She had a couple highlights of some other favorite charts that other folks built throughout the challenge, which are are absolutely beautiful. And and like you, Eric, I loved checking out her Shiny Live app and just taking a look at all of these different charts through the past four years that she's developed across thirty days. So I guess that's a 20 different possible charts that you can take a look that Nikola developed. And not only can you take a look at these beautiful charts, you can see the source code behind them. And it, again, makes me wanna create my first Shiny live app that is something I still haven't gotten to either, but it's, it's high on my list
[00:21:11] Eric Nantz:
here. We gotta rectify that pretty soon. I know. Because, you know, it's moving fast, man. It's moving fast. But I've yeah. I really enjoyed all these visuals here. And, you know, I have the the creative chops of, of a stick figure when I do visuals, but, boy, it sure is inspiring to see what Nicola and other great, community members, are able to come up within these in these chart challenges. So, yeah, if you're ever if you're on, like, the Mastodon or Blue Sky, you'll see a lot of these charts being thrown out there on social media, much like the TidyTuesday visualization. So it's a lot a lot you can be inspired by. Sometimes you'll see them in the most unlikely places too because I actually did see it wasn't in the thirty day challenge, but, you know, during the hockey playoffs, I've I frequent the hockey subreddit just to see what, you know, angst or excitement is out for various teams. And there is somebody that did a visualization of the brackets of the teams progressing to the next round, but they did it in the style of Lego bricks.
Bernie, you would have little Lego dots indicating the wins and then and and showing the team with, like, a nameplate with the the names on them. It was and the Stanley Cup in a gray kind of, Lego mosaic, like, setup. Man, there if I got, like, two months of just uninterrupted time, I would create that in our just see how, you know, ugly or good it looks because there's lots of no matter where you look, there's always places to up your game. And I definitely appreciate Nicole taking, you know, some new ideas, new frameworks, and seeing, you know, where she could push the envelope on.
And, also, you know, when you're doing interactive visualizations, not necessarily the topic of this post here, boy, I just love those those packages out there with what we call the batteries included, so to speak. I've had another project at the day job where ECharts four r by John Coon has saved my behind on this really nice line plot. Interactive zooming on the x axis right off the bat, tool tips right off the bat. It just is so elegant. I think those are great. When you have the idea for the visual, you just don't have the time to get into the nuts and bolts of JavaScript and CSS land. EChars4R is one of those that just takes care of so much for you. So I'll never turn on a chance to plug that awesome package. It's the best.
We're gonna take quite a diversion in the highlights because we're gonna go from a visualization corner to really challenging yourself with some more fun math teasers. But, admittedly, after I play with this, I almost can't believe it works, but it actually does. So this blog post for this next highlight comes to us from a fellow Our Weekly curator, Jonathan Carroll, who as you've heard in previous highlights, he loves to test his might both on interesting mathematical challenges and interesting languages that conduct them. And this post has both. So in this case, he was inspired by a post that he saw on Mastodon, that was sent by Greg Egan on in this kind of brain teaser that actually has a solution.
So imagine you have a number that has a a set amount of digits, let's say, four digits. And then you want to, what we call, rotate the numbers. So imagine, you know, the number one, two, three, four, that's four digits, and you wanna rotate it so that maybe the the one and two go behind the three and four. It's almost like if you have pieces for this, you're just kinda shuffling the order a little bit. Apparently now I'm gonna try and narrate this in audio as best I can. There is a mathematical derivation to get to that solution for any number if it meets one of the following constraints.
If that number n is not the same as 10 to the power of the number of digits of that number minus one, there is a formula you can use to get that derivation. It goes like this. The n the number n times 10 to the power of the number of digits, k in this case, and the mod or module, which is like in division, getting the remainder of that, the re and the the right hand side of this is 10 to the power of the number of digits minus one. Yes. This works. I tried this out. I voted my r session. I did that number one, two, three, four. I wanna rotate it by three digits and then see if I could get that right answer, which is four, one, two, three. And, yes, that formula actually works. I'll put that in the well, John did that as well. He almost didn't believe it worked either, but he thought, okay. How do I make this more general? It's great for a specific example.
He has solutions in his blog post, both in the r language and the Julia language, which as you read it, they do make a bit of sense. He's able to get the length or the number of digits in that number. An interesting trick in r to do that is that the n char function will automatically convert, like, an actual double or integer to a character and then get the number of digits. That's kinda slick in and of itself. But you have to watch out if it's a negative number. It won't quite give you the right answer in that case. But then he's able to plug that in and then do, you know, the mod operator and r, which is 2% signs, and get a function, you know, very simple function of one liner to get this done. And sure enough, it works.
Not quite vectorized, though. So that's where he has an interesting package called vec, which he's written about in his his blog before, which apparently does some neat trick with ring buffers and using the indices of vectors to get to that same solution. So, again, all the code in this in this in the, blog post, and this function rotate indeed works in both ways on that. Julia, again, pretty similar, albeit he does think it might be a bit simpler because there is an actual n digits function that is fit for purpose for actual numeric values and not just strings, which means it isn't as susceptible to that negative sign issue that the r solution might be.
But there are some other little gotchas along the way you have to be careful of. But, again, it's a one liner. And when you read the code, even if you're not a Julia programmer, you can kinda get a hang for how that works. And they do have a built in way to deal with vectors with using, you know, built in Julia primitives and doing cyclic rotation. And, again, all straightforward, at least in my opinion. Here's where things get odd, folks. Because as John has done in his previous blog post, he ventures into very niche languages for mathematics, and he talks about two of them here.
And this is where you have to flip your perspective a bit because now you have to deal with glyphs as the operators, not just typing just to be fancy. These are actual operators in these languages. And even in the case of this APL language, which he has written about before, and I think we featured on previous highlights, the order isn't left to right. It's right to left. So that's a lot to get your head around, but he narrates through how he found the different glyphs for these various operators going from the typical assignment all the way to the mod operator, the negative operator, and and men and and more. And then when you read it, I mean, I I can't even narrate it here in audio, but you'll see it looks pretty fancy, like, almost like a mathematical equation.
But my goodness, it works. And he puts a link to the, try a pl.org interactive web based editor. If you're not convinced, you can try this stuff yourself and convince yourself, but it indeed works. And then if that wasn't enough, there is a newer language out there called UIUA or Yuya. I have no idea how to pronounce that one, but it is using stack based language approaches, which, apparently has its own kind of domain specific operators, again, in glyphs as well. But this is a newer language. So there's not as much documentation out there ever than the official docs. But sure enough, he's able to get a solution in place, but he has to think in terms of, like, stacking things on top of each other to make this happen and not just the typical left to right or even the case of APL, right to left.
So that, that that definitely got my head scratched a little bit. But, again, fun brain teaser to look at. But this is a great way to challenge yourself. I'm one of my wife's group chats with some friends. She has some friends that do these math teasers, like, every week, and they all I think they would be all over this kind of language if they haven't heard about that. Hadn't heard about it already. But again, if you are not convinced that works, I I would say the proofs in this post and the best part is you can try this out in multiple languages to feed that curiosity or in the case of me, humble myself for what I don't know about how this stuff works. So, Mike, they may talk me talk me up a bit. Am I that bad with math or is this just really out there?
[00:31:18] Mike Thomas:
I'm gonna decide on this is really out there. Makes me wanna go back to the data visualization post because I feel like I'm over my skis. But this is I'm just teasing. This is it's a good blog post for me to get back into mathematics, get back into my mathematics, get back into my roots a little bit, and, you know, tease my brain on this. And like you, I I implemented this in r as well just to try it out for myself. And that's sort of the name of the game for how I was able to, I think, learn math more efficiently than I did prior to learning programming because, you know, if I could program it, I could understand it. Right? And I think that that goes for a lot of folks out there. And initially, I think when I I looked at this just reading the blog post, I was like, what is going on? And then put it to code a little bit, and it it started to make more sense to me a little bit. It's a cool little brain teaser. I think sometimes you are able to create some of these, you know, mathematical brain teasers if you if you'd like sort of backwards. And I'm curious to understand who originally sort of came up with this concept and and how they arrived at it. But it's it's super interesting. It's always funny to me sometimes I see on social media, like, somebody claiming to do magic with math and and, you know, telling, hey, what's what's your age? And then figure out, you know, okay, subtract that from what year it is and then multiply it by nine, and then they arrive back at their age or something like that. People are mind blown, and it's it's like it's, it's pretty straightforward. So this one, certainly not as straightforward, especially when you get down to the last two sections where we get into APL and UI, UA, where we're starting to use hieroglyphs, in my opinion, to, stitch together some of these equations. And we're going from, left to right to to right to left. Is APL the one that literally stands for a programming language?
[00:33:10] Eric Nantz:
You got it. Great callback to a previous episode.
[00:33:14] Mike Thomas:
Oh my goodness. I can see why. But I really appreciate, you know, taking a a small little brain teaser like this and, implementing it across these four different languages just to showcase these different languages, the differences between them, the strengths and weaknesses that they have, and serve as an introduction to some of these languages for folks like me who are certainly less familiar with them. Always always really interesting to read a Jonathan Carroll blog post.
[00:33:40] Eric Nantz:
Yeah. And I do have some, co colleagues at at the day job that I think would be all over this if they get a free moment. So I'll probably send this their way and see what kind of brain teasers they can come up with and and see if APL makes sense to them or not. Boy, I just I'm I'm amazed, all kidding aside. I'm amazed at the the creators of all this because I don't think I could come up with this if you gave me ten years to come up with something like this. My goodness. I have a I have a hard enough time just of art itself. So this is this is amazing stuff. And and speaking of amazing, so we're going back to r for a second. We will close out this episode with our last highlight here, which is, been a mainstay in the art community in terms of this content for years and years. It's may have had different platforms have been shared on, but Joe Rickert, who has been, one of the founding, you know, members of the art community that I've been following for years and years is back with his at monthly top 40 r packages roundup.
I'm sure if you've been around the community for the years that Mike and I have, you may have seen this in his previous Revolutions blog post back when he was at Revolutions. That was in a deposit, and now he's got this in the Our Works blog, which is a community effort led by him and Isabel Velasquez, with other contributors as well. So, again, we definitely don't have the time to talk about all all 40 packages that Joe has come up with here. But as usual, he organized these by different domains, different industries, or or different topics.
And there were two that definitely caught my eye because they are relating to different projects I'm working on at the moment. The first of which is a package called AutoDB. This one's interesting because this one here is a way to, in essence, help you normalize a data frame by going to the third normal form. And, admittedly, I don't know the nuts and bolts of that definition. But if you dealt with databases before, this will look a bit familiar to you. This has been authored by Mark Webster, and he does point out in the vignette that we'll link to in the show notes, he is using this as a way to inform data cleaning operations, investigating his data.
He stresses that this is not meant for database design. Even though you will see graph you know, tabular representations of the different relationships in these datasets, it'll look very similar to a schematic you might have for a robust database table relationship diagram or an ERD, I believe they're called. But, again, really interesting because I am dealing with data at the moment coming from this API of this, vendor with some other datasets we have in house, and we do have to merge them together. And more importantly, we have to make sure that the constraints that we have told the vendor to follow are actually being met. So I could definitely see this as a tool to investigate these relationships and make sure things are are sound.
And maybe it goes in ICU of a package like point blank by Richie Yoon that also looks at their data quality as a whole if you have an ETL kind of process like I'm dealing with. So that one caught my eye. And then on the visualization side of it, a package called Kite Square offered by John Weidenhauft. Hopefully, I said that right. This is a way to visualize contingency tables, which if you are dealing with you know, if you have, you know, often you'll have, like, observed versus truth or rater agreement. You often do contingency tables for this. This is an interesting way to have kind of a square grid, but then showing these, either discrepancies or similarities between these variables.
And he he he likes the name kite because it rhymes with chi as in the chi square test that you often see in contingency tables. So I'll have a link to the GitHub repository for the package in the show notes, but definitely an intriguing visualization to kind of bling up that what can be a pretty utilitarian table representation. Maybe get a little visual around those marginal effects and those counts all in the same, umbrella, all powered by g g plot two under the hood. So you can feed this in a lot of different places.
[00:38:26] Mike Thomas:
But that's only two of the 40 here, Mike. Did you see any that caught your eye as well? On the risk management side, which is a space that I operate in quite a bit, there's a new package called EQRN in all caps. I got a version one dot o, zero dot one dot o release, and it provides a framework for forecasting and extrapolating measures of conditional risk, for example, you know, of extreme and unprecedented events, which we seem to have more and more of lately, including quantiles and exceedance probabilities using extreme value statistics, which is a concept I'm familiar with, and flexible neural network architectures, which is interesting. There's a couple papers behind it and and quite a few folks that worked on this package, and it has a nice package down site, which I am in the middle of checking out right now. So I'll one, if you're in the risk space to look out for. Awesome. Yeah. I I thought that might catch your eye, so I wanna make sure
[00:39:22] Eric Nantz:
you got that one. And, yeah, very nice documentation here. So I could see this being very useful in your in your toolbox for risk based assessments and analysis. Yep. Great. Great find there. And like we said, there's a ton more in this blog post for you to choose from. And even for the for the, colleagues in life sciences, there is, I kinda say this with with with kindness. Yet another tables related package, called Clinify. I I I laugh about this because it was not long ago, folks, that we didn't have any great table packages in our and now I don't wanna say too much, but let's just say there's a lot of choice to be had here, and that's been a hot topic in some of my circles recently. But, again, great to see advancements on on that side of it too.
And, again, that's just scratching the surface's entire issue. Right? There's a lot more to choose from of additional packages that have been released, a great new set of resources, and we don't have time for additional finds because we had a whopper of an episode today, but definitely definitely check that out. See what catches your eye, and thus, yeah, you'll be able to get in contact with either the source code behind that great package or blog post, but also maybe make some connections in the future. And we also like to connect with all of you.
First of all, the projects definitely keeps going because of contributions from you out there with your poll request for new content to be shared in the upcoming issue. How do you do that? Let's go to rww.rrg. We got a link in the top right corner in the, in the page, a nice little ribbon structure. You can click there and get to the GitHub repo right from there with a nice issue template. You can also get in touch with us personally on the, episode show notes. We have a link to the contact page. You can send us your feedback there and get in touch with us on the various social media outlets. I am on blue sky the the I am on Blue Sky these days with @rpodcastatbsky.social.
Also, I'm on Mastodon with @rpodcastatpodcastindex.social, and I'm on LinkedIn. You can search my name, and you'll find me there. And, Mike, where can Alyssa get a hold of you?
[00:41:35] Mike Thomas:
Sure. You can find me on blue sky at mike dash thomas dot b s k y dot social. Or on LinkedIn, if you search Ketchbrooke Analytics, k e t c h b r o o k, you can see what I'm up to. Awesome stuff. And again, thank you so much for joining us for listening to episode 204 of our weekly highlights, and we'll be back with another edition
[00:41:57] Eric Nantz:
next week.
Hello, friends. We are back at episode 204 of the Our Weekly Highlights podcast. Editor's note, I still am not used to saying 200 something yet, but I'll get there eventually. Nonetheless, this is the podcast where we talk about the awesome resources that are shared every single week at rweekly.0rg. My name is Eric Nance, and I'm delighted you join us wherever you are around the world. And joining me at the hip after some, rather nervous firmware updates is my awesome cohost, Mike Thomas. Mike, how are you doing?
[00:00:34] Mike Thomas:
Gotta love Windows, Eric. Yep. It was one of those mornings where I booted up the computer and had to sit and wait for about ten minutes for a firmware update,
[00:00:42] Eric Nantz:
But I think we're okay. We made it through. We made it through. I still remember one of my older Android cell phones did one of those firmware updates, and it never woke up after that. So I'm one of those people, even in 2025, I never see anything do a firmware update. I'm crossing all my fingers and crossing my toes, and nothing pulls up. But but you are here, and I am here as well. No, no car failures this time around. So luckily, from that, tech seems to be kind to us. But what else is kind to us? Our weekly itself, we got a lot to talk about today because we have not one, not two, not three, but four highlights to talk about today. So we're gonna keep you keep your listening ears busy here. And our curator this week is Beto Almerzak. And as always, she had tremendous help from our fellow Rwiki team members and contributors like all of you around the world with your poll requests and other wonderful suggestions. And we'll get your appetite wet for our content here. And because we are talking about one of the mainstays in the tidy models ecosystems for getting all of your data and getting your models into shape for that eventual cooking that you do with your machine learning and other types of models. And that is the recipes package.
Recently had a 1.3.0 release updated on CRAN and, tidy models software engineer at POSIT. And Mel is back with, a nice roundup of the key new features that you may wanna pay attention to. First of which, admittedly, this one gives me flashbacks to my early days of r and especially reading certain CSV files. You'll know why in a second. Because there is an argument that's been around in the recipes ecosystem, strings as factors. Now before you run away from your earbuds here and wondering, oh, no. Why are you bringing up such bad memories here? No. This is this is important because there are situations where by default, a string variable that's in your predictor set of variables that you're gonna use in your model would be converted to factors by default, but that argument was located in a function called prep.
Recipes, like a lot of the tidy models factors, have a lot of function names that definitely either make me hungry or make me wanna cook right away, nonetheless. Now that argument, string as factors, has been moved to recipe itself, the recipe function to be exact. This is important because now you can take advantage of probably disabling this conversion in many different places in your recipes workflow. So, Emil's got a a nice example where they can see what it looked like previously and using in the prep function, and now using it in the actual recipe call right off the bat so you don't have to keep using it over and over again if you have multiple prep calls in your in your pipeline for getting your recipe ready to go.
And then also, there are some deprecations to be aware of, one of which is for the step underscore select function. This was apparently due to maybe some issues people are having in their recipe workflows, and it didn't really play as nicely what they felt was going on in the rest of the workflow. So there are cases where you can kinda migrate to some more, fit for purpose functions on this. One of which is step underscore r m, which then you can feed in the columns that you want removed in that particular step without doing the negative or minus sign annotation that you might have done before with step select. And also for selecting variables, they do recommend that you are able to use some of the more select helpers, which actually gets touched on later in the post from the tidy select package that's often using dply or tidy r and and the like. They're bringing better support for that in the recipes package itself.
And then lastly, there is, a new argument being added to the step underscore dummy function. And there's nothing dumb about this. This is the the convenience function to convert a variable into indicator like variables often in categorical outcomes or predictors so you can use these in many of the machine learning models. And in the past, if you want to specify how the contrast associated with that categorization were performed, you had to feed that into an option statement such as options, contrast, equal, and then you might give it the either treatment, c o n t r dot treatment, or the c o n t r dot poly. Those are two ways to define contrast.
Now you can define that in step underscore dummy itself. To me, I tend to not use options unless I absolutely have to because I think it does break a little bit of reproducibility. And somebody that I'll help debug code with has sent an option way, way, way somewhere else in their dot r profile, in their home directory, or some other place. And I'm like, what gives? Why is this digit rounding happening here? What what what's going on? So nothing I'm bitter. So I think anything you can have more transparent in the function itself is a win in my book. So I think that is a great, great new, new, capability.
And then also, there has been some performance improvements where the step underscore impute underscore bag function, which is often used whenever you have to do the bagging pro preprocessing for your machine learning model, such as random force or gradient boosting. And when I've used those in the past, there have been memory issues in the past for tree models. And now they've minimized the footprint quite a bit. And, in the example, m o sites here going from 75 megabytes run bagging step now down to 20. That's a nice nice win when you scale this up immensely to different, CPU architectures, especially if you're trying to be nice to your HPC admins, which I try to do.
And there's a lot more to this re release. The blog post has a link to the release notes if you wanna get more details. But it's great to see the momentum on tidy models keeps pushing forward. And, yeah, even the, artwork in this blog post makes me wanna eat right after this, Mike. How about you?
[00:07:19] Mike Thomas:
Same here. Yeah. A couple interesting things, you know, on the removal of, the step select are essentially deprecating that. It looks like the recommendations there are that you use just dplyr select to select which variables to use in your model before passing the data into the recipe where possible. And now that step select is deprecated, they're recommending that you use step remove. So that will, by default, remove the variables that you include in that statement. But if you negate them, it will include the negated ones. So it's almost like a flip of what step select was previously doing. So you may have to wrap your mind around that a little bit if you wanna continue to leverage this inside of your recipes workflow. And then the other one, Eric, that I'm not sure if you touched on was that tidy select can now be used anywhere, and everywhere within recipes. And I think previously, there were a few different steps or or functions, if you will, like step underscore p l s and step and puke bag that we also talked about that have maybe specialized arguments that previously asked you to select variables for the arguments of those functions in a particular way that was not compatible with your typical tidy selection, and now that is no more. Tidy selection can be used pretty much everywhere throughout all of the different functions within the recipes ecosystem. So that's definitely a great quality of life improvement here. And, you know, that memory reduction for step impute bag for those that fit a lot of bag tree models, I think is great. You know, I was a little surprised that Ames dataset that they use in the example, I think is pretty small. Right?
And that took 75 megs previously and now takes 20 megs. So if you scale that out to maybe a significant size dataset, I don't know, you know, what the relative size of your average machine learning dataset would be compared to the Ames dataset, but, not only would that that memory footprint probably be bigger than those numbers, but also your what you're getting back is is gonna be bigger as well. So that's a great improvement there. And always love to see that the Tidymodels team continues to work on the recipes package because, in my opinion, and I know that it's an opinionated framework in and of itself, but it is just fantastic for doing machine learning for the hardcore machine learning folks out there to be able to set up these workflows and sort of like this purr like fashion that allows you to really iterate over all sorts of different combinations and different algorithms that you wanna leverage really quickly in a way that, you know, we used to take a crazy amount of time from my days in Karat.
[00:10:01] Eric Nantz:
I was gonna say the same thing. That's how I first got into machine learning of ours was the Karat package. And so I remember being one of Max Kuhn's, early days of the Karat package. He did a workshop at a local stat conference here in the Midwest, and I was mesmerized by the power of it. But, boy, it took me a while to get my head around it. Recipes, alongside Tanya Miles itself, I really like the having dedicated packages for each part of this process. And if you're wondering, like, where do you go to learn more, we'll put a link in the show notes to the Tidymodels website, which has tons of resources, the online book. There are actually more than one online book about all this, some of which is more of the practical day to day, and others more on the statistical theory behind a lot of the choices. There's a lot a lot to choose from here. And like you said, Mike, it's really great to see see this getting pushed forward. And, even though my day job doesn't do as much of the machine learning side as it used to, there are cases where, especially when you get to that infamous exploratory analysis perspective, you're asked to do a lot with less, and you could do much worse than throwing some tiny models action in front of that dataset to find that hidden subgroup of of records that shows that improved effect. Don't ask me why I say it that way. It's a volatile industry. What can I say?
And up next, we're gonna visit the, visualization corner as I like to call it here on the highlights program because, it's an annual tradition since about 2022 or perhaps sooner. Around the springtime, there is a thirty day chart challenge that's, released online, and the thirty day chart challenge was originated by Cedric Sher and Dominic Royal. This is, like I said, been running a few years now, and we've seen a lot of esteemed members of the art community take their, take their, visualization chops to work and learn some new things along the way. And in particular, a frequent contributor to the highlights program, Nicole O'Raney, has bat is back with her roundup and her perspective on what she created for the 2025 edition of the thirty day chart challenge.
And she's been doing this for a few years now, but she kinda did things slightly differently this time around, which I think are kind of intriguing. And one of which is she tried to challenge herself in different technical stacks. She is definitely primarily an R user, but as we've seen in previous highlights, she is venturing into other visualization frameworks along the way, notably, Observable and d three for some great interactive web based visualizations. And also alongside this, in this thirty day chart challenge, they don't necessarily tell you what the data is you're visualizing, unlike the TidyTuesday initiative that does ask you to create a visual based on the dataset in mind for that week. Here, you might be asked to do a a challenge of a visualization showing relationships between variables or maybe a time series visualization, but you get to pick the data.
So what she was trying to do to save time and re and kinda challenge herself at the same time is to reuse datasets in multiple ways. And we'll get to some of that shortly. And then lastly, try the upper, annotation game, so to speak, on some of these visuals by adding more text and annotations to these charts to really, you know, show some key insights in these, in these visualizations. So when we talk about the data she used, there was one particular dataset that she drew upon for multiple visualizations and that was a set released by the our world and data project on income inequality as shown by the share of income that's received by what is deemed the richest 1% of the world population.
And she thought this would be a great way to you visualize not just that dataset on a whole, but also look at where how it relates to other data sets that may or may not be in that same domain. And this rather, fun plot we're about to talk about here is getting back to maybe some examples you've had in either a statistics or data science course where the underlying theme is correlation doesn't necessarily imply causation, because this awesome line plot here is showing how, apparently, as the rich got richer, so to speak, and taking the more, the share of the of the income, some reason, the population of wild birds in the world just started to decrease in that time span.
Are they buying the wild birds and just keeping them capped? I I hey. Hey. You didn't you didn't hear from me. I have no idea. But if that doesn't make sense, you're right. It doesn't make sense. That was definitely for the, spurious relationship side of the challenge. But that was, an interesting take on it because usually you hear about the ice cream and murder rate example. This was this was a fresh take on that, so I I enjoyed that. And then also, that that's just one, plot that she made. If you're interested in all the plots that she made for each thirty day or thirty of the thirty day challenge, she actually created a Shiny app powered by Shiny Live and WebAssembly, which we'll link to in the show notes. And you can look at each day and what she created, which will have links to the source code behind those visualizations. So I have fun, in the in the prep for this episode, just thumbing through that. And there was one other one that caught my eye maybe for, again, a lot of the wrong reasons, but this time very much intentional.
There was a part of the thirty day challenge where you were asked to create a chart associated with extraterrestrial domains like aliens or whatnot. So she makes this, I I do I had to I had to say this abomination of a bad chart, which looks like my screen is glitching in different parts, like if somebody hit the hit the screen in the monitor because it's all weird, jaggy, fuzzy resolution. But she made this chart. You'll have to see it to believe it. Apparently, in future blog posts to illustrate just what can go wrong in these visualizations, and I can already see about five or six things horribly wrong with that chart. So that was, that was a doozy there. But, again, you can check out the, the rest of the visualizations in her Shiny app that she created. Well, again, I have a link to that in the show notes.
But again, this isn't just about being creative. She actually learned some great, you know, new and interesting technical observations too, Mike. So why don't you walk us through that? Yeah, Eric. Some of these plots are really interesting. That extraterrestrial
[00:17:12] Mike Thomas:
one is pretty funny as well. You know, one of the things that Nicola among the many things that she was trying to try out, you know, knew during this thirty day chart challenge is she leveraged not only g g plot, but also explored the tidy plots package. And she was really encouraged by how easily portable her ggplot code was to the tidy plots framework, it seems like, and just how clean the syntax was. And I know that that's one that we've touched on a few times in the highlights. It's not one that I've had the chance to get to yet, but I have been blown away by everything that I have seen from a documentation perspective. On the tidy plots package, I think there's a great landing page for tidy plots that has some fantastic examples. And the visuals just look beautiful. And when you take a look at the code behind the visuals, it's it's almost shocking how, concise the the syntax is, how few lines of code it takes to get to really nice plots using tidy plots. Another thing that she explored on on the R side was using the g g I r f package in JavaScript to create drop down menus. I'm a little interested in how she did that and if that ported over to her Shiny Live app as well.
If it was some sort of an HTML widget that she was able to create. Are you familiar with doing that, Eric?
[00:18:29] Eric Nantz:
I have not tried that, so I'm really looking at that code too after this episode.
[00:18:34] Mike Thomas:
Definitely. And then she built a a function to do something which is something that I always struggle with depending on what operating system I'm working on and what dependencies are there and which aren't, but take a screenshot of charts that you built. And she leveraged, the h t t p u v package, particularly, a run static server and then a stop all servers function from that to, I guess, spin up a server and then use web shot two to actually take the picture, to take the screenshot and then store that as an image in a particular folder, using this really small, nice, little tidy function she developed called save j s p n g. So that was great. And she also leveraged some Python as well. And she still wants to work on some examples of combining Python data processing and visualization in Observable via Quarto, which I can attest to is super easy because we have that just great ability to pass our and Python objects into observable OJS data frames, I think, with, you know, very, very little code, and that's all documented in the Quarto website. So I would encourage you to check that out if you'd like. It's great for static servers, like GitHub pages and things like that, or even just sharing an HTML file with someone if you don't have really the capacity to stand up, you know, a whole server for a a shiny server type of web application.
So that's awesome. She had some advice here for chart challenges if you're somebody who's interested in getting into these thirty day chart challenges to try to push your data visualization skills. And one of her biggest pieces of advice is try to use the the same data for for multiple charts. And I think that that's a great piece of advice to try to get you to think less about the data and more about the visualization side and and really try to open up that creative side that at least me and maybe some other data scientists out there, may be weaker on, you know, compared to the the data processing side. She had a couple highlights of some other favorite charts that other folks built throughout the challenge, which are are absolutely beautiful. And and like you, Eric, I loved checking out her Shiny Live app and just taking a look at all of these different charts through the past four years that she's developed across thirty days. So I guess that's a 20 different possible charts that you can take a look that Nikola developed. And not only can you take a look at these beautiful charts, you can see the source code behind them. And it, again, makes me wanna create my first Shiny live app that is something I still haven't gotten to either, but it's, it's high on my list
[00:21:11] Eric Nantz:
here. We gotta rectify that pretty soon. I know. Because, you know, it's moving fast, man. It's moving fast. But I've yeah. I really enjoyed all these visuals here. And, you know, I have the the creative chops of, of a stick figure when I do visuals, but, boy, it sure is inspiring to see what Nicola and other great, community members, are able to come up within these in these chart challenges. So, yeah, if you're ever if you're on, like, the Mastodon or Blue Sky, you'll see a lot of these charts being thrown out there on social media, much like the TidyTuesday visualization. So it's a lot a lot you can be inspired by. Sometimes you'll see them in the most unlikely places too because I actually did see it wasn't in the thirty day challenge, but, you know, during the hockey playoffs, I've I frequent the hockey subreddit just to see what, you know, angst or excitement is out for various teams. And there is somebody that did a visualization of the brackets of the teams progressing to the next round, but they did it in the style of Lego bricks.
Bernie, you would have little Lego dots indicating the wins and then and and showing the team with, like, a nameplate with the the names on them. It was and the Stanley Cup in a gray kind of, Lego mosaic, like, setup. Man, there if I got, like, two months of just uninterrupted time, I would create that in our just see how, you know, ugly or good it looks because there's lots of no matter where you look, there's always places to up your game. And I definitely appreciate Nicole taking, you know, some new ideas, new frameworks, and seeing, you know, where she could push the envelope on.
And, also, you know, when you're doing interactive visualizations, not necessarily the topic of this post here, boy, I just love those those packages out there with what we call the batteries included, so to speak. I've had another project at the day job where ECharts four r by John Coon has saved my behind on this really nice line plot. Interactive zooming on the x axis right off the bat, tool tips right off the bat. It just is so elegant. I think those are great. When you have the idea for the visual, you just don't have the time to get into the nuts and bolts of JavaScript and CSS land. EChars4R is one of those that just takes care of so much for you. So I'll never turn on a chance to plug that awesome package. It's the best.
We're gonna take quite a diversion in the highlights because we're gonna go from a visualization corner to really challenging yourself with some more fun math teasers. But, admittedly, after I play with this, I almost can't believe it works, but it actually does. So this blog post for this next highlight comes to us from a fellow Our Weekly curator, Jonathan Carroll, who as you've heard in previous highlights, he loves to test his might both on interesting mathematical challenges and interesting languages that conduct them. And this post has both. So in this case, he was inspired by a post that he saw on Mastodon, that was sent by Greg Egan on in this kind of brain teaser that actually has a solution.
So imagine you have a number that has a a set amount of digits, let's say, four digits. And then you want to, what we call, rotate the numbers. So imagine, you know, the number one, two, three, four, that's four digits, and you wanna rotate it so that maybe the the one and two go behind the three and four. It's almost like if you have pieces for this, you're just kinda shuffling the order a little bit. Apparently now I'm gonna try and narrate this in audio as best I can. There is a mathematical derivation to get to that solution for any number if it meets one of the following constraints.
If that number n is not the same as 10 to the power of the number of digits of that number minus one, there is a formula you can use to get that derivation. It goes like this. The n the number n times 10 to the power of the number of digits, k in this case, and the mod or module, which is like in division, getting the remainder of that, the re and the the right hand side of this is 10 to the power of the number of digits minus one. Yes. This works. I tried this out. I voted my r session. I did that number one, two, three, four. I wanna rotate it by three digits and then see if I could get that right answer, which is four, one, two, three. And, yes, that formula actually works. I'll put that in the well, John did that as well. He almost didn't believe it worked either, but he thought, okay. How do I make this more general? It's great for a specific example.
He has solutions in his blog post, both in the r language and the Julia language, which as you read it, they do make a bit of sense. He's able to get the length or the number of digits in that number. An interesting trick in r to do that is that the n char function will automatically convert, like, an actual double or integer to a character and then get the number of digits. That's kinda slick in and of itself. But you have to watch out if it's a negative number. It won't quite give you the right answer in that case. But then he's able to plug that in and then do, you know, the mod operator and r, which is 2% signs, and get a function, you know, very simple function of one liner to get this done. And sure enough, it works.
Not quite vectorized, though. So that's where he has an interesting package called vec, which he's written about in his his blog before, which apparently does some neat trick with ring buffers and using the indices of vectors to get to that same solution. So, again, all the code in this in this in the, blog post, and this function rotate indeed works in both ways on that. Julia, again, pretty similar, albeit he does think it might be a bit simpler because there is an actual n digits function that is fit for purpose for actual numeric values and not just strings, which means it isn't as susceptible to that negative sign issue that the r solution might be.
But there are some other little gotchas along the way you have to be careful of. But, again, it's a one liner. And when you read the code, even if you're not a Julia programmer, you can kinda get a hang for how that works. And they do have a built in way to deal with vectors with using, you know, built in Julia primitives and doing cyclic rotation. And, again, all straightforward, at least in my opinion. Here's where things get odd, folks. Because as John has done in his previous blog post, he ventures into very niche languages for mathematics, and he talks about two of them here.
And this is where you have to flip your perspective a bit because now you have to deal with glyphs as the operators, not just typing just to be fancy. These are actual operators in these languages. And even in the case of this APL language, which he has written about before, and I think we featured on previous highlights, the order isn't left to right. It's right to left. So that's a lot to get your head around, but he narrates through how he found the different glyphs for these various operators going from the typical assignment all the way to the mod operator, the negative operator, and and men and and more. And then when you read it, I mean, I I can't even narrate it here in audio, but you'll see it looks pretty fancy, like, almost like a mathematical equation.
But my goodness, it works. And he puts a link to the, try a pl.org interactive web based editor. If you're not convinced, you can try this stuff yourself and convince yourself, but it indeed works. And then if that wasn't enough, there is a newer language out there called UIUA or Yuya. I have no idea how to pronounce that one, but it is using stack based language approaches, which, apparently has its own kind of domain specific operators, again, in glyphs as well. But this is a newer language. So there's not as much documentation out there ever than the official docs. But sure enough, he's able to get a solution in place, but he has to think in terms of, like, stacking things on top of each other to make this happen and not just the typical left to right or even the case of APL, right to left.
So that, that that definitely got my head scratched a little bit. But, again, fun brain teaser to look at. But this is a great way to challenge yourself. I'm one of my wife's group chats with some friends. She has some friends that do these math teasers, like, every week, and they all I think they would be all over this kind of language if they haven't heard about that. Hadn't heard about it already. But again, if you are not convinced that works, I I would say the proofs in this post and the best part is you can try this out in multiple languages to feed that curiosity or in the case of me, humble myself for what I don't know about how this stuff works. So, Mike, they may talk me talk me up a bit. Am I that bad with math or is this just really out there?
[00:31:18] Mike Thomas:
I'm gonna decide on this is really out there. Makes me wanna go back to the data visualization post because I feel like I'm over my skis. But this is I'm just teasing. This is it's a good blog post for me to get back into mathematics, get back into my mathematics, get back into my roots a little bit, and, you know, tease my brain on this. And like you, I I implemented this in r as well just to try it out for myself. And that's sort of the name of the game for how I was able to, I think, learn math more efficiently than I did prior to learning programming because, you know, if I could program it, I could understand it. Right? And I think that that goes for a lot of folks out there. And initially, I think when I I looked at this just reading the blog post, I was like, what is going on? And then put it to code a little bit, and it it started to make more sense to me a little bit. It's a cool little brain teaser. I think sometimes you are able to create some of these, you know, mathematical brain teasers if you if you'd like sort of backwards. And I'm curious to understand who originally sort of came up with this concept and and how they arrived at it. But it's it's super interesting. It's always funny to me sometimes I see on social media, like, somebody claiming to do magic with math and and, you know, telling, hey, what's what's your age? And then figure out, you know, okay, subtract that from what year it is and then multiply it by nine, and then they arrive back at their age or something like that. People are mind blown, and it's it's like it's, it's pretty straightforward. So this one, certainly not as straightforward, especially when you get down to the last two sections where we get into APL and UI, UA, where we're starting to use hieroglyphs, in my opinion, to, stitch together some of these equations. And we're going from, left to right to to right to left. Is APL the one that literally stands for a programming language?
[00:33:10] Eric Nantz:
You got it. Great callback to a previous episode.
[00:33:14] Mike Thomas:
Oh my goodness. I can see why. But I really appreciate, you know, taking a a small little brain teaser like this and, implementing it across these four different languages just to showcase these different languages, the differences between them, the strengths and weaknesses that they have, and serve as an introduction to some of these languages for folks like me who are certainly less familiar with them. Always always really interesting to read a Jonathan Carroll blog post.
[00:33:40] Eric Nantz:
Yeah. And I do have some, co colleagues at at the day job that I think would be all over this if they get a free moment. So I'll probably send this their way and see what kind of brain teasers they can come up with and and see if APL makes sense to them or not. Boy, I just I'm I'm amazed, all kidding aside. I'm amazed at the the creators of all this because I don't think I could come up with this if you gave me ten years to come up with something like this. My goodness. I have a I have a hard enough time just of art itself. So this is this is amazing stuff. And and speaking of amazing, so we're going back to r for a second. We will close out this episode with our last highlight here, which is, been a mainstay in the art community in terms of this content for years and years. It's may have had different platforms have been shared on, but Joe Rickert, who has been, one of the founding, you know, members of the art community that I've been following for years and years is back with his at monthly top 40 r packages roundup.
I'm sure if you've been around the community for the years that Mike and I have, you may have seen this in his previous Revolutions blog post back when he was at Revolutions. That was in a deposit, and now he's got this in the Our Works blog, which is a community effort led by him and Isabel Velasquez, with other contributors as well. So, again, we definitely don't have the time to talk about all all 40 packages that Joe has come up with here. But as usual, he organized these by different domains, different industries, or or different topics.
And there were two that definitely caught my eye because they are relating to different projects I'm working on at the moment. The first of which is a package called AutoDB. This one's interesting because this one here is a way to, in essence, help you normalize a data frame by going to the third normal form. And, admittedly, I don't know the nuts and bolts of that definition. But if you dealt with databases before, this will look a bit familiar to you. This has been authored by Mark Webster, and he does point out in the vignette that we'll link to in the show notes, he is using this as a way to inform data cleaning operations, investigating his data.
He stresses that this is not meant for database design. Even though you will see graph you know, tabular representations of the different relationships in these datasets, it'll look very similar to a schematic you might have for a robust database table relationship diagram or an ERD, I believe they're called. But, again, really interesting because I am dealing with data at the moment coming from this API of this, vendor with some other datasets we have in house, and we do have to merge them together. And more importantly, we have to make sure that the constraints that we have told the vendor to follow are actually being met. So I could definitely see this as a tool to investigate these relationships and make sure things are are sound.
And maybe it goes in ICU of a package like point blank by Richie Yoon that also looks at their data quality as a whole if you have an ETL kind of process like I'm dealing with. So that one caught my eye. And then on the visualization side of it, a package called Kite Square offered by John Weidenhauft. Hopefully, I said that right. This is a way to visualize contingency tables, which if you are dealing with you know, if you have, you know, often you'll have, like, observed versus truth or rater agreement. You often do contingency tables for this. This is an interesting way to have kind of a square grid, but then showing these, either discrepancies or similarities between these variables.
And he he he likes the name kite because it rhymes with chi as in the chi square test that you often see in contingency tables. So I'll have a link to the GitHub repository for the package in the show notes, but definitely an intriguing visualization to kind of bling up that what can be a pretty utilitarian table representation. Maybe get a little visual around those marginal effects and those counts all in the same, umbrella, all powered by g g plot two under the hood. So you can feed this in a lot of different places.
[00:38:26] Mike Thomas:
But that's only two of the 40 here, Mike. Did you see any that caught your eye as well? On the risk management side, which is a space that I operate in quite a bit, there's a new package called EQRN in all caps. I got a version one dot o, zero dot one dot o release, and it provides a framework for forecasting and extrapolating measures of conditional risk, for example, you know, of extreme and unprecedented events, which we seem to have more and more of lately, including quantiles and exceedance probabilities using extreme value statistics, which is a concept I'm familiar with, and flexible neural network architectures, which is interesting. There's a couple papers behind it and and quite a few folks that worked on this package, and it has a nice package down site, which I am in the middle of checking out right now. So I'll one, if you're in the risk space to look out for. Awesome. Yeah. I I thought that might catch your eye, so I wanna make sure
[00:39:22] Eric Nantz:
you got that one. And, yeah, very nice documentation here. So I could see this being very useful in your in your toolbox for risk based assessments and analysis. Yep. Great. Great find there. And like we said, there's a ton more in this blog post for you to choose from. And even for the for the, colleagues in life sciences, there is, I kinda say this with with with kindness. Yet another tables related package, called Clinify. I I I laugh about this because it was not long ago, folks, that we didn't have any great table packages in our and now I don't wanna say too much, but let's just say there's a lot of choice to be had here, and that's been a hot topic in some of my circles recently. But, again, great to see advancements on on that side of it too.
And, again, that's just scratching the surface's entire issue. Right? There's a lot more to choose from of additional packages that have been released, a great new set of resources, and we don't have time for additional finds because we had a whopper of an episode today, but definitely definitely check that out. See what catches your eye, and thus, yeah, you'll be able to get in contact with either the source code behind that great package or blog post, but also maybe make some connections in the future. And we also like to connect with all of you.
First of all, the projects definitely keeps going because of contributions from you out there with your poll request for new content to be shared in the upcoming issue. How do you do that? Let's go to rww.rrg. We got a link in the top right corner in the, in the page, a nice little ribbon structure. You can click there and get to the GitHub repo right from there with a nice issue template. You can also get in touch with us personally on the, episode show notes. We have a link to the contact page. You can send us your feedback there and get in touch with us on the various social media outlets. I am on blue sky the the I am on Blue Sky these days with @rpodcastatbsky.social.
Also, I'm on Mastodon with @rpodcastatpodcastindex.social, and I'm on LinkedIn. You can search my name, and you'll find me there. And, Mike, where can Alyssa get a hold of you?
[00:41:35] Mike Thomas:
Sure. You can find me on blue sky at mike dash thomas dot b s k y dot social. Or on LinkedIn, if you search Ketchbrooke Analytics, k e t c h b r o o k, you can see what I'm up to. Awesome stuff. And again, thank you so much for joining us for listening to episode 204 of our weekly highlights, and we'll be back with another edition
[00:41:57] Eric Nantz:
next week.