The innovations of the R community never cease to amaze us! How a programmatic approach to generating markdown was vital to a high-profile Quarto site, a novel infograph of Bob's Burgers sentiment analysis, and updates to the next evolution of object-oriented programming in R.
Episode Links
Episode Links
- This week's curator: Ryo Nakagawara - @[email protected] (Mastodon) & @R_by_Ryo) (X/Twitter)
- Guide to generating and rendering computational markdown content programmatically with Quarto
- Bob’s Burgers Episode Fingerprints by Season
- S7 0.2.0
- Entire issue available at rweekly.org/2024-W46
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon) and @theRcast (X/Twitter)
- Mike Thomas: @mike-thomas.bsky.social & @[email protected] (Mastodon) & @mike_ketchbrook (X/Twitter)
- Cammy's London Drizzle - Super Street Fighter II: The New Challengers - MkVaff - https://ocremix.org/remix/OCR00453
- Bar Hopping - Streets of Rage 2 - jaxx - https://ocremix.org/remix/OCR00437
[00:00:03]
Eric Nantz:
Hello, friends. We're back with episode a 185 of the R Weekly Highlights podcast. This is the weekly podcast where we talk about the excellent resources that are shared in the highlights section and elsewhere in this week's current our weekly issue. My name is Eric Nantz, and I'm delighted you join us wherever you are around the world. And he is back. He couldn't stay away forever, but my awesome cohost, Mike Thomas, has graced us with his presence again. Mike, how are you doing? Not that you've had any, like, free time or anything. I'm doing well, Eric. Calling in from a new location,
[00:00:33] Mike Thomas:
full 2 miles away from my old location. Settling in pretty well so far, and I have also migrated locations virtually. And as of last night, I am on Blue Sky.
[00:00:50] Eric Nantz:
Oh, you are? Okay. I'm feeling the the peer pressure now if you're there. It sounds like it's the place to be. Definitely taken an awful lot in the in the data science sector. Now, of course, I do still have a a very, vested interest in the mess, the the fediverse as well, which is just kind of a part of, but kind of it's a little dicey folks. And I'm still trying to sort out just like anyone else. But, yeah, you may see me on there in the future. We, actually did spin up a new account for the R pharma team on Blue Sky that we just put out there before the conference that took place a couple weeks ago, which I'll have a lot more to say about that once we get through the editing of all the recordings. But, nonetheless, that was a major event. But we're here to talk on our weekly, of course. And this week, our issue was curated by Ryo Nakagorora with, as always, tremendous help from our fellow, our weekly team members, and contributors like all of you around the world with your poll request and other suggestions.
And my goodness, Mike, we're gonna lead off with one that really is a both mind blowing, you know, showcase of what's possible with our end portal and just really inspiring as well to see just how far you can take these dynamically generated reports. And this has been authored by a a good friend of ours, from the highlights, Andrew Heist, who speaking of free time, I don't know how he has a free time to knock this stuff out. My goodness. I need whatever he's eating. And Andrew Heist blog, yeah, is always gonna be super detailed,
[00:02:24] Mike Thomas:
probably pretty long, and again, blow me away with how much he was able to accomplish.
[00:02:32] Eric Nantz:
Yep. And we'll dive into those details now because this post is talking about how he approached generating and rendering dynamically computed content written in markdown, but programmatically with quarto, which is something I've had a little bit of diving into with the r markdown days with various snippets, especially of reusable kind of sections in a report. But Andrew just blows my stuff out of the water, what he accomplishes here. So let's let's set the stage here because there's a big picture here. So, yeah, there was a little thing called the election that happened a a weekend or so ago. And apparently, Andrew was helping out in the state of Idaho with assembling the election results and then surfacing them into a really fancy quartile website, which love to see that quartile in the real world, so to speak in a not so, not so boring situation to say the least. Lots of eyes on that on that side of the news.
And apparently, Andrew has put this in a very sophisticated ETL pipeline, which is merging ETL concepts with, wait for it, targets to help process the data from different sources, whether they're online or from another storage and then creating a tidy, I guess, data store, he calls it, that everything else can be built upon. And then another pipeline is gonna take that tidy set of results and then actually generate the Cortl report and website programmatically. He teased that he's gonna share more about that, but I'm saying, Andrew, not that you have infinite time. I'd love to see a deep dive into those target pipelines because I'm always eager to see just the directions you take with that. But diving into the rest of the poster, he talks about kind of the, the meat and potatoes of what the website was surfacing, which was taking advantage of. It's really neat features in the quarto UI for HTML, where you can have these tabbed, you know, tab, panels much like you do in a shiny app and in quartile as well, which gave both a tabular visualization and an interactive map of the election results across districts in Idaho.
So you could kind of choose which which display you like. And then he shows kind of in general, you might have in the portal dashboard set up these different sections with code chunks to actually generate that table or that map. And sure, for maybe 1 or 2 races, so to speak in that particular state's election, you could, you know, you could just dive into that. But what if you have a 100 or more of these? Yeah. You don't wanna copy paste that stuff left and right. So he looked at how can we generate these, these table and mapping code chunks and the output from that more dynamically, not just the code itself, but where it's actually being placed in the Quartal website, IE, the Quartal syntax that's gonna be built and marked down for that.
So he first knocks out probably an elephant in the room for those that have done this before is that in code chunks, you can use the results parameter and call it as is to basically not escape that content and the basically take it as it is so that if there's an HTML report and you're spinning out markdown or HTML, it's just gonna surface that out right away when you compile the website or the report. But it doesn't always work the way you want. And he shows an example of like the, a bullet list of like the, gap minder country data where it just doesn't quite work when you need to have additional work in these chunks.
And he shows that in an example where he's trying to round certain numbers, just hypothetically the the number pi, it's only spitting out, like, the code that makes the result in the bullet list, not the actual result of that computation. So it didn't render the inline chunks of these items in the bullet list. And, of course, that's a that's a nonstarter if you're gonna do this more dynamically. So the trick that this is all built upon is to actually pre render these in line chunks before they actually appear in the document. So this is a little different. Right? This is not dynamically rendering the content, what I call just in time of the website compilation.
He is saying, let me re precompute that first and then show it into the actual quartile content. In this case, now you get the rounded versions of the pie number in that markdown text. Took me a little bit to grasp this, but I kind of see where he's going with this. And, of course, this is a trivial example. But back to the election website he was creating, he wanted to do a similar idea, but now with those tab sets of the visuals of the table and the map. So the concept does generalize. First, he's using the Gapminder data to kind of show a tidy data set that would be a scatter plot of the GDP per capital and then life expectancy.
Stuff you've seen many times before in presentations about Gapminder. And then he shows, okay, what if I want in my portal dashboard, the panel tab set, and then within each of these continents splitting up 1 by 1, echoing that particular continents, result from that tidy dataset and tidy visualization. Again, you could copy that over and over from icon in 1234 5. But again, going back to the election result, what if you have, you know, 100 of these or, you know, thousands of these potentially if you're doing a lot of, like, biomarker data or something like that? So, again, he's going to use a hybrid of the glue package and other tricks to precompute these panels dynamically. He's got a handy utility function called build panel where he's feeding in what's gonna be the chunk label, the output that is in string format, the markdown syntax as injecting that set of like the panel title, the index of the plot.
So that then that's going to render dynamically in the report itself. He does verify it works first by running the function and he gets the markdown of the heading and the continent name and then the coach on as if he typed it himself. So mission accomplished on that front. And then he's able to basically loop through this at one point with a combination of the knitter functions, knit function, which again, quarto with the r execution engine is built upon knitters. So knitter, like, we've we've spoken for years on this show, Mike, about the praise for knitter and what the doors had opened. People need to realize if they don't already, that quarto itself is building upon these types of execution engines. And without knitter, there is no quarto. I'll just be hot take 101 with Eric on that. I don't think it's such a hot take. But No. I would agree with that. It is
[00:10:03] Mike Thomas:
almost, impossible to not take for granted everything that Knitter has done for us. As much as you appreciate it, you should appreciate it even more.
[00:10:17] Eric Nantz:
Absolutely. Absolutely. So, again, just this knit function alone, you may take it for granted. That's when you're in the r markdown Jaysus hitting that, you know, compile report button in the IDE. But this is the engine behind all that. So he shows a great great, great example of using that in action. And then sure enough, he's got a nice little tab panel of the different continents and the scatter plots. And that really saves you a boatload of time there. And then he's got another example later on where you could use this in more of a teaching aspect. Let's say you wanna show the different stages of building up an effective visualization.
And he does this with what he calls the evolution of a ggplot visualization, where you can have these plots saved as objects, say, from the first stage of it to the last stage of it. You can make a tidy, you know, dataset with the actual plot object that is in a list column that has, like, the actual object itself, the text around it, and a description. And guess what? You can use the same logic to create a tab panel going from stage 1 all the way to the last stage and just grabbing these different plot objects from this list in that data frame. My goodness. That's a great way to help teach your students. Like, you may start of initial plot that looks, you know, very utilitarian and all the way to last step. You got a nice theme with it. You got the nice color choices, the background looking much sharper, changing default labels.
So quartile is a teaching tool. I mean, it's already getting very popular in education sector, but, man, this is really, really top notch stuff here. So again, you can take this even further with the concept of making now going back to the election results, you got all this, you know, content kind of stitched together. Quarto does have the concept of kinda having these child reports inside an overall, you know, website or or report or what have you. So he kinda takes this into a bunch of our chunks again that are dynamically generated. And you you can, you know, do this in quartile 2. Again, built on Knitter, you can have child documents filtered into an overall report.
But, again, he's showing then that these report texts is all just marked down in the end. Right? The key is just being able to loop through that. Maybe you have a tidy data frame that's like or or overarching all of this. And then to be able to use the knit function to paste this all in together. And that's where he shows then in this generated output. Instead of, like, separate reports like verbatim or separate files, he has different sections of this continents report where the user can quickly, you know, go through the different continents, figure out what countries are involved, looking at the details in one table, looking at the plot in another.
And you put a TOC with that and you got yourself a really intricate, dynamically driven report or dashboard, whatever have you. But all this is powered by rendering that markdown content ahead of time and then using it to inject that into the overall portal website. I literally literally yesterday did something very similar to this. Albeit, I did take the manual approach, but I regretted it for reasons. But now knowing what Andrew's done here, I can have a function that generates a snippet of this case iframes of another quartile dashboard inside another quartile reveal JS slide deck.
I could just have a tidy data frame that has, like, the links to these iframes of the quartile dashboard, loop through that, get the markdown chunks, and then put that into my main reveal JS document, deploy that on pause and connect, and I'm off to the races. I have eliminated the need for PowerPoint of these one pagers of results that I had to assemble. So, Andrew, you blew my mind yet again. Credit to you for sharing sharing your knowledge here. And, yeah, the possibilities seem endless with dynamic generation of portal content. So mission accomplished, buddy.
[00:14:51] Mike Thomas:
Yeah. This is fantastic. If you were to try to, you know, create some sort of a quartile or R Markdown document that displayed these 100 different is it counties in Idaho, I think that we're trying to, track or districts in Idaho. I think that's that's what it is. I mean, you could do it by hand if you wanted to, but it would it would take you forever and a whole lot of copy and paste, and God forbid you wanna change one little aesthetic, right, that you wanna propagate everywhere else. You're you're stuck doing a find, replace all. And and I think, you know, we've discussed in the past, the the benefits of of functional programming, and they apply very similarly to quarto. And one of my favorite things about quarto, and it's interesting to me because I just came off a project where we did a lot of work exactly like this. So I can't I can't, praise this type of workflow enough. I think, Eric, you probably have this experience with Knitter as well, where if you try to do some our run some our code, execute some our process within Knitter, it'll run a little bit slower than if you did it outside of Knitter.
So what that means is that you will be able to save time, in your rendering if you have pre computed and pre created any of those R objects ahead of time instead of asking them to be created during the knit process. You extend that one rung further and and we start to get into targets. Right? And that's sort of, you know, the exact goal of targets is to be able to pre process things in your pipeline and only re execute things that need to be updated. So it's a fantastic, complement to the workflow here that Andrew has put together to demonstrate this. And, you know, one other thing that I do wanna highlight is the use of child documents. I'm not sure if Andrew Andrew called it out in the blog post. I'm not sure if at the the end of his whole process, which we don't necessarily have full insight into yet. Right?
In terms of this election quarter report that he's put together. If he leveraged child documents or not, he talks about that when you you take a look at some of these code chunks that are in line in the blog post that are some pretty large glue based chunk, code chunks that he he's putting together that we can, you know, execute as a function. At the end of the day, he talks about how some of this this code in his current workflow, might be able to be condensed, if you took advantage of child documents as well. And what that means in a in a quarto, sense is using this special tag, I think, that starts with 2, arrows pointing to the left and and ends with 2 arrows pointing to the right and, includes, no pun intended, the include verb.
I think that allows you to reference another file, another QMD file that you can have, sort of inserted into maybe like a main dotqmd file. And if you're familiar with child documents in our markdown, it's the same type of concept, just a different syntax. Those are things that we leverage heavily because if you're putting together a a large report, or a large document like the one Andrew is is putting together that has a lot of moving pieces, I think it it sort of always makes sense to try to manage those pieces as separately as possible, especially if you're working on a collaborative team. Right? Somebody can just focus on on one piece and another person can be dedicated to focus on another piece. And I think it just allows you to, sort of piece together your final product in a way that's easier to manage and easier to maintain than if you were trying to do it in some sort of a monolith.
So I can't I can't say enough about, child documents within quarto. The syntax that they have makes it really easy to do so if you're on the quarto website and you're not familiar with how to leverage child documents. Just search it on on quarto.org and and the, function syntax to be able to do that, the markdown syntax, excuse me, will be right there for you. I can't can't harp targets enough as well and sort of bringing these technologies together. I would love to see how he leverages that remote storage with targets and that local DuckDb database as well to sort of bring this whole entire solution together.
But but top to bottom, I think it's a fantastic resource for anyone who is building a large scale quarto type of document or looking to just get a better understanding and better feel for best practices around authoring quarto reports. I think the tips in here will be invaluable for you.
[00:19:35] Eric Nantz:
Yeah. And, I do have it on good authority from the, author himself that there's some big enhancements coming to Target with respect to potentially DuckDV integration and even better performance. Like, Target's already performs great. But what Will has in store for us, oh, you all are gonna love it, especially with those that have these, you know, 10,000 plus, you know, branches and pipeline, you know, targets themselves. So, yeah, stay tuned, folks. It's getting better. But, yeah, this this whole thing has so many nuggets to to choose from here that I've gotta I gotta look at this even more.
But we literally at our day job have a couple teams I wanna they're they're not satisfied with the SharePoint world, man. They want to build dynamically data driven websites of these reports that can be shared broadly across the organization that take advantage of the interactivity that the portal offers, whether it's through, you know, things like the portal dashboard, which I've become a big fan of. Obviously, we have some people looking into the observable JS side of it. But you're not gonna get this to SharePoint folks. That's my hot take too for the podcast that the this this in particular, if they can be used in a high profile situation like tracking election results, good grief. It can be used almost anywhere.
[00:21:00] Mike Thomas:
Not fully satisfied with SharePoint? Are you sure? You have that right?
[00:21:05] Eric Nantz:
I may have to double check my references on that.
[00:21:08] Mike Thomas:
A little satire for the audience.
[00:21:10] Eric Nantz:
Yeah. That that hopefully, they got. But, yeah, we've had some internal debates on that one too. And I was, but I I did like I said, I I've used principles of this, albeit not so elegantly, where I was able to get away from having the creative a rather haphazard PowerPoint slide, but I used Cortl Dashboard instead. And because of the integrations we can have with Cortl websites and and iframes going as backgrounds in a presentation deck, I was able to create what that team really wanted, which was basically an HTML based slide of a bunch of quartile dashboards without having to put the quartile dashboards in the slide deck. But as background iframes, folks, like, this is so many mind blowing things we can do with this. I I hope I can give a talk about that later because I I've learned I've learned some tricks, man, but this is this is tricks on another level. What what Andrew's done here? I'm looking forward to that too.
So as I said, yeah, it was about a week and a half ago that there was a little thing called the election, and maybe somebody needed a little pick me up after that thing which side of the fence you're on. Maybe, you know, having a little bite to eat if you had a long night. Well, who knew that this next highlight was gonna high, showcase on a show that I admittedly have not seen before. I've heard about it. But this is coming from Steven Ponce, and he has put together a web a blog post, albeit mostly a notebook, I would say, about looking at the fingerprints used in each of the seasons of the show Bob's Burgers. And admittedly, I have not seen this show before as I'm Kabooie out of my wheelhouse for this. But it's basically an infograph as the meat of this post that is looking at across the different seasons where and I have to zoom in here on my fancy 4 k monitor to look at this more carefully.
The looking at the the transcripts of this, the dialogue and looking at, say, the the length of the sentences, the unique words, variance among the sentiment of the transcripts, a little text mining action here. How many questions there were? How many explanations there were? And it's a pretty neat infograph that for each, you know, faceted by season, you get I believe they call these like the spider plots or the radial charts. I forgot the exact name of them. But it's a it's a good way to put like multiple dimensions in a circular like fashion, but not be confined to the infamous pie chart limitation. So a pretty pretty neat visual there. And it's and again, the big picture is looking at the patterns and dialogue across the 14 seasons of this show, which again, I'm I'm an old timer. I haven't seen this yet. So maybe it has to be in my, in my, queue of shows to watch when I actually get a free time moment. Nonetheless, the notebook and style is that he's got the different steps in the building this visualization much like a tidy verse kind of, you know, flowchart that you would see often in our for data science and whatnot.
Loading the packages, of course. And this is a little interesting here. We don't see as much of this lately, but Steven is using the Pac man package to orchestrate packages, which some have had great success with. I admit my my, my attempts with, Pac man were mixed at best. But, hey, if it works, it works. So he's got the snippet to load the various packages. And yes, Mike and I were remarking before the show. There is a package called Bob's Burgers R that's got the data sets that are being used in this. There's a package for everything, isn't there?
[00:25:13] Mike Thomas:
Literally, at this point. Yes. It contains the transcripts. It looks like for every episode across, all the seasons that are available.
[00:25:22] Eric Nantz:
Absolutely. So I'll have to look at that in my spare time, but he's also loading additional packages to help do the text analysis, tidy verse, of course, and patchwork, which we've spoken very highly about for being able to compose multiple ggplot objects together in any way you see fit, basically. So and then, also, he's using the camcorder package, which you heard about at positconf as well, to record the different plots as PNGs as you're going through it. So that will come in play later. So, first, the data which again, thanks to the Bob Burgers r package just simply using the transcript data data frame, and he's got it right off the bat. So that that part's done. It does a little exploration of it. Although, we don't see the result of it, but there is a handy handy function from the skimmer package called skim, which lets you it's not shown here in the output, but get like a terminal base glance of that data set or a data frame.
Really handy, especially if you're in a terminal environment. Then comes the tidying stuff. So we got a lot of dplyr grouping by summarizations. And notice that in his syntax of the summary summarize function, he's taken advantage of the dot groups declaration, which again is thanks to the dplyr version 1.8 or later, I believe. They introduced the dot by and the dot groups parameters in key functions like mutate and summarize. So you don't you don't always have to do the old dplyr group by, summarize, dplyr ungroup afterwards because, Davis Vahn and others from the Tinyverse team were saying that got annoying on a lot of users. So that's a nice little trick
[00:27:14] Mike Thomas:
that that Steven shows here. Couldn't agree more. Yes. I love the dot buy argument.
[00:27:20] Eric Nantz:
Yep. I literally just started using that for a high high priority project and it's like I can never go back to group buy anymore if I can avoid it. So And the dot keep argument within mutate, so you don't have to use transmute anymore. That's right. Yeah. I need to explore that one too. Another great quality quality of life enhancement, I should say. Next comes the visualization. So he does a lot of setup up front to get the the the labels all in order as well as kind of the the CSS, I believe, is gonna be applied to these plots, a little bit of CSS, and then using glue to dynamically put in various things, and then managing the fonts.
Lots of, fonts are added from Google with the, I believe it's the fonts package, if I'm not mistaken. I have to double check that.
[00:28:13] Mike Thomas:
I think it's I thought it was called Google Fonts, but I'm not seeing
[00:28:19] Eric Nantz:
I'm not seeing where he got that from.
[00:28:23] Mike Thomas:
I'll take a look. Keep going.
[00:28:25] Eric Nantz:
Yeah. We'll keep going. Yeah. We're we're learning here, guys. So nonetheless, then he's able to assemble the theme object in ggplot2, which again is a great way if you wanted to find that upfront with the the theme set and then a theme update. That way he can use that theme anywhere he goes from that point on or the rest of the visualizations. And then comes the main plot where it's a the the nugget here is using the geom polygon to get that nice little polygon superimposed inside this, you know, circular type display. Again, people call that a spider plot or a radial plot. Some to that effect. And then adding average lines on that. But again, flipping that to polar coordinates towards the end.
And then defining the labels and the facets by season. And then adding on top of that kind of this pattern type visualization, which, again, you wanna look at the post to to get to get the meat of it. But there's a nice little pattern. I think that's kind of serving as the background, so to speak, on the plot itself. Again, really neat to play with. I haven't done this myself before. So really intricate annotations that he makes here. And then afterwards, he's gonna save all this as, PNGs for, I believe, at least one PNG I should say for the the whole plot itself. But he assembled that with patchwork before that to combine everything together.
And then to be able to draw that, clean things up, and then using the magic package able to create neat little thumbnails of the visualization that he used, I believe, in the post itself. So little nice lot of visualization tricks here that if you're want to up your game with ggplot too. There's some really some real nuggets to share here. And then like any any good, data science citizen, he's got the nice session info at the end here. And link to the GitHub repository so you can actually see how this is composed in action. So great notebook setup here. Love the way that you can collapse the different code chunks and just get to what you're interested in. But, yeah, with a nice little, tidy dataset of transcripts and Boss Burgers, got yourself a nice little visualization to, you know, satisfy your appetite.
[00:30:48] Mike Thomas:
Puns galore. Mike, what did you think about the visualizations here? Well, the final output that is at the top of the blog, it's a beautiful infographic. It's really nicely done. I like the contrast between the background that's just sort of off white a little bit and the the purple, gradient sort of that that, ticks the spider plot, on values are are represented by. Really, really cool. Turns out it's the show text package that allows you to manage Google Fonts, has the the font add Google, I think is the, particular function within the show text package that allows you to import certain Google fonts and leverage them in your, ggplot graphics.
One thing that I don't do well enough or understand well enough is adjusting, I guess, the graphics device itself. That, you know, arguments like DPI, which I think are dots per inch. Is that what that means? Yes. Correct. The units, I don't do that enough, unfortunately. I usually just in my, quarto documents, I'm just specifying fig height, fig width, things like that, and and messing around with it until it looks halfway decent. That's stuff that I need to learn a little bit more about, but Steven, has a few different places within this notebook where he is is setting those specific configurations. So I've definitely learned a lot there and you will as well if, that's something that you struggle with like me.
One other sort of cool function that I here's a little today I learned that I can't believe I don't even know if I want to admit that I'm just learning this today but there's a ggplot argument called theme underscore update. Eric, I'm sure you knew about this one.
[00:32:35] Eric Nantz:
I knew it sporadically. I never actually used it. So I love theme minimal,
[00:32:40] Mike Thomas:
but obviously, occasionally, there's some additional things that that I wanna do on top of theme minimal, that aren't within aren't doesn't, aren't contained within arguments of the theme minimal function. And typically I'll just add a theme function after that. And I think ggplot2 knows well enough to use sort of the the last, you know, value for a particular, theme argument to set that as what's gonna be shown in the plot. But I think theme update sounds like it is probably doing a better job of that as opposed to sort of overriding, what you had written before it in your theme minimal call. So this was a new one for me. A little embarrassed to say that this is the first time that I'm coming across it but it is one that I am for sure going to be using from now on in many many many places.
Just excellent blog posts top to bottom. Love the code, love the layout here, and the end deliverable is is absolutely beautiful. So take a look for yourself.
[00:33:47] Eric Nantz:
Yeah. I'm I'm doing you know, first, you're you're you're too, kind or too hard on yourself, I should say. There are so many things in ggplot too that I always scratch the surface of. But let's put things in perspective, folks, as I'm I'm gonna take a quick look at the archive of ggplot too. Did you know that jigapotwo, the first grand release was all the way back in 2007? So it's got a lot going on. I mean, that we're coming there 20 years on that thing. I mean, we're over 15 now. So it's not surprising that there are things in there that we we didn't expect to see. But, yeah, that's why we're having this post from Steven. It's a great great reminder of the capabilities of it. So I'll like you, I'm gonna take note of that theme update function. Lots, yeah, lots of attention to detail here. I absolutely love seeing how the sausage is made. And, yeah, the font add function, I did a little digging while you were talking, is from the show text package.
I don't use that a lot in my daily work, but I definitely will take a look at that for my next Gigi Pot visualization. But, yeah, nonetheless, really great design choices. So this is a great showcase of using the principles that I've seen outlined in various workshops, such as some Cedric Sure or others about what are the best ways to build up an effective infograph that, by the way, you don't need to go to Adobe Illustrator for. You don't need to go to some proprietary product, jiji plot 2. We have a little, little getting your hands dirty, so to speak, and get you really the whole way there. It's just a a wonderful plot here. Absolutely wonderful.
[00:35:27] Mike Thomas:
2007, you say, for ggplot. Well, it's incredible to think how much, that package has evolved. And you know what else has evolved in the R ecosystem? Object oriented
[00:35:40] Eric Nantz:
programming. You got it. You got it. And, you know, you may be as an our user, you've probably used this many, many times, sometimes without even realizing it because of the elegance of the language itself. So what are we teasing here is that since about a a year or so ago, there has been a new effort sanctioned by the art consortium no less to build a new object oriented paradigm into art itself eventually. And right now, it is a new package called s 7. The post comes to us from the Tidyverse blog written by Tomas Kalamazowski and Hadley Wickham himself on the new updates to s 7 version 0.2.0.
And for the uninitiated wondering, wait, why is s 7 even exist? Well, in r itself for historically a very long time since practically the very beginning, there have been at least 2 or 3 class systems in the language. 1 of which is s 3, which is leveraged heavily by the tidyverse packages and a lot of base art functions itself to give you that kind of very easy way to say, create a visualization with the plot function. But if you feed it a data frame, it's gonna know the treat that differently than if you feed it like a single vector or, 2 vectors of say, x and y. It's a dispatching system. I'll be it very general, almost to its detriment to some people's eyes.
Then you have s 4, which brings a lot of formality, a lot of guardrails around your object oriented structures. But I can attest that it is not for the faint of heart. It is quite complex to get into the nuts and bolts of. And when I was doing bioconductor stuff back in the early part of my career, I got to know s 4 almost unwillingly well, because of that. But it never felt natural to me. And again, that's just my opinion. There are others that use us for a great success. More power to you. S 7 is trying to be kind of in between that of and sorts, giving some of the simplicity of the syntax of s 3 with some of the guardrails and, you know, safety net and more, you know, formal definitions that s 4 has.
So in this update, what's new here, there's a few, you know, minor, I would say, bug fixes, but also building blocks for new bigger features in place. Some of which include being able to support lazy property defaults, which they're saying makes the actual setup of a class much more flexible. One other idea that caught one other item that caught my eye was that they made an enhanced speed improvements for when you set and get properties using the at sign or the at sign with the assignment operator. Apparently, there were some bottlenecks with that in previous version that they've, fixed this now.
They've also expanded the compatibility with additional s three classes to help bring that transition a little more optimal for those coming from the s three side of things. And then also be able to have be able to convert a class into a subclass with the new, convert function or a modified version of the convert function. Lots more that are in the release notes, but the post also talks about how do you actually use this thing. So there's a great great kind of, example where they use this new class they call it range to help look at kind of the range between numbers, I believe.
And you can kinda see how the class methods, the class properties, the generics are defined with this. And you'll see there is a lot of shared syntax or paradigms of shared syntax of s 3, but yet you're able to define things more formally with the s 4 kind of language inside as well. They realize there are so some limitations here. It's not quite production ready, I would say, for getting into the actual R language itself, which is the end goal here. But they know that they are actively working on it. But like I said, there is a huge goal here for this. Not just to be a standalone package for the foreseeable future, but to actually get into base r. That's as opposed to things like r 6, the object oriented class used by Shiny and many other packages, that's always gonna kinda stay as is because that's a very at rapidly evolving class system often with its own needs compared to what s 3, s 4, and now s 7 are bringing. So I'll be obviously watching this space quite closely. I have not used s 7 yet, but I know some packages are starting to use it now. So I'll be very curious kind of what the what the developer, you know, shared learning is as authors start to use this more formally. So great to see updates in this space, and I guess we'll stay tuned to see what else is out there. Yeah. It's interesting, Eric. You know, you know, a lot to
[00:41:07] Mike Thomas:
digest here. You know, I think that the idea, as noted in the blog post is that hopefully s seven eventually becomes part of BaseR. It is going to be, you know, an additional learning curve for some folks. Although, hopefully, if you've been doing some object oriented programming in s 3 and or s 4, some of the syntax and the concepts will be fairly familiar and fairly easy to migrate for you. This looks like a project that has now fallen under the R Consortium, which is cool. You can check out the GitHub there to take a look at the project itself. And, there's 2 limitations that they want to point out. The first is that s seven objects can be serialized, with Save RDS but the way that it's currently authored, saves the entire class specification with each object and that may change in the future. And then, the second is that support for implicit s three classes of array or matrix, is still in development. So some things to watch out for for the the hardcore object oriented program, programming developers out there. But I'm excited to see version 0.2.0 drop, and this looks definitely a little more digestible to me than s 4.
So I'm excited to, learn a little bit more about s 7 and hopefully incorporate it into our projects going forward.
[00:42:36] Eric Nantz:
Yeah. And I know I've seen in the community our our friend John Harman's put s 7 for the paces on some of his exploration efforts, and I see some others. You know, I'm sure they're learning on it. And I'm doing a quick check on the CRAN page. There are, as of now, 4 packages that are importing s 7. So there are there are a few to choose from, and they, admittedly, there is one called Monad. Remember Monad's from our shiny Oh, gosh. Learnings from Joe Chang. So I want to check that one out. But but, yeah, nonetheless, it does seem to be moving along, and I'll be watching the space quite closely and seeing where that fits in my adventures both in Shiny and also in generic package development.
But you could have lots of adventures in the r side of things when data science, and this rest of the r weekly issue would give you, I'm sure, lots of directions to go down different adventures, different rabbit holes, and really ways to supercharge your data science exploits. And we'll take a couple minutes to talk about our additional finds here. And, fellow curator, good friend of ours, Jonathan Carroll, has released on crayon a very cool r package that he's had in development for a bit of time called Nifty. And what Nifty is is, our wrapper around a completely open source, self hostable notification service called nifty that you could spin up on, say, a cloud VPS or on your internal network and be able to, from r, push out a push notification using this package to go to wherever it needs to go.
So let's imagine you're running that big old simulation. You're away from your computer while you let the HPC, Matt, do its magic. What if you want that notification on your mobile device to say it's done? Right? Nifty might be a way to do that. So I may have to take a look at that. I've seen other packages in this space called pushbullet, I believe, from Dirk Eddybuttel. It's doing a similar thing with the Pushbullet service. But it's great to see our use in in novel ways too. So congrats to Jonathan for getting nifty on the crayon. And I saw on Mastodon, there was already a few very excited users for what they can do with that package. So that'll be in my, things to work look at during my holiday break.
[00:44:59] Mike Thomas:
And, Mike, what did you find? Well, one thing I wanna shout out is a package called Survey Down. I'm not sure if we've talked about this on the highlights before or not. It had a new release out there, and it's a pretty cool open source way for making surveys with R, quarto, Shiny, and a technology called Supabase, which looks like, how the back end data is stored. It's some type of, database. And I think it's I have a lot of use cases potentially where I need to make small forms, things like that, surveys, and I always sort of tend to wanna go overkill and develop a Shiny app instead of using something off the shelf like, I don't know, a Microsoft product or SurveyMonkey or or things like that, just because I'm I like doing those things to myself. Right? Make making my life more difficult.
So I would check out this package if you, like me, have a need to create a form or a survey and wanna do it open source and leverage some leg work that's that's already been done by a great team.
[00:46:03] Eric Nantz:
Yeah. I I've seen this come through, but I haven't dived into it much. But boy oh boy, that would be terrific for, you know, wherever you have, like, surveys you wanna conduct in your organization or some other robust data collection, great to take advantage of the R ecosystem with that space. And, before we get gentle comments, it turns out I pronounced that package completely wrong. I went to the GitHub page. It's actually pronounced notify. It's n t f y. So sorry, John. I I I should have looked at that before I started saying that. Pronouncing things is hard, so correction noted. Yes. It is. Yep. But, luckily, you don't need to correct anything else with our weekly itself. We we strive on giving you authentic content. You don't have to worry about some, AI generated bots putting that populated feed into you. This is all human generated.
We definitely wanna take advantage of automation and certain pieces of it, but, no, that's the value of this project. Completely human element and, you know, written by the community for all of you in the community. And since it is a community effort, we rely on your help. One of the best ways to help is to share those great resources you find you found online. Whether it's a new package, a new blog post, new tutorial, we're all game for all of it. You can send us a poll request, all written in markdown using that top right banner link in the corner of our weekly dotorg. You know, it can be taken directly to the GitHub poll request template.
We show you kind of the things we're expecting. It's very minimal, but we always value your contributions there. We also value hearing from you in the audience as well. We got a little contact page in the episode show notes. We love hearing from you and what you've learned from our weekly. You can also find us on the social medias as well. Apparently, we're gonna we have a new source for Mike that he'll talk about shortly. But for me, it is still the, tried and true, Mastodon account with at our podcast at podcast index dot social as well as LinkedIn.
Search my name, you'll find me there. And I'll be at very much minimal now, the Weapon X thing at the r cast. But maybe I needed to pay attention to another one, Mike. What about you?
[00:48:13] Mike Thomas:
Yeah. I guess the latest for me, which I hope to check a little bit more often than I did on Mastodon, it it feels, like like I'm pretty excited about it. It's gonna be bluesky. You can find me at mikedashthomas.bsky.social. Otherwise, you can, check me out on LinkedIn if you search, Catchbrook Analytics, ketchb r o o k. You can figure out what I'm up to.
[00:48:42] Eric Nantz:
Looks like I need to update my markdown tempo in the show notes, buddy. Sorry to do that to you. No. No. That's easy. That's easy. All marked out all the time for me. So all all good here. Well, with that, we will put a bow on this episode of our weekly highlights, and I admit I was remarking to Mike before the show. We're at a 185 now. That means we're running close to that 200 mark eventually, I should say. And, shout out to our good friends, Ellis Hughes and Patrick Ward. They're on the similar journey. It looks like they're at a 180 some episodes of IDX. So, how about a friendly wager who gets there first? Hashtag just saying.
[00:49:19] Mike Thomas:
I don't know if I wanna make that bet.
[00:49:21] Eric Nantz:
No. I don't either. Holidays coming up. Holidays are coming up, so that'll put a wrench in things. But, nonetheless, we hope you enjoyed this episode of our weekly, and we will be back with another episode. Maybe next week, maybe not. We'll see, soon. Don't know how to close those out. Alright. We're done.
Hello, friends. We're back with episode a 185 of the R Weekly Highlights podcast. This is the weekly podcast where we talk about the excellent resources that are shared in the highlights section and elsewhere in this week's current our weekly issue. My name is Eric Nantz, and I'm delighted you join us wherever you are around the world. And he is back. He couldn't stay away forever, but my awesome cohost, Mike Thomas, has graced us with his presence again. Mike, how are you doing? Not that you've had any, like, free time or anything. I'm doing well, Eric. Calling in from a new location,
[00:00:33] Mike Thomas:
full 2 miles away from my old location. Settling in pretty well so far, and I have also migrated locations virtually. And as of last night, I am on Blue Sky.
[00:00:50] Eric Nantz:
Oh, you are? Okay. I'm feeling the the peer pressure now if you're there. It sounds like it's the place to be. Definitely taken an awful lot in the in the data science sector. Now, of course, I do still have a a very, vested interest in the mess, the the fediverse as well, which is just kind of a part of, but kind of it's a little dicey folks. And I'm still trying to sort out just like anyone else. But, yeah, you may see me on there in the future. We, actually did spin up a new account for the R pharma team on Blue Sky that we just put out there before the conference that took place a couple weeks ago, which I'll have a lot more to say about that once we get through the editing of all the recordings. But, nonetheless, that was a major event. But we're here to talk on our weekly, of course. And this week, our issue was curated by Ryo Nakagorora with, as always, tremendous help from our fellow, our weekly team members, and contributors like all of you around the world with your poll request and other suggestions.
And my goodness, Mike, we're gonna lead off with one that really is a both mind blowing, you know, showcase of what's possible with our end portal and just really inspiring as well to see just how far you can take these dynamically generated reports. And this has been authored by a a good friend of ours, from the highlights, Andrew Heist, who speaking of free time, I don't know how he has a free time to knock this stuff out. My goodness. I need whatever he's eating. And Andrew Heist blog, yeah, is always gonna be super detailed,
[00:02:24] Mike Thomas:
probably pretty long, and again, blow me away with how much he was able to accomplish.
[00:02:32] Eric Nantz:
Yep. And we'll dive into those details now because this post is talking about how he approached generating and rendering dynamically computed content written in markdown, but programmatically with quarto, which is something I've had a little bit of diving into with the r markdown days with various snippets, especially of reusable kind of sections in a report. But Andrew just blows my stuff out of the water, what he accomplishes here. So let's let's set the stage here because there's a big picture here. So, yeah, there was a little thing called the election that happened a a weekend or so ago. And apparently, Andrew was helping out in the state of Idaho with assembling the election results and then surfacing them into a really fancy quartile website, which love to see that quartile in the real world, so to speak in a not so, not so boring situation to say the least. Lots of eyes on that on that side of the news.
And apparently, Andrew has put this in a very sophisticated ETL pipeline, which is merging ETL concepts with, wait for it, targets to help process the data from different sources, whether they're online or from another storage and then creating a tidy, I guess, data store, he calls it, that everything else can be built upon. And then another pipeline is gonna take that tidy set of results and then actually generate the Cortl report and website programmatically. He teased that he's gonna share more about that, but I'm saying, Andrew, not that you have infinite time. I'd love to see a deep dive into those target pipelines because I'm always eager to see just the directions you take with that. But diving into the rest of the poster, he talks about kind of the, the meat and potatoes of what the website was surfacing, which was taking advantage of. It's really neat features in the quarto UI for HTML, where you can have these tabbed, you know, tab, panels much like you do in a shiny app and in quartile as well, which gave both a tabular visualization and an interactive map of the election results across districts in Idaho.
So you could kind of choose which which display you like. And then he shows kind of in general, you might have in the portal dashboard set up these different sections with code chunks to actually generate that table or that map. And sure, for maybe 1 or 2 races, so to speak in that particular state's election, you could, you know, you could just dive into that. But what if you have a 100 or more of these? Yeah. You don't wanna copy paste that stuff left and right. So he looked at how can we generate these, these table and mapping code chunks and the output from that more dynamically, not just the code itself, but where it's actually being placed in the Quartal website, IE, the Quartal syntax that's gonna be built and marked down for that.
So he first knocks out probably an elephant in the room for those that have done this before is that in code chunks, you can use the results parameter and call it as is to basically not escape that content and the basically take it as it is so that if there's an HTML report and you're spinning out markdown or HTML, it's just gonna surface that out right away when you compile the website or the report. But it doesn't always work the way you want. And he shows an example of like the, a bullet list of like the, gap minder country data where it just doesn't quite work when you need to have additional work in these chunks.
And he shows that in an example where he's trying to round certain numbers, just hypothetically the the number pi, it's only spitting out, like, the code that makes the result in the bullet list, not the actual result of that computation. So it didn't render the inline chunks of these items in the bullet list. And, of course, that's a that's a nonstarter if you're gonna do this more dynamically. So the trick that this is all built upon is to actually pre render these in line chunks before they actually appear in the document. So this is a little different. Right? This is not dynamically rendering the content, what I call just in time of the website compilation.
He is saying, let me re precompute that first and then show it into the actual quartile content. In this case, now you get the rounded versions of the pie number in that markdown text. Took me a little bit to grasp this, but I kind of see where he's going with this. And, of course, this is a trivial example. But back to the election website he was creating, he wanted to do a similar idea, but now with those tab sets of the visuals of the table and the map. So the concept does generalize. First, he's using the Gapminder data to kind of show a tidy data set that would be a scatter plot of the GDP per capital and then life expectancy.
Stuff you've seen many times before in presentations about Gapminder. And then he shows, okay, what if I want in my portal dashboard, the panel tab set, and then within each of these continents splitting up 1 by 1, echoing that particular continents, result from that tidy dataset and tidy visualization. Again, you could copy that over and over from icon in 1234 5. But again, going back to the election result, what if you have, you know, 100 of these or, you know, thousands of these potentially if you're doing a lot of, like, biomarker data or something like that? So, again, he's going to use a hybrid of the glue package and other tricks to precompute these panels dynamically. He's got a handy utility function called build panel where he's feeding in what's gonna be the chunk label, the output that is in string format, the markdown syntax as injecting that set of like the panel title, the index of the plot.
So that then that's going to render dynamically in the report itself. He does verify it works first by running the function and he gets the markdown of the heading and the continent name and then the coach on as if he typed it himself. So mission accomplished on that front. And then he's able to basically loop through this at one point with a combination of the knitter functions, knit function, which again, quarto with the r execution engine is built upon knitters. So knitter, like, we've we've spoken for years on this show, Mike, about the praise for knitter and what the doors had opened. People need to realize if they don't already, that quarto itself is building upon these types of execution engines. And without knitter, there is no quarto. I'll just be hot take 101 with Eric on that. I don't think it's such a hot take. But No. I would agree with that. It is
[00:10:03] Mike Thomas:
almost, impossible to not take for granted everything that Knitter has done for us. As much as you appreciate it, you should appreciate it even more.
[00:10:17] Eric Nantz:
Absolutely. Absolutely. So, again, just this knit function alone, you may take it for granted. That's when you're in the r markdown Jaysus hitting that, you know, compile report button in the IDE. But this is the engine behind all that. So he shows a great great, great example of using that in action. And then sure enough, he's got a nice little tab panel of the different continents and the scatter plots. And that really saves you a boatload of time there. And then he's got another example later on where you could use this in more of a teaching aspect. Let's say you wanna show the different stages of building up an effective visualization.
And he does this with what he calls the evolution of a ggplot visualization, where you can have these plots saved as objects, say, from the first stage of it to the last stage of it. You can make a tidy, you know, dataset with the actual plot object that is in a list column that has, like, the actual object itself, the text around it, and a description. And guess what? You can use the same logic to create a tab panel going from stage 1 all the way to the last stage and just grabbing these different plot objects from this list in that data frame. My goodness. That's a great way to help teach your students. Like, you may start of initial plot that looks, you know, very utilitarian and all the way to last step. You got a nice theme with it. You got the nice color choices, the background looking much sharper, changing default labels.
So quartile is a teaching tool. I mean, it's already getting very popular in education sector, but, man, this is really, really top notch stuff here. So again, you can take this even further with the concept of making now going back to the election results, you got all this, you know, content kind of stitched together. Quarto does have the concept of kinda having these child reports inside an overall, you know, website or or report or what have you. So he kinda takes this into a bunch of our chunks again that are dynamically generated. And you you can, you know, do this in quartile 2. Again, built on Knitter, you can have child documents filtered into an overall report.
But, again, he's showing then that these report texts is all just marked down in the end. Right? The key is just being able to loop through that. Maybe you have a tidy data frame that's like or or overarching all of this. And then to be able to use the knit function to paste this all in together. And that's where he shows then in this generated output. Instead of, like, separate reports like verbatim or separate files, he has different sections of this continents report where the user can quickly, you know, go through the different continents, figure out what countries are involved, looking at the details in one table, looking at the plot in another.
And you put a TOC with that and you got yourself a really intricate, dynamically driven report or dashboard, whatever have you. But all this is powered by rendering that markdown content ahead of time and then using it to inject that into the overall portal website. I literally literally yesterday did something very similar to this. Albeit, I did take the manual approach, but I regretted it for reasons. But now knowing what Andrew's done here, I can have a function that generates a snippet of this case iframes of another quartile dashboard inside another quartile reveal JS slide deck.
I could just have a tidy data frame that has, like, the links to these iframes of the quartile dashboard, loop through that, get the markdown chunks, and then put that into my main reveal JS document, deploy that on pause and connect, and I'm off to the races. I have eliminated the need for PowerPoint of these one pagers of results that I had to assemble. So, Andrew, you blew my mind yet again. Credit to you for sharing sharing your knowledge here. And, yeah, the possibilities seem endless with dynamic generation of portal content. So mission accomplished, buddy.
[00:14:51] Mike Thomas:
Yeah. This is fantastic. If you were to try to, you know, create some sort of a quartile or R Markdown document that displayed these 100 different is it counties in Idaho, I think that we're trying to, track or districts in Idaho. I think that's that's what it is. I mean, you could do it by hand if you wanted to, but it would it would take you forever and a whole lot of copy and paste, and God forbid you wanna change one little aesthetic, right, that you wanna propagate everywhere else. You're you're stuck doing a find, replace all. And and I think, you know, we've discussed in the past, the the benefits of of functional programming, and they apply very similarly to quarto. And one of my favorite things about quarto, and it's interesting to me because I just came off a project where we did a lot of work exactly like this. So I can't I can't, praise this type of workflow enough. I think, Eric, you probably have this experience with Knitter as well, where if you try to do some our run some our code, execute some our process within Knitter, it'll run a little bit slower than if you did it outside of Knitter.
So what that means is that you will be able to save time, in your rendering if you have pre computed and pre created any of those R objects ahead of time instead of asking them to be created during the knit process. You extend that one rung further and and we start to get into targets. Right? And that's sort of, you know, the exact goal of targets is to be able to pre process things in your pipeline and only re execute things that need to be updated. So it's a fantastic, complement to the workflow here that Andrew has put together to demonstrate this. And, you know, one other thing that I do wanna highlight is the use of child documents. I'm not sure if Andrew Andrew called it out in the blog post. I'm not sure if at the the end of his whole process, which we don't necessarily have full insight into yet. Right?
In terms of this election quarter report that he's put together. If he leveraged child documents or not, he talks about that when you you take a look at some of these code chunks that are in line in the blog post that are some pretty large glue based chunk, code chunks that he he's putting together that we can, you know, execute as a function. At the end of the day, he talks about how some of this this code in his current workflow, might be able to be condensed, if you took advantage of child documents as well. And what that means in a in a quarto, sense is using this special tag, I think, that starts with 2, arrows pointing to the left and and ends with 2 arrows pointing to the right and, includes, no pun intended, the include verb.
I think that allows you to reference another file, another QMD file that you can have, sort of inserted into maybe like a main dotqmd file. And if you're familiar with child documents in our markdown, it's the same type of concept, just a different syntax. Those are things that we leverage heavily because if you're putting together a a large report, or a large document like the one Andrew is is putting together that has a lot of moving pieces, I think it it sort of always makes sense to try to manage those pieces as separately as possible, especially if you're working on a collaborative team. Right? Somebody can just focus on on one piece and another person can be dedicated to focus on another piece. And I think it just allows you to, sort of piece together your final product in a way that's easier to manage and easier to maintain than if you were trying to do it in some sort of a monolith.
So I can't I can't say enough about, child documents within quarto. The syntax that they have makes it really easy to do so if you're on the quarto website and you're not familiar with how to leverage child documents. Just search it on on quarto.org and and the, function syntax to be able to do that, the markdown syntax, excuse me, will be right there for you. I can't can't harp targets enough as well and sort of bringing these technologies together. I would love to see how he leverages that remote storage with targets and that local DuckDb database as well to sort of bring this whole entire solution together.
But but top to bottom, I think it's a fantastic resource for anyone who is building a large scale quarto type of document or looking to just get a better understanding and better feel for best practices around authoring quarto reports. I think the tips in here will be invaluable for you.
[00:19:35] Eric Nantz:
Yeah. And, I do have it on good authority from the, author himself that there's some big enhancements coming to Target with respect to potentially DuckDV integration and even better performance. Like, Target's already performs great. But what Will has in store for us, oh, you all are gonna love it, especially with those that have these, you know, 10,000 plus, you know, branches and pipeline, you know, targets themselves. So, yeah, stay tuned, folks. It's getting better. But, yeah, this this whole thing has so many nuggets to to choose from here that I've gotta I gotta look at this even more.
But we literally at our day job have a couple teams I wanna they're they're not satisfied with the SharePoint world, man. They want to build dynamically data driven websites of these reports that can be shared broadly across the organization that take advantage of the interactivity that the portal offers, whether it's through, you know, things like the portal dashboard, which I've become a big fan of. Obviously, we have some people looking into the observable JS side of it. But you're not gonna get this to SharePoint folks. That's my hot take too for the podcast that the this this in particular, if they can be used in a high profile situation like tracking election results, good grief. It can be used almost anywhere.
[00:21:00] Mike Thomas:
Not fully satisfied with SharePoint? Are you sure? You have that right?
[00:21:05] Eric Nantz:
I may have to double check my references on that.
[00:21:08] Mike Thomas:
A little satire for the audience.
[00:21:10] Eric Nantz:
Yeah. That that hopefully, they got. But, yeah, we've had some internal debates on that one too. And I was, but I I did like I said, I I've used principles of this, albeit not so elegantly, where I was able to get away from having the creative a rather haphazard PowerPoint slide, but I used Cortl Dashboard instead. And because of the integrations we can have with Cortl websites and and iframes going as backgrounds in a presentation deck, I was able to create what that team really wanted, which was basically an HTML based slide of a bunch of quartile dashboards without having to put the quartile dashboards in the slide deck. But as background iframes, folks, like, this is so many mind blowing things we can do with this. I I hope I can give a talk about that later because I I've learned I've learned some tricks, man, but this is this is tricks on another level. What what Andrew's done here? I'm looking forward to that too.
So as I said, yeah, it was about a week and a half ago that there was a little thing called the election, and maybe somebody needed a little pick me up after that thing which side of the fence you're on. Maybe, you know, having a little bite to eat if you had a long night. Well, who knew that this next highlight was gonna high, showcase on a show that I admittedly have not seen before. I've heard about it. But this is coming from Steven Ponce, and he has put together a web a blog post, albeit mostly a notebook, I would say, about looking at the fingerprints used in each of the seasons of the show Bob's Burgers. And admittedly, I have not seen this show before as I'm Kabooie out of my wheelhouse for this. But it's basically an infograph as the meat of this post that is looking at across the different seasons where and I have to zoom in here on my fancy 4 k monitor to look at this more carefully.
The looking at the the transcripts of this, the dialogue and looking at, say, the the length of the sentences, the unique words, variance among the sentiment of the transcripts, a little text mining action here. How many questions there were? How many explanations there were? And it's a pretty neat infograph that for each, you know, faceted by season, you get I believe they call these like the spider plots or the radial charts. I forgot the exact name of them. But it's a it's a good way to put like multiple dimensions in a circular like fashion, but not be confined to the infamous pie chart limitation. So a pretty pretty neat visual there. And it's and again, the big picture is looking at the patterns and dialogue across the 14 seasons of this show, which again, I'm I'm an old timer. I haven't seen this yet. So maybe it has to be in my, in my, queue of shows to watch when I actually get a free time moment. Nonetheless, the notebook and style is that he's got the different steps in the building this visualization much like a tidy verse kind of, you know, flowchart that you would see often in our for data science and whatnot.
Loading the packages, of course. And this is a little interesting here. We don't see as much of this lately, but Steven is using the Pac man package to orchestrate packages, which some have had great success with. I admit my my, my attempts with, Pac man were mixed at best. But, hey, if it works, it works. So he's got the snippet to load the various packages. And yes, Mike and I were remarking before the show. There is a package called Bob's Burgers R that's got the data sets that are being used in this. There's a package for everything, isn't there?
[00:25:13] Mike Thomas:
Literally, at this point. Yes. It contains the transcripts. It looks like for every episode across, all the seasons that are available.
[00:25:22] Eric Nantz:
Absolutely. So I'll have to look at that in my spare time, but he's also loading additional packages to help do the text analysis, tidy verse, of course, and patchwork, which we've spoken very highly about for being able to compose multiple ggplot objects together in any way you see fit, basically. So and then, also, he's using the camcorder package, which you heard about at positconf as well, to record the different plots as PNGs as you're going through it. So that will come in play later. So, first, the data which again, thanks to the Bob Burgers r package just simply using the transcript data data frame, and he's got it right off the bat. So that that part's done. It does a little exploration of it. Although, we don't see the result of it, but there is a handy handy function from the skimmer package called skim, which lets you it's not shown here in the output, but get like a terminal base glance of that data set or a data frame.
Really handy, especially if you're in a terminal environment. Then comes the tidying stuff. So we got a lot of dplyr grouping by summarizations. And notice that in his syntax of the summary summarize function, he's taken advantage of the dot groups declaration, which again is thanks to the dplyr version 1.8 or later, I believe. They introduced the dot by and the dot groups parameters in key functions like mutate and summarize. So you don't you don't always have to do the old dplyr group by, summarize, dplyr ungroup afterwards because, Davis Vahn and others from the Tinyverse team were saying that got annoying on a lot of users. So that's a nice little trick
[00:27:14] Mike Thomas:
that that Steven shows here. Couldn't agree more. Yes. I love the dot buy argument.
[00:27:20] Eric Nantz:
Yep. I literally just started using that for a high high priority project and it's like I can never go back to group buy anymore if I can avoid it. So And the dot keep argument within mutate, so you don't have to use transmute anymore. That's right. Yeah. I need to explore that one too. Another great quality quality of life enhancement, I should say. Next comes the visualization. So he does a lot of setup up front to get the the the labels all in order as well as kind of the the CSS, I believe, is gonna be applied to these plots, a little bit of CSS, and then using glue to dynamically put in various things, and then managing the fonts.
Lots of, fonts are added from Google with the, I believe it's the fonts package, if I'm not mistaken. I have to double check that.
[00:28:13] Mike Thomas:
I think it's I thought it was called Google Fonts, but I'm not seeing
[00:28:19] Eric Nantz:
I'm not seeing where he got that from.
[00:28:23] Mike Thomas:
I'll take a look. Keep going.
[00:28:25] Eric Nantz:
Yeah. We'll keep going. Yeah. We're we're learning here, guys. So nonetheless, then he's able to assemble the theme object in ggplot2, which again is a great way if you wanted to find that upfront with the the theme set and then a theme update. That way he can use that theme anywhere he goes from that point on or the rest of the visualizations. And then comes the main plot where it's a the the nugget here is using the geom polygon to get that nice little polygon superimposed inside this, you know, circular type display. Again, people call that a spider plot or a radial plot. Some to that effect. And then adding average lines on that. But again, flipping that to polar coordinates towards the end.
And then defining the labels and the facets by season. And then adding on top of that kind of this pattern type visualization, which, again, you wanna look at the post to to get to get the meat of it. But there's a nice little pattern. I think that's kind of serving as the background, so to speak, on the plot itself. Again, really neat to play with. I haven't done this myself before. So really intricate annotations that he makes here. And then afterwards, he's gonna save all this as, PNGs for, I believe, at least one PNG I should say for the the whole plot itself. But he assembled that with patchwork before that to combine everything together.
And then to be able to draw that, clean things up, and then using the magic package able to create neat little thumbnails of the visualization that he used, I believe, in the post itself. So little nice lot of visualization tricks here that if you're want to up your game with ggplot too. There's some really some real nuggets to share here. And then like any any good, data science citizen, he's got the nice session info at the end here. And link to the GitHub repository so you can actually see how this is composed in action. So great notebook setup here. Love the way that you can collapse the different code chunks and just get to what you're interested in. But, yeah, with a nice little, tidy dataset of transcripts and Boss Burgers, got yourself a nice little visualization to, you know, satisfy your appetite.
[00:30:48] Mike Thomas:
Puns galore. Mike, what did you think about the visualizations here? Well, the final output that is at the top of the blog, it's a beautiful infographic. It's really nicely done. I like the contrast between the background that's just sort of off white a little bit and the the purple, gradient sort of that that, ticks the spider plot, on values are are represented by. Really, really cool. Turns out it's the show text package that allows you to manage Google Fonts, has the the font add Google, I think is the, particular function within the show text package that allows you to import certain Google fonts and leverage them in your, ggplot graphics.
One thing that I don't do well enough or understand well enough is adjusting, I guess, the graphics device itself. That, you know, arguments like DPI, which I think are dots per inch. Is that what that means? Yes. Correct. The units, I don't do that enough, unfortunately. I usually just in my, quarto documents, I'm just specifying fig height, fig width, things like that, and and messing around with it until it looks halfway decent. That's stuff that I need to learn a little bit more about, but Steven, has a few different places within this notebook where he is is setting those specific configurations. So I've definitely learned a lot there and you will as well if, that's something that you struggle with like me.
One other sort of cool function that I here's a little today I learned that I can't believe I don't even know if I want to admit that I'm just learning this today but there's a ggplot argument called theme underscore update. Eric, I'm sure you knew about this one.
[00:32:35] Eric Nantz:
I knew it sporadically. I never actually used it. So I love theme minimal,
[00:32:40] Mike Thomas:
but obviously, occasionally, there's some additional things that that I wanna do on top of theme minimal, that aren't within aren't doesn't, aren't contained within arguments of the theme minimal function. And typically I'll just add a theme function after that. And I think ggplot2 knows well enough to use sort of the the last, you know, value for a particular, theme argument to set that as what's gonna be shown in the plot. But I think theme update sounds like it is probably doing a better job of that as opposed to sort of overriding, what you had written before it in your theme minimal call. So this was a new one for me. A little embarrassed to say that this is the first time that I'm coming across it but it is one that I am for sure going to be using from now on in many many many places.
Just excellent blog posts top to bottom. Love the code, love the layout here, and the end deliverable is is absolutely beautiful. So take a look for yourself.
[00:33:47] Eric Nantz:
Yeah. I'm I'm doing you know, first, you're you're you're too, kind or too hard on yourself, I should say. There are so many things in ggplot too that I always scratch the surface of. But let's put things in perspective, folks, as I'm I'm gonna take a quick look at the archive of ggplot too. Did you know that jigapotwo, the first grand release was all the way back in 2007? So it's got a lot going on. I mean, that we're coming there 20 years on that thing. I mean, we're over 15 now. So it's not surprising that there are things in there that we we didn't expect to see. But, yeah, that's why we're having this post from Steven. It's a great great reminder of the capabilities of it. So I'll like you, I'm gonna take note of that theme update function. Lots, yeah, lots of attention to detail here. I absolutely love seeing how the sausage is made. And, yeah, the font add function, I did a little digging while you were talking, is from the show text package.
I don't use that a lot in my daily work, but I definitely will take a look at that for my next Gigi Pot visualization. But, yeah, nonetheless, really great design choices. So this is a great showcase of using the principles that I've seen outlined in various workshops, such as some Cedric Sure or others about what are the best ways to build up an effective infograph that, by the way, you don't need to go to Adobe Illustrator for. You don't need to go to some proprietary product, jiji plot 2. We have a little, little getting your hands dirty, so to speak, and get you really the whole way there. It's just a a wonderful plot here. Absolutely wonderful.
[00:35:27] Mike Thomas:
2007, you say, for ggplot. Well, it's incredible to think how much, that package has evolved. And you know what else has evolved in the R ecosystem? Object oriented
[00:35:40] Eric Nantz:
programming. You got it. You got it. And, you know, you may be as an our user, you've probably used this many, many times, sometimes without even realizing it because of the elegance of the language itself. So what are we teasing here is that since about a a year or so ago, there has been a new effort sanctioned by the art consortium no less to build a new object oriented paradigm into art itself eventually. And right now, it is a new package called s 7. The post comes to us from the Tidyverse blog written by Tomas Kalamazowski and Hadley Wickham himself on the new updates to s 7 version 0.2.0.
And for the uninitiated wondering, wait, why is s 7 even exist? Well, in r itself for historically a very long time since practically the very beginning, there have been at least 2 or 3 class systems in the language. 1 of which is s 3, which is leveraged heavily by the tidyverse packages and a lot of base art functions itself to give you that kind of very easy way to say, create a visualization with the plot function. But if you feed it a data frame, it's gonna know the treat that differently than if you feed it like a single vector or, 2 vectors of say, x and y. It's a dispatching system. I'll be it very general, almost to its detriment to some people's eyes.
Then you have s 4, which brings a lot of formality, a lot of guardrails around your object oriented structures. But I can attest that it is not for the faint of heart. It is quite complex to get into the nuts and bolts of. And when I was doing bioconductor stuff back in the early part of my career, I got to know s 4 almost unwillingly well, because of that. But it never felt natural to me. And again, that's just my opinion. There are others that use us for a great success. More power to you. S 7 is trying to be kind of in between that of and sorts, giving some of the simplicity of the syntax of s 3 with some of the guardrails and, you know, safety net and more, you know, formal definitions that s 4 has.
So in this update, what's new here, there's a few, you know, minor, I would say, bug fixes, but also building blocks for new bigger features in place. Some of which include being able to support lazy property defaults, which they're saying makes the actual setup of a class much more flexible. One other idea that caught one other item that caught my eye was that they made an enhanced speed improvements for when you set and get properties using the at sign or the at sign with the assignment operator. Apparently, there were some bottlenecks with that in previous version that they've, fixed this now.
They've also expanded the compatibility with additional s three classes to help bring that transition a little more optimal for those coming from the s three side of things. And then also be able to have be able to convert a class into a subclass with the new, convert function or a modified version of the convert function. Lots more that are in the release notes, but the post also talks about how do you actually use this thing. So there's a great great kind of, example where they use this new class they call it range to help look at kind of the range between numbers, I believe.
And you can kinda see how the class methods, the class properties, the generics are defined with this. And you'll see there is a lot of shared syntax or paradigms of shared syntax of s 3, but yet you're able to define things more formally with the s 4 kind of language inside as well. They realize there are so some limitations here. It's not quite production ready, I would say, for getting into the actual R language itself, which is the end goal here. But they know that they are actively working on it. But like I said, there is a huge goal here for this. Not just to be a standalone package for the foreseeable future, but to actually get into base r. That's as opposed to things like r 6, the object oriented class used by Shiny and many other packages, that's always gonna kinda stay as is because that's a very at rapidly evolving class system often with its own needs compared to what s 3, s 4, and now s 7 are bringing. So I'll be obviously watching this space quite closely. I have not used s 7 yet, but I know some packages are starting to use it now. So I'll be very curious kind of what the what the developer, you know, shared learning is as authors start to use this more formally. So great to see updates in this space, and I guess we'll stay tuned to see what else is out there. Yeah. It's interesting, Eric. You know, you know, a lot to
[00:41:07] Mike Thomas:
digest here. You know, I think that the idea, as noted in the blog post is that hopefully s seven eventually becomes part of BaseR. It is going to be, you know, an additional learning curve for some folks. Although, hopefully, if you've been doing some object oriented programming in s 3 and or s 4, some of the syntax and the concepts will be fairly familiar and fairly easy to migrate for you. This looks like a project that has now fallen under the R Consortium, which is cool. You can check out the GitHub there to take a look at the project itself. And, there's 2 limitations that they want to point out. The first is that s seven objects can be serialized, with Save RDS but the way that it's currently authored, saves the entire class specification with each object and that may change in the future. And then, the second is that support for implicit s three classes of array or matrix, is still in development. So some things to watch out for for the the hardcore object oriented program, programming developers out there. But I'm excited to see version 0.2.0 drop, and this looks definitely a little more digestible to me than s 4.
So I'm excited to, learn a little bit more about s 7 and hopefully incorporate it into our projects going forward.
[00:42:36] Eric Nantz:
Yeah. And I know I've seen in the community our our friend John Harman's put s 7 for the paces on some of his exploration efforts, and I see some others. You know, I'm sure they're learning on it. And I'm doing a quick check on the CRAN page. There are, as of now, 4 packages that are importing s 7. So there are there are a few to choose from, and they, admittedly, there is one called Monad. Remember Monad's from our shiny Oh, gosh. Learnings from Joe Chang. So I want to check that one out. But but, yeah, nonetheless, it does seem to be moving along, and I'll be watching the space quite closely and seeing where that fits in my adventures both in Shiny and also in generic package development.
But you could have lots of adventures in the r side of things when data science, and this rest of the r weekly issue would give you, I'm sure, lots of directions to go down different adventures, different rabbit holes, and really ways to supercharge your data science exploits. And we'll take a couple minutes to talk about our additional finds here. And, fellow curator, good friend of ours, Jonathan Carroll, has released on crayon a very cool r package that he's had in development for a bit of time called Nifty. And what Nifty is is, our wrapper around a completely open source, self hostable notification service called nifty that you could spin up on, say, a cloud VPS or on your internal network and be able to, from r, push out a push notification using this package to go to wherever it needs to go.
So let's imagine you're running that big old simulation. You're away from your computer while you let the HPC, Matt, do its magic. What if you want that notification on your mobile device to say it's done? Right? Nifty might be a way to do that. So I may have to take a look at that. I've seen other packages in this space called pushbullet, I believe, from Dirk Eddybuttel. It's doing a similar thing with the Pushbullet service. But it's great to see our use in in novel ways too. So congrats to Jonathan for getting nifty on the crayon. And I saw on Mastodon, there was already a few very excited users for what they can do with that package. So that'll be in my, things to work look at during my holiday break.
[00:44:59] Mike Thomas:
And, Mike, what did you find? Well, one thing I wanna shout out is a package called Survey Down. I'm not sure if we've talked about this on the highlights before or not. It had a new release out there, and it's a pretty cool open source way for making surveys with R, quarto, Shiny, and a technology called Supabase, which looks like, how the back end data is stored. It's some type of, database. And I think it's I have a lot of use cases potentially where I need to make small forms, things like that, surveys, and I always sort of tend to wanna go overkill and develop a Shiny app instead of using something off the shelf like, I don't know, a Microsoft product or SurveyMonkey or or things like that, just because I'm I like doing those things to myself. Right? Make making my life more difficult.
So I would check out this package if you, like me, have a need to create a form or a survey and wanna do it open source and leverage some leg work that's that's already been done by a great team.
[00:46:03] Eric Nantz:
Yeah. I I've seen this come through, but I haven't dived into it much. But boy oh boy, that would be terrific for, you know, wherever you have, like, surveys you wanna conduct in your organization or some other robust data collection, great to take advantage of the R ecosystem with that space. And, before we get gentle comments, it turns out I pronounced that package completely wrong. I went to the GitHub page. It's actually pronounced notify. It's n t f y. So sorry, John. I I I should have looked at that before I started saying that. Pronouncing things is hard, so correction noted. Yes. It is. Yep. But, luckily, you don't need to correct anything else with our weekly itself. We we strive on giving you authentic content. You don't have to worry about some, AI generated bots putting that populated feed into you. This is all human generated.
We definitely wanna take advantage of automation and certain pieces of it, but, no, that's the value of this project. Completely human element and, you know, written by the community for all of you in the community. And since it is a community effort, we rely on your help. One of the best ways to help is to share those great resources you find you found online. Whether it's a new package, a new blog post, new tutorial, we're all game for all of it. You can send us a poll request, all written in markdown using that top right banner link in the corner of our weekly dotorg. You know, it can be taken directly to the GitHub poll request template.
We show you kind of the things we're expecting. It's very minimal, but we always value your contributions there. We also value hearing from you in the audience as well. We got a little contact page in the episode show notes. We love hearing from you and what you've learned from our weekly. You can also find us on the social medias as well. Apparently, we're gonna we have a new source for Mike that he'll talk about shortly. But for me, it is still the, tried and true, Mastodon account with at our podcast at podcast index dot social as well as LinkedIn.
Search my name, you'll find me there. And I'll be at very much minimal now, the Weapon X thing at the r cast. But maybe I needed to pay attention to another one, Mike. What about you?
[00:48:13] Mike Thomas:
Yeah. I guess the latest for me, which I hope to check a little bit more often than I did on Mastodon, it it feels, like like I'm pretty excited about it. It's gonna be bluesky. You can find me at mikedashthomas.bsky.social. Otherwise, you can, check me out on LinkedIn if you search, Catchbrook Analytics, ketchb r o o k. You can figure out what I'm up to.
[00:48:42] Eric Nantz:
Looks like I need to update my markdown tempo in the show notes, buddy. Sorry to do that to you. No. No. That's easy. That's easy. All marked out all the time for me. So all all good here. Well, with that, we will put a bow on this episode of our weekly highlights, and I admit I was remarking to Mike before the show. We're at a 185 now. That means we're running close to that 200 mark eventually, I should say. And, shout out to our good friends, Ellis Hughes and Patrick Ward. They're on the similar journey. It looks like they're at a 180 some episodes of IDX. So, how about a friendly wager who gets there first? Hashtag just saying.
[00:49:19] Mike Thomas:
I don't know if I wanna make that bet.
[00:49:21] Eric Nantz:
No. I don't either. Holidays coming up. Holidays are coming up, so that'll put a wrench in things. But, nonetheless, we hope you enjoyed this episode of our weekly, and we will be back with another episode. Maybe next week, maybe not. We'll see, soon. Don't know how to close those out. Alright. We're done.
Additional Finds
Episode Wrapup