The R-Weekly Highlights podcast has crossed another milestone with episode 150! In this episode we cover a terrific collection of development nuggets of wisdom revealed in a recent package review livestream, and how a feature flying under the radar from Git can facilitate investigations of multiple package versions.
Episode Links
- This week's curator: Batool Almarzouq - @batool664 (Twitter)
- Notes from live code review of {soils}
- Load different R package versions at once with git worktree
- Entire issue available at rweekly.org/2024-W05
Supplement Resources
- How to embed videos with GitHub markdown: https://youtu.be/G3Cytlicv8Y
- Reproducible Manuscripts with Quarto: https://youtu.be/BoiW9UWDLY0
Supporting the show
- Use the contact page at https://rweekly.fireside.fm/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @theRcast (Twitter) and @[email protected] (Mastodon)
- Mike Thomas: @mike_ketchbrook (Twitter) and @[email protected] (Mastodon)
Music credits powered by OCRemix
- Gerudo Desert Party - The Legend of Zelda: Ocarina of Time - Reuben6 - https://ocremix.org/remix/OCR03720
[00:00:03]
Eric Nantz:
Hello, friends. We are back at episode 150 of the R weekly highlights podcast. I knew we're gonna get to an awesome number, and we finally did. We're happy to have you join us from wherever you are around the world where we talk about the latest and greatest highlights that we have seen in this current week's our weekly issue. My name is Eric Nantz, and I'm delighted that you joined us today. And as always, I have my awesome cohost who never stops the hustle, Mike Thomas. Mike, how are you doing today?
[00:00:27] Mike Thomas:
I'm doing well, Eric. I am going on-site to a client for the first time in a long time. So, I'm showered, dressed, you know, all before 9 o'clock which is
[00:00:38] Eric Nantz:
occasionally unusual. I'm not gonna fully admit to that but, yeah. Looking forward to that today and looking forward to a a quick, highlights here. That's right. Yep. You got yourself ready for the the old business professional look. I'm doing that tomorrow for an on-site thing. So we got our our week of on-site stuff. But guess what? The power of virtual means we can do this from our comfortable homes for this episode. And this episode is not possible, of course, our weekly itself. And this week's issue was curated by Batool Almazak who, of course, had great help from our fellow Rwicky team members and contributors like you all around the world.
Now, Mike, you know that in our little post show last week when we were just getting our files sorted out, I had kind of lamented the fact that I was a little jealous of a certain individual that just did a really fun screen cast of a package review. Well, guess what? That is our first highlight here. I'm referring to a live package review that was conducted by Nick Tierney, very well established member of the R community and has cooperated our open side quite a bit in his tenure. Yes. And he did a terrific package review of JD Ryan's soils package, which is one of a very, ambitious, yet very powerful package trying to help surface up some very innovative data and innovative workflows for her team. To Nick's credit, he was very practical and very upfront with some of his process of evaluating a package, which, again, draws a lot from his rOpenSci roots.
And at a high level, a few of the things that he illustrates here that I definitely need to take note of is using from the good practice package a function called GB to literally automate the more standard types of checks that they would do in rOpenSci whenever a package is on board, which, again, everybody can benefit from because it's not like rOpenSci has some esoteric requirements. These are all great practices for software development and especially in the space of our package development. Also, extensive use of the COI package. We've been singing the praises of COI quite a bit, and Nick had some nice, you know, targeted comments on making that even more seamless to give a more friendly looking message for various notes, bullet points, or even error messages that can occur in this workflow.
And then, also, the use this package comes into play yet again. Right? We use this quite a bit in package development and getting kind of the basics of package documentation lined up with the usepackagedoc function is a terrific way to get that package level documentation up and running quickly. Throughout it, very much, in JD's blog post that we're linking to in the highlights here, she says she's rewatched this a couple of times, and her blog post is literally going through the recommendations that Nick had and the ways that she is now improving the soils package, even things like logic and the functions for directory and file paths checking. These are all things that we sometimes take for granted.
But what was also interesting is that Nick had in the live chat other really well established members of the community, such as Miles McBain, also giving his 2¢ on some of the operations that the package was doing and high level looks at how things are completely organized. So, yeah, some of the things I took away, Mike, is, yeah, I need to invest in this good practices package a lot more because that's gonna help me up my game with documentation as well. And, also, some of the nice pointers that they have here in the package documentation itself with the markdown files and the snapshotting of tests and various practices for committing these with descriptive messages. So, again, worth a watch for sure to see Nick in action because to me, I love seeing the journey just as much as the destination to rip off some friends in the Linux podcast and ecosystem.
And Nick really showed the the actual process of package review, which I think is extremely helpful from no matter where you are in your organization or academic institution. There there are a bunch of nuggets for you to learn from here.
[00:05:09] Mike Thomas:
Yeah. I'm super impressed with the soils package. I'm super impressed with the work that, JD and her team at Washington Department of Agriculture have done. It's really exciting for me to see, you know, government organizations, large organizations, you know, not only using R, but like creating beautiful R packages, and package down sites, and utilities for their team, and may maybe for for others to use as well and doing some of this work out in the open for us to be able to take a look at it, learn from it and potentially even contribute to it as well. You know, one of the the really interesting things, there are many really interesting things here, in my opinion. So this package is not on Quran. I don't know if they have the, the desire to to put this package on Quran, but it is on our universe.
And we recently, at Catchbook authored an open source package that, obviously, folks can install from from GitHub. We haven't pushed it to Kran or submitted it to Kran yet either. But I would be interested in seeing, I guess, the the process. And I should know this by now because we've covered our universe enough on this podcast. But the process of getting a package onto our universe that isn't on crayon. I believe that there's some workflow that Yaron has for our universe to actually take a look at what is on crayon and sort of copy that over Right. Onto our universe. But I didn't realize, I guess, what the workflow was for submitting a package to to be on our universe, but not necessarily on crayon. And that's not not speaking ill of crayon, but I think there's just some particular packages, you know, in our case that, maybe, aren't necessarily worth going through, the entire crayon workflow for.
This is a really really really cool idea and I learned a ton from this, you know. Take a look at the YouTube video. Take a look at how Nick walks through this package, and Miles and Adam, walked through this package and the different things that they call out in terms of things that, she did well, things that she didn't do well. 1, I don't know if you have the GitHub repository open, Eric, at all. One thing that's like blowing my mind a little bit that I can't figure out is so in the read me, which, you know, extends to the package down site, it has a bunch of videos in it that are video demos on how to create a soils project, render Right. A word or an HTML report.
And if you look in the read me on how these videos are sort of embedded, there's a link to the same GitHub repo and a folder called assets. Mhmm. And the folder called assets, I can't find on the repository anywhere. Wow. And it's also not get ignored. That's interesting. So I'm I'm curious as to how, like, maybe at build time when you're you're building the read me, she was able to embed these these files with a link to this this assets subfolder. But unless I'm going crazy, I can't find it. So that's really cool because one of the things that's that's called out in the blog post is the package size is very large, which would be an issue if you're submitting to CRAN, but not necessarily an issue otherwise. And and, you know, when I'm I'm thinking about packages that are large, I'm I'm obviously thinking, you know, what sort of types of files could be within that package to to make it large. And then, I I looked at the read me right away and I saw, oh, we have a bunch of video demos that must that could potentially be it. But I can't find them in the read me anywhere. So I'm very, perplexed to say the least.
And then, maybe, the the last thing that I'll I'll call out here is, you know, her team went all the way down the path of being able to use the Rstudio IDE to create a new Soils Rstudio project. The same way that we would create a new Gollum, Gollum package. Right? Or create a new new R package through the RStudio IDE. And the the little hex sticker from soils is is on the RStudio IDE right there for for creating a quarto soil health report. It's incredible. Obviously, this this package, that they've created is going to make other folks in our organization's lives a ton easier to just get their projects up and running sort of immediately instead of having to start from scratch. So if you are someone working in an organization where you find yourself do doing the same types of projects over and over, and our package could be a huge benefit to you and a great place to start for a template would be this soils package. It's phenomenal.
[00:09:53] Eric Nantz:
Yeah. There's a lot to unpack here on what JD's done. And as you're talking about the the videos and the Remi, in this blog post, I'll put the direct link to this in the show notes. She does say that there is a video tutorial from GitHub directly on how to pull this off. And it does have to do with GitHub flavored markdown. So he must have done some magic with GitHub itself. So we'll link to that, directly because my goodness, if my package is on GitHub, I definitely want to take advantage of this and make it easier for people to see some of the workflow in action for some of the packages I have in mind in the Shiny space in the future. So lots of great points, Mike. I think knowing that many of the analysts that she's working with and, frankly, the ones I work with are are using POSIT or Rstudio as their front end to this, using that new project feature and getting things ready right away is just so helpful for them. I mentioned I'm on a kind of a crusade at the day job to help make some of these initial clinical projects easier for people, and this project feature is going to be something we look at quite closely here. Definitely.
And, yeah, you and I, Mike, we've been living the life of creating the internal or company packages for our various clients or stakeholders. And a lot of times, things move. Sometimes things move pretty fast. And sometimes, we might need to check just what happened, maybe a version behind or 2 versions behind. But then you're kind of wondering, how do I handle that? Do I have to do a whole separate r installation on a virtual machine that has, like, an old version installed? Well, we have some good news for you, folks. If you are leveraging Git for your version control, whether you're putting on GitHub or not, but just using Git for version control of some sort.
Mylesalmon is back on the highlights once again. Definitely a repeatable pattern here in a great way because she has discovered in her continued, I would say, journey of leveling up her git knowledge that there is a way to kind of have your cake and eat it too of loading different r package versions kind of at once without a lot of fuss involved using a very, I would say, niche feature in Git called Git Worktree. I admit I have not seen this at all, and I've been using Git for over, what, 10 years. I did not know about this feature at all. So let's break it down for you real quick. Yeah. Yeah. We're both learning something here, Mike. So I think what you and I are familiar is a concept of branching. Where in branching, you could say, I'm on my main branch, but I know I'm gonna work on this new feature or new bug fix. But I don't wanna commit that to main yet until I get through this fix. And I'll do a new branch to work through that, iterate, and then push that up and do a code review or whatnot to push that into main.
I knew about that, but Git work tree is a little different. And in fact, it's more comparable, not so much the branching, but to the idea of Git stash where you just wanna put things aside in your working area of Git for a bit and then maybe fix something else real quick and then bring that back forward when you're ready. Well, apparently, with Git work tree, you can and as Mao's post illustrates, create a new folder somewhere on your computer and then have that folder be linked to that same Git repository of that package but to a different state of that package, maybe based on a commit, maybe based on another branch, maybe based on a previous release, which means that you could use that additional area that's separated from your main working area to look at, say, a previous package version.
And she gives an example of, say, another package called riGraph and then putting the tag after that and then making a directory for that and then using git work tree to check that tag out into that other folder. And then you can remove that. Clean clean up after yourself, so to speak, when you're done doing that investigation or that previous version of using git work tree remove and then that folder name. And then then it's as if nothing happened. I'm still wrapping my head a little bit around this because it it it I've never used Git Worktree before.
But there are plenty of times at the day job where maybe I've already gone, like, 1 version, 2 versions ahead on what I need to finish. But then I'll get a request from, like, an analyst or a client or or a customer in my in my various departments. And they have a question that admittedly, some reason they're using an older version of the package. So now I can use get work tree to investigate that really quickly without having to, you know, do some clever library magic along the way. So I'm still wrapping my head around this, but definitely as we do always, we'll have a link to Miles' blog post here. But if you need to quickly check what you did a version or 2 behind, Git Worktree seems like the way to go.
[00:15:11] Mike Thomas:
This is somewhat mind blowing to me. I think it's a great utility function that I now know about, with Git, and it's it's fantastic. I think it's it's very simply explained by Maelle to be able to create this just additional directories. This additional subdirectory essentially is that I think the way that she set it up, which would contain the specific version of the package that you want to work in temporarily. So, you know, I I think you you covered it excellently. This is a very nice short and sweet blog post, but I again appreciate, Mel, pointing out these nifty little tricks and tools that that we have. I didn't I had a similar use case but but not quite the same use case. I actually wanted to try out, a pack this package that we've developed on a different version of R, which I I guess I could have done and opened up a separate sort of IDE and had a local installation, of an older version of R, because I wanted to make sure that the package worked on an older installation of R. But then, we, we're taking a look at at of the utilities like R hub and things like that that allow you to test your package, against multiple versions of R on multiple different platforms and things like that.
And that also sort of is where where Docker, I think, can come into play and be your friends to allow you to be able to spin up a container that contains a particular version of R without having to necessarily install it on your local machine, and then worry about uninstalling it and things like that. But that can be a little more tricky. But I can absolutely see plenty of use cases where, you know, instead of changing the version of R, I would wanna actually change a version of of a particular R package and and take a look at, you know, how that package was functioning in that version, compared to a previous version. The scales package is one that was giving me some headaches lately. There's some new there's some new arguments in your, your label, number or label percent, that deal with positive and negative values that that were newly introduced and and giving me some headaches recently. So this is a use case that I think I might have to to spin up by this Get Work Tree function today and dive back into that. But, excellent, excellent blog post again by Ma'el Asserta. Just again, pulling out a bag of tricks that that I didn't really know existed that are absolutely gonna be helpful for me in the future.
[00:17:35] Eric Nantz:
Yes. And, I believe even though he's halfway across the world potentially, I might hear Bruno's voice in my ears saying, you could probably combine your use case, Mike, of checking different package versions with my Al's use case of different or of different package versions with your use case of different our versions with Nick's. I I bet there's way the titles do together. So, Bruno, I heard you even if you weren't saying that. I can I can hear you telepathically? So this would this would fit really nicely in this, and I'm excited to maybe try out some of these newer ideas as I'm getting more in the weeds, especially this past month, on some internal package development and trying to make it easier for both future me and future, collaborators as well. But if you ever thought you knew everything about Git no. I I I'm I'm one of those people that seems like I've learned something new every week about Git. So it is just amazing what we're learning in this space. And speaking of amazing, the rest of the art week of issue is just as amazing. You're gonna learn so much along the way if you read through the entire list of new blog posts, new packages, updated packages, and offer tremendous resources.
So it'll take a couple of minutes for our additional finds here. And, I I admit sometimes and I'll read an old I hate to say old, but maybe a somewhat newish or, you know, senior statistical research book. You wonder how would what would happen if I just updated my the code examples in that book to use a newer package framework, a newer paradigm? How does that compare and contrast? Well, Imo Vitfeld from posit has done just that with the introduction to statistical learning labs converted to using tidy models. This is massive. If you ever wanna see just relating a newer framework for machine learning and and and everything like that. But with a very critically renowned, well established literature of getting into the nuts and bolts of predictive modeling and machine learning.
This quartile book online of Tidy Models Labs has you covered. I've been watching this over a little bit. It looks like it's had a ton of updates since I last looked at this, but it is a very direct one to one relationship with the labs that are mentioned in the in the second edition of ISLR with using tidy models. So if you ever wondered how, like, classification, you know, linear model selection, support vector machines would look in the ISLR context. But with tidy models, this is the place to go. Highly recommended.
[00:20:12] Mike Thomas:
I was looking at that, that book and I cannot wait to fully check that out. The ISLR book is is absolutely phenomenal and tidy models is absolutely phenomenal as well. That's that's our package, suite of packages of of choice here at Catch Brook for when we're doing predictive modeling projects, and then, ISLR is is sitting on my desk essentially at all times. So it's it's going to make my life even even easier to be able to have this resource that sort of, is the serves as the translation between those two things immediately instead of having to to do it ourselves. And I just want to point out quarto 1.4 has been released.
Big improvement here, I think, is around dashboards. You know, we've talked about it a lot, but quarto dashboards are here. The new iteration of of flex dashboard. So try it out yourself. The other thing that I'm super excited about, but I I think is in the early stages, I'll have to check out sort of how stable it is, but it's this new new manuscript project type called typest, t y p s t, if I'm pronouncing that correctly, I'm not sure. Sounds like a much lighter weight version, of maybe Pandoc or or for rendering PDFs, really lightning fast, or or Latex essentially. I think it's it's replacing Pandoc and LaTeX.
I haven't dug into it yet but if there is something out there that can render PDF reports for us, much faster and much more lightweight than what the current options are, I am super interested in. So we'll we'll see how that goes.
[00:21:47] Eric Nantz:
I'm very interested as well, and I believe there was a talk at Positconf about the type support coming for Cortl. So if I'm able to find that, I'll put that in the show notes as well. But I do have a use case at the day job or maybe you wanna make a PDF even not just of the statistical results of, like, a model fit, but also we can might even use this for, like, an internal newsletter or a new internal update and send that out because in any corporation, sometimes email is the only way to get ahold of people, and this will be a great way to have attractive kind of, branding, if you will, on some of the things we do. But types can make that a lot easier.
And, certainly, we hope that our weekly itself makes your journeys in r and data science much easier. And, of course, we love hearing from you. The best ways to get a hold of us are on the contact page linked in this episode show notes. You can also have a modern podcast app like Pod Versa Foundation. Us a fun boost along the way, and details about that are in the show notes as well. And, also, we are variously sporadically on social medias. I am at, our podcast at podcastindex.social on the Mastodon servers,
[00:22:57] Mike Thomas:
sporadically on the Weapon X thing with at the Rcast and LinkedIn from time to time. And, Mike, where can listeners find you? Sure. On LinkedIn, you could search Catchbrook Analytics, k e t c h b r o o k, and see what I'm up to there. Or you can find me on mastodon@[email protected].
[00:23:16] Eric Nantz:
Awesome stuff as always. And, yeah, it's a nice tidy episode this week. And but as always, every single week, we're trying to be back here with awesome art content for all of you. So that'll do it for us for episode 150. Only 50 more to go than the big 200. We'll see if we get there. And in any event, we hope you enjoy listening, and we'll see you back for another edition of our weekly highlights next week.
Hello, friends. We are back at episode 150 of the R weekly highlights podcast. I knew we're gonna get to an awesome number, and we finally did. We're happy to have you join us from wherever you are around the world where we talk about the latest and greatest highlights that we have seen in this current week's our weekly issue. My name is Eric Nantz, and I'm delighted that you joined us today. And as always, I have my awesome cohost who never stops the hustle, Mike Thomas. Mike, how are you doing today?
[00:00:27] Mike Thomas:
I'm doing well, Eric. I am going on-site to a client for the first time in a long time. So, I'm showered, dressed, you know, all before 9 o'clock which is
[00:00:38] Eric Nantz:
occasionally unusual. I'm not gonna fully admit to that but, yeah. Looking forward to that today and looking forward to a a quick, highlights here. That's right. Yep. You got yourself ready for the the old business professional look. I'm doing that tomorrow for an on-site thing. So we got our our week of on-site stuff. But guess what? The power of virtual means we can do this from our comfortable homes for this episode. And this episode is not possible, of course, our weekly itself. And this week's issue was curated by Batool Almazak who, of course, had great help from our fellow Rwicky team members and contributors like you all around the world.
Now, Mike, you know that in our little post show last week when we were just getting our files sorted out, I had kind of lamented the fact that I was a little jealous of a certain individual that just did a really fun screen cast of a package review. Well, guess what? That is our first highlight here. I'm referring to a live package review that was conducted by Nick Tierney, very well established member of the R community and has cooperated our open side quite a bit in his tenure. Yes. And he did a terrific package review of JD Ryan's soils package, which is one of a very, ambitious, yet very powerful package trying to help surface up some very innovative data and innovative workflows for her team. To Nick's credit, he was very practical and very upfront with some of his process of evaluating a package, which, again, draws a lot from his rOpenSci roots.
And at a high level, a few of the things that he illustrates here that I definitely need to take note of is using from the good practice package a function called GB to literally automate the more standard types of checks that they would do in rOpenSci whenever a package is on board, which, again, everybody can benefit from because it's not like rOpenSci has some esoteric requirements. These are all great practices for software development and especially in the space of our package development. Also, extensive use of the COI package. We've been singing the praises of COI quite a bit, and Nick had some nice, you know, targeted comments on making that even more seamless to give a more friendly looking message for various notes, bullet points, or even error messages that can occur in this workflow.
And then, also, the use this package comes into play yet again. Right? We use this quite a bit in package development and getting kind of the basics of package documentation lined up with the usepackagedoc function is a terrific way to get that package level documentation up and running quickly. Throughout it, very much, in JD's blog post that we're linking to in the highlights here, she says she's rewatched this a couple of times, and her blog post is literally going through the recommendations that Nick had and the ways that she is now improving the soils package, even things like logic and the functions for directory and file paths checking. These are all things that we sometimes take for granted.
But what was also interesting is that Nick had in the live chat other really well established members of the community, such as Miles McBain, also giving his 2¢ on some of the operations that the package was doing and high level looks at how things are completely organized. So, yeah, some of the things I took away, Mike, is, yeah, I need to invest in this good practices package a lot more because that's gonna help me up my game with documentation as well. And, also, some of the nice pointers that they have here in the package documentation itself with the markdown files and the snapshotting of tests and various practices for committing these with descriptive messages. So, again, worth a watch for sure to see Nick in action because to me, I love seeing the journey just as much as the destination to rip off some friends in the Linux podcast and ecosystem.
And Nick really showed the the actual process of package review, which I think is extremely helpful from no matter where you are in your organization or academic institution. There there are a bunch of nuggets for you to learn from here.
[00:05:09] Mike Thomas:
Yeah. I'm super impressed with the soils package. I'm super impressed with the work that, JD and her team at Washington Department of Agriculture have done. It's really exciting for me to see, you know, government organizations, large organizations, you know, not only using R, but like creating beautiful R packages, and package down sites, and utilities for their team, and may maybe for for others to use as well and doing some of this work out in the open for us to be able to take a look at it, learn from it and potentially even contribute to it as well. You know, one of the the really interesting things, there are many really interesting things here, in my opinion. So this package is not on Quran. I don't know if they have the, the desire to to put this package on Quran, but it is on our universe.
And we recently, at Catchbook authored an open source package that, obviously, folks can install from from GitHub. We haven't pushed it to Kran or submitted it to Kran yet either. But I would be interested in seeing, I guess, the the process. And I should know this by now because we've covered our universe enough on this podcast. But the process of getting a package onto our universe that isn't on crayon. I believe that there's some workflow that Yaron has for our universe to actually take a look at what is on crayon and sort of copy that over Right. Onto our universe. But I didn't realize, I guess, what the workflow was for submitting a package to to be on our universe, but not necessarily on crayon. And that's not not speaking ill of crayon, but I think there's just some particular packages, you know, in our case that, maybe, aren't necessarily worth going through, the entire crayon workflow for.
This is a really really really cool idea and I learned a ton from this, you know. Take a look at the YouTube video. Take a look at how Nick walks through this package, and Miles and Adam, walked through this package and the different things that they call out in terms of things that, she did well, things that she didn't do well. 1, I don't know if you have the GitHub repository open, Eric, at all. One thing that's like blowing my mind a little bit that I can't figure out is so in the read me, which, you know, extends to the package down site, it has a bunch of videos in it that are video demos on how to create a soils project, render Right. A word or an HTML report.
And if you look in the read me on how these videos are sort of embedded, there's a link to the same GitHub repo and a folder called assets. Mhmm. And the folder called assets, I can't find on the repository anywhere. Wow. And it's also not get ignored. That's interesting. So I'm I'm curious as to how, like, maybe at build time when you're you're building the read me, she was able to embed these these files with a link to this this assets subfolder. But unless I'm going crazy, I can't find it. So that's really cool because one of the things that's that's called out in the blog post is the package size is very large, which would be an issue if you're submitting to CRAN, but not necessarily an issue otherwise. And and, you know, when I'm I'm thinking about packages that are large, I'm I'm obviously thinking, you know, what sort of types of files could be within that package to to make it large. And then, I I looked at the read me right away and I saw, oh, we have a bunch of video demos that must that could potentially be it. But I can't find them in the read me anywhere. So I'm very, perplexed to say the least.
And then, maybe, the the last thing that I'll I'll call out here is, you know, her team went all the way down the path of being able to use the Rstudio IDE to create a new Soils Rstudio project. The same way that we would create a new Gollum, Gollum package. Right? Or create a new new R package through the RStudio IDE. And the the little hex sticker from soils is is on the RStudio IDE right there for for creating a quarto soil health report. It's incredible. Obviously, this this package, that they've created is going to make other folks in our organization's lives a ton easier to just get their projects up and running sort of immediately instead of having to start from scratch. So if you are someone working in an organization where you find yourself do doing the same types of projects over and over, and our package could be a huge benefit to you and a great place to start for a template would be this soils package. It's phenomenal.
[00:09:53] Eric Nantz:
Yeah. There's a lot to unpack here on what JD's done. And as you're talking about the the videos and the Remi, in this blog post, I'll put the direct link to this in the show notes. She does say that there is a video tutorial from GitHub directly on how to pull this off. And it does have to do with GitHub flavored markdown. So he must have done some magic with GitHub itself. So we'll link to that, directly because my goodness, if my package is on GitHub, I definitely want to take advantage of this and make it easier for people to see some of the workflow in action for some of the packages I have in mind in the Shiny space in the future. So lots of great points, Mike. I think knowing that many of the analysts that she's working with and, frankly, the ones I work with are are using POSIT or Rstudio as their front end to this, using that new project feature and getting things ready right away is just so helpful for them. I mentioned I'm on a kind of a crusade at the day job to help make some of these initial clinical projects easier for people, and this project feature is going to be something we look at quite closely here. Definitely.
And, yeah, you and I, Mike, we've been living the life of creating the internal or company packages for our various clients or stakeholders. And a lot of times, things move. Sometimes things move pretty fast. And sometimes, we might need to check just what happened, maybe a version behind or 2 versions behind. But then you're kind of wondering, how do I handle that? Do I have to do a whole separate r installation on a virtual machine that has, like, an old version installed? Well, we have some good news for you, folks. If you are leveraging Git for your version control, whether you're putting on GitHub or not, but just using Git for version control of some sort.
Mylesalmon is back on the highlights once again. Definitely a repeatable pattern here in a great way because she has discovered in her continued, I would say, journey of leveling up her git knowledge that there is a way to kind of have your cake and eat it too of loading different r package versions kind of at once without a lot of fuss involved using a very, I would say, niche feature in Git called Git Worktree. I admit I have not seen this at all, and I've been using Git for over, what, 10 years. I did not know about this feature at all. So let's break it down for you real quick. Yeah. Yeah. We're both learning something here, Mike. So I think what you and I are familiar is a concept of branching. Where in branching, you could say, I'm on my main branch, but I know I'm gonna work on this new feature or new bug fix. But I don't wanna commit that to main yet until I get through this fix. And I'll do a new branch to work through that, iterate, and then push that up and do a code review or whatnot to push that into main.
I knew about that, but Git work tree is a little different. And in fact, it's more comparable, not so much the branching, but to the idea of Git stash where you just wanna put things aside in your working area of Git for a bit and then maybe fix something else real quick and then bring that back forward when you're ready. Well, apparently, with Git work tree, you can and as Mao's post illustrates, create a new folder somewhere on your computer and then have that folder be linked to that same Git repository of that package but to a different state of that package, maybe based on a commit, maybe based on another branch, maybe based on a previous release, which means that you could use that additional area that's separated from your main working area to look at, say, a previous package version.
And she gives an example of, say, another package called riGraph and then putting the tag after that and then making a directory for that and then using git work tree to check that tag out into that other folder. And then you can remove that. Clean clean up after yourself, so to speak, when you're done doing that investigation or that previous version of using git work tree remove and then that folder name. And then then it's as if nothing happened. I'm still wrapping my head a little bit around this because it it it I've never used Git Worktree before.
But there are plenty of times at the day job where maybe I've already gone, like, 1 version, 2 versions ahead on what I need to finish. But then I'll get a request from, like, an analyst or a client or or a customer in my in my various departments. And they have a question that admittedly, some reason they're using an older version of the package. So now I can use get work tree to investigate that really quickly without having to, you know, do some clever library magic along the way. So I'm still wrapping my head around this, but definitely as we do always, we'll have a link to Miles' blog post here. But if you need to quickly check what you did a version or 2 behind, Git Worktree seems like the way to go.
[00:15:11] Mike Thomas:
This is somewhat mind blowing to me. I think it's a great utility function that I now know about, with Git, and it's it's fantastic. I think it's it's very simply explained by Maelle to be able to create this just additional directories. This additional subdirectory essentially is that I think the way that she set it up, which would contain the specific version of the package that you want to work in temporarily. So, you know, I I think you you covered it excellently. This is a very nice short and sweet blog post, but I again appreciate, Mel, pointing out these nifty little tricks and tools that that we have. I didn't I had a similar use case but but not quite the same use case. I actually wanted to try out, a pack this package that we've developed on a different version of R, which I I guess I could have done and opened up a separate sort of IDE and had a local installation, of an older version of R, because I wanted to make sure that the package worked on an older installation of R. But then, we, we're taking a look at at of the utilities like R hub and things like that that allow you to test your package, against multiple versions of R on multiple different platforms and things like that.
And that also sort of is where where Docker, I think, can come into play and be your friends to allow you to be able to spin up a container that contains a particular version of R without having to necessarily install it on your local machine, and then worry about uninstalling it and things like that. But that can be a little more tricky. But I can absolutely see plenty of use cases where, you know, instead of changing the version of R, I would wanna actually change a version of of a particular R package and and take a look at, you know, how that package was functioning in that version, compared to a previous version. The scales package is one that was giving me some headaches lately. There's some new there's some new arguments in your, your label, number or label percent, that deal with positive and negative values that that were newly introduced and and giving me some headaches recently. So this is a use case that I think I might have to to spin up by this Get Work Tree function today and dive back into that. But, excellent, excellent blog post again by Ma'el Asserta. Just again, pulling out a bag of tricks that that I didn't really know existed that are absolutely gonna be helpful for me in the future.
[00:17:35] Eric Nantz:
Yes. And, I believe even though he's halfway across the world potentially, I might hear Bruno's voice in my ears saying, you could probably combine your use case, Mike, of checking different package versions with my Al's use case of different or of different package versions with your use case of different our versions with Nick's. I I bet there's way the titles do together. So, Bruno, I heard you even if you weren't saying that. I can I can hear you telepathically? So this would this would fit really nicely in this, and I'm excited to maybe try out some of these newer ideas as I'm getting more in the weeds, especially this past month, on some internal package development and trying to make it easier for both future me and future, collaborators as well. But if you ever thought you knew everything about Git no. I I I'm I'm one of those people that seems like I've learned something new every week about Git. So it is just amazing what we're learning in this space. And speaking of amazing, the rest of the art week of issue is just as amazing. You're gonna learn so much along the way if you read through the entire list of new blog posts, new packages, updated packages, and offer tremendous resources.
So it'll take a couple of minutes for our additional finds here. And, I I admit sometimes and I'll read an old I hate to say old, but maybe a somewhat newish or, you know, senior statistical research book. You wonder how would what would happen if I just updated my the code examples in that book to use a newer package framework, a newer paradigm? How does that compare and contrast? Well, Imo Vitfeld from posit has done just that with the introduction to statistical learning labs converted to using tidy models. This is massive. If you ever wanna see just relating a newer framework for machine learning and and and everything like that. But with a very critically renowned, well established literature of getting into the nuts and bolts of predictive modeling and machine learning.
This quartile book online of Tidy Models Labs has you covered. I've been watching this over a little bit. It looks like it's had a ton of updates since I last looked at this, but it is a very direct one to one relationship with the labs that are mentioned in the in the second edition of ISLR with using tidy models. So if you ever wondered how, like, classification, you know, linear model selection, support vector machines would look in the ISLR context. But with tidy models, this is the place to go. Highly recommended.
[00:20:12] Mike Thomas:
I was looking at that, that book and I cannot wait to fully check that out. The ISLR book is is absolutely phenomenal and tidy models is absolutely phenomenal as well. That's that's our package, suite of packages of of choice here at Catch Brook for when we're doing predictive modeling projects, and then, ISLR is is sitting on my desk essentially at all times. So it's it's going to make my life even even easier to be able to have this resource that sort of, is the serves as the translation between those two things immediately instead of having to to do it ourselves. And I just want to point out quarto 1.4 has been released.
Big improvement here, I think, is around dashboards. You know, we've talked about it a lot, but quarto dashboards are here. The new iteration of of flex dashboard. So try it out yourself. The other thing that I'm super excited about, but I I think is in the early stages, I'll have to check out sort of how stable it is, but it's this new new manuscript project type called typest, t y p s t, if I'm pronouncing that correctly, I'm not sure. Sounds like a much lighter weight version, of maybe Pandoc or or for rendering PDFs, really lightning fast, or or Latex essentially. I think it's it's replacing Pandoc and LaTeX.
I haven't dug into it yet but if there is something out there that can render PDF reports for us, much faster and much more lightweight than what the current options are, I am super interested in. So we'll we'll see how that goes.
[00:21:47] Eric Nantz:
I'm very interested as well, and I believe there was a talk at Positconf about the type support coming for Cortl. So if I'm able to find that, I'll put that in the show notes as well. But I do have a use case at the day job or maybe you wanna make a PDF even not just of the statistical results of, like, a model fit, but also we can might even use this for, like, an internal newsletter or a new internal update and send that out because in any corporation, sometimes email is the only way to get ahold of people, and this will be a great way to have attractive kind of, branding, if you will, on some of the things we do. But types can make that a lot easier.
And, certainly, we hope that our weekly itself makes your journeys in r and data science much easier. And, of course, we love hearing from you. The best ways to get a hold of us are on the contact page linked in this episode show notes. You can also have a modern podcast app like Pod Versa Foundation. Us a fun boost along the way, and details about that are in the show notes as well. And, also, we are variously sporadically on social medias. I am at, our podcast at podcastindex.social on the Mastodon servers,
[00:22:57] Mike Thomas:
sporadically on the Weapon X thing with at the Rcast and LinkedIn from time to time. And, Mike, where can listeners find you? Sure. On LinkedIn, you could search Catchbrook Analytics, k e t c h b r o o k, and see what I'm up to there. Or you can find me on mastodon@[email protected].
[00:23:16] Eric Nantz:
Awesome stuff as always. And, yeah, it's a nice tidy episode this week. And but as always, every single week, we're trying to be back here with awesome art content for all of you. So that'll do it for us for episode 150. Only 50 more to go than the big 200. We'll see if we get there. And in any event, we hope you enjoy listening, and we'll see you back for another edition of our weekly highlights next week.