Flipping a Hello World function on its head, assorted improvements landing in ggplot2 3.5.0, and why authoring beautiful code is so worth it.
Episode Links
- This week's curator: Jon Carroll - @carroll_jono (Twitter) & @[email protected] (Mastodon)
- HelloWorld(“print”)
- ggplot2 3.5.0
- Beautiful Code, Because We’re Worth It!
- Entire issue available at rweekly.org/2024-W09
Supplement Resources
- lazygit - Simple terminal UI for git commands https://github.com/jesseduffield/lazygit
- Advanced R - Expressions https://adv-r.hadley.nz/expressions.html
- Jenny Bryan's talk on code smells and feels https://github.com/jennybc/code-smells-and-feels
Supporting the show
- Use the contact page at https://rweekly.fireside.fm/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @theRcast (Twitter) and @[email protected] (Mastodon)
- Mike Thomas: @mike_ketchbrook (Twitter) and @[email protected] (Mastodon)
Music credits powered by OCRemix
- Everybody Wants to Rule the Wisps - Sonic Colors - The Good Ice - https://ocremix.org/remix/OCR04368
- You Are Not Confined - Final Fantasy IX - Sonicade - https://ocremix.org/remix/OCR01064
[00:00:03]
Eric Nantz:
Hello, friends. We're back with episode 154 of the R Weekly Highlights podcast. This is the weekly podcast where we talk about the latest and awesome resources that you can find every single week on the latest our weekly issue. My name is Eric Nantz, and I'm so delighted you joined us today from wherever you are around the world.
[00:00:21] Mike Thomas:
And I never do this alone. He is my line mate and tag team partner here, Mike Thomas. Mike, how are you doing today? Good. I like that hockey reference, Eric. I have been living in the terminal for the last couple days, so I'm going to crawl out of the terminal here for a few minutes and, excited to get a little higher level with the highlights today.
[00:00:41] Eric Nantz:
Yeah. I've been in the terminal myself. Fun little, I don't wanna call it a hack because it's a legit tool. But, I was getting jealous of some of these really fancy Git GUI interfaces I often use locally. Shout out to the GitKraken project. That's one of these. Can't really install that on my, my company's HPC infrastructure. So I may put this in the show notes just for kicks. Found a terminal based Git tool called lazy Git. It's not lazy. It's really powerful. And it's written in Go, actually. But that's my end cursor's Git interface, which has been super smooth for me. So if you if any of you out there are a need for a great kinda terminal git experience that gives you that great overview of, like, branches, your staging area, commit history, It's all right there. So, shout out to Lazy Git. Fun project.
[00:01:39] Mike Thomas:
No. That's a great shout out. I I love the the Git GUI clients or or sort of anything that tries to help make it a little bit more manageable than it is. Understand that there is need, right, to go straight to the the git bash shell, once in a while for doing in particular things, but I I think sort of in general for 99% of my use cases, it helps to use something that's a little more gooey to help you avoid making Git mistakes, which can be hard to undo.
[00:02:08] Eric Nantz:
Yeah. No hope. Don't get me started. I had to do undo a lot of nonsense the past couple weeks in one of my repos, but I digress. Only we can undo our recordings of this podcast. We gotta get our act together, shall we? You might you might dare say it's showtime, folks. But, yes, this, issue this week was curated by John Carroll who is, another longtime contributor and curator for our weekly. And as always, he had tremendous help from our fellow Rwicky team members and contributors like all of you around the world. Now we're gonna lead off here with a post that's gonna flip a lot of your assumptions perhaps on it on their head, so to speak. Because our first post here comes from June Cho who is a PhD candidate in linguistics at the University of Pennsylvania and has often been at the cutting edge of going not just a little bit into r, but really deep into the fundamentals of r itself. And, boy, this one is if you wanna go deep in how functions are composed, this is for you.
So he leads off with a typical premise that when you're learning any language, typically, you're gonna do the infamous hello world type example just to make sure things are quote unquote working. Well, apparently, there's been some, over the years, albeit I'm not seeing this until this post, there have been some pretty, adventurous developers out there for various languages that might play a little trick on your mind by not just having a function that prints the 10 stacks hello world as in a typical print call in things like Java or or other languages. But instead of having a function literally called hello world, putting the the string of print in it, and it still somehow prints hello world.
Like, what is going on there? Well, apparently, there's a lot of ways and multiple languages to kind of flip the concept of arguments and functions. So June explores in this post, what can we do with the R language in this in this case? Well, in order to get there, you gotta learn about some of the, self described quirks in the R syntax that you may not see until you really dive further into it. Case in point, anytime we define a function, he has some examples here of, like, summing or adding numbers together, It first needs to see that that is represented as a expression or not. And if it does, it needs to determine if that value is a function or not.
And how does it know that it's a function or not? Apparently, there are very intricate orderings here with respect to evaluating the scope of this in terms of these language objects. In a language object, if it is a function, it's always gonna be first in line. And he has an example where he literally goes to this expression, finds the first item of it, and, indeed, it is the function that's being wrapped into that. And, of course, to review, even the operators you see in r, like the plus, multiplication, etcetera, those are all functions under the hood. Right? So they would be first in this stack of the language object.
Once you know that, you can now start to do some crazy stuff with actually flipping the order of this and superimposing different different ways of architecting this. And this is where you need to dive into some concepts that scared the heck out of me in my early days of R, and that is deparsing. And also a new function, not, I mean, new to me, I should say, the sys.call function which can actually find where the which returns the expression of a function of where that call was taking place. And, again, we're gonna try to explain this at a high level, but, obviously, look at the post for the detailed examples here. But then he shows how to actually get these functions from these syscalls and what is actually returned inside of them, which, again, first align is the function itself, and then the second would be the arguments being supplied to it.
So once you have that, you can now start to do a little bit of flipping of that order. And instead of having the typical print hello, world, you can have the hello, world with the syntax of print, and it's still gonna give you what that output of that function would be in ordinary language. Now that gets it gets even more bizarre here, bizarre to me anyway, because, again, I haven't dived this much into functions ever. But you can also write wrappers around this to do this with any function, not just a manually specified like hello world printing. He has an example register function that he defines where it's going to dynamically grab the name of that function in this language object calling stack, register a new function with that name, and then basically in that cut function environment, now give you that alias to, again, flip the argument and function on its head.
And then lastly, in terms of where I see this, you can go back to the other way and make it the typical print hello world, which is called unflipping. Again, some clever use of the substitute function to make that happen. He has an example here called unflippery. He shows how to reverse this kind of bizarre sequence so you can get back to what you wanted to do with a call statement to get there. But, yeah, if you ever wanted to know just how far you can take this reversing of function arguments and function calls themselves to mimic what you often see in the other programming languages in terms of these thought experiments of just how far you can take it.
June's example, again, fully reproducible. You can run all this in your console and inspect these language objects, these calling stacks, and just how these functions built into r, like matching function calls, finding the system call itself, and then clever use of the eval and substitute functions to kind of change the ordering of things. This can this can be pretty powerful, albeit This probably could be a great fodder for maybe an April fools joke someday for somebody not knowing what to expect out of your package functions. I don't know. I'm just saying we're still out of April yet, but may y'all keep this in mind for some good time, pranks in the future with my art friends.
[00:09:08] Mike Thomas:
I would have to agree, Eric. If you do prank me with that, I I can't say. I'd be be laughing too hard because some of this stuff is fairly convoluted. I think some of this can can trip up beginners as well, and there's probably some fair critiques of the R language, you know, for for newcomers who who might get tripped up in some of, you know, that this meta programming and and real quirks about the ability to sort of program on the language itself. And like you said, I think it is important though for for anyone, using R, probably experienced developers, maybe more so to to read a blog post like this and understand these different things. You know, I think it's very important to to understand that your operators are functions in and of themselves.
It was a refresher for me. I I think maybe something I knew at one time but happened to forget, that you could wrap the function name in your console in quotations and run that and it would still return sort of what you would expect. So I ran, you know, some 4 and then I I wrapped some in quotes, double quotes, and and ran that again and it returned 4 again. And, it was just a little little bit of a shock to the system to to recall that, you know, that is is possible. You know, I think we see a lot of code sometimes with with folks using, like, the the get function or assign or manipulating, the environments that you're using, you know, within sort of beginner r code. And I think that stuff can be pretty powerful, but it can can trip you up if you are trying to to build, you know, our software that is going to go into production somewhere or is going to, as you may, lament with Eric, you know, run through a GitHub action that that may treat environments, you know, a little bit differently than what you have going on on your local machine. Sorry to sorry to dig that that stuff. Bad memories. My bad memories.
But, this this also reminds me of the the advanced r book, which I think would be a nice complement to a lot of the content in here. There is a chapter in there called metaprogramming that runs through, you know, the big pictures there, the important concepts, expressions, you know, quasi quotation when we think about nonstandard evaluation and our and our sort of ability to do that, which is is unique to R in a way that I believe is not really possible in in Python. And, you know, a lot of these different quirks that you have to think about when understanding and working with, this type of functionality.
So really, really interesting blog post, you know, I I think really creative examples here by June to to just show us how some of these internals work under the hood.
[00:12:01] Eric Nantz:
Yeah. You'll definitely want your r terminal side by side as you're as you're reading this and kinda try this out interactively. Boy, I wonder if this could be augmented with Quartle and having that fancy evaluator inside, but I digress. But in any event, evaluator inside, but I digress. But in any event, one way or another, you'll wanna practice this if you ever wanna see this in action. Because someone like me, I definitely like to be hands on when I'm learning these concepts. So I would I would definitely have my fancy terminal side by side as as reading June's post. But, yeah, he's got a whole boat of of awesome posts in his, in his blog, especially around other areas of the tidy verse. He's been front and center. So definitely check out his his site, with his back catalog of really awesome explorations with the language, in more ways than one.
And in our next highlight today, we've got a a great, showcase of the recent advancements that landed in the latest version of ggplot 2. Again, one of the more fundamental pillars of visualization in the art language itself. Ggplot2 just recently had version 3.5.0 land on CRAN. And in this highlight, we got a terrific blog post from the tidyverse blog by one of the, I believe, newer ggpod 2 maintainers to an brand. I haven't seen his name before this. But, yeah, great, great to see this post here, and we'll walk through some of the the key features and and key improvements here. Leading off here is a very important infrastructure improvement to help bring the mechanism behind guides in ggplot2 to a little more consistency with other systems in ggplot2.
Mainly speaking that up to this version, the object oriented paradigm that the guide system was following was still using s 3. Well, now with this rewrite that they've had in ggplot2, 3.5.0, now guides are now being the system behind guides is now rebranded to use gg proto, bringing it in line with the other parts of ggpod 2 that have been used in heavy customizations, such as, you know, layers, facets, scales, and whatnot. Meaning that now the door is open to treat new extensions on the guide system just like any other extension that we could do in this space. So I think this is gonna be, hopefully, a launching point for others to make even more customized versions of the guide system as they see fit in the ggplot2 landscape. So really nice to see that consistency being brought in from a back end level.
And speaking of visuals, with ggplot2, a lot of the graphs these days are making heavy use of gradients and patterns in their visualizations. Well, now they are first class citizens in the ggpa 2 ecosystem with respect to new functions that are that can be used such as within the fill argument using the patterns argument. And you get being able to tap into the grid system where it has built in functions in the grid package called linear gradient, radial gradient, and others, within the pattern function. So you can now have those really nice looking gradient bar charts, gradient backgrounds for scatter plots. This looks really sharp and not just gradients too.
If you have a pattern you wanna use to really distinguish that particular facet or that particular bar from the others, you can also leverage patterns using within the scale fill manual directive. That is really powerful stuff. There's some great examples in there of a bar chart that has multiple patterns inside to really, you know, really catch your eye, so to speak. But looks like what they've done is some really important improvements to how the alpha aesthetic was being applied to this situation. And that was a hurdle they had to overcome to make all this happen. So lots of great improvements there for even more custom visualization for how you display colors in your ggplot graph. But, of course, there is much more to this.
[00:16:46] Mike Thomas:
Yes. There is, Eric. You know, in terms of the scales, you know, ggplot has has changed how plots interact with these variables created with the I function, which I believe is is from base r, and it creates this this class, if I'm not mistaken, or or a pen prepends the class as is to the object's class. So, what this allows you to do from from my interpretation of the blog post is to be able to prevent, you know, some of the clashes that happen, when you are introducing, you know, for example, here, like, an additional scale on your plot. So one of the, examples that they give is if you have, you know, sort of 2 calls to to Geonpoint 2 to Geonpoint layers here on your ggplot, you know, just using the empty cars package and, you know, for one of those layers, within your aesthetic, you're you're setting the color as a variable within the MT cars, dataset, the drv drive variable. And then you wanna layer on a second point on top of that that that's going to serve as as, like, a circle, around the dots from the first layer, and you want those colors to be, you know, some predefined string of colors like red, blue, green that you had set. You know, previously, this would you would actually run into an error here and it would not be able to find, those colors for your 2nd layer, but now, if you leverage that I function around the the variable that holds the string containing your colors, you'll be able to add the circles around that or or add this additional layer aesthetic, without that clashing with the guide, with the legend that was developed in the first layer at all. So we're on a podcast trying to describe DataViz again, Eric.
The best way to best way to check this out is certainly through reading the blog post. And another sort of improvement here around ignoring scales is the ability within the, ggplot annotate function to add some text in specific locations, that that will not clash against multiple annotate layers, again, leveraging this this as is function, this this capital I function, that allows you to have sort of greater control over where you wanna annotate different, text overlaid on different parts or layers of your ggplot chart.
So lots of great code examples here. I I think sort of the best way to dive into this content and these improvements, which I think are are mostly subtle, and may not affect most of your day to day work within ggplot, but the the best way to do that is definitely to take a look at this this blog post. Take a look at the code snippets and and see, how these may relate to your Dataviz work on a day to day basis right now with ggplot.
[00:19:43] Eric Nantz:
Yeah. I'm I'm definitely seeing, especially towards the end, these examples, I think, have taken inspiration by the community itself, large with ggplot too. Some of these features that I think have been exposed in additional packages are now coming into ggplot2 proper. You'll see kinda towards the end of the post some new ways to angle the orientation of labels on on your various, point annotations. I have an example with the empty car set and flipping the annotations sometimes with, like, 45 degrees or or or less. And then others, being able to do some padding around the labels too. Again, that's really neat. I think I've seen that in additional packages.
And, yes, certainly, those that have been creating those fancy violin plots or box plots, in general, and have been really frustrated of how to deal with outliers efficiently. Well, guess what? Now geomboxplot has an option to remove outliers entirely or just the outlier is a false directive. Very nice. Very nice. But you can still just hide them with sending that outlier shape of na. So you've got you got you got multiple ways to handle outliers. But, again, it's great to see if you just wanna wipe them out, you wipe them out. So really nice improvements to the, ggplot to, box plot directives. But, again, lots of looks like very nice improvements, and it sounds like they wanted to do this more incrementally. But it just so happened that a bunch of these improvements landed in 3.5.0.
But again, we're all to benefit from it. And I always like to see at the end of these posts on the Tidyverse blog and others from Pawsit, they always make a point to recognize all of those that have contributed to this release. So you get all the GitHub handles, of the numerous contributors to this particular release. But, again, congrats to the team, and look forward to putting this, in the production for my workflows very soon.
[00:21:43] Mike Thomas:
Yes. And you could be included in that list of acknowledgments and famous if you, find, you know, even that for the smallest use case, a grammatical issue in in a vignette or some documentation as well so always feel free that that you can contribute to open source and there there is no pull request too small in my opinion
[00:22:17] Eric Nantz:
And rounding out our highlights today, we've got a really fun post here because Mike and I have dealt with this in various ways in our respective workloads, making sure that the code that we personally are writing and the code that we have with our collaborators into a more, you know, central project. Then we're kind of on the same page as the cliche goes. But there are ways that you can make sure that that is easy to opt into. And in this case, we have a terrific set of resources and narrative from our last highlight today. A blog post from the esteemed rOpenSci blog.
Not one, but 2 authors here. We got Mao Salmon who again returns back to the highlights yet again. Her her streak continues. And also coauthor with Ioannina Bellini Salbin, who is a community manager at rOpenSci now and very, frequent contributor in the open source community space and data science space. And they have this awesome blog blog post titled Beautiful Code. Because you're worth it. No. Don't don't get worried, folks. We're not we're not getting sponsor from a certain fashion company. But I digress. Let's dive into what makes beautiful code in the minds of my own. I mean, Yanina here.
Well, let's start off with spacing. And, you know, this is something I have to I have to have a little confession here as I read through this the first time is that it's it's one thing when you see in the example here, you've got inconsistent use of spacing between operators, maybe between arguments or, you know, separated arguments and whatnot. And, yeah, that that can that can just be a little bad UX, so to speak, as you're reviewing that. But having the unified system for how you're treating both space between function parameter names or operators after the function call and this indentation on the new lines, that's hugely important for readability.
But I I admit they also have great advice too of not necessarily putting, like, a new line between all of your declaratives. And I've been kinda guilty of maybe putting too many new lines between my various function calls, but instead to try to group them in kind of related chunks, so to speak. Whereas maybe you have a tidyverse pipeline ish, you know, syntax, and you wanna keep, like, a lot of that data manipulation in one concise area, then you maybe break it up with another part of your function operation and you're doing a new operation. A lot of times in my shiny ass, I would kinda break things up maybe a bit too much.
But again, I think it's not so much what's right or wrong. It's be consistent. Be consistent with yourself. Be consistent with your main your collaborators. And I think then you're gonna have what they envision as well. Proportion code will be easier for reviewing, easier for debugging. And also, another trick that they recommend as well is maybe you realize you have a lot of lines in that particular pipeline. Well, there's nothing stopping you from having more fit for purpose functions inside that overall pipeline to help break out some of that potential long scrolling syndrome that you might have with these more of a reversed pipeline. So being able to leverage that mechanism is really important too.
But, it's not just about this, obviously, the spacing and the use of maybe fit for purpose functions. There are obviously other ways that you can have concise and beautiful code too without being in the code itself. And that we have to talk about comments now, Mike. What what do what do they say about comments here?
[00:26:14] Mike Thomas:
Yeah. So the the section title is not too wordy, just the right amount of comments, and they even link to a blog post on our hub called why comment your code as little and as well as possible. This is one that I am probably super guilty of as well just like creating sort of too much vertical space probably between, you know, different pieces of logic, within a lot of a lot of our code. And, you know, I I guess I have sort of mixed mixed feelings on on this. And I think, you know, the idea is to use, you know, very self explanatory functions, function names, or variable names where by just by looking at the code, you know, it's very easy to understand exactly what's going on.
I I think in a perfect world, you know, that that we wouldn't have to write any any comments at all because, you know, function names and and our variables would be so self explanatory. But I think we all know that that's just not necessarily the case. And I think, you know, this is something that I see a lot in a lot of the open source packages that that posit, you know, formally, RStudio has put out for years years years. And something that I probably need to adopt a little bit better, but it's, I I think the the concept of really only introducing comments when you think it's not necessarily self explanatory.
What's going on. You know, when there's an additional anecdote, additional piece of information that you need to provide on top of, you know, what the logic is doing itself. Because if if somebody wanted to understand exactly what was going on and didn't, you know, they could always dive into the the help documentation for each of those functions, to to understand exactly what's going on in it. As long as you're writing good descriptions in your roxigen comments above those functions, you know, defining what the parameters, represent and defining sort of the overall goal of the function and and what it returns, then I think there's a lot of good arguments there.
But again, you know, I would agree with you, Eric, that it's consistency would be key here. It's probably, you know, we're starting to get even more and more into sort of gray area when to comment and when not to comment. But if you can set, you know, some, you know, basic high level rules and decisions within, you know, your team about, you know, when to comment and when not to comment, and try your best to to follow those. I think that that consistency will will help your code base be more maintainable over time.
[00:28:51] Eric Nantz:
Yeah. And and I do admit being in industry versus releasing a package open source. There's, it's almost I got to be 2 personas in one of for a lot of my projects. The hear me out here. This may sound bizarre, but hear me out. Is that there is the purest in me that wants to make things as concise as possible from a development standpoint. I know the project very well. Right? I mean, I I built the package. I built this shiny app for five years. I I know the intricacies, but I need to think about, do I really wanna be the only one on this project to help maintain and help develop a new feature? No. I want people from my team or maybe others in the organization to help me out from time to time.
Well, sometimes the comments that I put in my package source, you know, functions and documentation alike, they're kind of serving another purpose. It's not just to highlight a particular idiosyncrasy or a particular area we need to be aware of. It's kind of doubling as a teaching mechanism too. Like, often in my comments, I'll maybe describe what it's solving, and then I'll put, like, a reference. And guess what? It's gonna probably be a Stack Overflow reference or or a blog post. You know? Just to get that get that in there right into the eyes of that collaborator is gonna help me. Yes. Ideally, that would all go in a GitHub issue or or a dev notes journal or whatnot. But sometimes you gotta get strike while the iron's hot, so to speak. When you have a collaborator looking at your code base, maybe whipping up posit workbench or whatnot and looking at this code, you wanna put that front and center of, like, not just to be aware of the issue, but how did I or anyone get an insight into how to solve that? And a lot of times, I don't solve these myself. I've leveraged a vast R community that have treaded those waters before, maybe an API call or maybe other operations like that in the Shiny space. And I am not shy about putting those links to external references in the code base itself.
Again, I'm an industry. 99% of what I do doesn't see the light of day outside the firewall. So I wanna make sure that for future me and future collaborators, they have a better understanding of why that solution's in place. So that's my mini soapbox for today.
[00:31:17] Mike Thomas:
Yeah. And I think it even I couldn't agree more, Eric. And I think it even in a higher level, you know, just having that, you know sort of code style guide within your organization can go a really really long way towards getting everybody on the same page here. And you know, I think we can all agree on a couple of these last tips from Yael, Amael, and Yanini, on early return and and the switch function. So if you have a particular function and and I think Jenny Bryan refers to this as like, the happy path. If there is an if statement, it within that function, and if else, if you will, and you sort of expect most of the time for it to go down this first path, You can actually early return, have a return call, within that first chunk of your if statement, sort of assuming that it will never get to that that second portion most of the time. And that can save you a little bit of time, make your code a little bit more lightweight.
And you know, Erica, as you and I know, these these little things, you know, might save like a a millisecond, right, to to make a change like this. And it may seem like like not much. But if you consistently do this throughout your projects, you know, those little milliseconds can add up and and turn into an improved user experience and and that's regardless of whether you're developing a shiny app or if you're developing a, just an R package in general that others are going to be using. You know, I I think, you know, these little things, especially these early returns, can can add up over time. And then the the switch function is one that I don't see used enough. It is the definition of an oldie but a goodie, Eric. If you have nested if statements, if you have like an an if else, or an if, and then an else if, and then another else if, and then another else if, to handle all of these different cases of what this, variable could potentially take on for a value.
Please leverage the switch function. It makes it so much easier to define all of those different all that different case logic, for the different values that that particular variable can take on and it it looks much cleaner it's just much easier to handle so and that's a phenomenal recommendation as well because that is something that I do see time and time again way too often are these long lengthy nested, if else statements.
[00:33:48] Eric Nantz:
Yeah. I I, there's a lot of legacy projects where I fell into the if else else else if trap. And, yeah, I definitely need to refactor that the switch sometime. You know? It it it you know, I I was thinking as I'm I'm reading through all these tips. You know? There are some analogies you can make, especially as you're doing maybe some of these things you're just not as comfortable with because you didn't know about them in the first place. Like I like, what you're talking about, the early returns, the switch, and and a different naming convention. Honestly, I think it's gonna it's gonna be a little hard at first, especially if you're, you know, you have old habits like I do. And I I see my old code bases from 4 or 5 years ago. I'm like, oh my goodness. What was I thinking? Well, this is similar to, frankly, keeping healthy from, like, a fitness standpoint. You may it may seem uncomfortable at first, but you build up. You build up. You build up. And then suddenly, the the next time you make that new shiny app, that new R package, even just that new set of functions you're gonna pass off to that colleague, these will be front and center. It won't be the old habits anymore.
Of course, easier said than done. Right? You gotta start somewhere. This is fresh in my mind because I'm refactoring a 7 year old package as I speak. And boy, oh, boy, were there some issues there, which is a great segue into kinda how this post concludes where, you know, occasionally, if you do have the time, and I realize time is hard to come by with a lot of our jobs these days, but taking a little bit of time to do what they call spring cleaning of your code, seeing what are some gaps that you can solve with the knowledge you've gained from hopefully reading our weekly and listening to this podcast or other ways of of learning about codevelopment.
And there's a link also in the blog post to about the how the tidyverse team is doing spring cleaning. So that might be some inspiration as well. And then, of course, take advantage of automation when you can. There is the lint r package. It's just gonna help you with things like the spacing issues and syntax issues that, again, can automatically point where these are so you don't have to manually scan it. This is fresh in my mind too because I was helping, do a little, new feature to one of the internal companies packages that my esteemed teammate, Will Landau, maintains. And he built in winter checks in the GitHub action, and I forgot to run that locally. And I was like, oh goodness. I messed up stuff up.
But it it pointed it to me. Then I ran it locally, got it fixed. Now they get that fancy green check mark in the in the PR check. So lots, lots of great great tips here for sure.
[00:36:28] Mike Thomas:
Yes. Absolutely. And I think if you're a manager, especially of data analysts or data scientists, try to build in. I know it's hard, but but try to build in time, at least once during the year to to take a day or or a couple days or a week even to to go through your repositories and take a look at that code and and see what you can do. I think they're calling it referring to it as spring cleaning here to to maybe improve that code styling or develop some refactoring to keep that code up to date, and keep it as maintainable as possible. As a as a a quick story, I have a former employer who, won't work with me on a on a project because of some code that I wrote 6 years ago that broke, I guess, internally recently. So they they they think that I'm a pretty terrible R programmer, because the code that I wrote 6 years ago is no no longer working there even though I've I've offered to help. So this is I'm not gonna name name any names or anything like that but, I I would just say don't don't be that person. Understand that that software needs to be maintained and managed and improved over time and, you know, don't judge somebody on the code that they wrote, you know, even a couple years ago because we're all consistently learning, improving, and and I don't even look like looking at the code that I, you know, wrote a couple years ago. So we're, but let's lift each other up here.
[00:37:56] Eric Nantz:
Exactly. All positive vibes. Yeah. We don't we don't need that. We could do a whole another hour podcast on that kinda issue. Trust me on that. But, you spotted someone else in here to post in this post, Mike, because I there's there's an opportunity for a a real, nice quote to live by here. Right? Oh, there is a quote that I absolutely love. I we need to get t shirts made up of this, Eric. I might get this tattooed on myself.
[00:38:19] Mike Thomas:
But, the the the line in here is the code you don't write has no bug. Unbelievable.
[00:38:26] Eric Nantz:
That oh, my goodness. Yes. We we need a shirt, whoever's listening out there. Yeah. Please please make this. We will take our money after you print it. We will buy it. That is, I love that line and it just speaks to so many aspects of my development life. Yeah. So there there's there's a boatload of additional resources that they link to at the end of this post. And, also, I have a link to the, one of the inspirations that it's supposed to begin with is that rOpenSci recently had their their second cohort of champions, onboarded and they ran some virtual workshops and some of those materials are online with respect to package development. So I'll have a link to that in the show notes too. And that particular external resource, wowed me for another reason, is that they also use the same Hugo theme that I did for an internal documentation site at the company about our HPC system. System. I was like, hey, I know this theme. That was awesome. So it's great when I feel like I'm I'm thinking similar to all these people I look up to in the community. That was just that was awesome stuff. Oh, that is awesome, Eric. You know what I think we should put on the back of that t shirt? I should, I think, you know, the front could say the code you don't write has no bug and the back could say, the code that an LLM writes for you probably does have a bug.
Bingo. We need we need a patent soon or or well, somebody's gonna take that run with it. Oh, goodness. Just kidding. Yeah. You know. You know. You know how it goes. But, we also know how it goes is that, yeah, the rest of the issue has a set of fantastic blog posts, new packages, updated packages, calls to action, you know, call to events, and everything else that you can find every single week at Our Weekly. So we're gonna take a couple minutes, tell us some additional finds that came our way that we we wanted to highlight here. And, of course, me being an audio video kind of, you know, junkie, so to speak, with doing this podcast and other media ventures.
This post here really hit home. There was a recent post, that I saw a Mastodon from Matt Crump about how he was exploring importing MIDI audio data into R for his cognition experiments. Well, he ended up using a mix of command line calls, the FFmpeg, which is kinda like the Swiss army knife, so to speak, of media, conversions and then another utility called fluid synth and some Python code, but, using a lot of shell commands. Well, your own ooms who, of course, is heavily involved with the infrastructure behind our OpenSci and the our universe project, decided to take matters in his own hands and decided to create a package called fluid synth to help wrap some of these system utilities for bringing in and parsing MIDI data. So if you ever find yourself having to analyze these and maybe use them in a data driven way and then also rendering that to an audio file, yeah, your Roam's package got you covered. So I have to add that to my toolbox amongst many other great utilities in the audio visual space in the art community.
[00:41:41] Mike Thomas:
That's a that's a super niche little, package there. I like it. I wanted to highlight a webinar series that's actually been going on through the our consortium, our insurance series. I believe it's hosted by, 2 folks at Swiss Re, which is an insurance company. Georgios Bacalukas and Benedicte Chamberge. And they it this video series looks fantastic. Eric, I'm just gonna walk you through the titles of the first few videos here. The first one is from Excel to programming in R, great content applicable everywhere. From programming in R to putting R into production.
Now, I know I'm getting you more excited. Oh, yeah. Our performance culture, and lastly, high performance programming in our so these are the 4 webinars that are now available through the our consortium's website. I'm not sure if they're going to continue to have more webinars or not. But if you are in the insurance space or if you're, into actuarial science, I would highly recommend checking out these webinars.
[00:42:45] Eric Nantz:
Yeah. What a what a excellent, you know, set of resources here. And I love the fact that they're being shared with others because I know that, you know, r is making big headways in the world of insurance and the world of finance and everything else in between. And, of course, I'm in life sciences, but it's great to see these tailored to that audience but with concepts that are most definitely universal to anybody in our respective industries because you gotta start somewhere. Right? More often than not, Excel is that window to data analysis that people use routinely and then be able to take that programming based approach with our boat tailored to that kind of audience going all the way to writing highly performing code.
Yeah. That's something that I am doing, trying every single day, and I can't pretend that I know everything about. So I'll definitely have to check these out. Looks like even they got wind of a certain project called Parquet. So that's really speaking to our eyes on this.
[00:43:44] Mike Thomas:
No. That that was my journey from Excel Excel guru,
[00:43:49] Eric Nantz:
into to R and changed my life. There's a lot more in this issue. Of course, we're gonna we're gonna have to start to wrap things up here. But, if you wanna get in touch with us, if you wanna help with the Rweekly project itself, that's always something we welcome. Whether it's your poll request, contributions, or suggestions, we're all just a poll request away to the upcoming issue draft, all linked at rweekly.org. That's where you'll find everything. I I have, inkling that the next curator could definitely use a bit of help if you get my drift. So yeah. Please send those requests to the project way.
And also, you can get in touch with us directly. A few ways to do that. We have in this episode's show notes a handy link to the contact page if you want to send us feedback there. You can also if you're on the, podcast 2.0 train with your modern podcast app, there's a boatload to choose from out there at podcastapps.com. You could send us a fun little boost along the way to give us a little message directly from within your app itself. Details on setting that up are also in the show notes. But, also, we are on the various social media spheres from time to time. I'm mostly on Mastodon these days with at our podcast, at podcast index dot social. I will admit I'm a little late, replying back to Bruno's been checking in with me on my next journey. I I have some follow-up with you. It's coming soon. Trust me.
But, also, I am sporadically on the weapon x thing with at the r cast. And lastly, on LinkedIn from time to time popping in with some announcements and episode posts. But, Mike, where can the listeners get a hold of you?
[00:45:27] Mike Thomas:
Sure. You can find me on mastodon@[email protected], or you can check out what I'm up to on Catchbrook Analytics, k e t c h b r o o k.
[00:45:41] Eric Nantz:
Awesome stuff, my friend. And, yeah. We we had a we had a heck of a a kind of a therapeutic preshow session. You all didn't get to hear it. But, Mike, listened to my GitHub action rant that may be becoming a rep for x in the very near future so that I can talk about it here later. But in any event, I'm gonna get back to the old day job here. So we're gonna close out this, episode of our weekly highlights, and we'll be back with another episode next week.
Hello, friends. We're back with episode 154 of the R Weekly Highlights podcast. This is the weekly podcast where we talk about the latest and awesome resources that you can find every single week on the latest our weekly issue. My name is Eric Nantz, and I'm so delighted you joined us today from wherever you are around the world.
[00:00:21] Mike Thomas:
And I never do this alone. He is my line mate and tag team partner here, Mike Thomas. Mike, how are you doing today? Good. I like that hockey reference, Eric. I have been living in the terminal for the last couple days, so I'm going to crawl out of the terminal here for a few minutes and, excited to get a little higher level with the highlights today.
[00:00:41] Eric Nantz:
Yeah. I've been in the terminal myself. Fun little, I don't wanna call it a hack because it's a legit tool. But, I was getting jealous of some of these really fancy Git GUI interfaces I often use locally. Shout out to the GitKraken project. That's one of these. Can't really install that on my, my company's HPC infrastructure. So I may put this in the show notes just for kicks. Found a terminal based Git tool called lazy Git. It's not lazy. It's really powerful. And it's written in Go, actually. But that's my end cursor's Git interface, which has been super smooth for me. So if you if any of you out there are a need for a great kinda terminal git experience that gives you that great overview of, like, branches, your staging area, commit history, It's all right there. So, shout out to Lazy Git. Fun project.
[00:01:39] Mike Thomas:
No. That's a great shout out. I I love the the Git GUI clients or or sort of anything that tries to help make it a little bit more manageable than it is. Understand that there is need, right, to go straight to the the git bash shell, once in a while for doing in particular things, but I I think sort of in general for 99% of my use cases, it helps to use something that's a little more gooey to help you avoid making Git mistakes, which can be hard to undo.
[00:02:08] Eric Nantz:
Yeah. No hope. Don't get me started. I had to do undo a lot of nonsense the past couple weeks in one of my repos, but I digress. Only we can undo our recordings of this podcast. We gotta get our act together, shall we? You might you might dare say it's showtime, folks. But, yes, this, issue this week was curated by John Carroll who is, another longtime contributor and curator for our weekly. And as always, he had tremendous help from our fellow Rwicky team members and contributors like all of you around the world. Now we're gonna lead off here with a post that's gonna flip a lot of your assumptions perhaps on it on their head, so to speak. Because our first post here comes from June Cho who is a PhD candidate in linguistics at the University of Pennsylvania and has often been at the cutting edge of going not just a little bit into r, but really deep into the fundamentals of r itself. And, boy, this one is if you wanna go deep in how functions are composed, this is for you.
So he leads off with a typical premise that when you're learning any language, typically, you're gonna do the infamous hello world type example just to make sure things are quote unquote working. Well, apparently, there's been some, over the years, albeit I'm not seeing this until this post, there have been some pretty, adventurous developers out there for various languages that might play a little trick on your mind by not just having a function that prints the 10 stacks hello world as in a typical print call in things like Java or or other languages. But instead of having a function literally called hello world, putting the the string of print in it, and it still somehow prints hello world.
Like, what is going on there? Well, apparently, there's a lot of ways and multiple languages to kind of flip the concept of arguments and functions. So June explores in this post, what can we do with the R language in this in this case? Well, in order to get there, you gotta learn about some of the, self described quirks in the R syntax that you may not see until you really dive further into it. Case in point, anytime we define a function, he has some examples here of, like, summing or adding numbers together, It first needs to see that that is represented as a expression or not. And if it does, it needs to determine if that value is a function or not.
And how does it know that it's a function or not? Apparently, there are very intricate orderings here with respect to evaluating the scope of this in terms of these language objects. In a language object, if it is a function, it's always gonna be first in line. And he has an example where he literally goes to this expression, finds the first item of it, and, indeed, it is the function that's being wrapped into that. And, of course, to review, even the operators you see in r, like the plus, multiplication, etcetera, those are all functions under the hood. Right? So they would be first in this stack of the language object.
Once you know that, you can now start to do some crazy stuff with actually flipping the order of this and superimposing different different ways of architecting this. And this is where you need to dive into some concepts that scared the heck out of me in my early days of R, and that is deparsing. And also a new function, not, I mean, new to me, I should say, the sys.call function which can actually find where the which returns the expression of a function of where that call was taking place. And, again, we're gonna try to explain this at a high level, but, obviously, look at the post for the detailed examples here. But then he shows how to actually get these functions from these syscalls and what is actually returned inside of them, which, again, first align is the function itself, and then the second would be the arguments being supplied to it.
So once you have that, you can now start to do a little bit of flipping of that order. And instead of having the typical print hello, world, you can have the hello, world with the syntax of print, and it's still gonna give you what that output of that function would be in ordinary language. Now that gets it gets even more bizarre here, bizarre to me anyway, because, again, I haven't dived this much into functions ever. But you can also write wrappers around this to do this with any function, not just a manually specified like hello world printing. He has an example register function that he defines where it's going to dynamically grab the name of that function in this language object calling stack, register a new function with that name, and then basically in that cut function environment, now give you that alias to, again, flip the argument and function on its head.
And then lastly, in terms of where I see this, you can go back to the other way and make it the typical print hello world, which is called unflipping. Again, some clever use of the substitute function to make that happen. He has an example here called unflippery. He shows how to reverse this kind of bizarre sequence so you can get back to what you wanted to do with a call statement to get there. But, yeah, if you ever wanted to know just how far you can take this reversing of function arguments and function calls themselves to mimic what you often see in the other programming languages in terms of these thought experiments of just how far you can take it.
June's example, again, fully reproducible. You can run all this in your console and inspect these language objects, these calling stacks, and just how these functions built into r, like matching function calls, finding the system call itself, and then clever use of the eval and substitute functions to kind of change the ordering of things. This can this can be pretty powerful, albeit This probably could be a great fodder for maybe an April fools joke someday for somebody not knowing what to expect out of your package functions. I don't know. I'm just saying we're still out of April yet, but may y'all keep this in mind for some good time, pranks in the future with my art friends.
[00:09:08] Mike Thomas:
I would have to agree, Eric. If you do prank me with that, I I can't say. I'd be be laughing too hard because some of this stuff is fairly convoluted. I think some of this can can trip up beginners as well, and there's probably some fair critiques of the R language, you know, for for newcomers who who might get tripped up in some of, you know, that this meta programming and and real quirks about the ability to sort of program on the language itself. And like you said, I think it is important though for for anyone, using R, probably experienced developers, maybe more so to to read a blog post like this and understand these different things. You know, I think it's very important to to understand that your operators are functions in and of themselves.
It was a refresher for me. I I think maybe something I knew at one time but happened to forget, that you could wrap the function name in your console in quotations and run that and it would still return sort of what you would expect. So I ran, you know, some 4 and then I I wrapped some in quotes, double quotes, and and ran that again and it returned 4 again. And, it was just a little little bit of a shock to the system to to recall that, you know, that is is possible. You know, I think we see a lot of code sometimes with with folks using, like, the the get function or assign or manipulating, the environments that you're using, you know, within sort of beginner r code. And I think that stuff can be pretty powerful, but it can can trip you up if you are trying to to build, you know, our software that is going to go into production somewhere or is going to, as you may, lament with Eric, you know, run through a GitHub action that that may treat environments, you know, a little bit differently than what you have going on on your local machine. Sorry to sorry to dig that that stuff. Bad memories. My bad memories.
But, this this also reminds me of the the advanced r book, which I think would be a nice complement to a lot of the content in here. There is a chapter in there called metaprogramming that runs through, you know, the big pictures there, the important concepts, expressions, you know, quasi quotation when we think about nonstandard evaluation and our and our sort of ability to do that, which is is unique to R in a way that I believe is not really possible in in Python. And, you know, a lot of these different quirks that you have to think about when understanding and working with, this type of functionality.
So really, really interesting blog post, you know, I I think really creative examples here by June to to just show us how some of these internals work under the hood.
[00:12:01] Eric Nantz:
Yeah. You'll definitely want your r terminal side by side as you're as you're reading this and kinda try this out interactively. Boy, I wonder if this could be augmented with Quartle and having that fancy evaluator inside, but I digress. But in any event, evaluator inside, but I digress. But in any event, one way or another, you'll wanna practice this if you ever wanna see this in action. Because someone like me, I definitely like to be hands on when I'm learning these concepts. So I would I would definitely have my fancy terminal side by side as as reading June's post. But, yeah, he's got a whole boat of of awesome posts in his, in his blog, especially around other areas of the tidy verse. He's been front and center. So definitely check out his his site, with his back catalog of really awesome explorations with the language, in more ways than one.
And in our next highlight today, we've got a a great, showcase of the recent advancements that landed in the latest version of ggplot 2. Again, one of the more fundamental pillars of visualization in the art language itself. Ggplot2 just recently had version 3.5.0 land on CRAN. And in this highlight, we got a terrific blog post from the tidyverse blog by one of the, I believe, newer ggpod 2 maintainers to an brand. I haven't seen his name before this. But, yeah, great, great to see this post here, and we'll walk through some of the the key features and and key improvements here. Leading off here is a very important infrastructure improvement to help bring the mechanism behind guides in ggplot2 to a little more consistency with other systems in ggplot2.
Mainly speaking that up to this version, the object oriented paradigm that the guide system was following was still using s 3. Well, now with this rewrite that they've had in ggplot2, 3.5.0, now guides are now being the system behind guides is now rebranded to use gg proto, bringing it in line with the other parts of ggpod 2 that have been used in heavy customizations, such as, you know, layers, facets, scales, and whatnot. Meaning that now the door is open to treat new extensions on the guide system just like any other extension that we could do in this space. So I think this is gonna be, hopefully, a launching point for others to make even more customized versions of the guide system as they see fit in the ggplot2 landscape. So really nice to see that consistency being brought in from a back end level.
And speaking of visuals, with ggplot2, a lot of the graphs these days are making heavy use of gradients and patterns in their visualizations. Well, now they are first class citizens in the ggpa 2 ecosystem with respect to new functions that are that can be used such as within the fill argument using the patterns argument. And you get being able to tap into the grid system where it has built in functions in the grid package called linear gradient, radial gradient, and others, within the pattern function. So you can now have those really nice looking gradient bar charts, gradient backgrounds for scatter plots. This looks really sharp and not just gradients too.
If you have a pattern you wanna use to really distinguish that particular facet or that particular bar from the others, you can also leverage patterns using within the scale fill manual directive. That is really powerful stuff. There's some great examples in there of a bar chart that has multiple patterns inside to really, you know, really catch your eye, so to speak. But looks like what they've done is some really important improvements to how the alpha aesthetic was being applied to this situation. And that was a hurdle they had to overcome to make all this happen. So lots of great improvements there for even more custom visualization for how you display colors in your ggplot graph. But, of course, there is much more to this.
[00:16:46] Mike Thomas:
Yes. There is, Eric. You know, in terms of the scales, you know, ggplot has has changed how plots interact with these variables created with the I function, which I believe is is from base r, and it creates this this class, if I'm not mistaken, or or a pen prepends the class as is to the object's class. So, what this allows you to do from from my interpretation of the blog post is to be able to prevent, you know, some of the clashes that happen, when you are introducing, you know, for example, here, like, an additional scale on your plot. So one of the, examples that they give is if you have, you know, sort of 2 calls to to Geonpoint 2 to Geonpoint layers here on your ggplot, you know, just using the empty cars package and, you know, for one of those layers, within your aesthetic, you're you're setting the color as a variable within the MT cars, dataset, the drv drive variable. And then you wanna layer on a second point on top of that that that's going to serve as as, like, a circle, around the dots from the first layer, and you want those colors to be, you know, some predefined string of colors like red, blue, green that you had set. You know, previously, this would you would actually run into an error here and it would not be able to find, those colors for your 2nd layer, but now, if you leverage that I function around the the variable that holds the string containing your colors, you'll be able to add the circles around that or or add this additional layer aesthetic, without that clashing with the guide, with the legend that was developed in the first layer at all. So we're on a podcast trying to describe DataViz again, Eric.
The best way to best way to check this out is certainly through reading the blog post. And another sort of improvement here around ignoring scales is the ability within the, ggplot annotate function to add some text in specific locations, that that will not clash against multiple annotate layers, again, leveraging this this as is function, this this capital I function, that allows you to have sort of greater control over where you wanna annotate different, text overlaid on different parts or layers of your ggplot chart.
So lots of great code examples here. I I think sort of the best way to dive into this content and these improvements, which I think are are mostly subtle, and may not affect most of your day to day work within ggplot, but the the best way to do that is definitely to take a look at this this blog post. Take a look at the code snippets and and see, how these may relate to your Dataviz work on a day to day basis right now with ggplot.
[00:19:43] Eric Nantz:
Yeah. I'm I'm definitely seeing, especially towards the end, these examples, I think, have taken inspiration by the community itself, large with ggplot too. Some of these features that I think have been exposed in additional packages are now coming into ggplot2 proper. You'll see kinda towards the end of the post some new ways to angle the orientation of labels on on your various, point annotations. I have an example with the empty car set and flipping the annotations sometimes with, like, 45 degrees or or or less. And then others, being able to do some padding around the labels too. Again, that's really neat. I think I've seen that in additional packages.
And, yes, certainly, those that have been creating those fancy violin plots or box plots, in general, and have been really frustrated of how to deal with outliers efficiently. Well, guess what? Now geomboxplot has an option to remove outliers entirely or just the outlier is a false directive. Very nice. Very nice. But you can still just hide them with sending that outlier shape of na. So you've got you got you got multiple ways to handle outliers. But, again, it's great to see if you just wanna wipe them out, you wipe them out. So really nice improvements to the, ggplot to, box plot directives. But, again, lots of looks like very nice improvements, and it sounds like they wanted to do this more incrementally. But it just so happened that a bunch of these improvements landed in 3.5.0.
But again, we're all to benefit from it. And I always like to see at the end of these posts on the Tidyverse blog and others from Pawsit, they always make a point to recognize all of those that have contributed to this release. So you get all the GitHub handles, of the numerous contributors to this particular release. But, again, congrats to the team, and look forward to putting this, in the production for my workflows very soon.
[00:21:43] Mike Thomas:
Yes. And you could be included in that list of acknowledgments and famous if you, find, you know, even that for the smallest use case, a grammatical issue in in a vignette or some documentation as well so always feel free that that you can contribute to open source and there there is no pull request too small in my opinion
[00:22:17] Eric Nantz:
And rounding out our highlights today, we've got a really fun post here because Mike and I have dealt with this in various ways in our respective workloads, making sure that the code that we personally are writing and the code that we have with our collaborators into a more, you know, central project. Then we're kind of on the same page as the cliche goes. But there are ways that you can make sure that that is easy to opt into. And in this case, we have a terrific set of resources and narrative from our last highlight today. A blog post from the esteemed rOpenSci blog.
Not one, but 2 authors here. We got Mao Salmon who again returns back to the highlights yet again. Her her streak continues. And also coauthor with Ioannina Bellini Salbin, who is a community manager at rOpenSci now and very, frequent contributor in the open source community space and data science space. And they have this awesome blog blog post titled Beautiful Code. Because you're worth it. No. Don't don't get worried, folks. We're not we're not getting sponsor from a certain fashion company. But I digress. Let's dive into what makes beautiful code in the minds of my own. I mean, Yanina here.
Well, let's start off with spacing. And, you know, this is something I have to I have to have a little confession here as I read through this the first time is that it's it's one thing when you see in the example here, you've got inconsistent use of spacing between operators, maybe between arguments or, you know, separated arguments and whatnot. And, yeah, that that can that can just be a little bad UX, so to speak, as you're reviewing that. But having the unified system for how you're treating both space between function parameter names or operators after the function call and this indentation on the new lines, that's hugely important for readability.
But I I admit they also have great advice too of not necessarily putting, like, a new line between all of your declaratives. And I've been kinda guilty of maybe putting too many new lines between my various function calls, but instead to try to group them in kind of related chunks, so to speak. Whereas maybe you have a tidyverse pipeline ish, you know, syntax, and you wanna keep, like, a lot of that data manipulation in one concise area, then you maybe break it up with another part of your function operation and you're doing a new operation. A lot of times in my shiny ass, I would kinda break things up maybe a bit too much.
But again, I think it's not so much what's right or wrong. It's be consistent. Be consistent with yourself. Be consistent with your main your collaborators. And I think then you're gonna have what they envision as well. Proportion code will be easier for reviewing, easier for debugging. And also, another trick that they recommend as well is maybe you realize you have a lot of lines in that particular pipeline. Well, there's nothing stopping you from having more fit for purpose functions inside that overall pipeline to help break out some of that potential long scrolling syndrome that you might have with these more of a reversed pipeline. So being able to leverage that mechanism is really important too.
But, it's not just about this, obviously, the spacing and the use of maybe fit for purpose functions. There are obviously other ways that you can have concise and beautiful code too without being in the code itself. And that we have to talk about comments now, Mike. What what do what do they say about comments here?
[00:26:14] Mike Thomas:
Yeah. So the the section title is not too wordy, just the right amount of comments, and they even link to a blog post on our hub called why comment your code as little and as well as possible. This is one that I am probably super guilty of as well just like creating sort of too much vertical space probably between, you know, different pieces of logic, within a lot of a lot of our code. And, you know, I I guess I have sort of mixed mixed feelings on on this. And I think, you know, the idea is to use, you know, very self explanatory functions, function names, or variable names where by just by looking at the code, you know, it's very easy to understand exactly what's going on.
I I think in a perfect world, you know, that that we wouldn't have to write any any comments at all because, you know, function names and and our variables would be so self explanatory. But I think we all know that that's just not necessarily the case. And I think, you know, this is something that I see a lot in a lot of the open source packages that that posit, you know, formally, RStudio has put out for years years years. And something that I probably need to adopt a little bit better, but it's, I I think the the concept of really only introducing comments when you think it's not necessarily self explanatory.
What's going on. You know, when there's an additional anecdote, additional piece of information that you need to provide on top of, you know, what the logic is doing itself. Because if if somebody wanted to understand exactly what was going on and didn't, you know, they could always dive into the the help documentation for each of those functions, to to understand exactly what's going on in it. As long as you're writing good descriptions in your roxigen comments above those functions, you know, defining what the parameters, represent and defining sort of the overall goal of the function and and what it returns, then I think there's a lot of good arguments there.
But again, you know, I would agree with you, Eric, that it's consistency would be key here. It's probably, you know, we're starting to get even more and more into sort of gray area when to comment and when not to comment. But if you can set, you know, some, you know, basic high level rules and decisions within, you know, your team about, you know, when to comment and when not to comment, and try your best to to follow those. I think that that consistency will will help your code base be more maintainable over time.
[00:28:51] Eric Nantz:
Yeah. And and I do admit being in industry versus releasing a package open source. There's, it's almost I got to be 2 personas in one of for a lot of my projects. The hear me out here. This may sound bizarre, but hear me out. Is that there is the purest in me that wants to make things as concise as possible from a development standpoint. I know the project very well. Right? I mean, I I built the package. I built this shiny app for five years. I I know the intricacies, but I need to think about, do I really wanna be the only one on this project to help maintain and help develop a new feature? No. I want people from my team or maybe others in the organization to help me out from time to time.
Well, sometimes the comments that I put in my package source, you know, functions and documentation alike, they're kind of serving another purpose. It's not just to highlight a particular idiosyncrasy or a particular area we need to be aware of. It's kind of doubling as a teaching mechanism too. Like, often in my comments, I'll maybe describe what it's solving, and then I'll put, like, a reference. And guess what? It's gonna probably be a Stack Overflow reference or or a blog post. You know? Just to get that get that in there right into the eyes of that collaborator is gonna help me. Yes. Ideally, that would all go in a GitHub issue or or a dev notes journal or whatnot. But sometimes you gotta get strike while the iron's hot, so to speak. When you have a collaborator looking at your code base, maybe whipping up posit workbench or whatnot and looking at this code, you wanna put that front and center of, like, not just to be aware of the issue, but how did I or anyone get an insight into how to solve that? And a lot of times, I don't solve these myself. I've leveraged a vast R community that have treaded those waters before, maybe an API call or maybe other operations like that in the Shiny space. And I am not shy about putting those links to external references in the code base itself.
Again, I'm an industry. 99% of what I do doesn't see the light of day outside the firewall. So I wanna make sure that for future me and future collaborators, they have a better understanding of why that solution's in place. So that's my mini soapbox for today.
[00:31:17] Mike Thomas:
Yeah. And I think it even I couldn't agree more, Eric. And I think it even in a higher level, you know, just having that, you know sort of code style guide within your organization can go a really really long way towards getting everybody on the same page here. And you know, I think we can all agree on a couple of these last tips from Yael, Amael, and Yanini, on early return and and the switch function. So if you have a particular function and and I think Jenny Bryan refers to this as like, the happy path. If there is an if statement, it within that function, and if else, if you will, and you sort of expect most of the time for it to go down this first path, You can actually early return, have a return call, within that first chunk of your if statement, sort of assuming that it will never get to that that second portion most of the time. And that can save you a little bit of time, make your code a little bit more lightweight.
And you know, Erica, as you and I know, these these little things, you know, might save like a a millisecond, right, to to make a change like this. And it may seem like like not much. But if you consistently do this throughout your projects, you know, those little milliseconds can add up and and turn into an improved user experience and and that's regardless of whether you're developing a shiny app or if you're developing a, just an R package in general that others are going to be using. You know, I I think, you know, these little things, especially these early returns, can can add up over time. And then the the switch function is one that I don't see used enough. It is the definition of an oldie but a goodie, Eric. If you have nested if statements, if you have like an an if else, or an if, and then an else if, and then another else if, and then another else if, to handle all of these different cases of what this, variable could potentially take on for a value.
Please leverage the switch function. It makes it so much easier to define all of those different all that different case logic, for the different values that that particular variable can take on and it it looks much cleaner it's just much easier to handle so and that's a phenomenal recommendation as well because that is something that I do see time and time again way too often are these long lengthy nested, if else statements.
[00:33:48] Eric Nantz:
Yeah. I I, there's a lot of legacy projects where I fell into the if else else else if trap. And, yeah, I definitely need to refactor that the switch sometime. You know? It it it you know, I I was thinking as I'm I'm reading through all these tips. You know? There are some analogies you can make, especially as you're doing maybe some of these things you're just not as comfortable with because you didn't know about them in the first place. Like I like, what you're talking about, the early returns, the switch, and and a different naming convention. Honestly, I think it's gonna it's gonna be a little hard at first, especially if you're, you know, you have old habits like I do. And I I see my old code bases from 4 or 5 years ago. I'm like, oh my goodness. What was I thinking? Well, this is similar to, frankly, keeping healthy from, like, a fitness standpoint. You may it may seem uncomfortable at first, but you build up. You build up. You build up. And then suddenly, the the next time you make that new shiny app, that new R package, even just that new set of functions you're gonna pass off to that colleague, these will be front and center. It won't be the old habits anymore.
Of course, easier said than done. Right? You gotta start somewhere. This is fresh in my mind because I'm refactoring a 7 year old package as I speak. And boy, oh, boy, were there some issues there, which is a great segue into kinda how this post concludes where, you know, occasionally, if you do have the time, and I realize time is hard to come by with a lot of our jobs these days, but taking a little bit of time to do what they call spring cleaning of your code, seeing what are some gaps that you can solve with the knowledge you've gained from hopefully reading our weekly and listening to this podcast or other ways of of learning about codevelopment.
And there's a link also in the blog post to about the how the tidyverse team is doing spring cleaning. So that might be some inspiration as well. And then, of course, take advantage of automation when you can. There is the lint r package. It's just gonna help you with things like the spacing issues and syntax issues that, again, can automatically point where these are so you don't have to manually scan it. This is fresh in my mind too because I was helping, do a little, new feature to one of the internal companies packages that my esteemed teammate, Will Landau, maintains. And he built in winter checks in the GitHub action, and I forgot to run that locally. And I was like, oh goodness. I messed up stuff up.
But it it pointed it to me. Then I ran it locally, got it fixed. Now they get that fancy green check mark in the in the PR check. So lots, lots of great great tips here for sure.
[00:36:28] Mike Thomas:
Yes. Absolutely. And I think if you're a manager, especially of data analysts or data scientists, try to build in. I know it's hard, but but try to build in time, at least once during the year to to take a day or or a couple days or a week even to to go through your repositories and take a look at that code and and see what you can do. I think they're calling it referring to it as spring cleaning here to to maybe improve that code styling or develop some refactoring to keep that code up to date, and keep it as maintainable as possible. As a as a a quick story, I have a former employer who, won't work with me on a on a project because of some code that I wrote 6 years ago that broke, I guess, internally recently. So they they they think that I'm a pretty terrible R programmer, because the code that I wrote 6 years ago is no no longer working there even though I've I've offered to help. So this is I'm not gonna name name any names or anything like that but, I I would just say don't don't be that person. Understand that that software needs to be maintained and managed and improved over time and, you know, don't judge somebody on the code that they wrote, you know, even a couple years ago because we're all consistently learning, improving, and and I don't even look like looking at the code that I, you know, wrote a couple years ago. So we're, but let's lift each other up here.
[00:37:56] Eric Nantz:
Exactly. All positive vibes. Yeah. We don't we don't need that. We could do a whole another hour podcast on that kinda issue. Trust me on that. But, you spotted someone else in here to post in this post, Mike, because I there's there's an opportunity for a a real, nice quote to live by here. Right? Oh, there is a quote that I absolutely love. I we need to get t shirts made up of this, Eric. I might get this tattooed on myself.
[00:38:19] Mike Thomas:
But, the the the line in here is the code you don't write has no bug. Unbelievable.
[00:38:26] Eric Nantz:
That oh, my goodness. Yes. We we need a shirt, whoever's listening out there. Yeah. Please please make this. We will take our money after you print it. We will buy it. That is, I love that line and it just speaks to so many aspects of my development life. Yeah. So there there's there's a boatload of additional resources that they link to at the end of this post. And, also, I have a link to the, one of the inspirations that it's supposed to begin with is that rOpenSci recently had their their second cohort of champions, onboarded and they ran some virtual workshops and some of those materials are online with respect to package development. So I'll have a link to that in the show notes too. And that particular external resource, wowed me for another reason, is that they also use the same Hugo theme that I did for an internal documentation site at the company about our HPC system. System. I was like, hey, I know this theme. That was awesome. So it's great when I feel like I'm I'm thinking similar to all these people I look up to in the community. That was just that was awesome stuff. Oh, that is awesome, Eric. You know what I think we should put on the back of that t shirt? I should, I think, you know, the front could say the code you don't write has no bug and the back could say, the code that an LLM writes for you probably does have a bug.
Bingo. We need we need a patent soon or or well, somebody's gonna take that run with it. Oh, goodness. Just kidding. Yeah. You know. You know. You know how it goes. But, we also know how it goes is that, yeah, the rest of the issue has a set of fantastic blog posts, new packages, updated packages, calls to action, you know, call to events, and everything else that you can find every single week at Our Weekly. So we're gonna take a couple minutes, tell us some additional finds that came our way that we we wanted to highlight here. And, of course, me being an audio video kind of, you know, junkie, so to speak, with doing this podcast and other media ventures.
This post here really hit home. There was a recent post, that I saw a Mastodon from Matt Crump about how he was exploring importing MIDI audio data into R for his cognition experiments. Well, he ended up using a mix of command line calls, the FFmpeg, which is kinda like the Swiss army knife, so to speak, of media, conversions and then another utility called fluid synth and some Python code, but, using a lot of shell commands. Well, your own ooms who, of course, is heavily involved with the infrastructure behind our OpenSci and the our universe project, decided to take matters in his own hands and decided to create a package called fluid synth to help wrap some of these system utilities for bringing in and parsing MIDI data. So if you ever find yourself having to analyze these and maybe use them in a data driven way and then also rendering that to an audio file, yeah, your Roam's package got you covered. So I have to add that to my toolbox amongst many other great utilities in the audio visual space in the art community.
[00:41:41] Mike Thomas:
That's a that's a super niche little, package there. I like it. I wanted to highlight a webinar series that's actually been going on through the our consortium, our insurance series. I believe it's hosted by, 2 folks at Swiss Re, which is an insurance company. Georgios Bacalukas and Benedicte Chamberge. And they it this video series looks fantastic. Eric, I'm just gonna walk you through the titles of the first few videos here. The first one is from Excel to programming in R, great content applicable everywhere. From programming in R to putting R into production.
Now, I know I'm getting you more excited. Oh, yeah. Our performance culture, and lastly, high performance programming in our so these are the 4 webinars that are now available through the our consortium's website. I'm not sure if they're going to continue to have more webinars or not. But if you are in the insurance space or if you're, into actuarial science, I would highly recommend checking out these webinars.
[00:42:45] Eric Nantz:
Yeah. What a what a excellent, you know, set of resources here. And I love the fact that they're being shared with others because I know that, you know, r is making big headways in the world of insurance and the world of finance and everything else in between. And, of course, I'm in life sciences, but it's great to see these tailored to that audience but with concepts that are most definitely universal to anybody in our respective industries because you gotta start somewhere. Right? More often than not, Excel is that window to data analysis that people use routinely and then be able to take that programming based approach with our boat tailored to that kind of audience going all the way to writing highly performing code.
Yeah. That's something that I am doing, trying every single day, and I can't pretend that I know everything about. So I'll definitely have to check these out. Looks like even they got wind of a certain project called Parquet. So that's really speaking to our eyes on this.
[00:43:44] Mike Thomas:
No. That that was my journey from Excel Excel guru,
[00:43:49] Eric Nantz:
into to R and changed my life. There's a lot more in this issue. Of course, we're gonna we're gonna have to start to wrap things up here. But, if you wanna get in touch with us, if you wanna help with the Rweekly project itself, that's always something we welcome. Whether it's your poll request, contributions, or suggestions, we're all just a poll request away to the upcoming issue draft, all linked at rweekly.org. That's where you'll find everything. I I have, inkling that the next curator could definitely use a bit of help if you get my drift. So yeah. Please send those requests to the project way.
And also, you can get in touch with us directly. A few ways to do that. We have in this episode's show notes a handy link to the contact page if you want to send us feedback there. You can also if you're on the, podcast 2.0 train with your modern podcast app, there's a boatload to choose from out there at podcastapps.com. You could send us a fun little boost along the way to give us a little message directly from within your app itself. Details on setting that up are also in the show notes. But, also, we are on the various social media spheres from time to time. I'm mostly on Mastodon these days with at our podcast, at podcast index dot social. I will admit I'm a little late, replying back to Bruno's been checking in with me on my next journey. I I have some follow-up with you. It's coming soon. Trust me.
But, also, I am sporadically on the weapon x thing with at the r cast. And lastly, on LinkedIn from time to time popping in with some announcements and episode posts. But, Mike, where can the listeners get a hold of you?
[00:45:27] Mike Thomas:
Sure. You can find me on mastodon@[email protected], or you can check out what I'm up to on Catchbrook Analytics, k e t c h b r o o k.
[00:45:41] Eric Nantz:
Awesome stuff, my friend. And, yeah. We we had a we had a heck of a a kind of a therapeutic preshow session. You all didn't get to hear it. But, Mike, listened to my GitHub action rant that may be becoming a rep for x in the very near future so that I can talk about it here later. But in any event, I'm gonna get back to the old day job here. So we're gonna close out this, episode of our weekly highlights, and we'll be back with another episode next week.