In this episode of R Weekly Highlights we hear from industry experts on how they choose a programming language for their projects, a big boost to the use of copilot for building your next Shiny app, and the learning journey of a new R user.
Episode Links
Episode Links
- This week's curator: Sam Parmar - @[email protected] (Mastodon) & @parmsam_ (X/Twitter)
- Which programming language should I use? A guide for early-career researchers
- Reliable Shiny Code with Copilot and Posit’s VS Code Extension
- My Journey Learning R as a Humanities Undergrad
- Entire issue available at rweekly.org/2025-W18
- R for Data Science 2nd Edition https://r4ds.hadley.nz/
- Data Science Learning Community https://dslc.io/
- IssueTrackeR https://tanguybarthelemy.github.io/IssueTrackeR
- 3MW (Scalable reporting with Quarto) https://3mw.albert-rapp.de/p/scalable-reporting-with-quarto
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)
- Mike Thomas: @[email protected] (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter)
- Secrets Abound - Final Fantasy - Midgarian Sky - https://ocremix.org/remix/OCR02452
- Voodoo, Roots 'n Grog - The Secret of Monkey Island - Alex Jones, Diggi Dis - https://ocremix.org/remix/OCR02180
[00:00:03]
Eric Nantz:
Hello, friends. We are back at episode 203 of the Our Weekly Howards podcast. We are a little bit later, than usual because, yeah, real life happened for both of us in different ways. Poor Mike here was the victim of my diatribe in the preshow that we may or may not retread here. We probably won't. But we're happy to be back this week with covering the latest highlights that have been shared in this week's our weekly issue. My name is Eric Nance, and, again, I'm so happy you've joined us wherever you are around the world. It is, already month of May. My goodness. Almost halfway through the year. It does not seem real, but it is a happier time of year. I always feel like once we get out of the February doldrums that things start to pick up a little bit amidst all the chaos that can occur. But as always, I'm joined by my awesome cohost, Mike Thomas. Mike, how are you doing today?
Doing well, Eric. Yeah. A couple extra days for us this week before recording has helped me fully charge my batteries at least. Oh, be thankful that you didn't have any failures on that. So Mike here is referring to a recent car mishap I had the overnight, which, never an ideal time for those things. So, yeah, folks, if you do drive cars regularly, check those battery levels. Sometimes things go haywire, man. Thank goodness for warranties. Okay. The good news is I don't need a car to do this show. I am right here in the humble confines of my, recording environment here, and we get to come talk about some really fun stuff in this week's Our Weekly Issue. And as always, if you're new to the process, we always have a curator that takes the issue every week. We rotate among our team of curators.
And this week, our curator is Sam Parmer. He also did a terrific job as always, and he also had tremendous help from our fellow Aruki team members and contributors like all of you around the world with your poll requests and other great suggestions. So we lead off with a very typical kinda tale in terms of building solutions, and you might be facing kind of a fork in the road, wherever you're new to data science, new to software development, often trying to figure out what is the best tool for the job. And sometimes that tool is, in essence, the programming language itself. And our first highlight here is actually a recent new article from Nature, the Nature journal, highly regarded in the world of science, authored by Jeffrey Pirkle.
And this is a very, quick, you know, very short article, but in a good way. It's very concise. A lot of interesting feedback from various esteemed colleagues in different industries on what they would say are key questions and answers to key questions for those that are new to the world of software development and and coding in general, and how they can make the appropriate choice for what language they might use for their given task. So the article starts off with, you know, just what is programming in a nutshell. And guess what? Like many things in this article, it'll kinda depend on what perspective you're bringing here. There are a lot of programming is at a lower level.
Think of trying to build a solution that needs to be highly performant, cross, operating system compliant, and like I said, you know, very, you know, fit for performance. You're probably looking at one of the compiled languages where you write code. There is a compiler in the back end to translate that into the machine readable state that your computer needs to crunch through that processing. In the role of r, there are many packages that do include compiled code from c plus plus. We are not and, of course, historically, languages like Fortran, which I saw in grad school, but also more recently, the Rust language has come, you know, in a lot of focus as well.
So when you're in that kind of stack of software development, these compiled languages are often what's turned to first to optimize these algorithms or these lower level utilities. But with that said, if you're coming in from data science, you're often used to having the you often need the ability to explore your data interactively, look at what kind of variables you have, do some interactive plots. And that's where you have the interpreted type of languages, where you're writing code, and then there is a process that's often to the side of your code developer window where you can send that code and get feedback right away.
And here you're looking at, of course, the R language as well as Python or MATLAB as various options depending on where you go for your scientific needs. And more recently, there are frameworks like Quarto that let you put multiple languages in one place if you wanna hop between R, Python, JavaScript, Julia in terms of running that report or that data analysis. So with that said, there's also, obviously, the world of web interfaces where, of course, Mike and I are huge fans of Shiny, available both R and Python. There's other ways that you might wanna look at the language of choice in that domain. But then we get to, you know, the other meat of the article about these various, industry experts and how they are choosing the language they're using and what is leading to those decisions.
So we have, from, Eduardo Secrete. Hopefully, I'm saying that right. He is doing a lot of applied statistics research in The Netherlands, often looks at multiple languages, and looking at the ecosystem around them, like how many packages are available. Are they in his specific domain of psychometrics? And that's where things like MATLAB have been very appropriate for him. But then others in the communities such as Yanina Balinese Sabini, who you've heard quite a bit in the world of rOpenSci as their community manager along with other initiatives.
She is a big fan of r, because often everything she needs to do in r, there's a package for that. She jumps in the article. The only thing r doesn't do is make her breakfast or coffee in the morning. I'm sure there's gonna be a package for that someday. You know? What what else is new? That that will be a big hit whenever that hits. And so that's that's great advice too. Just what do you need to do, and does the ecosystem support it? Another key aspect is the type of data you're looking at, especially in the world of genomics where not only is the data highly specialized, but the volume of it as well. It has been, you know, known for a while that within the our our ecosystem, the Bioconductor suite of packages is very important in the world of bioinformatics research.
And that's where having that available to you to import, say, those gene expression data files and to be able to have custom classes that are tailored to that type of data is really, really important. And and other situations is, again, the size of data. A bioinformatician named Titus Brown comments in this article that a lot of the hardships that can occur is when somebody's new to r. It's working fine for smaller data, But when they get to analysis that involves thousands of genomes and other genetic data, you might have to look at other languages that have packages ready to deal with large data. Your references, Python might have a broader way, array of tools to do that. I think R has come a long way in the world of big data too. You just gotta know where to look. Sometimes it's not obvious to a new user, but being able to manage data in a file format, we're big fans of things like DuckDV.
Hopefully, things like that start to take a lot of foothold in the world of bioinformatics. And then lastly, another key consideration is when you're encountering issues, where can you find help? That's where, again, Yan Yanina has a great comment here about the R language having a welcoming community of everyone to support each other as well as extensive documentation both with the core, our language, as well as the packages that are available in many of these, industry specific domains. And with the advent of GitHub, many people are sharing packages in the open, So you have an open dialogue to file an issue. If you find an issue with a package, both R and Python are also, you know, very much not strangers to that area. And, you know, in the advent of AI as well, you know, that's gonna be another consideration as a lot of newer students are are newer to the language. They're probably leveraging a large language model to help with some of that. And at least we are seeing that, you know, with the right models, you can get some help with R, for Python coding.
But, you know, we could go on for hours about, you know, how how to best use AI responsibly. But I think in a pinch, I can definitely help you out, especially for some esoteric needs. So really great perspective from multiple industry experts in this field. And is there a is there a winner in all this? Well, no. We're not gonna pick a winner in this. This is about in your specific domain, there may be an ecosystem around a language that's already ready for you, to take advantage of to get your job done and to really start where you see that that most, you know, wide usage and, you know, a great community around it. So, of course, we're big fans of R here. It literally does almost everything I need. Of course, I am venturing into other niche, you know, languages or niche frameworks like JavaScript when needed. But R is still the engine that performs all my all my, analysis needs at the moment, but it's great to see if I was new to the game. Just what kind of questions should I be asking if I'm looking to make a optimal decision?
[00:10:13] Mike Thomas:
Yeah. I really enjoyed it. Yeah. It kicks off with a discussion about how Python has sort of overtaken JavaScript, from, I think, some GitHub research that was done. And I'm sure that's in large part to everything going on with AI, but that's that's a pretty big deal. And I I think there are probably also a lot of data scientists and data analysts out there that may not consciously always realize that they're developing software. Right? Sometimes if we're just playing around in R, it can can feel like magic or or power Excel if you came from that world. Right? And I really like Jeffrey's articulation of compiled versus non compiled languages. That was really helpful for my own understanding, and a great articulation of sort of the long standing debate over the usefulness of notebooks and struggles that they have with reproducibility.
And the shout out to the Marimo project, which seems to be a great option, it sort of creates a dependency graph in the background from what I understand. I haven't tried it myself yet. And when you change a value in one cell, the dependent cells will be updated accordingly. Kind of feels like or sounds like targets to me. I haven't tried it yet. We use quarto, you know, because it's easy to switch between R and Python within the same notebook tool. But if I was more exclusively on the Python side, I think it would be, my tool of choice, it sounds like. And I also think sort of that big data discussion is pretty much been squashed, right, with the likes of DuckDB and that the parquet format and now we have APIs from whatever tool we choose to be able to successfully leverage those technologies from R or Python.
And Jeffrey, you know, mentions and rounds out the article, a couple different resources, including the Carpentries and the Data Science Learning Community. Those are are two of my favorite resources and maybe the last thing that will touch on is I agree with him that I there's a lot of benefit in my opinion in choosing the tool that your colleagues are using and Eric I think you'd share the sentiment as well but I also think there's value in really going deep in one particular tool and learning it really well one particular programming language I mean and I think then it will become easier to adopt the next tool as opposed to trying to learn two simultaneously.
You know, I did this with R going really deep into R over over multiple years before I really started to try to pick up Python, and I truly believe it helped me pick up Python a lot quicker than I would have otherwise. You know, though I'm I'm no expert, I can get around pretty well in Python these days. And a lot of that stems from, you know, Googling Stack Overflow, ChatGPT, whatever you wanna call it, saying, hey, I do this thing in R. How can I do this in Python? Right? And knowing those keywords to ask are really what unlocks you to be able to get that answer, to be able to to, you know, incorporate that functionality in that lesser comfortable programming language that you're trying to to leverage and use. So I think that that's been a really good strategy for me, and hopefully, it's helpful for others. But this is a really interesting, blog post. I thought it was sort of unique to our our weekly highlights, this type of a discussion, and really enjoyed it.
[00:13:23] Eric Nantz:
Yeah. Me as well. And one thing that really helps as you're, you know, trying to go deep in in these languages, it's always helpful when they even though they are they can have fundamental differences and some bits and pieces of it. But I I I do share your your your, experience there that if you if if I had started with an of a language that was, you know, deep rooted in, say, object oriented principles and other, you know, more traditional frameworks that say R and Python utilize versus starting with SAS. It was really hard to translate what I've learned with SAS being, quote, unquote, my I guess, tangling my second language. I don't even count Java because it was a nightmare back then.
But going from SAS to R was a massive jump to say the least because they were so so different. So I've I've I think we're seeing without me getting on my soapbox about the whole SAS stuff, We are seeing that whatever our Python, that's becoming what most people in such in data science are getting introduced to when they get to their respective coding classes. So I think the principles you learn there will set you up for success success even when you do have to venture off into some of the more niche side of programming. Obviously, we didn't hear about Julia much in this article, but I know that's getting a lot of momentum as well. But, again, with it striking the balance of the open source paradigm, the object oriented paradigm, I think with the resources out there, the key is seeing the community around it, knowing how to ask the questions that you have, hopefully, having a mentor along the way or user group you can talk to.
Again, we're biased in the our ecosystem. We got the great data science learning community. Wonderful place to join if you're new to the language. You have so many people ready to help you out with the book clubs and other adventures. That's where you have to go kind of outside the confines of just what the language document documentation has to to offer to you. But the time is now to take advantage of those resources. And up next next in our highlights, we did talk about some of the newer ways you can get help for information. And one of the ways that came from a code helper perspective a few years ago is when Microsoft introduced the Copilot, functionality in Visual Studio Code and and in GitHub in generally.
And what can be nice is, you know, I'm using that to help develop, you know, maybe that snippet that you need for that function you're trying to put in your in in your Shiny app, for example. When first co when, GitHub Copilot came out in the very early days, and, admittedly, I was kind of intrigued by it because I never really used a code completion thing before. I did try it out. It left a lot to be desired in the world of shiny development, and I was kinda turned off after that. Now this was about two years ago. I knew things were gonna get better. I just didn't wanna wait around that long, and I kinda gave up on it. Well, we have learned since then. There are some interesting advancements not just in Copilot in general, but in the way that you can develop a shiny app with Copilot.
And so our our next, highlight comes from Peter Stryanko who is, one of the leading, AI engineers and thought leaders at Absalon. And first he had given a wonderful talk at the recent shiny conf, which I invite you to check out the replays if you wanna look at that after the fact. But his article in the Absalon post here is talking about now building more reliable Shiny code with GitHub Copilot, but also a new extension in the Versus code ecosystem from POSIT. So what are we talking about here? Well, actually getting back to that shiny comp that just concluded a week or two weeks ago, there was a keynote by Winston Chang that in about midway through had introduced some recent work that the positive team has been doing by introducing a new tag in Copilot as an extension in Versus code, specifically the shiny tag.
So what does this actually mean in practice? Well, let's say you install the the shiny Visual Studio Code extension, which again is available in the in the Versus Code extension marketplace. And if you have Copilot already wired up, you just fire up a, an interface for that chat in Versus Code. And then when you're ready to ask it a Shiny specific question, you do the at Shiny tag before you put in your request. This was not around when I first started Copilot. There was no shiny tag or even our tag for that matter. So I'm kinda trying to narrate what I want, and I would just get a whole bunch of junk out. But in this example here, he has, where he's supposed to add shiny tag and he has create a simple app with a drop down menu to select a variable from the m t car set and then a plot showing that variable's distribution.
Now that add shiny tag is doing a lot under the hood. What it's doing is it's basically behind the scenes, really injecting the prompt that is often sent to these AIs with much more additional context around shiny itself. Whereas, if you don't have this, that shiny tag, it's just gonna kinda go through its typical resources that the model's been trained on to do this. So he's got an interesting before and after, situation here where he tries asking this question without the shiny tag versus with it. And there is a stark a stark difference where it is a the without the shiny tag, it looks like a gobbledygook of API, custom API, JSON files, CSS files, YAML files, and other weird stuff under the hood. It ain't shiny, folks. I mean, we can tell that much.
Where, if you look at the example with the shiny prompt, you get at the end, after some questions that the prompt gives back to the user, in this case of which language, r or Python? Chooses r. And then, okay, would you like the where would you like the app to be put? And makes a subdirectory for it. And then sure enough, becomes a simple Shiny app already using bslib under the hood. So that's great. You know, taking advantage of that. And it you know, it's a streamlined app, but it it it got the job done, apparently. So if you are developing Shiny and you're leveraging the Copilot, you know, extension, you owe it to yourself to try this out because I think you're gonna get a much better results in this than if you just don't have the the shiny tag to help you out with the prompt behind the scenes. Now, again, I have not tried it with the new shiny tag. It's on my list to do.
I wanna make a note for those who may be wondering, wait a minute, Eric. Why didn't you mention Positron? This is not available in Positron yet. I know that's in the works, so we may not see that until later this year. But if you're on Versus Code already, yeah, maybe give it a shot and see if they can bootstrap an app for you in a seamless way. So,
[00:21:04] Mike Thomas:
again, caveats abound, but, hey, there's progress to be made here. Yeah. Eric, I really appreciate this work by the Posite team. You know, I'm sure it's no small effort to convert all of the shiny documentation to markdown that can be injected into the user prompt. I'm assuming that's sort of the approach that they're taking, and it sounds like they did so on both the R Shiny side and the Shiny for Python side. So I'd be really interested to learn about how they went about doing that, I'm assuming in a some sort of a programmatic approach so that when, you know, that R Shiny package gets updated or when the Shiny for Python package gets updated, you know, that shiny tag in the Versus Code extension will reflect those updates for the sort of current best practices for those two packages.
And this prompt stuffing approach is one that we use very often, to try to pretend that the LLM was trained on a specific set of documentation or or context that we wanted to know about. With this approach, you're definitely going to get way better results than you would without doing any of this prompt stuffing. But behind the scenes, my understanding is that the LLM is sort of using a combination of the prompt that you provided, that context, as well as what it was trained on and and hopefully, weighting much more heavily, your prompt than the context that it was trained on. But I think you're still running probably a a non zero chance of it hallucinating on a particular question that you're going to ask. So like any of these AI solutions, don't take its output as pure gospel. Make sure that you, have a some sort of a workflow and approach where you're ensuring that either you have the the documentation for Shiny pulled up on the other screen and you're just, you know, leveraging.
You're you're using that as a gut check and leveraging the the LLM to try to get you to your final result faster as sort of a co pairing, a pair programming guide, which I think is a fantastic way to do it. But, yeah, I think this is, as I mentioned before, going to get you way better results than probably what you've been trying to do in terms of asking ChatGPT to write your Shiny apps for you.
[00:23:17] Eric Nantz:
I actually have, I'm I'm intrigued to try this out for a a real case, I would say, where I'm about, you know, without reviewing too much here, I'm about to get an influx of requests from different statisticians at my company to help build some shiny apps that may vary in the level of complexity. They may start small, they may end up getting bigger and whatnot. That's beside the point. I've been asked to, you know, at least look at sketching something out to give a demo that may or may not, you know, get the project get green lit for more robust development. I've been contemplating whether I try something like this to boost wrap the initial version of this. Now putting this, your thoughts on this, Mike, do you think I should, if I was gonna do this, ask it to start building an app that kinda follows what typically you and I prefer for our app structure, I e a golem powered app as a package?
Or do you think that might just be a bit too much for it to do right away? Maybe I should stick with, you know, a more traditional app layout. And then down the road, I convert it to a goal on that. Do you think an LOM could handle something like that?
[00:24:27] Mike Thomas:
I don't know. It's a good question, you know, and I think in a if you could, in a similar way to how Posit has converted the documentation or leverage the documentation for, Shiny for Python and and R Shiny, maybe you could also stuff the context for the documentation for Golem or b s lib. Right? I would imagine, into that prompt as well. These context windows are getting bigger and bigger. So I think it's it's worth a try, but if you were to to sort of do it do that approach without providing it any context around Gollum, I'd be interested to I wouldn't have a lot of confidence that it's gonna give you great results.
[00:25:10] Eric Nantz:
Yeah. I I've learned that, you know, when I'm building these initial prompts, I'm a very detailed oriented person with these things, whether it was back in the old days. If I had, like, when I say old, this is only, like, ten years ago. If I had a colleague that was instructed to help me with programming support for, like, this custom biomarker analysis, and I would give he he or she the very detailed specs of here's the input data. Here's the layout I'm looking for. Just code this up, and then let me review when you're ready. A bio would always leave no stone unturned with, like, type of variables to look at, the type of output, you know, these considerations, any derivations that they need to be careful on. I typically use that same approach for prompts, but I always wonder, am I being too detailed about it? So far, I haven't been, but I think, like you said, the context is really important. I can't expect in only a few sentences that this thing is gonna know what to do because what human would be able to do that either? So another perspective to keep in mind, I guess. Yeah. No. I envision a world I think everybody does
[00:26:12] Mike Thomas:
where someday instead of using this approach to provide, you know, additional context to the LLM about what you wanted to know to be able to have a better approach to fine tuning, I guess they call it, these LLM models so that really it only has the knowledge, you know, of the context that you want it to have.
[00:26:34] Eric Nantz:
Yep. And I've been seeing bits and pieces in in these areas. So you often see, you know, these in in within industries, some of these specific, you know, bots or models that are being shared in Hugging Face or other areas. And, yeah, I'm really intrigued to see see where this ecosystem goes. And, yeah, I'll give this a play. And in the end, would I ever just sign off with some about reviewing it first? Oh, heck no. No. I'm I'm not gonna if any of my colleagues are listening to this, don't worry. I'm not gonna I'm not gonna throw something over the fence if I don't vet it first. So take that to heart, folks.
[00:27:09] Mike Thomas:
Well, if at Catchbook, we develop an AI agent for Shiny apps specifically, I promise we'll name it Eric.
[00:27:16] Eric Nantz:
That's it, man. Game over, man. It's game over. Well, in in about as soon as you hear our last last highlight is kind of going back that we talked about at the very beginning for those that are maybe new to programming and informing what kind of choice they make for their language or programming language they're gonna use to accomplish a certain task. Well, let's say you've already made that choice of you're gonna use r to accomplish that task, and maybe you don't even know what that task is. Yeah. You're just learning from the ground up here. Our last highlight here is an interesting perspective on what's been helpful for a recent, undergraduate to learn our in the humanities, area.
So this, blog post comes to us from Bruno Pone or or Pone. He is a now a data analytics consultant at the data school in Deutschland. He is talking about what some of the things that he has encountered with applying, you know, data science type principles to history and the humanities. So his first encounters with r itself, was when he was in his master's of studies at the Hertay School in Berlin in their statistics department. And there were a couple mandatory courses, and one of them talked about some of the statistical concepts that he became, you know, interested in. Things like validity, selection bias, principles like regression to the mean. They kind of captured his interest and noticed that, hey, these could be applied to multiple industries, not just, like, statistics as a whole.
So within within that kinda area, he encountered his first R programming assignments, and it felt a little frustrating to him. It was just totally new to him. Lots of function syntax that got a little frustrating, debugging errors, and was, you know, trying to figure out how to best proceed here. This is where, going back to what I said in in the earlier part of the show, knowing where to get help and knowing any communities around that language can be extremely helpful. In in his case here, Bruno, discovered that there was, as we've heard in the community for years, access through his, master's, program to a platform called DataCamp.
And for those aren't aware, DataCamp is this is definitely not free advertising. I'm just saying what they do. They just offer focused courses with video content and in browser exercises all within their platform. You don't have to install anything on their system. Now Eric's editorial comment here. You may wanna think about different sources in that, but we'll leave it at there. But with that, there are plenty of services like DataCamp out there. If you're interested in some more interactive content, that can be a great place to jump start your education. So like I said, there's a lot out there. Definitely, let us know if you're interested in what those are. We're happy to send you links about those other services.
But once you get through that up, you know, that little hump, if you will, in your development, now he gets to actually use some of the things he's been learning in this domain that he's looking at in policy analysis and really starting to leverage in those statistical concepts that you mentioned earlier within our deduce visualizations, maybe do some ad hoc simulations to illustrate the the impact of certain concepts. And then, of course, like all of us, we we became the victim of the pandemic, which changed the world in a lot of ways where we often were, you know, confined to our various homes or or other locations. And because of that, apparently, because of budget considerations, his institution, stopped providing those, Datacamp access.
And that's where then he turned the platforms like Stack Overflow to help get questions and answers. And then also started reading more of the official documentation for functions and packages. As we know in the R ecosystem, there is varying levels of documentation for packages. Most of the time, the ones that are robustly developed will have great documentation they can draw upon, but there's more to that. He also turns to these great books that are specific to R and different ecosystems around R. He mentions O'Reilly Media, which, of course, has been a great publisher of various books like R for Data Science and the like. There are many others out there like CRC Press and others.
He mentions the R for Data Science book. We literally, at the day job, had an open office hour and some statistician was calling us and said, hey. I'm just picking up R again after many years. Where can I go to learn more? And we both my colleague, Will Landau, and I pointed out r for data science is a great place to start. So we'll have a link to that in the show notes. It's also was the genesis of the data science learning community. So lots of great resources out there if you're new to the language to make you feel not so alone and and learning about all this.
And touching back on the last highlight. Yes. In this day and age, AI, when used in, you know, responsibly, can be another aid in that journey. But this is where we do with my 2¢. I do stress getting a pretty solid foundation in your understanding first before you start vetting the AI solutions because depending on which model you're running, depending on what kind of question you're asking through that prompt, you might get a solution that would have been great, say, five years ago, but maybe is missing some of the more modern approaches that maybe the tidyverse is giving you.
Or let's just be real. A real example might be, hey. I'm dealing with this large data. How do I deal with memory management? You know, it may not pick up some of the newer advances like DuckDV or Parquet that Mike mentioned just a few minutes ago. So just be careful when you're using those AI prompts to help you learn something that you do build that foundation first through these more, traditional resources that, again, if you're new to the language, I think can get you really far ahead of the curve, as well as you just reviewing other people's code. Let's face it. In the role of the shiny, I'm always reviewing what people like David Grange and, Mike, when you share an app online in the public, I'm always reviewing your stuff. Many others that help inform my style, my learning. So I can't stress enough leveraging GitHub, looking at repos of packages or apps or even just tidy Tuesday analysis if someone shares their code. That's a great win and of itself. So in this day and age, not to be like that guy that says, good off my lawn, but we didn't have this stuff when I was learning r. So taking advantage of these resources by hearing Bruno's perspective is certainly insightful and how he started from being both new to r and new to statistics into really loving the use of that language and getting some real interesting, analysis completed. So, again, great food for thought here, and I think a very relatable blog post for many people that are listening now.
[00:34:56] Mike Thomas:
Yeah. I agree. And, you know, this blog post takes me back to some of my own journey learning are and some of the things that I wish I had known and some of the things that, did help me or or hurt me along the way. I really like the the key takeaway around, you know, visualizations being a really great way to see data come to life and to put meaning right towards the code that that you're developing and actually sort of see the value firsthand there right in front of you, as well as, you know, having a goal or a project that you're trying to work on and and using R as a tool to get you to that that end goal as opposed to just sort of trying to blanket learn a programming language without any, you know, specific particular use case that you're trying to tackle.
This might sound silly, but one other thing that helped me when I was learning data science, maybe less, you know, specific to programming, but data science in general, podcasts. So I was a huge consumer of podcasts. So if you're trying to learn r, maybe r highlights podcast or just our weekly.org, shameless, shameless plug right now, could be a great resource to be able to help you, you know, just keep up with the conversations of everything that's going on and the the verbiage and, you know, some of the acronyms that get used that you may see online in, you know, some of your your research as you're trying to learn this that that may help some of these concepts start to click a little bit better. But overall, you know, I think a really great rundown. This this was sort of nostalgic for me, taking me back to my journey as well, but some great fantastic resources for folks who are trying to pick up R.
[00:36:39] Eric Nantz:
You give me the feels, man. That's why I started the R podcast back in the day because, a, no one else is doing it. And, b, I had learned so much about Linux through podcast. I thought, why is anybody doing this in data science and R in particular? So I do remember in the early days, there were some listeners out there who said, oh my goodness. This is such a massive help compared to just reading that online doc. So, yeah, I I think we can we can plug what we're doing here. I have heard people say it's been quite helpful because in this day and age, so many ways to consume content. So when you're doing the dishes or you're mowing that lawn, you wanna level up your r game, tune in to the back catalog of our weekly. We got a lot of things that we cover both in our experiences, but also more important highlights like this that share some some great insights too. And like I said, the online resources in general, so much out there.
Sometimes it can be overwhelming at first. That's why, again, I'm gonna plug it one more time. But John Harmon, who runs the data science learning community, top notch place to go to. Again, we'll have a link in the in the show notes. That is a wonderful way to collaborate with others no matter where you are in that learning journey. So great great post here by Bruno, and and, certainly, I'll keep a lot of these things in mind as I hear from others that are also new to the language. And as we said, there is a lot going on in our weekly, not just when we talk about these, you know, selection of the highlights. So we invite you to check out the full issue, which, of course, we link to every time in the episode show notes. And I do wanna give a very quick shout out to an additional package that I found in our packages section that I think could have really it will be useful now. It would have been even more useful a few years ago, but there was a time in a day job situation where we had a project manager that wanted the status of all my issues on my GitHub repo for a big large scale project because she wanted to feed that information into her, I guess, Gantt chart visualization thing to track a milestone forecasting and all that jazz.
So I stitched together some custom code with the GitHub API and r using the g h package. We kind of pull all this stuff down, do some massaging. Well, there's a new package that does that for you that just hit CRAN, and it's called issue tracker, author by Tengi Bertamili. Probably butchered that name. But it does what it says what you might expect on the tin. It's got some great functions to basically retrieve a project's open or issue information, both the issues as well as milestones and other interesting metrics. You can and then you can save that to a local area so that when you wanna do processing on this data, you don't have to keep hitting the API to do it. You can have a cache version of this so you can refresh at any point.
So and then you can update this database when you need to if you know there have been new issues filed or whatnot. But once you have that, there are also some convenience functions to filter the issues based on fields, maybe certain values or keywords, as well as some default sorting you can do of the issues based on, again, maybe milestones, other metrics. And I would have used the heck out of this back then, and frankly, I may use it now because guess what? As much as I use GitHub project boards for my issue management, I live in a Jira shop, so sometimes I might have to pipe some stuff from GitHub to Jira, and maybe I just use this as an intermediary. I don't know. Mike's already scowling at that, so I'm sorry for bringing up bad vibes there. But, nonetheless, issue tracker is definitely a package I'm gonna be looking at. No. Sorry. I was just recently
[00:40:27] Mike Thomas:
added to an external team that uses Jira and sort of my first foray into using Jira, and I'm still still getting adjusted to it. Let's put it that way. But that's a great call out. You know, I'd be remiss not to to call out Albert Raps Scalable Reporting with Quarto blog post that just dropped this past week. It's a use case that I think a ton of people have. And as always, Albert does an awesome job of walking us through that concept.
[00:40:54] Eric Nantz:
Yeah. I love using Quarto in many different ways here. And the possibilities that you can do once you use the parameterized, you know, report functionality, There's just so much as possible. And also check out his, back catalog, if you will, on his site. He's been also talking about Quarrel recently with types for doing optimal PDF reports and theming those up. So if you're in the world of static reports, you have our sympathies if you are because, man, I love the HTML lifestyle. But if you are in that space, types is something I'm keeping an eye on for some really attractive PDF reports with that hints of CSS kind of style when you get, in web reports. So definitely check those out.
[00:41:38] Mike Thomas:
Yes. Types is awesome. Super, super fast compared to LaTeX. And, it's pretty much, I think, built into your install of Quarto. So if you have Quarto installed, then types is installed. No no tiny tech.
[00:41:52] Eric Nantz:
Yeah. Yeah. And that that can be really important, especially those in the enterprise that get a hard enough time asking their IT admins, can I have tiny tech or can I have this latex thing? You might get like, nah. No. So anything that comes bundled in can be can be a big help there. But like I said, our Wiki bundles up all sorts of great content here. We just mentioned a couple additional fines, but you may have an additional fine just by reading that issue. We'd love to hear about it too. And and speaking of hearing about it, of course, our project largely depends on the community for help. And that's where if you find that great new resource, we are just a poll request away. Everything's on GitHub.
It's all marked down all the time. Marked down if you can't learn in five minutes. The author and knitter would give you $5 back in the day. Maybe not now, but at least he told me that ten years ago. Nonetheless, you can file an issue or a poll request right from our GitHub page. It's linked in the top right corner of our our weekly issue, our weekly site. And if you wanna get in touch with us, there are multiple ways of doing that. We have a contact page in the episode show notes. If you wanna send us feedback in the traditional way, I will get that in our fancy our weekly inbox, and I will be able to share that on the show if you're interested. Also, you can get in touch with us on the social medias out there. I am on Blue Sky where I'm at [email protected].
And, also, I am on Mastodon with @rpodcastatpodcastindex.social. Oh, those are hard to keep straight sometimes. And I'm also on LinkedIn. You search my name and you'll find me there.
[00:43:27] Mike Thomas:
And, Mike, where can the listeners find you? You can find me on blue sky these days at mike dash thomas dot b s k y dot social, or you can find me on LinkedIn if you search catch broke analytics, k e t c h b r o o k. You can see what I'm up to.
[00:43:44] Eric Nantz:
You bet. Always, great follow the hair on LinkedIn and the like and, yeah. I'm trying to get in my head above water so this week. I have some recent day job projects. I got some more open source stuff I gotta get back into, like, some, quote unquote conference package development for my upcoming let's talk at Pawsit Conf. I gotta button some things up there. So, well, we're really happy that you joined us for this, episode 203 of our weekly highlights. And, hopefully, unless real life gets in the way, we'll be back with another edition of our weekly highlights next week.
Hello, friends. We are back at episode 203 of the Our Weekly Howards podcast. We are a little bit later, than usual because, yeah, real life happened for both of us in different ways. Poor Mike here was the victim of my diatribe in the preshow that we may or may not retread here. We probably won't. But we're happy to be back this week with covering the latest highlights that have been shared in this week's our weekly issue. My name is Eric Nance, and, again, I'm so happy you've joined us wherever you are around the world. It is, already month of May. My goodness. Almost halfway through the year. It does not seem real, but it is a happier time of year. I always feel like once we get out of the February doldrums that things start to pick up a little bit amidst all the chaos that can occur. But as always, I'm joined by my awesome cohost, Mike Thomas. Mike, how are you doing today?
Doing well, Eric. Yeah. A couple extra days for us this week before recording has helped me fully charge my batteries at least. Oh, be thankful that you didn't have any failures on that. So Mike here is referring to a recent car mishap I had the overnight, which, never an ideal time for those things. So, yeah, folks, if you do drive cars regularly, check those battery levels. Sometimes things go haywire, man. Thank goodness for warranties. Okay. The good news is I don't need a car to do this show. I am right here in the humble confines of my, recording environment here, and we get to come talk about some really fun stuff in this week's Our Weekly Issue. And as always, if you're new to the process, we always have a curator that takes the issue every week. We rotate among our team of curators.
And this week, our curator is Sam Parmer. He also did a terrific job as always, and he also had tremendous help from our fellow Aruki team members and contributors like all of you around the world with your poll requests and other great suggestions. So we lead off with a very typical kinda tale in terms of building solutions, and you might be facing kind of a fork in the road, wherever you're new to data science, new to software development, often trying to figure out what is the best tool for the job. And sometimes that tool is, in essence, the programming language itself. And our first highlight here is actually a recent new article from Nature, the Nature journal, highly regarded in the world of science, authored by Jeffrey Pirkle.
And this is a very, quick, you know, very short article, but in a good way. It's very concise. A lot of interesting feedback from various esteemed colleagues in different industries on what they would say are key questions and answers to key questions for those that are new to the world of software development and and coding in general, and how they can make the appropriate choice for what language they might use for their given task. So the article starts off with, you know, just what is programming in a nutshell. And guess what? Like many things in this article, it'll kinda depend on what perspective you're bringing here. There are a lot of programming is at a lower level.
Think of trying to build a solution that needs to be highly performant, cross, operating system compliant, and like I said, you know, very, you know, fit for performance. You're probably looking at one of the compiled languages where you write code. There is a compiler in the back end to translate that into the machine readable state that your computer needs to crunch through that processing. In the role of r, there are many packages that do include compiled code from c plus plus. We are not and, of course, historically, languages like Fortran, which I saw in grad school, but also more recently, the Rust language has come, you know, in a lot of focus as well.
So when you're in that kind of stack of software development, these compiled languages are often what's turned to first to optimize these algorithms or these lower level utilities. But with that said, if you're coming in from data science, you're often used to having the you often need the ability to explore your data interactively, look at what kind of variables you have, do some interactive plots. And that's where you have the interpreted type of languages, where you're writing code, and then there is a process that's often to the side of your code developer window where you can send that code and get feedback right away.
And here you're looking at, of course, the R language as well as Python or MATLAB as various options depending on where you go for your scientific needs. And more recently, there are frameworks like Quarto that let you put multiple languages in one place if you wanna hop between R, Python, JavaScript, Julia in terms of running that report or that data analysis. So with that said, there's also, obviously, the world of web interfaces where, of course, Mike and I are huge fans of Shiny, available both R and Python. There's other ways that you might wanna look at the language of choice in that domain. But then we get to, you know, the other meat of the article about these various, industry experts and how they are choosing the language they're using and what is leading to those decisions.
So we have, from, Eduardo Secrete. Hopefully, I'm saying that right. He is doing a lot of applied statistics research in The Netherlands, often looks at multiple languages, and looking at the ecosystem around them, like how many packages are available. Are they in his specific domain of psychometrics? And that's where things like MATLAB have been very appropriate for him. But then others in the communities such as Yanina Balinese Sabini, who you've heard quite a bit in the world of rOpenSci as their community manager along with other initiatives.
She is a big fan of r, because often everything she needs to do in r, there's a package for that. She jumps in the article. The only thing r doesn't do is make her breakfast or coffee in the morning. I'm sure there's gonna be a package for that someday. You know? What what else is new? That that will be a big hit whenever that hits. And so that's that's great advice too. Just what do you need to do, and does the ecosystem support it? Another key aspect is the type of data you're looking at, especially in the world of genomics where not only is the data highly specialized, but the volume of it as well. It has been, you know, known for a while that within the our our ecosystem, the Bioconductor suite of packages is very important in the world of bioinformatics research.
And that's where having that available to you to import, say, those gene expression data files and to be able to have custom classes that are tailored to that type of data is really, really important. And and other situations is, again, the size of data. A bioinformatician named Titus Brown comments in this article that a lot of the hardships that can occur is when somebody's new to r. It's working fine for smaller data, But when they get to analysis that involves thousands of genomes and other genetic data, you might have to look at other languages that have packages ready to deal with large data. Your references, Python might have a broader way, array of tools to do that. I think R has come a long way in the world of big data too. You just gotta know where to look. Sometimes it's not obvious to a new user, but being able to manage data in a file format, we're big fans of things like DuckDV.
Hopefully, things like that start to take a lot of foothold in the world of bioinformatics. And then lastly, another key consideration is when you're encountering issues, where can you find help? That's where, again, Yan Yanina has a great comment here about the R language having a welcoming community of everyone to support each other as well as extensive documentation both with the core, our language, as well as the packages that are available in many of these, industry specific domains. And with the advent of GitHub, many people are sharing packages in the open, So you have an open dialogue to file an issue. If you find an issue with a package, both R and Python are also, you know, very much not strangers to that area. And, you know, in the advent of AI as well, you know, that's gonna be another consideration as a lot of newer students are are newer to the language. They're probably leveraging a large language model to help with some of that. And at least we are seeing that, you know, with the right models, you can get some help with R, for Python coding.
But, you know, we could go on for hours about, you know, how how to best use AI responsibly. But I think in a pinch, I can definitely help you out, especially for some esoteric needs. So really great perspective from multiple industry experts in this field. And is there a is there a winner in all this? Well, no. We're not gonna pick a winner in this. This is about in your specific domain, there may be an ecosystem around a language that's already ready for you, to take advantage of to get your job done and to really start where you see that that most, you know, wide usage and, you know, a great community around it. So, of course, we're big fans of R here. It literally does almost everything I need. Of course, I am venturing into other niche, you know, languages or niche frameworks like JavaScript when needed. But R is still the engine that performs all my all my, analysis needs at the moment, but it's great to see if I was new to the game. Just what kind of questions should I be asking if I'm looking to make a optimal decision?
[00:10:13] Mike Thomas:
Yeah. I really enjoyed it. Yeah. It kicks off with a discussion about how Python has sort of overtaken JavaScript, from, I think, some GitHub research that was done. And I'm sure that's in large part to everything going on with AI, but that's that's a pretty big deal. And I I think there are probably also a lot of data scientists and data analysts out there that may not consciously always realize that they're developing software. Right? Sometimes if we're just playing around in R, it can can feel like magic or or power Excel if you came from that world. Right? And I really like Jeffrey's articulation of compiled versus non compiled languages. That was really helpful for my own understanding, and a great articulation of sort of the long standing debate over the usefulness of notebooks and struggles that they have with reproducibility.
And the shout out to the Marimo project, which seems to be a great option, it sort of creates a dependency graph in the background from what I understand. I haven't tried it myself yet. And when you change a value in one cell, the dependent cells will be updated accordingly. Kind of feels like or sounds like targets to me. I haven't tried it yet. We use quarto, you know, because it's easy to switch between R and Python within the same notebook tool. But if I was more exclusively on the Python side, I think it would be, my tool of choice, it sounds like. And I also think sort of that big data discussion is pretty much been squashed, right, with the likes of DuckDB and that the parquet format and now we have APIs from whatever tool we choose to be able to successfully leverage those technologies from R or Python.
And Jeffrey, you know, mentions and rounds out the article, a couple different resources, including the Carpentries and the Data Science Learning Community. Those are are two of my favorite resources and maybe the last thing that will touch on is I agree with him that I there's a lot of benefit in my opinion in choosing the tool that your colleagues are using and Eric I think you'd share the sentiment as well but I also think there's value in really going deep in one particular tool and learning it really well one particular programming language I mean and I think then it will become easier to adopt the next tool as opposed to trying to learn two simultaneously.
You know, I did this with R going really deep into R over over multiple years before I really started to try to pick up Python, and I truly believe it helped me pick up Python a lot quicker than I would have otherwise. You know, though I'm I'm no expert, I can get around pretty well in Python these days. And a lot of that stems from, you know, Googling Stack Overflow, ChatGPT, whatever you wanna call it, saying, hey, I do this thing in R. How can I do this in Python? Right? And knowing those keywords to ask are really what unlocks you to be able to get that answer, to be able to to, you know, incorporate that functionality in that lesser comfortable programming language that you're trying to to leverage and use. So I think that that's been a really good strategy for me, and hopefully, it's helpful for others. But this is a really interesting, blog post. I thought it was sort of unique to our our weekly highlights, this type of a discussion, and really enjoyed it.
[00:13:23] Eric Nantz:
Yeah. Me as well. And one thing that really helps as you're, you know, trying to go deep in in these languages, it's always helpful when they even though they are they can have fundamental differences and some bits and pieces of it. But I I I do share your your your, experience there that if you if if I had started with an of a language that was, you know, deep rooted in, say, object oriented principles and other, you know, more traditional frameworks that say R and Python utilize versus starting with SAS. It was really hard to translate what I've learned with SAS being, quote, unquote, my I guess, tangling my second language. I don't even count Java because it was a nightmare back then.
But going from SAS to R was a massive jump to say the least because they were so so different. So I've I've I think we're seeing without me getting on my soapbox about the whole SAS stuff, We are seeing that whatever our Python, that's becoming what most people in such in data science are getting introduced to when they get to their respective coding classes. So I think the principles you learn there will set you up for success success even when you do have to venture off into some of the more niche side of programming. Obviously, we didn't hear about Julia much in this article, but I know that's getting a lot of momentum as well. But, again, with it striking the balance of the open source paradigm, the object oriented paradigm, I think with the resources out there, the key is seeing the community around it, knowing how to ask the questions that you have, hopefully, having a mentor along the way or user group you can talk to.
Again, we're biased in the our ecosystem. We got the great data science learning community. Wonderful place to join if you're new to the language. You have so many people ready to help you out with the book clubs and other adventures. That's where you have to go kind of outside the confines of just what the language document documentation has to to offer to you. But the time is now to take advantage of those resources. And up next next in our highlights, we did talk about some of the newer ways you can get help for information. And one of the ways that came from a code helper perspective a few years ago is when Microsoft introduced the Copilot, functionality in Visual Studio Code and and in GitHub in generally.
And what can be nice is, you know, I'm using that to help develop, you know, maybe that snippet that you need for that function you're trying to put in your in in your Shiny app, for example. When first co when, GitHub Copilot came out in the very early days, and, admittedly, I was kind of intrigued by it because I never really used a code completion thing before. I did try it out. It left a lot to be desired in the world of shiny development, and I was kinda turned off after that. Now this was about two years ago. I knew things were gonna get better. I just didn't wanna wait around that long, and I kinda gave up on it. Well, we have learned since then. There are some interesting advancements not just in Copilot in general, but in the way that you can develop a shiny app with Copilot.
And so our our next, highlight comes from Peter Stryanko who is, one of the leading, AI engineers and thought leaders at Absalon. And first he had given a wonderful talk at the recent shiny conf, which I invite you to check out the replays if you wanna look at that after the fact. But his article in the Absalon post here is talking about now building more reliable Shiny code with GitHub Copilot, but also a new extension in the Versus code ecosystem from POSIT. So what are we talking about here? Well, actually getting back to that shiny comp that just concluded a week or two weeks ago, there was a keynote by Winston Chang that in about midway through had introduced some recent work that the positive team has been doing by introducing a new tag in Copilot as an extension in Versus code, specifically the shiny tag.
So what does this actually mean in practice? Well, let's say you install the the shiny Visual Studio Code extension, which again is available in the in the Versus Code extension marketplace. And if you have Copilot already wired up, you just fire up a, an interface for that chat in Versus Code. And then when you're ready to ask it a Shiny specific question, you do the at Shiny tag before you put in your request. This was not around when I first started Copilot. There was no shiny tag or even our tag for that matter. So I'm kinda trying to narrate what I want, and I would just get a whole bunch of junk out. But in this example here, he has, where he's supposed to add shiny tag and he has create a simple app with a drop down menu to select a variable from the m t car set and then a plot showing that variable's distribution.
Now that add shiny tag is doing a lot under the hood. What it's doing is it's basically behind the scenes, really injecting the prompt that is often sent to these AIs with much more additional context around shiny itself. Whereas, if you don't have this, that shiny tag, it's just gonna kinda go through its typical resources that the model's been trained on to do this. So he's got an interesting before and after, situation here where he tries asking this question without the shiny tag versus with it. And there is a stark a stark difference where it is a the without the shiny tag, it looks like a gobbledygook of API, custom API, JSON files, CSS files, YAML files, and other weird stuff under the hood. It ain't shiny, folks. I mean, we can tell that much.
Where, if you look at the example with the shiny prompt, you get at the end, after some questions that the prompt gives back to the user, in this case of which language, r or Python? Chooses r. And then, okay, would you like the where would you like the app to be put? And makes a subdirectory for it. And then sure enough, becomes a simple Shiny app already using bslib under the hood. So that's great. You know, taking advantage of that. And it you know, it's a streamlined app, but it it it got the job done, apparently. So if you are developing Shiny and you're leveraging the Copilot, you know, extension, you owe it to yourself to try this out because I think you're gonna get a much better results in this than if you just don't have the the shiny tag to help you out with the prompt behind the scenes. Now, again, I have not tried it with the new shiny tag. It's on my list to do.
I wanna make a note for those who may be wondering, wait a minute, Eric. Why didn't you mention Positron? This is not available in Positron yet. I know that's in the works, so we may not see that until later this year. But if you're on Versus Code already, yeah, maybe give it a shot and see if they can bootstrap an app for you in a seamless way. So,
[00:21:04] Mike Thomas:
again, caveats abound, but, hey, there's progress to be made here. Yeah. Eric, I really appreciate this work by the Posite team. You know, I'm sure it's no small effort to convert all of the shiny documentation to markdown that can be injected into the user prompt. I'm assuming that's sort of the approach that they're taking, and it sounds like they did so on both the R Shiny side and the Shiny for Python side. So I'd be really interested to learn about how they went about doing that, I'm assuming in a some sort of a programmatic approach so that when, you know, that R Shiny package gets updated or when the Shiny for Python package gets updated, you know, that shiny tag in the Versus Code extension will reflect those updates for the sort of current best practices for those two packages.
And this prompt stuffing approach is one that we use very often, to try to pretend that the LLM was trained on a specific set of documentation or or context that we wanted to know about. With this approach, you're definitely going to get way better results than you would without doing any of this prompt stuffing. But behind the scenes, my understanding is that the LLM is sort of using a combination of the prompt that you provided, that context, as well as what it was trained on and and hopefully, weighting much more heavily, your prompt than the context that it was trained on. But I think you're still running probably a a non zero chance of it hallucinating on a particular question that you're going to ask. So like any of these AI solutions, don't take its output as pure gospel. Make sure that you, have a some sort of a workflow and approach where you're ensuring that either you have the the documentation for Shiny pulled up on the other screen and you're just, you know, leveraging.
You're you're using that as a gut check and leveraging the the LLM to try to get you to your final result faster as sort of a co pairing, a pair programming guide, which I think is a fantastic way to do it. But, yeah, I think this is, as I mentioned before, going to get you way better results than probably what you've been trying to do in terms of asking ChatGPT to write your Shiny apps for you.
[00:23:17] Eric Nantz:
I actually have, I'm I'm intrigued to try this out for a a real case, I would say, where I'm about, you know, without reviewing too much here, I'm about to get an influx of requests from different statisticians at my company to help build some shiny apps that may vary in the level of complexity. They may start small, they may end up getting bigger and whatnot. That's beside the point. I've been asked to, you know, at least look at sketching something out to give a demo that may or may not, you know, get the project get green lit for more robust development. I've been contemplating whether I try something like this to boost wrap the initial version of this. Now putting this, your thoughts on this, Mike, do you think I should, if I was gonna do this, ask it to start building an app that kinda follows what typically you and I prefer for our app structure, I e a golem powered app as a package?
Or do you think that might just be a bit too much for it to do right away? Maybe I should stick with, you know, a more traditional app layout. And then down the road, I convert it to a goal on that. Do you think an LOM could handle something like that?
[00:24:27] Mike Thomas:
I don't know. It's a good question, you know, and I think in a if you could, in a similar way to how Posit has converted the documentation or leverage the documentation for, Shiny for Python and and R Shiny, maybe you could also stuff the context for the documentation for Golem or b s lib. Right? I would imagine, into that prompt as well. These context windows are getting bigger and bigger. So I think it's it's worth a try, but if you were to to sort of do it do that approach without providing it any context around Gollum, I'd be interested to I wouldn't have a lot of confidence that it's gonna give you great results.
[00:25:10] Eric Nantz:
Yeah. I I've learned that, you know, when I'm building these initial prompts, I'm a very detailed oriented person with these things, whether it was back in the old days. If I had, like, when I say old, this is only, like, ten years ago. If I had a colleague that was instructed to help me with programming support for, like, this custom biomarker analysis, and I would give he he or she the very detailed specs of here's the input data. Here's the layout I'm looking for. Just code this up, and then let me review when you're ready. A bio would always leave no stone unturned with, like, type of variables to look at, the type of output, you know, these considerations, any derivations that they need to be careful on. I typically use that same approach for prompts, but I always wonder, am I being too detailed about it? So far, I haven't been, but I think, like you said, the context is really important. I can't expect in only a few sentences that this thing is gonna know what to do because what human would be able to do that either? So another perspective to keep in mind, I guess. Yeah. No. I envision a world I think everybody does
[00:26:12] Mike Thomas:
where someday instead of using this approach to provide, you know, additional context to the LLM about what you wanted to know to be able to have a better approach to fine tuning, I guess they call it, these LLM models so that really it only has the knowledge, you know, of the context that you want it to have.
[00:26:34] Eric Nantz:
Yep. And I've been seeing bits and pieces in in these areas. So you often see, you know, these in in within industries, some of these specific, you know, bots or models that are being shared in Hugging Face or other areas. And, yeah, I'm really intrigued to see see where this ecosystem goes. And, yeah, I'll give this a play. And in the end, would I ever just sign off with some about reviewing it first? Oh, heck no. No. I'm I'm not gonna if any of my colleagues are listening to this, don't worry. I'm not gonna I'm not gonna throw something over the fence if I don't vet it first. So take that to heart, folks.
[00:27:09] Mike Thomas:
Well, if at Catchbook, we develop an AI agent for Shiny apps specifically, I promise we'll name it Eric.
[00:27:16] Eric Nantz:
That's it, man. Game over, man. It's game over. Well, in in about as soon as you hear our last last highlight is kind of going back that we talked about at the very beginning for those that are maybe new to programming and informing what kind of choice they make for their language or programming language they're gonna use to accomplish a certain task. Well, let's say you've already made that choice of you're gonna use r to accomplish that task, and maybe you don't even know what that task is. Yeah. You're just learning from the ground up here. Our last highlight here is an interesting perspective on what's been helpful for a recent, undergraduate to learn our in the humanities, area.
So this, blog post comes to us from Bruno Pone or or Pone. He is a now a data analytics consultant at the data school in Deutschland. He is talking about what some of the things that he has encountered with applying, you know, data science type principles to history and the humanities. So his first encounters with r itself, was when he was in his master's of studies at the Hertay School in Berlin in their statistics department. And there were a couple mandatory courses, and one of them talked about some of the statistical concepts that he became, you know, interested in. Things like validity, selection bias, principles like regression to the mean. They kind of captured his interest and noticed that, hey, these could be applied to multiple industries, not just, like, statistics as a whole.
So within within that kinda area, he encountered his first R programming assignments, and it felt a little frustrating to him. It was just totally new to him. Lots of function syntax that got a little frustrating, debugging errors, and was, you know, trying to figure out how to best proceed here. This is where, going back to what I said in in the earlier part of the show, knowing where to get help and knowing any communities around that language can be extremely helpful. In in his case here, Bruno, discovered that there was, as we've heard in the community for years, access through his, master's, program to a platform called DataCamp.
And for those aren't aware, DataCamp is this is definitely not free advertising. I'm just saying what they do. They just offer focused courses with video content and in browser exercises all within their platform. You don't have to install anything on their system. Now Eric's editorial comment here. You may wanna think about different sources in that, but we'll leave it at there. But with that, there are plenty of services like DataCamp out there. If you're interested in some more interactive content, that can be a great place to jump start your education. So like I said, there's a lot out there. Definitely, let us know if you're interested in what those are. We're happy to send you links about those other services.
But once you get through that up, you know, that little hump, if you will, in your development, now he gets to actually use some of the things he's been learning in this domain that he's looking at in policy analysis and really starting to leverage in those statistical concepts that you mentioned earlier within our deduce visualizations, maybe do some ad hoc simulations to illustrate the the impact of certain concepts. And then, of course, like all of us, we we became the victim of the pandemic, which changed the world in a lot of ways where we often were, you know, confined to our various homes or or other locations. And because of that, apparently, because of budget considerations, his institution, stopped providing those, Datacamp access.
And that's where then he turned the platforms like Stack Overflow to help get questions and answers. And then also started reading more of the official documentation for functions and packages. As we know in the R ecosystem, there is varying levels of documentation for packages. Most of the time, the ones that are robustly developed will have great documentation they can draw upon, but there's more to that. He also turns to these great books that are specific to R and different ecosystems around R. He mentions O'Reilly Media, which, of course, has been a great publisher of various books like R for Data Science and the like. There are many others out there like CRC Press and others.
He mentions the R for Data Science book. We literally, at the day job, had an open office hour and some statistician was calling us and said, hey. I'm just picking up R again after many years. Where can I go to learn more? And we both my colleague, Will Landau, and I pointed out r for data science is a great place to start. So we'll have a link to that in the show notes. It's also was the genesis of the data science learning community. So lots of great resources out there if you're new to the language to make you feel not so alone and and learning about all this.
And touching back on the last highlight. Yes. In this day and age, AI, when used in, you know, responsibly, can be another aid in that journey. But this is where we do with my 2¢. I do stress getting a pretty solid foundation in your understanding first before you start vetting the AI solutions because depending on which model you're running, depending on what kind of question you're asking through that prompt, you might get a solution that would have been great, say, five years ago, but maybe is missing some of the more modern approaches that maybe the tidyverse is giving you.
Or let's just be real. A real example might be, hey. I'm dealing with this large data. How do I deal with memory management? You know, it may not pick up some of the newer advances like DuckDV or Parquet that Mike mentioned just a few minutes ago. So just be careful when you're using those AI prompts to help you learn something that you do build that foundation first through these more, traditional resources that, again, if you're new to the language, I think can get you really far ahead of the curve, as well as you just reviewing other people's code. Let's face it. In the role of the shiny, I'm always reviewing what people like David Grange and, Mike, when you share an app online in the public, I'm always reviewing your stuff. Many others that help inform my style, my learning. So I can't stress enough leveraging GitHub, looking at repos of packages or apps or even just tidy Tuesday analysis if someone shares their code. That's a great win and of itself. So in this day and age, not to be like that guy that says, good off my lawn, but we didn't have this stuff when I was learning r. So taking advantage of these resources by hearing Bruno's perspective is certainly insightful and how he started from being both new to r and new to statistics into really loving the use of that language and getting some real interesting, analysis completed. So, again, great food for thought here, and I think a very relatable blog post for many people that are listening now.
[00:34:56] Mike Thomas:
Yeah. I agree. And, you know, this blog post takes me back to some of my own journey learning are and some of the things that I wish I had known and some of the things that, did help me or or hurt me along the way. I really like the the key takeaway around, you know, visualizations being a really great way to see data come to life and to put meaning right towards the code that that you're developing and actually sort of see the value firsthand there right in front of you, as well as, you know, having a goal or a project that you're trying to work on and and using R as a tool to get you to that that end goal as opposed to just sort of trying to blanket learn a programming language without any, you know, specific particular use case that you're trying to tackle.
This might sound silly, but one other thing that helped me when I was learning data science, maybe less, you know, specific to programming, but data science in general, podcasts. So I was a huge consumer of podcasts. So if you're trying to learn r, maybe r highlights podcast or just our weekly.org, shameless, shameless plug right now, could be a great resource to be able to help you, you know, just keep up with the conversations of everything that's going on and the the verbiage and, you know, some of the acronyms that get used that you may see online in, you know, some of your your research as you're trying to learn this that that may help some of these concepts start to click a little bit better. But overall, you know, I think a really great rundown. This this was sort of nostalgic for me, taking me back to my journey as well, but some great fantastic resources for folks who are trying to pick up R.
[00:36:39] Eric Nantz:
You give me the feels, man. That's why I started the R podcast back in the day because, a, no one else is doing it. And, b, I had learned so much about Linux through podcast. I thought, why is anybody doing this in data science and R in particular? So I do remember in the early days, there were some listeners out there who said, oh my goodness. This is such a massive help compared to just reading that online doc. So, yeah, I I think we can we can plug what we're doing here. I have heard people say it's been quite helpful because in this day and age, so many ways to consume content. So when you're doing the dishes or you're mowing that lawn, you wanna level up your r game, tune in to the back catalog of our weekly. We got a lot of things that we cover both in our experiences, but also more important highlights like this that share some some great insights too. And like I said, the online resources in general, so much out there.
Sometimes it can be overwhelming at first. That's why, again, I'm gonna plug it one more time. But John Harmon, who runs the data science learning community, top notch place to go to. Again, we'll have a link in the in the show notes. That is a wonderful way to collaborate with others no matter where you are in that learning journey. So great great post here by Bruno, and and, certainly, I'll keep a lot of these things in mind as I hear from others that are also new to the language. And as we said, there is a lot going on in our weekly, not just when we talk about these, you know, selection of the highlights. So we invite you to check out the full issue, which, of course, we link to every time in the episode show notes. And I do wanna give a very quick shout out to an additional package that I found in our packages section that I think could have really it will be useful now. It would have been even more useful a few years ago, but there was a time in a day job situation where we had a project manager that wanted the status of all my issues on my GitHub repo for a big large scale project because she wanted to feed that information into her, I guess, Gantt chart visualization thing to track a milestone forecasting and all that jazz.
So I stitched together some custom code with the GitHub API and r using the g h package. We kind of pull all this stuff down, do some massaging. Well, there's a new package that does that for you that just hit CRAN, and it's called issue tracker, author by Tengi Bertamili. Probably butchered that name. But it does what it says what you might expect on the tin. It's got some great functions to basically retrieve a project's open or issue information, both the issues as well as milestones and other interesting metrics. You can and then you can save that to a local area so that when you wanna do processing on this data, you don't have to keep hitting the API to do it. You can have a cache version of this so you can refresh at any point.
So and then you can update this database when you need to if you know there have been new issues filed or whatnot. But once you have that, there are also some convenience functions to filter the issues based on fields, maybe certain values or keywords, as well as some default sorting you can do of the issues based on, again, maybe milestones, other metrics. And I would have used the heck out of this back then, and frankly, I may use it now because guess what? As much as I use GitHub project boards for my issue management, I live in a Jira shop, so sometimes I might have to pipe some stuff from GitHub to Jira, and maybe I just use this as an intermediary. I don't know. Mike's already scowling at that, so I'm sorry for bringing up bad vibes there. But, nonetheless, issue tracker is definitely a package I'm gonna be looking at. No. Sorry. I was just recently
[00:40:27] Mike Thomas:
added to an external team that uses Jira and sort of my first foray into using Jira, and I'm still still getting adjusted to it. Let's put it that way. But that's a great call out. You know, I'd be remiss not to to call out Albert Raps Scalable Reporting with Quarto blog post that just dropped this past week. It's a use case that I think a ton of people have. And as always, Albert does an awesome job of walking us through that concept.
[00:40:54] Eric Nantz:
Yeah. I love using Quarto in many different ways here. And the possibilities that you can do once you use the parameterized, you know, report functionality, There's just so much as possible. And also check out his, back catalog, if you will, on his site. He's been also talking about Quarrel recently with types for doing optimal PDF reports and theming those up. So if you're in the world of static reports, you have our sympathies if you are because, man, I love the HTML lifestyle. But if you are in that space, types is something I'm keeping an eye on for some really attractive PDF reports with that hints of CSS kind of style when you get, in web reports. So definitely check those out.
[00:41:38] Mike Thomas:
Yes. Types is awesome. Super, super fast compared to LaTeX. And, it's pretty much, I think, built into your install of Quarto. So if you have Quarto installed, then types is installed. No no tiny tech.
[00:41:52] Eric Nantz:
Yeah. Yeah. And that that can be really important, especially those in the enterprise that get a hard enough time asking their IT admins, can I have tiny tech or can I have this latex thing? You might get like, nah. No. So anything that comes bundled in can be can be a big help there. But like I said, our Wiki bundles up all sorts of great content here. We just mentioned a couple additional fines, but you may have an additional fine just by reading that issue. We'd love to hear about it too. And and speaking of hearing about it, of course, our project largely depends on the community for help. And that's where if you find that great new resource, we are just a poll request away. Everything's on GitHub.
It's all marked down all the time. Marked down if you can't learn in five minutes. The author and knitter would give you $5 back in the day. Maybe not now, but at least he told me that ten years ago. Nonetheless, you can file an issue or a poll request right from our GitHub page. It's linked in the top right corner of our our weekly issue, our weekly site. And if you wanna get in touch with us, there are multiple ways of doing that. We have a contact page in the episode show notes. If you wanna send us feedback in the traditional way, I will get that in our fancy our weekly inbox, and I will be able to share that on the show if you're interested. Also, you can get in touch with us on the social medias out there. I am on Blue Sky where I'm at [email protected].
And, also, I am on Mastodon with @rpodcastatpodcastindex.social. Oh, those are hard to keep straight sometimes. And I'm also on LinkedIn. You search my name and you'll find me there.
[00:43:27] Mike Thomas:
And, Mike, where can the listeners find you? You can find me on blue sky these days at mike dash thomas dot b s k y dot social, or you can find me on LinkedIn if you search catch broke analytics, k e t c h b r o o k. You can see what I'm up to.
[00:43:44] Eric Nantz:
You bet. Always, great follow the hair on LinkedIn and the like and, yeah. I'm trying to get in my head above water so this week. I have some recent day job projects. I got some more open source stuff I gotta get back into, like, some, quote unquote conference package development for my upcoming let's talk at Pawsit Conf. I gotta button some things up there. So, well, we're really happy that you joined us for this, episode 203 of our weekly highlights. And, hopefully, unless real life gets in the way, we'll be back with another edition of our weekly highlights next week.