In this episode of R Weekly Highlights: We have a six-month follow-up perspective from an early Positron user, how the current landscape of AI tools perform when learning the ropes with the Tidyverse, and how you can create your first Observable plot while using R for data munging.
Episode Links
Episode Links
- This week's curator: Jon Carroll - @[email protected] (Mastodon) & @jonocarroll.fosstodon.org.ap.brid.gy (Bluesky) & @carroll_jono (X/Twitter)
- Positron: current joys and pains
- Learning the tidyverse with the help of AI tools
- Observable for R users
- Entire issue available at rweekly.org/2025-W15
- Positron +1e https://open-vsx.org/extension/grrrck/positron-plus-1-e
- Vanishing Gradients episode 47 (The Great Pacific Garbage Patch of Code Slop with Joe Reis) https://vanishinggradients.fireside.fm/47
- Observable color palette viewer https://observablehq.com/plot/features/scales#color-scales
- Observable Plots (R/Pharma 2024 Workshop Series) https://www.youtube.com/watch?v=M6fP68XnacM
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)
- Mike Thomas: @[email protected] (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter)
- Sunny Side Up - Yoshi's Island DS - ZackParrish - https://ocremix.org/remix/OCR04558
- Costa Del Sol DANCE - Final Fantasy VII - Posu Yan - https://ocremix.org/remix/OCR00095
[00:00:03]
Eric Nantz:
Hello, friends. We are back with episode 201 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the terrific highlights and other excellent resources that I shared every single week at rweekly.0rg. My name is Eric Nance, and I'm delighted you joined us from Revyar around the world. And, boy, it's been a crazy times in some parts of the world lately, but we're happy you're here. And I'm not joined here alone. I am joined at the hip here virtually by my awesome cohost, Mike Thomas. Mike, how are you doing today?
[00:00:35] Mike Thomas:
Doing pretty well there. Can't complain. The weather out here on the East Coast, about twenty minutes ago, it was snowing, and now it is sunny and beautiful. So it's as crazy as, the world seems to be these days.
[00:00:51] Eric Nantz:
It is. And that's no April Fools a week later. That actually happens, folks. And we had a freeze warning here too where it's like, I thought we were done with this, but, nope, we are not. So my my little hands here are still frigid from from being here in the in the humble basement when did I record this. So first first world problems, I guess. But nonetheless, we got some stuff to heat up our our, our knowledge here with the batch of highlights we're gonna talk about today. And as usual, the our weekly effort is a volunteer effort where every week we have a a new curator, rotating into their shift, if you will. And this week, that was Jonathan Carroll. Again, one of our longtime curators on the project.
He's also does a lot of interesting programming exercises, so definitely check out his blog if you're interested in what he's up to. But as always, he had tremendous help from our fellow Arrowki team members and contributors like all of you around the world with your poll requests and other terrific suggestions. So if you recall, it was back in 2024. The company, Posit, who, of course, have authored what has become one of the standards in data science tooling with the RStudio IDE, that's also been branded as Posit workbench for their enterprise products. Well, they made a splash last year when first, it was kinda quiet, but then it was a big splash at Positconf when they talked about their new IDE called Positron.
And for those that aren't aware, Positron is, in essence a wrapper around Visual Studio code, which has been used heavily in software development for quite a few years now, but with a data science flavor to it. And one of the main selling points is that it is a polyglot type of IDE where you can have r, Python, Julia, and almost any other language that Visual Studio Code supports. You can have that right into your Positron session. I have been using Positron almost exclusively for now about four or five months. It was a little bit here and there in 2024. But with some recent advancements, especially around Nick's, I've been able to drive it a lot more as my daily driver, but I'm not the only one. And our first highlight today, we have a great blog post here called Positron, Current Joys Joys and Pains offered by Athanasia Milenko.
She is a neuroscientist, and she gives us a great recap on her experience after six months of using positron. She first leads off with the positives. We always like talking about the good things before the not so good. As I mentioned in the outset, a very, very useful feature of positron is, like I said, this multiple language support, which doesn't feel like something that was bolted on midway through a product's life cycle. Where if you remember running, say, RStudio in your daily driver and you maybe wanted to do some Python development with it, you would have to use reticulate. Sometimes it wouldn't feel quite as native. You got a lot of handoffs going on there.
But because this is based on Visual Studio Code and it's got access to all the ecosystem that Visual Studio Code brings, you have either extensions or built in support for the common languages, especially for data science, such as Python, such as Julia, and others as well. And if you're, let's say, a JavaScript developer, you can tap into things like ESLint for linting, lots of other extensions. And a new one that came out somewhat recently is the air extension to help you format your code as you're saving your file, both in r and, I believe, on the r side of things. But in Positron, these things are easy to set up, and you don't feel like you're going off label, so to speak, when you use these multiple languages. So while I've been mostly doing R with Positron, I definitely have dabbled in Python before Visual Studio Code, so so it shouldn't be any real difference here. As long as you have your Python environment and your R environment set up, you should be good to go with positron in these multiple languages.
Another thing that is growing on me, it took a little bit, but now it's starting to really become nice is the environment viewer, which now you're starting to see what positron brings to the table as opposed to just straight Visual Studio code, is that this environment viewer definitely takes a lot of inspiration from what we saw in the RStudio IDE. Ways that you can view your dataset in a rectangular kind of display, do filtering on the spot. It may take a little getting used to at first if you're new to it, but once you get going with it, I think there is a lot of great ways you can explore your data there. Not all perfect. We'll get to that a little later, but I think it's it's coming along pretty nicely.
And one nice thing that coming from those that use RStudio is that there was a viewer pane that you could say run either your shiny app or maybe document a report. I knew that would happen. You didn't see it. You could document you can look at a render report in the viewer and whatnot. But guess what? Anything that the RStudio viewer could do back in RStudio itself, positron has support for that as well. This means also if you're maybe not on the portal train and you're still using a framework like blog down, one of my favorite packages for writing a blog with Rmarkdown, you can actually run your preview of the site in that positron viewer as well, just like you could with the r studio ID. So, again, they've done a lot of engineering under the hood to make that pretty seamless.
There's also support, for the add ins ecosystem that our studio brought to the table, again, midway through its life cycle. But because those are basically embedded Shiny apps when you run those most of the time, You can run those in positron just as well. However, not all roses and unicorns here. There's still a few things that are troublesome. So, Mike, I hate to make you be the bearer of bad news, but apparently, yeah, debugging's still not quite a seamless experience. What what does she have to say about that?
[00:07:26] Mike Thomas:
Yeah. In terms of debugging, it's it's not quite as easy as maybe what you would be used to in Versus Code. You know, running code while you're in the debugger, it can be done with, control enter, allows you to to run line by line from your browser statement, or breakpoint or other debugging marker that you've used. And Athanasia notes that if she's moved the cursor for some reason, whenever she does control and enter, it just is stuck at the browser call without running the other code. But if you've moved the cursor manually, it might not work, and and she can't really understand exactly why that is the case. I don't know if this is also the experience for any others. Eric, I know you've been a user of Positron a little bit. I don't know how much debugging you've done or if you've sort of ran into the same thing.
[00:08:20] Eric Nantz:
Oh, here and there I have. Now one thing I've kinda had in my muscle memory is I've been using debuggers both in RStudio and in Positron is I'm I'm one of those old school folks that just likes to type in that console to run code and either hit end to do the next line or just print out, like, an object structure. So but I can I can see I've had a little bit here and there where that focus seems to randomly shift away, and then sometimes I have to quit the debugger with with capital q to get back to the normal session and then try again? So there is some finagling you might have to do, so I can I can see where she's coming from here?
[00:09:02] Mike Thomas:
Absolutely. And and as well, you know, if you are if your previous workflow like Athanasius, in terms of how to run code when you're in the debugger, I think that's exactly what you're talking about. If you would copy and paste sections of code that you you wanted to run into the console and sort of have that interactive experience between the debugger and the console, that's not necessarily gonna work in Positron currently. And I think that's sort of the other big frustration, that she has at this point right now. However, you know, she says that her experience actually in Positron has helped her better understand the R Debugger as a tool and leveraging, you know, the keyboard commands in the debugger, like q to properly exit the debugger, you know, as opposed to maybe doing those things in a point and click fashion when you're in the debugger in our studio. So, you know, I think there's some pros and cons there. In terms of the data viewer, which is how she wraps up this post, it seems like it maps to Versus Code's data viewer, if you're familiar with that. So it's it's pretty good, but I think it lacks a a few features that you may be used to if you came from RStudio.
First being no handling for labeled data, which she points out is something that Shannon Pileggi feels strongly about, and there's a great link to Shannon Pileggi's blog post on labeling data in R. You can explore lists in the viewer where, you know, in RStudio, you can expand and collapse nested lists if you click on them from, you know, your global environment. And the data viewer also tries to guess delimiters for plain text files and doesn't always do a great job. So Athanasi has been preferring to use the Rainbow CSV extension, which is, you know, one of the benefits of Positron being built on the open source fork, I believe, of Versus Code, which enables use of that open VSX universe of extensions, which is is fantastic and something that you wouldn't have had access to as an RStudio user, which is a pretty cool universe for those of us coming from, you know, the Versus Code world.
And then the last thing the blog post wraps up with is being able to sort of set up your Positron environment, leveraging a a JSON file. And I don't believe that you can do something like this with RStudio. Is it mostly point and click to do that?
[00:11:26] Eric Nantz:
They're they're they did laid much later in the in the life of RStudio, give you a way to do a text config file, but it was never, in my opinion, documented very well of how to interact with that directly. You can back it up and then restore. But this I admit what we see here for Positron and by proxy Versus code. Yeah. It's JSON, but it it makes more sense, I think, to configure it well. I agree. And it's really nice to be able to do this. It's it's pretty easy to wrap your head around.
[00:11:56] Mike Thomas:
It should be familiar for Versus Code users, maybe less familiar for RStudio users. But I think taking a couple cracks at it, you'll get used to it really quick.
[00:12:06] Eric Nantz:
Yeah. And and that's a it's a great phrase to to end with. I admit I I may have had a jump start in my positron journey because, Mike, you know this. I was on that whole dev container kick with Visual Studio Code and the r extension with Docker containers. And I was kind of, for my open source work using Versus code extensively for my development environment. And, yeah, I mean, I got pretty long, pretty long way, in most of my Shiny app workflows especially. But there are always some pain points here and there. So when Positron came to be, it wasn't that huge of a of a lift or I should say a shift in in perspective because I've already kind of stress test Versus Code a bit. I still can see if you're coming from our studio itself, this is gonna be a bit of a jump. I think no one will will discount that and that's expected. It'll take some getting used to. That's why I always recommend when something new comes out and let's let's be honest here, it's still considered beta. It's not a production release yet. Although, you know, if we know our history well enough, we'd imagine around September. This is gonna be a production release if you know what I mean.
But my point being is that try with some low risk project, get a feel for it, and then slowly start to more fit into maybe your your daily workflow. But, you know, the best way to to learn is by trying. So I think there's a lot of lot of positive momentum here. Still some paper cuts as they say. That data viewer, aspect without the label attributes, that's actually a kind of a bigger one for a lot of my, colleagues here in life sciences because when we import data from SAS, say, using the Haven package, we get those labels right off the bat. And to be able to see those in the in the viewer is a big win for us. So, hopefully, that that comes to play. And I think the other thing that maybe you have to set your expectations for is that this is a very, very customizable experience.
Like, Versus Code, there are all sorts of blog posts out there, people tricking out their setup with various extensions, various paying layouts. In the end, Positron tries to give you, like, three or four good choices to start with of your paying layout that you can toggle kind of back and forth for the preferences. But I wouldn't I wouldn't get too bogged down into just how much you can customize. I would kind of build it step by step until you get to that that state that you like. So I can tell from Athanasios, post here and her post before this, she's been learning along the way and I think the evolution you can see in her settings file shows just that with, settings for both positron and some of the popular extensions, say, for git, there's a git lens extension and other, you know, nice things with GitHub directly.
Speaking of extensions, you may be wondering, well, sure there's so many out there. Which one should I pick? Well I will put a link in the show notes to really handy what I'll call wrapper extension author by Garrick Adenbuie over at posit I believe it's called positron x 1,000 or something to that effect where it's actually a collection of extensions that he is benefiting from from his role as a shiny developer and an engineer at posit. You can even just see if you don't install it right away, you could just see what's included in that and see if there's specific ones that meet your needs, whether it's from linting, auto spell checking, you know, navigation layout for your your files. There's all sorts of interesting goodies there.
So, again, easy to get bogged down a little bit, but you've got a lot of choices at your fingertips for how you can tailor the, positron experience to meet your needs in your data science journey. Well, we're at two zero one, Mike, and the street continues for yet another, interesting highlight involving artificial intelligence and our world of learning and data science, and we're gonna hone on that keyword of learning here because one perspective might be it's one thing to use AI for getting specific help to problems that we're encountering. Maybe it's a, you know, an error message in our console. Maybe we have to tap into this other language that we rarely use like XML or whatnot in our projects. But going back in time as if you and I had these tools available to us and we're just starting our data science journey, just what would that actually look like? And I think this next highlight here is showing just that, with, author by Minnet Chechnya Rundell who is a professor of statistics at Duke University and also on the Pazit team where she is talking about using AI tools for helping learn the tidy verse.
Now she mentions off the bat, and you're in good company here Minhay, that there are a lot of opinions on AI out there and, we have no we have no shortage of ones that we have here on this very show, but she's gonna frame this in the context of being, you know, a newer learner in data science and maybe trying to do a more common task in, say, r with the tidy verse to get, like, a data analysis or visualization done. So there's a few different case studies here to kinda run the full spectrum of what we're talking about here. The first one is leveraging chat g p t to help with reshaping your data and plotting.
So this example is based on an already worked out example, from the r for data science book using the billboard music chart, dataset where she's asking the the AI, you know, chatbot, use the billboard dataset in the tidy r package to create a visualization of rank versus week number for each song in the dataset. And the answer comes back not too shabby. It's loading the packages right off the bat. It's doing a little bit of, reshaping with the pivot longer function. That's already a step in the right direction. Doing a little bit of conversion for the week to become numeric and then doing a jiji plot, which, you might call a spaghetti plot type setup where each line is one of these, trends and week number on the x axis and the rank on the y axis of that longitudinal profile.
And the so chat GP does give a little explanation at the end and Minay says yeah there are some things that are promising first it is using the tidy verse it's using a newer version of that tidy our function to go from wide to long with pivot longer, and it's got the y axis actually being reversed with zero at the top and a hundred at the bottom which is what she was looking for. But this is where things get tricky, folks. There's always a little bit more than you bargain for here sometimes. Like, I colored the lines by track. That, you know, looks kind of interesting from a visual perspective, I guess, as an art piece. Maybe that's not exactly what you're looking for, perhaps.
And so it's also loading other packages are necessary involved. Like, there's no dplyr used here. Right? It's just tidy r and ggplot2. But yet, sometimes this throws this in for about a lot of rhyme or reason. So what you can do and what a learner might want to do in this case is they may not know the best way to just not color by track, so they may ask the bot, hey, in a follow-up chat, can you just do this about coloring each line? Sure enough, you get a new plot out of that, and they're all blue now. But they did more than just changing the color and also change the alpha level inexplicably, from point six to point three.
So okay. Maybe it just thought that was better when there's no color there. So, again, the bot may be making some choices. The AI, you know, chat might make some choices that you don't necessarily agree with, underscores the importance of reviewing the code as you're seeing this to see if it is really something you want. So there's there's stuff more at the end of the post we'll get to about kind of practically how do you handle these situations. But again, she's putting herself in the shoes as somebody who's still not quite comfortable with r and and say the tidy verse yet and trying to get to that more polished answer through additional prompts.
In the end though, some promising trends there. We move on the case study too. More of a data cleaning exercise, which I think will be very common, a problem to solve for a lot of those, using data science in the real world. And she kept this one kind of vague. There was a column in the dataset called membership status where it can be either NA or select to register for a group but she wants us to again instead say closed or open otherwise so there's this time she's using quad to do it. So in this, there's, the first response that it gets is a help cleans up the variable, this recoding.
But note that if you look at the post after you're listening to this, it's using base r to do it, which, again, not bad. Right? But she wanted tidy verse approaches. So she asked the problem, well, can you do this with tidy versus set of base r? Then it pivots over, to the if underscore else function, and then to use that in a mutate call with dplyr. Looks fine. We're getting there. But it also does some additional things that maybe because it's been trained on resources that aren't as up to date as what we have now, that kind of raised a few eyebrows. One of which is using the Magritter pipe instead of the base pipe. So again, that can happen, right, when you have data that's trained on this.
Some of the the styling wasn't quite up to snuff, with the line breaks, and it didn't quite, show everything correctly. That I didn't need a head call for that because it's using a Tibble. So little little little things like that. If you know r well, you know to look for this, but maybe there are better opportunities and additional prompts to make that happen. The last case study is web scraping. This one can get a little messy, right, because there are multiple ways we can do this. Often multiple languages as well, and she kept this prompt really general. She just asked write code for scraping data from this resource on North Carolina weather data and this time using perplexity AI. So she's using different services and each of these the compare and contrast.
Well, this should be the surprise of relatively few people. She got an answer right off the bat that was using Python with the beautiful soup library which often gets talked about in web scraping. So she had to follow-up that prompt with, well, use r instead. And then you get some, you know, relatively readable code. It's using the r vest package, which is, again, one of the great packages in the in the tidy verse to help with scraping. You know pretty utilitarian type code again using the Magritter pipe. And you but there's one thing that it didn't do well.
It didn't reshape the data the way that she wanted it. She wanted to have the months, as rows and the temperatures as columns. It didn't quite do that the first time. So she asked it to do do the reshaping. And then you get a lot of new code to do this, but it's still not quite what she wanted. So she had to try one more time, and she was very explicit this time in the instructions to get two tables, one from January to June and then July to December, and then put them all together and then reshape them. And no. This is where things get really dicey.
The code that she got back for this step actually doesn't work. It doesn't work. And even though it looks like it worked, but as it gives some output, she has no idea how that output got there because the code actually doesn't work. So that's another red flag, folks. Sometimes it may look correct, but you really got to evaluate this. So this was, this is a fun little journey. Mike, I think you've got a lot of thoughts on this too.
[00:25:19] Mike Thomas:
As you're seeing these case studies here, what jumps out at you that people should be watching out for as as you're as you're seeing this? Yeah. I think one of the biggest gotchas and is the fact that, especially in the last case here with perplexity, and I've seen it in all the other services as well, both Claude and, ChatGPT, which are the two that I have been experimenting with the most, admittedly, Claude much more so recently than than ChatGPT, is that they will pretend that they, you know, they'll spit out an answer and make it look like the code should run successfully. Right? And, that's not necessarily the case here where in, you know, Mel's blog post here, it's the same thing happened.
It's spat out the answers, which not only were incorrect, but the code that it it's also spat out to, give those answers doesn't run. So I think that that is going to confuse a lot of new learners. I I think it's going to, you know, I I was listening to a another podcast, shameless plug for the Vanishing Gradients podcast, with Hugo Bowne Anderson, who had, Joe Rees on. And, you know, there's this idea that I think we're gonna create a ton of technical debt in in terms and, you know, in the next maybe five to ten years as more AI generated code gets into code bases. And then when something goes wrong, who's going to solve it?
Because it looks like these AI solutions are not great at handling the edge cases. Right? They're they're generalizations as it is. And I I think that we're not by having students sort of, you know, lean heavily on these AI tools, they're not maybe not learning the fundamentals of programming, which I think is a drawback, to, you know, sort of where we're headed in the future. I think it's it's going to be difficult, for folks who are just starting out in their career to to get jobs unless they really focus on a lot of these fundamentals. And maybe, you know, the folks like us, Eric, may not have as much work to do, I guess, if we're gonna be employing a lot of these AI tools. But when stuff hits the fan, we are going to be extremely highly valued.
So, you know, there's that dichotomy there. There's a lot of talk about agentic AI lately, which to my understanding is like stringing these things together in a process. And if one of these is writing poor code that doesn't execute, how can we expect to multiply that concept in a Markov chain type of, right, environment and not have things go wrong. Right? I don't think we've solved for the single agent, if you will, doing a good enough job, that, you know, in this case, you know, handling code execution. And maybe it's it's better in the Python ecosystem or JavaScript or some languages that are much more heavily used depending on your use case. But I think when it's gets applied to, you know, a lot of business settings, there's there's just domain knowledge there that the AI application, a lot of times, will not have to handle those edge cases and to really fit the code to the problem. So I could rant about I could rant here for for the next hour, but I do want to go through, sort of the the tips and good practices that Mine outlines here at the bottom of the blog post, which I really agree with.
First is, you know, provide context and and engineer your prompts. And I think we all know this, that the more explicit that you can possibly be and verbose in what you want and maybe how you want the AI to go about doing it, I think the more likely that you're going to get accurate results or results that you were looking for. So that whole concept of of prompt prompt engineering, I think, is actually very important. Second, you know, is to check for errors, and it's it's sort of obvious, but don't just take what comes out of the LLM as gospel. Make sure you run it in your console. Make sure you understand what it's doing.
I can't implore, you know, younger data scientists, data analysts in their journey enough to, you know, take a look at what comes back and and don't just don't just use it. Right? Make sure that you understand all of the parts of it. Go line by line. And if there's something that you're not sure about, take a look at what the LLM gave back because sometimes they'll not only provide the response, but they they will also provide sort of a step by step, explanation of the code that it gave back and and why it used things. Or if you're not a % sure, ask the LLM why, you know, why did you include this particular line of code as as opposed to doing something different?
And I think if you are not taking those steps to do that, I think you're gonna be certainly holding yourself back pretty significantly. But if you are, I think it could potentially be just as good of an education and maybe a more streamlined education in a lot of ways compared to, you know, Eric, you and I googling things. That's the way that we did it. Right? For hours and hours on end. And I still catch myself doing that too. It's it's been amazing how it it it's still not a
[00:30:46] Eric Nantz:
a first reflex to me to go to the chatbot. Like, I'm I'm slowly getting there, but I still catch myself on that Stack Overflow exercise.
[00:30:54] Mike Thomas:
Yeah. And then I think, you know, we talk about code smell. That's that's tip number four here. And I think that goes back to probably these LLMs being a little bit behind best practices in present day. So it's using the Magritter pipe instead of the base pipe, for a lot of cases here and and things like that that, you know, unfortunately, it's hard to make progress when the LLMs are trained on historical data. Right? And there's this sort of gap between present best practices and what these AI tools know. So, I don't know how exactly we're going to solve that, but it doesn't make me doesn't make me excited or optimistic, I guess, about the ability for us to continue to to innovate and make progress at the pace that we currently are.
The last couple tips are, you know, potentially starting a new chat. If you feel like you are just, you know, continuing to send prompts to your current chat and just not getting exactly what you're looking for, potentially start fresh and have the LLM sort of ignore all of the prior context that it's perhaps using in your current conversation. Code completion tools, she recommends to to use them sparingly if you're a new user, such as GitHub Copilot, and I would I would certainly echo that sentiment as well. And I think the last tip here is use AI tools for help with getting help, and that's a fantastic idea. I think, sort of the the last one of the last sentences here is leverage an AI tool to develop a rep rex. Maybe that you can share in an issue on GitHub or that you can share in, like, the data science, learning community Slack channel to be able to get help, solving your problem if the LLM is perhaps not doing exactly what you hoped it could do.
[00:32:53] Eric Nantz:
Yeah. I really like that that that feedback there. And it's something that I think the more specific you are in what you're looking for, I think the better off you'll be. And and, again, it's it's kind of an art form to get these prompts in a way to get you to that, quote, unquote, right answer more quickly. I don't think I figured it all out yet, but I did get a lot of great advice from, you know, many leaders in this in this space such as, Joe Chang, the author of Shiny, when he was showing me a lot of those AI tools before Positconf last year. It was amazing to see the effort he took in those prompts to get to that, you know, that data viewer, application that we saw in his presentation and whatnot. It it it's gonna take practice. Definitely takes a a lot of practice, but I really thought try to maybe it's just my nature. I've always been one of those more detail oriented people when I tell maybe a resource at the day job what I need to get this task done, and I'm very explicit about here are the input datasets.
Here are what I'm looking for on the output. Here are the key variables I I need you to look at. I I don't leave anything unturned. My wife will often joke I'm like the menu guy because I always like to go things by a menu so much. Well, I think it can be a good thing in most cases. And I think with prompts, it's better to be specific than to be too general. So that's one thing I've learned over the months or so that I've been using it. And I admit I still am not on the train of code completion yet even though I've done it a little bit here and there. I was jaded by my my first foray of a GitHub Copilot a few years ago and it was just giving me nonsense for my shiny app deployments and I or which is my shiny app code completion and I just never really turned it on since. But I got others that seem to be doing better with this. So maybe I just need to have a fresh look at it in, 2025.
But, overall, lots of thoughts that were provoked here with this post, and I think the key part that you and I and many likely agree with here is that these can help you, but you still need a fundamental, you know, baseline to help judge what is quality code and to not take it literally the first time around. If you have a careful eye, I think it'd be really helpful. But I've seen too many times already people are just taking what it's spitting out and running with it at the risk of going into a complete dead end because they didn't know any better. So hopefully, over time that becomes becomes solvable, but I won't hold my breath.
[00:35:29] Mike Thomas:
Me neither, unfortunately. They can certainly help you, but they can certainly hold you back.
[00:35:48] Eric Nantz:
Well, we mentioned at the outset of the episode how products like Positron are really positioning themselves to be a language multiple language supported development environment. Well, you can also say the same thing with the Quarto ecosystem where you have the ability to leverage, you know, pack languages like R, Python, Julia, and one that I have not used as much until kind of recently, observable for interactive JavaScript. And you may be wondering as an R user, yeah, I've heard about this observable thing either and and and talks about quartal or other blog posts. But what does it really mean for me? How do I get started with it? Our last highlight is a terrific way to to make this happen. It's called observable for our users, and it is a blog post authored by Nicola Rennie.
She does a terrific job here getting straight to the point of how you can still do both r for parts of your data processing, but then turn to observable for your visualization needs. So what she starts off with at the post is using r for what it does very well, which is data wrangling. So she's got, based on a a previous tidy Tuesday dataset that was used based on the Himalayan mountaineering expeditions. That was from January of this year. She imports that dataset into r, does a little bit of cleaning here and there. Not much really. It's just a little bit of deep wire code for filtering and selecting various columns, and you got yourself a nice rectangular type of data frame. And normally, if I was just staying with r, it might then turn to ggplot two or one of the extension packages to do the visualization.
But now it's time to see what observable can do for the visualization needs. So in quarto, just like it was in the predecessor r markdown, you have your code chunks where you give it the language that's being used for that. So we have used r as a language up to this point in the post. Now we're gonna use the o j s code block to start those or o j s side of it. But the first step though is to hand off the data you just process in r over to observable. This is the part that seems kind of magical to me. I I'm I'm sure it's it's documented somewhere. But Quarto comes with a function called o j s underscore define, and you can feed in the object name of that data frame or table in R that you created, and then that is going to become available in any code chunk after with observable nonetheless this is a convenient a convenient way now to get the data you wanted from r into observable.
And so what do you do with observable? There are some things you might need to be aware of for a lot of the plotting libraries is that sometimes it expects things to be in more of a row wise format instead of the columnar format that we often see in our in our datasets in r. So there are functions, such as transpose to get you into that row based format that the visualization function will need later on. So just got an example of using that with the data that was handed off from r to get that into the row based layout. When you're in observable, Azure itself comes with a lot of nice built in libraries for visualization processing and whatnot, but what's nice is that you can bring in other JavaScript based libraries into your observable session.
And this is the other part that seems like magic. In R, we're used to loading a package in your R session via the library function, but that assumes you installed it first. Right? Well, in observable, you can use a function called require, put in the name of that extension library, and it's gonna kinda take care of both installing and using it for you. Because now we're in the world of JavaScript. Right? These libraries are often externally hosted in some form and this is kind of like giving you that pointer to that library so she has an example here where maybe you want to use the d three library you could do that and then give it an annotation like a version number afterwards and you can load that into your session so that's a nice to have I've done this before with the Quero library as well that's kind of like a tidy verse inspired data processing library and and JavaScript, but for the rest of the post she's gonna use what's in observable itself.
One of which, libraries is called observable plot, which as the name sounds like you're gonna do yourself a plot with observable. She starts off with a basic scatterplot with the year on the x axis and the height of the of the mountains on the y axis and once you have your data in the right format that you do previously there is a function conveniently called plot and this is where this kind of resonated with me a little bit with my explorations with plotly and some other JavaScript libraries you got to tell the plot what are the variables they're gonna use on your plot and so there's a function or an attribute called marks and that's where you give it then the data argument but then what is your x variable and your y variable you do that then you got yourself your your basic scatterplot right off the bat but why why stop there let's let's make this a little cleaner shall we do you want to make some adjustments that the x axis is representing years but it's like using actual number or comma notation to do it so we've got to transform that into a more logical type of attribute which would be a date right so there are ways that then you can change your data set using you know what looks like pretty you know tidyverse like code where you can give it a new year or a new variable called p year and convert that to a date object.
So just like R, JavaScript has date objects, numeric objects, you know character objects and whatnot for your datasets then you feed that into your plot and now you've got at least what looks like a more robust year on the x axis and you can give it other nice attributes like using a grid on the back end with a grid true flag and giving it labels as well with, x and y. There's an attribute called label where you can change the label on that, to meet your needs. And then you can get to a color palette where there are built in color palettes. She has a link to the observable documentation where if you want to cover the points by another variable, you can definitely do that such as a region.
There's a pallet she uses called set two and you get a nice selection of the different colors. And then last but not least, you can add some titles as well. And again, all these arguments sound pretty logical. It's just a matter of where you fit them. So you have a title, subtitle, and a caption as well if you wanna put that at the bottom of the plot along with tweaking the size and the margins. So you can get very low level with these, but you can get to a pretty basic looking plot that looks pretty nice right off the bat. And observable, one of the biggest selling points, of course, is the interactivity that'll be built right into your report. If you compile this a quartile, you might get those tool tips. You might be able to zoom in. Lots of different, attributes that you can tap into here.
And then if you want you can save that as a static image afterwards, just as a PNG. There's you could tap into a package like webshot in R to grab that from observable back into R referencing the output of that cell that was doing the visualization. I've only scratched the surface of what's possible here, and Nikola does a great job of, like, orient yourself with a here's how to get started with it and little teasers along the way to really make this really polished. So I'm intrigued what we can do here. And, certainly, the handoff from r, the observable, could not be easier, in the Corto ecosystem. What do you think, Mike?
[00:44:35] Mike Thomas:
Well, I've done a deep dive and it looks like we can't and I'm not sure if you're asking if we can go from Python to OJS or if you were asking if in Quarto we can go from r to Python data. It was more of the latter. Just curious. I don't think the answer is is yes for the latter unless you have reticulate. I think that's what they recommend. Gotcha. But you can go from, obviously, in our data frame to a a JavaScript data frame that OJS can consume using that OJS define function. And similarly, on the Python side, you can go from a Pandas data frame to, again, using OJS define to an OJS data frame.
I have looked beyond that to see if you could do the same with, like, a polar's data frame, but there are no documentation and no links on that. And I can't even find the documentation for if any other, p Python libraries besides pandas are supported for that particular handoff. But obviously, pandas is pretty ubiquitous. Most other Python libraries, even such as Pollers, have a function to convert that type of a data frame to a Pandas data frame pretty quickly if you need to do that handoff. But, yeah, this is a fantastic walkthrough. You know, OJS for blog purposes, I I absolutely love, you know, for static website purposes because you you don't necessarily need that full server behind the page, but you can still get a lot of the interactivity and the tool tips and things like that that OJS, defines. And the syntax isn't that bad. Obviously, coming from a different language, it's gonna look a little bit different.
But if you just sort of take the time, and I think this great this blog post is a great walkthrough and some really nice snippets of OJS code. It's pretty understandable, pretty legible, pretty consumable, and not too scary. I think if somebody was, you know, trying to recreate these types of plots or try to use trying to use OJS for their own use cases, whether that be a blog post or some other type of, dynamic document, I I think you'd find it not too difficult of a process to convert from your R and Python, you know, visualization code to OJS code. And I think there are probably some syntaxes here that I might actually prefer OJS to what we have to wrangle on the R and Python side to get the plot to look the way that we want it to look. You know, one pretty cool thing about OJS that I read up on that I did not know is that the the documentation on the Observable website has an interactive color palette viewer, where you can browse different sequential, diverging, and discrete color palettes.
And the built in options include the ColorBrewer palettes, you know, which we are all, very familiar with on the our side. So I thought that that was cool. Last night, I was, you know, just messing around trying to come up with nice contrast between text and background color on a particular presentation that we are putting together and didn't really have a great workflow for doing that. I wish I had known about that observable, documentation site.
[00:47:42] Eric Nantz:
Yeah. Well, I've linked that in the show notes too. And I've I played a little bit of visualizations here, but my foray recently with observable and and quartile was, I maintain a website in quartile for our our consortium submissions working group. And I have, admittedly a geeky page on there that's meant for authors of the site like me and and other contributors, a way to use observable to dynamically generate the code based on a widget that the user can or the developer has of the possible attendees to a working group meeting. There's a little checkbox interface where you can check the names that attended it. And then below it, there will be a prefilled quartal snippet with one of those call out blocks that you can expand and contrast with the attendees, with the date that you select, and then you can copy that text into a new portal document.
It I mean, it was an over the top. Yeah. Absolutely. But it was a good way to learn nonetheless. So I I use that when I start drafting the minutes, for a working group meeting. Put that developer page up, get the attendees out, and then copy that over, to quartal. But that showed me this connector, which again seemed magical to me, going from r to import a spreadsheet of the possible attendees over to that widget and observable to reference that on the page itself and then give you that interactive element to select the name. So there's there's untapped potential here, and I think in the case where you don't need a server, it's hard to deny the impact that observable is having these really clean and and polished interactive, summaries and visualizations in our in our quarter reports. So I think that the time is now to get a little observable action here. And, speaking of which, we had, a previous workshop, I believe, at our pharma a year or two ago, talking about observable for our users. So I'll put a link to that in the show notes too if you're interested.
And there's a lot more awesome stuff happening in this week's our weekly issue and, you know, of course, we have a link to that full issue in the show notes. Running a bit low on time today, so we probably won't do our additional fines. We'll invite you to check out the excellent issue that Jonathan Carroll has put together for us. Some great, highlights or additional fines that if you do wanna tap into, I'm already seeing some good stuff about Parquet, DuckDV. You know us. We love we love soaking up that content. So there's some great data processing, content there as well.
And if you wanna help the project, one of the best ways to do that is to send us that suggestion for that new resource that you found. Blog post, new package, anything in the world resource that you found, blog post, new package, anything in the world of data science and R, we love to hear about it. So you can do that via a poll request linked in the top right corner, that little octocot there right off the bat. Get a link to this week's, current or upcoming issue draft. Just send a poll request right there, and our curator of the week will be sure to merge that in. And a little birdie tells me that that might be me this coming week. So I could use all the help I can get folks. Send those suggestions my way.
And, also, we love to hear from you on the social medias as well. You can find me on blue sky where I am at our podcast at bsky.social. I'm also on Mastodon where I'm at our podcast at podcast index on social, and I'm on LinkedIn. You can search my name and you'll find me there. And by the time you're listening to this, the the twenty twenty five shiny conference will be underway, and you'll be able to hear my talk on Friday, talking about the cool stuff I'm doing with Knicks and shiny. But it should be a wonderful conference. So if you're,
[00:51:22] Mike Thomas:
not registered for that, go register now because there's some great content coming your way there. And, Mike, we're gonna listen to us find you. I couldn't agree more. We are super super excited for ShinyConf twenty twenty five over here as well. You can find me on blue sky at mike dash thomas dot b s k y dot social, or you can find me on LinkedIn by searching Ketchbrooke Analytics, k e t c h b r o o k, to see what we're up to.
[00:51:47] Eric Nantz:
Excellent stuff. And, again, great to always record a fun episode with you. And, yeah, right after today, we'll be getting our our shiny geek them on at the at the shiny conf, and I gotta finish my slides, folks. So nothing like conference room development. Okay then. Well, that's a good time to sign off here. So we'll wrap up episode 201 of the rweekly highlights, and we'll be back with another edition of r wiki highlights next week.
Hello, friends. We are back with episode 201 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the terrific highlights and other excellent resources that I shared every single week at rweekly.0rg. My name is Eric Nance, and I'm delighted you joined us from Revyar around the world. And, boy, it's been a crazy times in some parts of the world lately, but we're happy you're here. And I'm not joined here alone. I am joined at the hip here virtually by my awesome cohost, Mike Thomas. Mike, how are you doing today?
[00:00:35] Mike Thomas:
Doing pretty well there. Can't complain. The weather out here on the East Coast, about twenty minutes ago, it was snowing, and now it is sunny and beautiful. So it's as crazy as, the world seems to be these days.
[00:00:51] Eric Nantz:
It is. And that's no April Fools a week later. That actually happens, folks. And we had a freeze warning here too where it's like, I thought we were done with this, but, nope, we are not. So my my little hands here are still frigid from from being here in the in the humble basement when did I record this. So first first world problems, I guess. But nonetheless, we got some stuff to heat up our our, our knowledge here with the batch of highlights we're gonna talk about today. And as usual, the our weekly effort is a volunteer effort where every week we have a a new curator, rotating into their shift, if you will. And this week, that was Jonathan Carroll. Again, one of our longtime curators on the project.
He's also does a lot of interesting programming exercises, so definitely check out his blog if you're interested in what he's up to. But as always, he had tremendous help from our fellow Arrowki team members and contributors like all of you around the world with your poll requests and other terrific suggestions. So if you recall, it was back in 2024. The company, Posit, who, of course, have authored what has become one of the standards in data science tooling with the RStudio IDE, that's also been branded as Posit workbench for their enterprise products. Well, they made a splash last year when first, it was kinda quiet, but then it was a big splash at Positconf when they talked about their new IDE called Positron.
And for those that aren't aware, Positron is, in essence a wrapper around Visual Studio code, which has been used heavily in software development for quite a few years now, but with a data science flavor to it. And one of the main selling points is that it is a polyglot type of IDE where you can have r, Python, Julia, and almost any other language that Visual Studio Code supports. You can have that right into your Positron session. I have been using Positron almost exclusively for now about four or five months. It was a little bit here and there in 2024. But with some recent advancements, especially around Nick's, I've been able to drive it a lot more as my daily driver, but I'm not the only one. And our first highlight today, we have a great blog post here called Positron, Current Joys Joys and Pains offered by Athanasia Milenko.
She is a neuroscientist, and she gives us a great recap on her experience after six months of using positron. She first leads off with the positives. We always like talking about the good things before the not so good. As I mentioned in the outset, a very, very useful feature of positron is, like I said, this multiple language support, which doesn't feel like something that was bolted on midway through a product's life cycle. Where if you remember running, say, RStudio in your daily driver and you maybe wanted to do some Python development with it, you would have to use reticulate. Sometimes it wouldn't feel quite as native. You got a lot of handoffs going on there.
But because this is based on Visual Studio Code and it's got access to all the ecosystem that Visual Studio Code brings, you have either extensions or built in support for the common languages, especially for data science, such as Python, such as Julia, and others as well. And if you're, let's say, a JavaScript developer, you can tap into things like ESLint for linting, lots of other extensions. And a new one that came out somewhat recently is the air extension to help you format your code as you're saving your file, both in r and, I believe, on the r side of things. But in Positron, these things are easy to set up, and you don't feel like you're going off label, so to speak, when you use these multiple languages. So while I've been mostly doing R with Positron, I definitely have dabbled in Python before Visual Studio Code, so so it shouldn't be any real difference here. As long as you have your Python environment and your R environment set up, you should be good to go with positron in these multiple languages.
Another thing that is growing on me, it took a little bit, but now it's starting to really become nice is the environment viewer, which now you're starting to see what positron brings to the table as opposed to just straight Visual Studio code, is that this environment viewer definitely takes a lot of inspiration from what we saw in the RStudio IDE. Ways that you can view your dataset in a rectangular kind of display, do filtering on the spot. It may take a little getting used to at first if you're new to it, but once you get going with it, I think there is a lot of great ways you can explore your data there. Not all perfect. We'll get to that a little later, but I think it's it's coming along pretty nicely.
And one nice thing that coming from those that use RStudio is that there was a viewer pane that you could say run either your shiny app or maybe document a report. I knew that would happen. You didn't see it. You could document you can look at a render report in the viewer and whatnot. But guess what? Anything that the RStudio viewer could do back in RStudio itself, positron has support for that as well. This means also if you're maybe not on the portal train and you're still using a framework like blog down, one of my favorite packages for writing a blog with Rmarkdown, you can actually run your preview of the site in that positron viewer as well, just like you could with the r studio ID. So, again, they've done a lot of engineering under the hood to make that pretty seamless.
There's also support, for the add ins ecosystem that our studio brought to the table, again, midway through its life cycle. But because those are basically embedded Shiny apps when you run those most of the time, You can run those in positron just as well. However, not all roses and unicorns here. There's still a few things that are troublesome. So, Mike, I hate to make you be the bearer of bad news, but apparently, yeah, debugging's still not quite a seamless experience. What what does she have to say about that?
[00:07:26] Mike Thomas:
Yeah. In terms of debugging, it's it's not quite as easy as maybe what you would be used to in Versus Code. You know, running code while you're in the debugger, it can be done with, control enter, allows you to to run line by line from your browser statement, or breakpoint or other debugging marker that you've used. And Athanasia notes that if she's moved the cursor for some reason, whenever she does control and enter, it just is stuck at the browser call without running the other code. But if you've moved the cursor manually, it might not work, and and she can't really understand exactly why that is the case. I don't know if this is also the experience for any others. Eric, I know you've been a user of Positron a little bit. I don't know how much debugging you've done or if you've sort of ran into the same thing.
[00:08:20] Eric Nantz:
Oh, here and there I have. Now one thing I've kinda had in my muscle memory is I've been using debuggers both in RStudio and in Positron is I'm I'm one of those old school folks that just likes to type in that console to run code and either hit end to do the next line or just print out, like, an object structure. So but I can I can see I've had a little bit here and there where that focus seems to randomly shift away, and then sometimes I have to quit the debugger with with capital q to get back to the normal session and then try again? So there is some finagling you might have to do, so I can I can see where she's coming from here?
[00:09:02] Mike Thomas:
Absolutely. And and as well, you know, if you are if your previous workflow like Athanasius, in terms of how to run code when you're in the debugger, I think that's exactly what you're talking about. If you would copy and paste sections of code that you you wanted to run into the console and sort of have that interactive experience between the debugger and the console, that's not necessarily gonna work in Positron currently. And I think that's sort of the other big frustration, that she has at this point right now. However, you know, she says that her experience actually in Positron has helped her better understand the R Debugger as a tool and leveraging, you know, the keyboard commands in the debugger, like q to properly exit the debugger, you know, as opposed to maybe doing those things in a point and click fashion when you're in the debugger in our studio. So, you know, I think there's some pros and cons there. In terms of the data viewer, which is how she wraps up this post, it seems like it maps to Versus Code's data viewer, if you're familiar with that. So it's it's pretty good, but I think it lacks a a few features that you may be used to if you came from RStudio.
First being no handling for labeled data, which she points out is something that Shannon Pileggi feels strongly about, and there's a great link to Shannon Pileggi's blog post on labeling data in R. You can explore lists in the viewer where, you know, in RStudio, you can expand and collapse nested lists if you click on them from, you know, your global environment. And the data viewer also tries to guess delimiters for plain text files and doesn't always do a great job. So Athanasi has been preferring to use the Rainbow CSV extension, which is, you know, one of the benefits of Positron being built on the open source fork, I believe, of Versus Code, which enables use of that open VSX universe of extensions, which is is fantastic and something that you wouldn't have had access to as an RStudio user, which is a pretty cool universe for those of us coming from, you know, the Versus Code world.
And then the last thing the blog post wraps up with is being able to sort of set up your Positron environment, leveraging a a JSON file. And I don't believe that you can do something like this with RStudio. Is it mostly point and click to do that?
[00:11:26] Eric Nantz:
They're they're they did laid much later in the in the life of RStudio, give you a way to do a text config file, but it was never, in my opinion, documented very well of how to interact with that directly. You can back it up and then restore. But this I admit what we see here for Positron and by proxy Versus code. Yeah. It's JSON, but it it makes more sense, I think, to configure it well. I agree. And it's really nice to be able to do this. It's it's pretty easy to wrap your head around.
[00:11:56] Mike Thomas:
It should be familiar for Versus Code users, maybe less familiar for RStudio users. But I think taking a couple cracks at it, you'll get used to it really quick.
[00:12:06] Eric Nantz:
Yeah. And and that's a it's a great phrase to to end with. I admit I I may have had a jump start in my positron journey because, Mike, you know this. I was on that whole dev container kick with Visual Studio Code and the r extension with Docker containers. And I was kind of, for my open source work using Versus code extensively for my development environment. And, yeah, I mean, I got pretty long, pretty long way, in most of my Shiny app workflows especially. But there are always some pain points here and there. So when Positron came to be, it wasn't that huge of a of a lift or I should say a shift in in perspective because I've already kind of stress test Versus Code a bit. I still can see if you're coming from our studio itself, this is gonna be a bit of a jump. I think no one will will discount that and that's expected. It'll take some getting used to. That's why I always recommend when something new comes out and let's let's be honest here, it's still considered beta. It's not a production release yet. Although, you know, if we know our history well enough, we'd imagine around September. This is gonna be a production release if you know what I mean.
But my point being is that try with some low risk project, get a feel for it, and then slowly start to more fit into maybe your your daily workflow. But, you know, the best way to to learn is by trying. So I think there's a lot of lot of positive momentum here. Still some paper cuts as they say. That data viewer, aspect without the label attributes, that's actually a kind of a bigger one for a lot of my, colleagues here in life sciences because when we import data from SAS, say, using the Haven package, we get those labels right off the bat. And to be able to see those in the in the viewer is a big win for us. So, hopefully, that that comes to play. And I think the other thing that maybe you have to set your expectations for is that this is a very, very customizable experience.
Like, Versus Code, there are all sorts of blog posts out there, people tricking out their setup with various extensions, various paying layouts. In the end, Positron tries to give you, like, three or four good choices to start with of your paying layout that you can toggle kind of back and forth for the preferences. But I wouldn't I wouldn't get too bogged down into just how much you can customize. I would kind of build it step by step until you get to that that state that you like. So I can tell from Athanasios, post here and her post before this, she's been learning along the way and I think the evolution you can see in her settings file shows just that with, settings for both positron and some of the popular extensions, say, for git, there's a git lens extension and other, you know, nice things with GitHub directly.
Speaking of extensions, you may be wondering, well, sure there's so many out there. Which one should I pick? Well I will put a link in the show notes to really handy what I'll call wrapper extension author by Garrick Adenbuie over at posit I believe it's called positron x 1,000 or something to that effect where it's actually a collection of extensions that he is benefiting from from his role as a shiny developer and an engineer at posit. You can even just see if you don't install it right away, you could just see what's included in that and see if there's specific ones that meet your needs, whether it's from linting, auto spell checking, you know, navigation layout for your your files. There's all sorts of interesting goodies there.
So, again, easy to get bogged down a little bit, but you've got a lot of choices at your fingertips for how you can tailor the, positron experience to meet your needs in your data science journey. Well, we're at two zero one, Mike, and the street continues for yet another, interesting highlight involving artificial intelligence and our world of learning and data science, and we're gonna hone on that keyword of learning here because one perspective might be it's one thing to use AI for getting specific help to problems that we're encountering. Maybe it's a, you know, an error message in our console. Maybe we have to tap into this other language that we rarely use like XML or whatnot in our projects. But going back in time as if you and I had these tools available to us and we're just starting our data science journey, just what would that actually look like? And I think this next highlight here is showing just that, with, author by Minnet Chechnya Rundell who is a professor of statistics at Duke University and also on the Pazit team where she is talking about using AI tools for helping learn the tidy verse.
Now she mentions off the bat, and you're in good company here Minhay, that there are a lot of opinions on AI out there and, we have no we have no shortage of ones that we have here on this very show, but she's gonna frame this in the context of being, you know, a newer learner in data science and maybe trying to do a more common task in, say, r with the tidy verse to get, like, a data analysis or visualization done. So there's a few different case studies here to kinda run the full spectrum of what we're talking about here. The first one is leveraging chat g p t to help with reshaping your data and plotting.
So this example is based on an already worked out example, from the r for data science book using the billboard music chart, dataset where she's asking the the AI, you know, chatbot, use the billboard dataset in the tidy r package to create a visualization of rank versus week number for each song in the dataset. And the answer comes back not too shabby. It's loading the packages right off the bat. It's doing a little bit of, reshaping with the pivot longer function. That's already a step in the right direction. Doing a little bit of conversion for the week to become numeric and then doing a jiji plot, which, you might call a spaghetti plot type setup where each line is one of these, trends and week number on the x axis and the rank on the y axis of that longitudinal profile.
And the so chat GP does give a little explanation at the end and Minay says yeah there are some things that are promising first it is using the tidy verse it's using a newer version of that tidy our function to go from wide to long with pivot longer, and it's got the y axis actually being reversed with zero at the top and a hundred at the bottom which is what she was looking for. But this is where things get tricky, folks. There's always a little bit more than you bargain for here sometimes. Like, I colored the lines by track. That, you know, looks kind of interesting from a visual perspective, I guess, as an art piece. Maybe that's not exactly what you're looking for, perhaps.
And so it's also loading other packages are necessary involved. Like, there's no dplyr used here. Right? It's just tidy r and ggplot2. But yet, sometimes this throws this in for about a lot of rhyme or reason. So what you can do and what a learner might want to do in this case is they may not know the best way to just not color by track, so they may ask the bot, hey, in a follow-up chat, can you just do this about coloring each line? Sure enough, you get a new plot out of that, and they're all blue now. But they did more than just changing the color and also change the alpha level inexplicably, from point six to point three.
So okay. Maybe it just thought that was better when there's no color there. So, again, the bot may be making some choices. The AI, you know, chat might make some choices that you don't necessarily agree with, underscores the importance of reviewing the code as you're seeing this to see if it is really something you want. So there's there's stuff more at the end of the post we'll get to about kind of practically how do you handle these situations. But again, she's putting herself in the shoes as somebody who's still not quite comfortable with r and and say the tidy verse yet and trying to get to that more polished answer through additional prompts.
In the end though, some promising trends there. We move on the case study too. More of a data cleaning exercise, which I think will be very common, a problem to solve for a lot of those, using data science in the real world. And she kept this one kind of vague. There was a column in the dataset called membership status where it can be either NA or select to register for a group but she wants us to again instead say closed or open otherwise so there's this time she's using quad to do it. So in this, there's, the first response that it gets is a help cleans up the variable, this recoding.
But note that if you look at the post after you're listening to this, it's using base r to do it, which, again, not bad. Right? But she wanted tidy verse approaches. So she asked the problem, well, can you do this with tidy versus set of base r? Then it pivots over, to the if underscore else function, and then to use that in a mutate call with dplyr. Looks fine. We're getting there. But it also does some additional things that maybe because it's been trained on resources that aren't as up to date as what we have now, that kind of raised a few eyebrows. One of which is using the Magritter pipe instead of the base pipe. So again, that can happen, right, when you have data that's trained on this.
Some of the the styling wasn't quite up to snuff, with the line breaks, and it didn't quite, show everything correctly. That I didn't need a head call for that because it's using a Tibble. So little little little things like that. If you know r well, you know to look for this, but maybe there are better opportunities and additional prompts to make that happen. The last case study is web scraping. This one can get a little messy, right, because there are multiple ways we can do this. Often multiple languages as well, and she kept this prompt really general. She just asked write code for scraping data from this resource on North Carolina weather data and this time using perplexity AI. So she's using different services and each of these the compare and contrast.
Well, this should be the surprise of relatively few people. She got an answer right off the bat that was using Python with the beautiful soup library which often gets talked about in web scraping. So she had to follow-up that prompt with, well, use r instead. And then you get some, you know, relatively readable code. It's using the r vest package, which is, again, one of the great packages in the in the tidy verse to help with scraping. You know pretty utilitarian type code again using the Magritter pipe. And you but there's one thing that it didn't do well.
It didn't reshape the data the way that she wanted it. She wanted to have the months, as rows and the temperatures as columns. It didn't quite do that the first time. So she asked it to do do the reshaping. And then you get a lot of new code to do this, but it's still not quite what she wanted. So she had to try one more time, and she was very explicit this time in the instructions to get two tables, one from January to June and then July to December, and then put them all together and then reshape them. And no. This is where things get really dicey.
The code that she got back for this step actually doesn't work. It doesn't work. And even though it looks like it worked, but as it gives some output, she has no idea how that output got there because the code actually doesn't work. So that's another red flag, folks. Sometimes it may look correct, but you really got to evaluate this. So this was, this is a fun little journey. Mike, I think you've got a lot of thoughts on this too.
[00:25:19] Mike Thomas:
As you're seeing these case studies here, what jumps out at you that people should be watching out for as as you're as you're seeing this? Yeah. I think one of the biggest gotchas and is the fact that, especially in the last case here with perplexity, and I've seen it in all the other services as well, both Claude and, ChatGPT, which are the two that I have been experimenting with the most, admittedly, Claude much more so recently than than ChatGPT, is that they will pretend that they, you know, they'll spit out an answer and make it look like the code should run successfully. Right? And, that's not necessarily the case here where in, you know, Mel's blog post here, it's the same thing happened.
It's spat out the answers, which not only were incorrect, but the code that it it's also spat out to, give those answers doesn't run. So I think that that is going to confuse a lot of new learners. I I think it's going to, you know, I I was listening to a another podcast, shameless plug for the Vanishing Gradients podcast, with Hugo Bowne Anderson, who had, Joe Rees on. And, you know, there's this idea that I think we're gonna create a ton of technical debt in in terms and, you know, in the next maybe five to ten years as more AI generated code gets into code bases. And then when something goes wrong, who's going to solve it?
Because it looks like these AI solutions are not great at handling the edge cases. Right? They're they're generalizations as it is. And I I think that we're not by having students sort of, you know, lean heavily on these AI tools, they're not maybe not learning the fundamentals of programming, which I think is a drawback, to, you know, sort of where we're headed in the future. I think it's it's going to be difficult, for folks who are just starting out in their career to to get jobs unless they really focus on a lot of these fundamentals. And maybe, you know, the folks like us, Eric, may not have as much work to do, I guess, if we're gonna be employing a lot of these AI tools. But when stuff hits the fan, we are going to be extremely highly valued.
So, you know, there's that dichotomy there. There's a lot of talk about agentic AI lately, which to my understanding is like stringing these things together in a process. And if one of these is writing poor code that doesn't execute, how can we expect to multiply that concept in a Markov chain type of, right, environment and not have things go wrong. Right? I don't think we've solved for the single agent, if you will, doing a good enough job, that, you know, in this case, you know, handling code execution. And maybe it's it's better in the Python ecosystem or JavaScript or some languages that are much more heavily used depending on your use case. But I think when it's gets applied to, you know, a lot of business settings, there's there's just domain knowledge there that the AI application, a lot of times, will not have to handle those edge cases and to really fit the code to the problem. So I could rant about I could rant here for for the next hour, but I do want to go through, sort of the the tips and good practices that Mine outlines here at the bottom of the blog post, which I really agree with.
First is, you know, provide context and and engineer your prompts. And I think we all know this, that the more explicit that you can possibly be and verbose in what you want and maybe how you want the AI to go about doing it, I think the more likely that you're going to get accurate results or results that you were looking for. So that whole concept of of prompt prompt engineering, I think, is actually very important. Second, you know, is to check for errors, and it's it's sort of obvious, but don't just take what comes out of the LLM as gospel. Make sure you run it in your console. Make sure you understand what it's doing.
I can't implore, you know, younger data scientists, data analysts in their journey enough to, you know, take a look at what comes back and and don't just don't just use it. Right? Make sure that you understand all of the parts of it. Go line by line. And if there's something that you're not sure about, take a look at what the LLM gave back because sometimes they'll not only provide the response, but they they will also provide sort of a step by step, explanation of the code that it gave back and and why it used things. Or if you're not a % sure, ask the LLM why, you know, why did you include this particular line of code as as opposed to doing something different?
And I think if you are not taking those steps to do that, I think you're gonna be certainly holding yourself back pretty significantly. But if you are, I think it could potentially be just as good of an education and maybe a more streamlined education in a lot of ways compared to, you know, Eric, you and I googling things. That's the way that we did it. Right? For hours and hours on end. And I still catch myself doing that too. It's it's been amazing how it it it's still not a
[00:30:46] Eric Nantz:
a first reflex to me to go to the chatbot. Like, I'm I'm slowly getting there, but I still catch myself on that Stack Overflow exercise.
[00:30:54] Mike Thomas:
Yeah. And then I think, you know, we talk about code smell. That's that's tip number four here. And I think that goes back to probably these LLMs being a little bit behind best practices in present day. So it's using the Magritter pipe instead of the base pipe, for a lot of cases here and and things like that that, you know, unfortunately, it's hard to make progress when the LLMs are trained on historical data. Right? And there's this sort of gap between present best practices and what these AI tools know. So, I don't know how exactly we're going to solve that, but it doesn't make me doesn't make me excited or optimistic, I guess, about the ability for us to continue to to innovate and make progress at the pace that we currently are.
The last couple tips are, you know, potentially starting a new chat. If you feel like you are just, you know, continuing to send prompts to your current chat and just not getting exactly what you're looking for, potentially start fresh and have the LLM sort of ignore all of the prior context that it's perhaps using in your current conversation. Code completion tools, she recommends to to use them sparingly if you're a new user, such as GitHub Copilot, and I would I would certainly echo that sentiment as well. And I think the last tip here is use AI tools for help with getting help, and that's a fantastic idea. I think, sort of the the last one of the last sentences here is leverage an AI tool to develop a rep rex. Maybe that you can share in an issue on GitHub or that you can share in, like, the data science, learning community Slack channel to be able to get help, solving your problem if the LLM is perhaps not doing exactly what you hoped it could do.
[00:32:53] Eric Nantz:
Yeah. I really like that that that feedback there. And it's something that I think the more specific you are in what you're looking for, I think the better off you'll be. And and, again, it's it's kind of an art form to get these prompts in a way to get you to that, quote, unquote, right answer more quickly. I don't think I figured it all out yet, but I did get a lot of great advice from, you know, many leaders in this in this space such as, Joe Chang, the author of Shiny, when he was showing me a lot of those AI tools before Positconf last year. It was amazing to see the effort he took in those prompts to get to that, you know, that data viewer, application that we saw in his presentation and whatnot. It it it's gonna take practice. Definitely takes a a lot of practice, but I really thought try to maybe it's just my nature. I've always been one of those more detail oriented people when I tell maybe a resource at the day job what I need to get this task done, and I'm very explicit about here are the input datasets.
Here are what I'm looking for on the output. Here are the key variables I I need you to look at. I I don't leave anything unturned. My wife will often joke I'm like the menu guy because I always like to go things by a menu so much. Well, I think it can be a good thing in most cases. And I think with prompts, it's better to be specific than to be too general. So that's one thing I've learned over the months or so that I've been using it. And I admit I still am not on the train of code completion yet even though I've done it a little bit here and there. I was jaded by my my first foray of a GitHub Copilot a few years ago and it was just giving me nonsense for my shiny app deployments and I or which is my shiny app code completion and I just never really turned it on since. But I got others that seem to be doing better with this. So maybe I just need to have a fresh look at it in, 2025.
But, overall, lots of thoughts that were provoked here with this post, and I think the key part that you and I and many likely agree with here is that these can help you, but you still need a fundamental, you know, baseline to help judge what is quality code and to not take it literally the first time around. If you have a careful eye, I think it'd be really helpful. But I've seen too many times already people are just taking what it's spitting out and running with it at the risk of going into a complete dead end because they didn't know any better. So hopefully, over time that becomes becomes solvable, but I won't hold my breath.
[00:35:29] Mike Thomas:
Me neither, unfortunately. They can certainly help you, but they can certainly hold you back.
[00:35:48] Eric Nantz:
Well, we mentioned at the outset of the episode how products like Positron are really positioning themselves to be a language multiple language supported development environment. Well, you can also say the same thing with the Quarto ecosystem where you have the ability to leverage, you know, pack languages like R, Python, Julia, and one that I have not used as much until kind of recently, observable for interactive JavaScript. And you may be wondering as an R user, yeah, I've heard about this observable thing either and and and talks about quartal or other blog posts. But what does it really mean for me? How do I get started with it? Our last highlight is a terrific way to to make this happen. It's called observable for our users, and it is a blog post authored by Nicola Rennie.
She does a terrific job here getting straight to the point of how you can still do both r for parts of your data processing, but then turn to observable for your visualization needs. So what she starts off with at the post is using r for what it does very well, which is data wrangling. So she's got, based on a a previous tidy Tuesday dataset that was used based on the Himalayan mountaineering expeditions. That was from January of this year. She imports that dataset into r, does a little bit of cleaning here and there. Not much really. It's just a little bit of deep wire code for filtering and selecting various columns, and you got yourself a nice rectangular type of data frame. And normally, if I was just staying with r, it might then turn to ggplot two or one of the extension packages to do the visualization.
But now it's time to see what observable can do for the visualization needs. So in quarto, just like it was in the predecessor r markdown, you have your code chunks where you give it the language that's being used for that. So we have used r as a language up to this point in the post. Now we're gonna use the o j s code block to start those or o j s side of it. But the first step though is to hand off the data you just process in r over to observable. This is the part that seems kind of magical to me. I I'm I'm sure it's it's documented somewhere. But Quarto comes with a function called o j s underscore define, and you can feed in the object name of that data frame or table in R that you created, and then that is going to become available in any code chunk after with observable nonetheless this is a convenient a convenient way now to get the data you wanted from r into observable.
And so what do you do with observable? There are some things you might need to be aware of for a lot of the plotting libraries is that sometimes it expects things to be in more of a row wise format instead of the columnar format that we often see in our in our datasets in r. So there are functions, such as transpose to get you into that row based format that the visualization function will need later on. So just got an example of using that with the data that was handed off from r to get that into the row based layout. When you're in observable, Azure itself comes with a lot of nice built in libraries for visualization processing and whatnot, but what's nice is that you can bring in other JavaScript based libraries into your observable session.
And this is the other part that seems like magic. In R, we're used to loading a package in your R session via the library function, but that assumes you installed it first. Right? Well, in observable, you can use a function called require, put in the name of that extension library, and it's gonna kinda take care of both installing and using it for you. Because now we're in the world of JavaScript. Right? These libraries are often externally hosted in some form and this is kind of like giving you that pointer to that library so she has an example here where maybe you want to use the d three library you could do that and then give it an annotation like a version number afterwards and you can load that into your session so that's a nice to have I've done this before with the Quero library as well that's kind of like a tidy verse inspired data processing library and and JavaScript, but for the rest of the post she's gonna use what's in observable itself.
One of which, libraries is called observable plot, which as the name sounds like you're gonna do yourself a plot with observable. She starts off with a basic scatterplot with the year on the x axis and the height of the of the mountains on the y axis and once you have your data in the right format that you do previously there is a function conveniently called plot and this is where this kind of resonated with me a little bit with my explorations with plotly and some other JavaScript libraries you got to tell the plot what are the variables they're gonna use on your plot and so there's a function or an attribute called marks and that's where you give it then the data argument but then what is your x variable and your y variable you do that then you got yourself your your basic scatterplot right off the bat but why why stop there let's let's make this a little cleaner shall we do you want to make some adjustments that the x axis is representing years but it's like using actual number or comma notation to do it so we've got to transform that into a more logical type of attribute which would be a date right so there are ways that then you can change your data set using you know what looks like pretty you know tidyverse like code where you can give it a new year or a new variable called p year and convert that to a date object.
So just like R, JavaScript has date objects, numeric objects, you know character objects and whatnot for your datasets then you feed that into your plot and now you've got at least what looks like a more robust year on the x axis and you can give it other nice attributes like using a grid on the back end with a grid true flag and giving it labels as well with, x and y. There's an attribute called label where you can change the label on that, to meet your needs. And then you can get to a color palette where there are built in color palettes. She has a link to the observable documentation where if you want to cover the points by another variable, you can definitely do that such as a region.
There's a pallet she uses called set two and you get a nice selection of the different colors. And then last but not least, you can add some titles as well. And again, all these arguments sound pretty logical. It's just a matter of where you fit them. So you have a title, subtitle, and a caption as well if you wanna put that at the bottom of the plot along with tweaking the size and the margins. So you can get very low level with these, but you can get to a pretty basic looking plot that looks pretty nice right off the bat. And observable, one of the biggest selling points, of course, is the interactivity that'll be built right into your report. If you compile this a quartile, you might get those tool tips. You might be able to zoom in. Lots of different, attributes that you can tap into here.
And then if you want you can save that as a static image afterwards, just as a PNG. There's you could tap into a package like webshot in R to grab that from observable back into R referencing the output of that cell that was doing the visualization. I've only scratched the surface of what's possible here, and Nikola does a great job of, like, orient yourself with a here's how to get started with it and little teasers along the way to really make this really polished. So I'm intrigued what we can do here. And, certainly, the handoff from r, the observable, could not be easier, in the Corto ecosystem. What do you think, Mike?
[00:44:35] Mike Thomas:
Well, I've done a deep dive and it looks like we can't and I'm not sure if you're asking if we can go from Python to OJS or if you were asking if in Quarto we can go from r to Python data. It was more of the latter. Just curious. I don't think the answer is is yes for the latter unless you have reticulate. I think that's what they recommend. Gotcha. But you can go from, obviously, in our data frame to a a JavaScript data frame that OJS can consume using that OJS define function. And similarly, on the Python side, you can go from a Pandas data frame to, again, using OJS define to an OJS data frame.
I have looked beyond that to see if you could do the same with, like, a polar's data frame, but there are no documentation and no links on that. And I can't even find the documentation for if any other, p Python libraries besides pandas are supported for that particular handoff. But obviously, pandas is pretty ubiquitous. Most other Python libraries, even such as Pollers, have a function to convert that type of a data frame to a Pandas data frame pretty quickly if you need to do that handoff. But, yeah, this is a fantastic walkthrough. You know, OJS for blog purposes, I I absolutely love, you know, for static website purposes because you you don't necessarily need that full server behind the page, but you can still get a lot of the interactivity and the tool tips and things like that that OJS, defines. And the syntax isn't that bad. Obviously, coming from a different language, it's gonna look a little bit different.
But if you just sort of take the time, and I think this great this blog post is a great walkthrough and some really nice snippets of OJS code. It's pretty understandable, pretty legible, pretty consumable, and not too scary. I think if somebody was, you know, trying to recreate these types of plots or try to use trying to use OJS for their own use cases, whether that be a blog post or some other type of, dynamic document, I I think you'd find it not too difficult of a process to convert from your R and Python, you know, visualization code to OJS code. And I think there are probably some syntaxes here that I might actually prefer OJS to what we have to wrangle on the R and Python side to get the plot to look the way that we want it to look. You know, one pretty cool thing about OJS that I read up on that I did not know is that the the documentation on the Observable website has an interactive color palette viewer, where you can browse different sequential, diverging, and discrete color palettes.
And the built in options include the ColorBrewer palettes, you know, which we are all, very familiar with on the our side. So I thought that that was cool. Last night, I was, you know, just messing around trying to come up with nice contrast between text and background color on a particular presentation that we are putting together and didn't really have a great workflow for doing that. I wish I had known about that observable, documentation site.
[00:47:42] Eric Nantz:
Yeah. Well, I've linked that in the show notes too. And I've I played a little bit of visualizations here, but my foray recently with observable and and quartile was, I maintain a website in quartile for our our consortium submissions working group. And I have, admittedly a geeky page on there that's meant for authors of the site like me and and other contributors, a way to use observable to dynamically generate the code based on a widget that the user can or the developer has of the possible attendees to a working group meeting. There's a little checkbox interface where you can check the names that attended it. And then below it, there will be a prefilled quartal snippet with one of those call out blocks that you can expand and contrast with the attendees, with the date that you select, and then you can copy that text into a new portal document.
It I mean, it was an over the top. Yeah. Absolutely. But it was a good way to learn nonetheless. So I I use that when I start drafting the minutes, for a working group meeting. Put that developer page up, get the attendees out, and then copy that over, to quartal. But that showed me this connector, which again seemed magical to me, going from r to import a spreadsheet of the possible attendees over to that widget and observable to reference that on the page itself and then give you that interactive element to select the name. So there's there's untapped potential here, and I think in the case where you don't need a server, it's hard to deny the impact that observable is having these really clean and and polished interactive, summaries and visualizations in our in our quarter reports. So I think that the time is now to get a little observable action here. And, speaking of which, we had, a previous workshop, I believe, at our pharma a year or two ago, talking about observable for our users. So I'll put a link to that in the show notes too if you're interested.
And there's a lot more awesome stuff happening in this week's our weekly issue and, you know, of course, we have a link to that full issue in the show notes. Running a bit low on time today, so we probably won't do our additional fines. We'll invite you to check out the excellent issue that Jonathan Carroll has put together for us. Some great, highlights or additional fines that if you do wanna tap into, I'm already seeing some good stuff about Parquet, DuckDV. You know us. We love we love soaking up that content. So there's some great data processing, content there as well.
And if you wanna help the project, one of the best ways to do that is to send us that suggestion for that new resource that you found. Blog post, new package, anything in the world resource that you found, blog post, new package, anything in the world of data science and R, we love to hear about it. So you can do that via a poll request linked in the top right corner, that little octocot there right off the bat. Get a link to this week's, current or upcoming issue draft. Just send a poll request right there, and our curator of the week will be sure to merge that in. And a little birdie tells me that that might be me this coming week. So I could use all the help I can get folks. Send those suggestions my way.
And, also, we love to hear from you on the social medias as well. You can find me on blue sky where I am at our podcast at bsky.social. I'm also on Mastodon where I'm at our podcast at podcast index on social, and I'm on LinkedIn. You can search my name and you'll find me there. And by the time you're listening to this, the the twenty twenty five shiny conference will be underway, and you'll be able to hear my talk on Friday, talking about the cool stuff I'm doing with Knicks and shiny. But it should be a wonderful conference. So if you're,
[00:51:22] Mike Thomas:
not registered for that, go register now because there's some great content coming your way there. And, Mike, we're gonna listen to us find you. I couldn't agree more. We are super super excited for ShinyConf twenty twenty five over here as well. You can find me on blue sky at mike dash thomas dot b s k y dot social, or you can find me on LinkedIn by searching Ketchbrooke Analytics, k e t c h b r o o k, to see what we're up to.
[00:51:47] Eric Nantz:
Excellent stuff. And, again, great to always record a fun episode with you. And, yeah, right after today, we'll be getting our our shiny geek them on at the at the shiny conf, and I gotta finish my slides, folks. So nothing like conference room development. Okay then. Well, that's a good time to sign off here. So we'll wrap up episode 201 of the rweekly highlights, and we'll be back with another edition of r wiki highlights next week.