Creating retro-gaming sprites rendered from the comforts of R? Yes we can! Plus an honest take on the utility of Github's Copilot Workspace in the context of package development, and taking the concept of code trees to another level with treesitter.
Episode Links
Episode Links
- This week's curator: Ryo Nakagawara - @Rby[email protected] (Mastodon) & @RbyRyo) (X/Twitter)
- Tile-style sprite delight
- Some thoughts after a trial run of GitHub's Copilot Workspace
- Extracting names of functions defined in a script with treesitter
- Entire issue available at rweekly.org/2024-W30
- tree-sitter-r https://github.com/r-lib/tree-sitter-r
- Shiny.telemetry 0.3.0 https://www.appsilon.com/post/shiny-telemetry-0-3-0-update
- Introduction to R with the Tidyverse https://introduction-r-tidyverse.netlify.app/session1_notes
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon) and @theRcast (X/Twitter)
- Mike Thomas: @mike[email protected] (Mastodon) and @mikeketchbrook (X/Twitter)
- Moonlight Vibin' - Mega Man X5 - DCT - https://ocremix.org/remix/OCR02053
- Forest Through the Trees - Shea's Violin - Final Fantasy Mystic Quest - https://ocremix.org/remix/OCR04484
[00:00:03]
Eric Nantz:
Hello, friends. We're back with episode a 172 of the Art Wicked Highlights podcast. If you're new to the show, this is the weekly podcast where we talk about the latest highlights and awesome additional resources that are shared every single week in this week's our weekly issue. My name is Eric Nantz, and I'm delighted that you join us wherever you are around the world. It's hard to believe July is almost over, but, of course, we got a lot more great art content to talk about with you today. And I never do this alone as you know. So at the virtual hip right here on this split screen here is my cohost, Mike Thomas. Mike, how are you doing today? Doing well, Eric, at the virtual hip. Only for a couple weeks until we get to see each other again in Seattle maybe? That's right. Yeah. Her her the countdown is on, so I gotta get all my bits sorted out and hopefully get that get that talk ready to go, but all all in good time. But, yes, the you might say the nerves are starting to hit a little bit, but again, it'd be great to see you again and see all the the wonderful peeps in data science and other sectors that frequent that conference every year.
[00:01:06] Mike Thomas:
Yes. I'm super excited as well. We have a ton of clients that are going to this conference this year, so I think it's gonna be a big one. I'm I'm almost surprised how many folks I I know that are gonna be there. So it's gonna be it's gonna be a party.
[00:01:19] Eric Nantz:
Yeah. Party as well. Yeah. You're gonna be high in demand, my friend. I hope they even get a few minutes of you after all that.
[00:01:26] Mike Thomas:
Quick short story is last time that last year when we were at Pawsit Conference, Eric and I met and it wasn't 2 minutes after, you know, we met, I guess, for the first time in real life that somebody came up and asked me to take a picture of you and them because you are the celebrity at this conference, Certainly not me. So Oh. Goodness. Goodness. Yeah. That was how it started.
[00:01:49] Eric Nantz:
That's yeah. That's how it always starts. Yeah. But you know, I think the tables will turn this time around. But, nonetheless, we're gonna have fun connecting with everybody. And, yeah, we're we're both gonna we're gonna have a lot going on there, but we got a lot going on here, my friend, with the mic in our hands. So let's get going here. Our issue this week was curated by Ryo Nakagorua, another one of our OG curators and longtime contributors to our weekly. And as always, he had tremendous help from our fellow Rwicky team members and contributors like you all around the world with your poll requests and suggestions.
As you may have heard in previous episodes, I admit the teas has input ever over like a few months ago that I think who needs that like game development engine, like unreal or anything. We can use our developer games. And there is yet another milestone in this workflow here that we're gonna talk about leading off this episode. And we are talking about the latest blog post from Matt Dre who has been on this quest to leverage not only tooling that he's creating, but also augmenting some of the awesome tooling that's being created by Mike Chang who you may also known as cool but useless on the social interwebs.
[00:03:01] Mike Thomas:
So probably the highlight of this blog post is that we know the full name of of Mike FC. Cool but useless. Right?
[00:03:08] Eric Nantz:
You are correct, sir. This may be the first time we've ever seen that spelled out. So I guess the mystery is over.
[00:03:15] Mike Thomas:
I know. I apologize to Mike if he was trying to, keep that from us for because I he did a pretty good job for a long time, until this blog by Matt has come along.
[00:03:26] Eric Nantz:
Yeah. You know, I I liken this to, you know, for those of you that follow pro wrestling, there are always that those days in the eighties where you didn't know that Hulk Hogan is really named Terry Beleja until, like, you were much older. Maybe it's just one of those reveals, like the gimmick the gimmick. The time's up for the gimmick, but nonetheless, we we know who you are, Mike, nonetheless. But, Yes. You've been you've been hard at work on these packages. So what Matt talks about in this post here is that he is leverage what Micah's created called the Nara package, which is basically super charging R's ability to produce raster based graphics, but tailor made to things like pixel art.
And he, and Matt what Matt has done is he's augmented the narrow package tooling with his previous package called rogue like, which if you recall from a highlight episode probably 4 or 5 months ago, was a way in r to create these, in essence, randomly generated dungeon crawlers all in text based format. So you get a nice ASCII art of the dungeon with the layouts, and it would respond to your key presses to have you as the player with a little, you know, maybe p symbol in the middle there going up, down, left, right, and then the randomly generated enemies or other artifacts would move along alongside you.
So he simply now merged the ability to do this, but now instead of the textual representation of those dungeons, he is augmenting some nice retro style looking sprites here. In fact, he's leveraging an open source framework, from a, username Kenny. He has what's called a tiny asset pack of all these different sprite arts that are 16 by 16 pickle pixels. So really, really nice, and if you ever played those RPGs of yesteryear, like, say, dragon warrior or final fantasy, yeah, these are gonna look right at home in those in those artwork.
So how does this actually work? Well, as I mentioned, there is a lot of the underpinnings have already been made by Matt in this roguelike package, But now instead of the textual representation, he is mapping that mesh which basically is a matrix of, you know, 16 by 16 or whatever dimension where each cell has either a movable space or an obstacle, the player or the enemy. And so that's already randomly generated up the front, but then that is being trans translated into these tiles that are being created by the narrow package. So in the blog post, you see the textual structure of it, which again will look very similar to if you use Matt's roguelike package, but then that handoff is then translated, like I said, into this tile based board. And now you see the nice image going going under the matrix representation.
It just looks literally picked out of a retro game. It it really is absolutely amazing what's going on here. And you can do all sorts of things of this. Obviously, you can make this as big or small as you want, but there is a lot more that is coming, and that's, tooling here. He wants to make, you know, really a true loop of the game, which if you played RPGs or roguelike dungeon crawlers in the past, you know that as you move along the board, you get a random encounter with an enemy, do the battle, win or lose, rinse and repeat. But, of course, it's almost like an infinite loop per se. So he's looking at ways of having that true kind of gaming style loop in the back end here. And of course, like any RPG, you're gonna have it after your inventory at some point. Right? You gotta have those weapons, those potions, those antidotes when you get poisoned or whatnot. So, obviously, this is probably gonna be a huge rabbit hole if they choose to go down this route. But I am very much eagerly watching this. But, of course, the first major step is what the user actually sees.
So what's seen here with this, package that's Matt's created, he now calls a tile based. This is the start, folks. Like I said, who needs to pay for that unreal engine or that unity engine? Boot up your r console and go to town. Right, Mike? Absolutely. And it's you know, just reminds me of the old adage that
[00:08:02] Mike Thomas:
R is only a programming language for statistical analysis. Right, Eric? Can't can't do anything else well. Yeah. Not at all. Not at all. And this also throws me back to, I think these graphics are like equivalent to at least, you know, the the Game Boy Color at times of the world. This is reminding me of my Pokemon Blue game that I used to to play on long car rides quite often. Oh, yes. I think the graphics are are quite akin to that. It's it's pretty incredible that I think, as you mentioned, Eric, there's only like 4 they look like emojis, but I guess we're calling them, the these objects from a tiny asset pack which is some resource out there that has all of these different 16 by 16 pixels that you can use. I imagine that they're like Creative Commons licensed or something like that, which is why Matt chose to to use them and incorporate them and this whole entire game it looks like the graphics are just made from these 4 different emojis if you want to call that which is it's pretty incredible.
The idea that you know the the things that we're looking forward to here in the next iterations of this potentially could be, as you mentioned, a true game loop, a way to have some sort of an inventory system. The sound generation is really cool to be able to have some sort of a soundtrack to this game as well. I think I've seen some some more things from Mike FC coming out on Mastodon lately, if I'm not, if I'm not mistaken, around our packages that are doing some audio things. So maybe there's a potential chance that that could get integrated into this package as well but, you know, it's it's hard to do this justice in audio form. You really have to check out the blog post, see the visuals that are there, install the package from GitHub yourself, and try not to waste a couple hours, going down, you know, playing games in R. I I I dare you to try. And so this is this is really really cool stuff. I think it's a fun way to start off the highlights, this week, and I'm looking forward to what else is to come here. Yeah. The creativity
[00:10:09] Eric Nantz:
possibilities here are practically endless. And, of course, I as I'm watching this, I'm wondering, well, how would you be able to distribute this? Can you imagine a web assembly app that puts all this together? Good grief. I mean, I've seen, you know, maybe I shouldn't say too much here on audio, but I'll say it anyway. There are even on the Internet archives, some of these web assembly powered, you know, emulators of the classic arcade games that have long passed your IP windows, but it's all in your browser. It's like using native JavaScript to do it. Just imagine if, you know well, you know, we've been watching the WebAssembly space quite a bit. I wouldn't put a pass either me or someone else maybe working with Matt or Mike to throw their hand at this and see what happens because just imagine how easily you could distribute something like this. Just mind blowing possibilities here. Absolutely. No. Sort of reminds me that that
[00:11:04] Mike Thomas:
the shiny contest is back. I probably should have saved that for my additional highlight, but I saw that, yesterday. I'm come across on my social media feeds, but it sort of reminds me of that the Appsilon app. I believe that was like the shark or underwater. That's right. The Shark Attack app. Yeah. That was awesome. That can be hosted on WebAssembly. Ivan from my team has recently, you know, published his first shiny live experimental app that's a card game. So, yeah, I think the the gaming Shiny Live, you know, crossover here is is gonna happen pretty soon.
[00:11:39] Eric Nantz:
It's not a matter of if, Mike. It's when.
[00:11:43] Mike Thomas:
Sounds like you found your next side project.
[00:11:55] Eric Nantz:
Speaking of trying things out, unless you've been living under a rock, you know that some of the biggest advancements in our world attack have been the use of generative AI and large language models to help with all sorts of things in our daily lives and especially in the world of software development. You've sure you've heard about efforts such as what Microsoft at piloted years ago with what's called copilot, which if you opt into that in your IDE such as visual studio code, also there are plugins to this with, say, our studio and whatnot. You'll get these, you know, auto completed like suggestions as you're typing out code. Maybe it's some boiler plate for a function call. Maybe it's helping flesh out some additional parameters or whatnot. Well, there obviously this space is moving quite fast and our next highway today talks about some recent findings that have been explored by the Epiverse team on their explorations of what's called the copilot workspace initiative.
So as I mentioned, this is coming from the epiverse blog. It's got a set of authors here. We got Joshua Lambert, James Azzam, Pratik Gupta, and Adam Kucharski. Hopefully, I said those right. They have teamed up here to talk about as they've been watching this space of how AI and LOM models are helping with development. What would be the what would be the situation of trying to leverage this new copilot workspace, which is kinda taking what was mentioned earlier by me in this copilot initiative to the next level and not just auto complete code as you're typing, but actually take a set of requirements that are surface Savia GitHub issue and just see how to actually produce the code or produce a solution to the problem at hand.
So they decide, let's do an experiment here. They teamed up with a group of professors and their organization to do 3 different experiments with this copilot workspace to see how it works in the real world. The first experiment, and they tried to go, I guess, from easy to difficult as we go through this is say, they have a function in their r package that has been internal up to this point, but maybe they got user feedback and says, hey. You know what? That's a useful function. Maybe we should export that. So using their epi now 2 package, they looked at an existing issue that was, again, was already filed about, hey, this function called epi now 2_cme c m d stand underscore model function should be exported.
So they let the copilot workspace turn loose on this and it did get the job done, albeit in ways that probably aren't as intuitive to an R user. So let me explain a bit by bit here. The Copilot workspace did determine that, oh, you know what? We need to replace that keyword of internal in the r oxygen documentation of that function and replace that with export and also having to update the namespace. Now there were some little shenanigans here because apparently it does change the formatting one of their other function arguments. I guess doing some text tidying up, which again, yeah, that's fine and all.
But in the end, it did technically get the job done. But this Copilot workspace is not intelligent enough to understand how documentation is updated with the modern r tooling for package development, such as using things like dev tools document or, you know, more natively of our oxygen to the re reupdate the name space dynamically. It did it itself manually like you would maybe in the early days of package development. Package development. So context itself, not quite there yet, but you can't say it didn't get the job done with this. It just did it in a much more manual fashion. So, you know, so far, you know, pretty promising.
Let's take the difficulty up a notch, Mike, because now in the next experiment, they wanna add a new model to the package, albeit not too difficult from a complexity perspective, but this is upping the ante into what the Copilot workspace can do. How did this one fare? What do you think? Yeah. Not quite as well, Eric. They wanted to add what's called a simple epidemic model,
[00:16:21] Mike Thomas:
to the r package, that contains, you know, a bunch of different models as well. So they, I believe, created a new issue if I'm mistaken, to add a basic SIR model, it's called, with a couple sentences on exactly what they were looking for. What happened was get Copilot created a script, a an R script with a naming convention that followed the naming convention of the other modeling scripts that they had. This one was, you know, within the R directory which was good and, the name of the script was model_sir which again, you know, follow the the naming conventions that they've used in this epidemicsr package that they have right now.
And I think this follows a lot of sort of the the same things that I've seen, you know, over and over again with Copilot, with chatGPT, with some of this, it gets you close to the right answer and puts down, you know, some of the the the right things that you would want there, but not in the way that you would necessarily want that things organized, if you will. You know, so the code that was generated, you know, constructed that basic SIR model, used roxigeon 2 as as well, to document the code, but but a lot of aspects of the code, you know, didn't match what they asked for in that issue. The code contained what they call some inadvisable coding practices in R, you know, what we like to call code smells, and the model it self, you know, followed the standard set of of differential equations that are solved using the the desolve which is one, that they had actually requested in their issue, but it didn't have any options to input, you know, things that are are really important to them like interventions, which is, you know, what they the the Copilot had suggested that it would actually include but but failed to do so.
The other downside is that they they use the require, dissolve they use the require function to import the dissolve action, function package, excuse me, in the body of the generated code. And as you know, Eric, when we are developing our packages, that's not something that you want to include in your function. Right? Yeah. The smell was stronger that one. Oh, my goodness. That one stinks to high heaven. And you know, we use a lot of utility functions like from the dev tools and they use this package if we want to leverage a new package within our our package in a right in a proper way we could do you know use this use underscore package dissolve Right? And that'll add that to our description file.
It might add it to the namespace where appropriate. And obviously, we could, request the use of that package in our roxygen, right, import or import from code decoration above that particular function. So that's a pretty bad one right there as well. So you know a a few different code smells here. I don't think that the function itself accomplished exactly what they were looking for that model to do as well as, you know, some of the kind of ugly best practice things. So we're we're starting to head a little bit in the wrong direction here as we get into experiment 3.
[00:19:44] Eric Nantz:
Yeah. And so talk about upping the ante a little bit, but, yeah, it's something that is very relatable to every one of us that are developing, you know, intricate code bases. They wanted to see if this Copilot works makes it actually do an intelligently driven code review of the package itself. As you know, as you get, you know, maybe new features put in bug fixes or whatnot, you get that, point of eventual re release, maybe an update you wanna release on crayon or whatnot. You wanna have that code review to make sure that all the things are looking good. You're being efficient with memory usage, efficient with coding best practices. I always saw a little glimpse that, yeah, it may not be the best at coding practices.
So what they are expecting to see is hopefully in this issue of asking it to do the code review, they would have a documented set with, you know, links to particular snippets of code or maybe things could be optimized or maybe even just asking questions about the code or whatnot. Well, bad news is it didn't actually do any analysis of the code itself. It basically regurgitated some of the changes that were already described in the poll request and looking at change log, I e, from the news file and just kinda bundling all that together as a narrative, which in essence means that it does a great job of reading the news that is not so much looking at the actual code than the thing changes that could be made to make the code better.
So this is where, you know, humans aren't being replaced on this one by by a long shot. But what I do appreciate is a, they gave this, you know, 3, again, realistic use cases and not all difficult. I would say that the first one, the first experiment is definitely the easiest one, and it did get the job done. Just not in the way of you as an our developer wouldn't carry that out. So I think the takeaway here is that as we as I agree with them and their takeaways here, there's a long ways to go. There are avenues of success here, but I think you as the end user definitely need to be vigilant on making sure that whatever prompts or requirements you're feeding into these are being accurately, you know, addressed in the results you get. And if you're getting code back, yes, you definitely should not take that blindly, so to speak. You need to make sure does that fit your overall paradigm of a code base, your stylist, you know, your style guides, if you will, your best practices for your team.
Obviously, there's much more work to be done to make these, in my opinion, more dynamic so that as it looks at the code base for a package or maybe even a set of packages that it can really grok if you will. What are the key paradigms that are governing that code base instead of relying on a whole bunch of additional sources of who knows where it gets it with respect to this effort. I think being intelligent enough about what's actually being done in the project is is the way to go For research software engineering context, like what this is based on, yeah, maybe some help, but in the end, sometimes you you just can't get away from doing some of this manually right now. But this space is moving fast. And again, I really appreciate their their attention to detail to put it through the paces.
But in the end, I'm not I'm not moves asleep over the fact that the AI bots are gonna take over package development anytime soon. What do you think? Me neither, Eric. You know, I I'm trying to be, like, open minded
[00:23:28] Mike Thomas:
about it and I think as long as you have the expertise to be able to to sniff out the good from the bad, running your code, you know, through Copilot for any of these purposes, can still be useful. Right? I'm not sure if I want it to actually physically make these changes, to my code in any way, but maybe, you know, it's not such a bad thing for it to provide me with suggestions. Right? There might be something if I ask it to do a full blown code review, there might be a little thing that it finds that I missed or or somebody in our team missed, because, you know, programmatically pouring through the the code, having a computer do that, might be able to catch things easier than, you know, we can catch with the naked eye.
Even looking at our our code for a little while. So I think I'm open to suggestions, let me put it that way, from Copilot and from the chat GPTs of the world, but I'm still in a place right now where I am going to to often take those with a grain of salt and, you know, leverage sort of my expertise over in what's coming back from those types of models.
[00:24:34] Eric Nantz:
Yeah. One use case I'm seeing more and more often in many industries or many organizations is the process of maybe refactoring from, like, one type of code base to another, especially when you're shifting languages. A lot of companies are turning to LLMs and AI to help with that conversion. I've always had a little bit of spider senses tingling at this because with that new code, will it look like somebody with competent skills and whether it's r, Python, JavaScript, or whatnot, will it look like they wrote it, or is gonna look like a hodgepodge of tutorials that have found online and trying to do all, like you said, these mix mash of different coding styles and practices. So I think we're still a ways there but I know that is a hot topic in many circles to see how fast it can get you to that next step And there may be cases where it gets you really close and you just have to maybe enough or 10% of your time to revise it. There may be other cases where it just gives you absolute garbage, you might as well throw it away.
So in any event, I think keeping an open mind and being realistic, I think are very important in these still early stages of this whole tech sector or this whole industry. But I think I think good things are coming. Just gotta use it responsibly, of course.
[00:25:54] Mike Thomas:
We'll see.
[00:26:04] Eric Nantz:
And then rounding out our highlights today, speaking of refactoring things, you may have a situation where there are some things you could do manually, but then as you do it over and over again, especially as you're dealing with a large code base, maybe not even the one you wrote yourself, you wonder there's got to be something that can help me get there even faster. So our last highlight today comes from a frequent contributor, Myel Salman, from her recent blog about her recent journey to look at all the names of functions under defined in a script or set of scripts and a new approach that is new to her and frankly new to me as well. So this is fresh off the recent USAR conference where I'm seeing some of the videos of that come online, and it sound like it was a terrific event. Of course, I have a little FOMO every time I see that because I still haven't been to a user yet, but someday that will be checked off the bucket list. It just wasn't able to this year.
But one of the talks that my old discovered that she didn't see live, but she heard about after the fact is, Davis Vaughn, from the PASA team talked about a framework called tree sitter. What tree sitter actually is is a mechanism to parse code for somewhat like what we've heard in the past with things like abstract syntax trees and trying to parse either variable names, function names, or whatnot. Apparently, tree sitter is a c library and I guess maybe other libraries as well to help with that mechanism of parsing, you know, code intelligently. So her use case was a following is that she wanted to help out, finding wanted help finding functions in pat in, in the package eye graph. And if you know what an eye graph is, it's kind of like the standard bearer of doing network visual network, you know, data representation, which you can turn into network diagrams, you know, very much like tree setups are very intricate networks.
I graph has a long history. We've covered it before on the show and it is a massive code base. So she wanted to have a use case of okay what functions and I graph now there is a certain operator that I graph has. It's like a bracket or square operator. She wanted to see how many functions were only using within this special operator. So she had to literally go within this operator function to find all the utility functions throughout. The example she has in the post only has a couple of them just for kicks, but imagine there's, like, a whole bunch of them. And she wanted to be able to she had to refactor these, but she wanted to or at least get to the point of refactoring them, And she didn't want to manually copy paste all these function names or discover them.
So she has used things like XML parse data in the past. We've covered on a previous episode of our expirations without the parse, a complicated function. She wanted to see what it'll look like with tree sitter to do that similar operation. So she talks about her for use case here. She loads the package, and then she loads the parser for the r language and that reads in that function text or that script that has the function text. And then she wanted to figure out, okay, where is the root of that tree that is now gonna govern all the child nodes of that function.
So this gets a little in the weeds here, but there is a function in tree sort of called query that you can feed in kind of a snippet of code that you wanted to look for. And there is like a little kind of YAML like structure to it. You define what's on the left side, what's on the right side, and you give it kind of like the plain text like label of that, wherever it's identifier, a definition, and then you kinda give it looks like a regex kind of comparison about what you wanna match it to. This is all news to me. I've never seen the tree sitter syntax before. But sure enough, when you run that and then you get a nested list back in r, that gives you the text of what it found versus the the actual text itself, the expression that's kinda more translated to the tree sitter notation of it. And then you've got you gotta step it away there. She didn't find the children functions yet, but guess what? There is more It's almost like you have to do a nested query to get to that point.
You then do the similar kind of syntax. You're looking at the left side and the right side definitions, finding another query of the previous query and then she was able to find the names of these different functions, again, internal functions that are in this Square operator. And that's a huge list that after that point. But then through some gripping to that, she was able to get all of those hidden function names in that square operator. There's about 5 or 6 of these in in intact, but this this all programmatically shouldn't have to look at all this herself.
So you can imagine if you scale this up to, like, a 100 times this size with a huge r package, this might be a great way to kind of do this kind of, like, unique query language with TreeSetter to get to what you need. Admittedly, I have never even ventured near these rabbit holes before, but I could see for our legacy Go base. And, again, I stress maybe one that you yourself didn't write as you're getting familiar of it. Because I've looked at iGRAPH before, and, yeah, there there's a lot going on under the hood on that one. So I wouldn't know where to look with trying to pick out these internal functions. Looks like tree sitter is something to to take a look at. And fun fact about tree sitter that I discovered after briefly looking at Davis's talk, this is being incorporated directly into posit's new IDE positron as part of its base for when you look at, say, the outline view of a script and getting those function definitions, which we've seen in our studio as well. But now I believe tree sitter is doing the heavy lifting to grab all those contexts around those functions. So it looks like tree sitter is here to stay for sure, but, what an interesting use case here to have to take a note of if I have to deal with legacy code based discovery in the future.
[00:32:34] Mike Thomas:
Yeah. Eric, I feel like Mal always brings us some pretty interesting use cases on the highlights. This is actually quite similar to a blog post that she authored a little while back where she did something very similar but, in XML instead using an XML parsed data R package, I believe. So this is if you wanna take a look at that blog post and this one side by side, I think it'll give you 2 different approaches to essentially doing the same thing. And it looks like there's about, you know, 5 or 6 different functions from the tree sitter package, that Maelle is is really leveraging here. And she does note that, you know, she went through a lot of different emotions as a beginner to this tree sitter package and not all of them were positive. One of them that I must have imagined taking a lot of tinkering with is where she does define that that sort of string that she's looking to to parse which involves, as you said, this this regular expression type of notation that we're looking at to try to, call out sort of the the particular function definitions that we're interested in here and being able to return that as a list that is is parsable.
And so I think sort of the the chief different functions that she's employing from tree sitter after she does that are the the query, the query captures, and then, this node text which I I think sort of turn this list that gets returned into something that's a little more easily parsable, if you will, within r and obviously at some point in here we're leveraging per because, we are out outputting a list, the the map character function, to be able to break things down into just this final, you know, simple handful of of sip 6 different names of functions, that she's interested in. There is one footnote in this blog post. It says, no, I have not installed positron yet. So my l is doing this all from the RStudio IDE at this point. Yeah. But it's interesting to hear that TreeSitter is obviously gaining some some ground. I'm not sure how new in and of itself the the TreeSitter C utility is but it's something that we're we're new to seeing I think in the R Eco system as there's recently been some other blog posts as as you mentioned that you know the folks at Positron are using it as well. Well. So it's very interesting to me. Obviously, you know, I I could see if we have some large code bases where you have to do some sort of, you know, profiling of that that code base or extracting of of particular portions of that code in a way that just really isn't, you know, useful to do manually, the old copy and paste method, and it makes more sense to do that programmatically.
[00:35:23] Eric Nantz:
Grateful to the fact that my EL has now given us 2 different ways to go about doing that. Yeah. And I'm looking at this tree sitter repo, and I will put this in the show notes if you haven't heard of this before. Like, we didn't hear about this before. It looks like a pretty mature project of a lot of modern bindings, nonetheless, and they're trying to be dependency free to start. But then if you wanna hook it in the rust, guess what? You can. Wanna hook it in the WASM or web assembly? Yes. You can. Even has its own COI to boot. So there is a lot going on under the hood with this, and the best part is you can use this wherever you're comfortable with. It doesn't have to be in positron as you said. This could be in any r session. You can put it as on Versus code or whatnot. So, again, what I appreciate is, yeah, being able to learn about these discoveries, but then to be able to use this in my preferred environment. So this will be great. Again, I think the trial with a low friction way to get start and there's an r package now that has been author. We'll put a link to the r package itself in the show notes that ties ties all this together. But, yeah, you never I guess you'll never look at your code the same way again when you're looking in the forest, getting the forest from the trees or whatever it says.
[00:36:32] Mike Thomas:
Exactly. Exactly. And that's a that's a good, tree related pun
[00:36:35] Eric Nantz:
here. Yeah. I know. I try I try I try. But, you well, you hopefully, you don't get lost then as the rest of this because there's a lot a lot of great content here. But as always, we put these in nice sections for you to digest, whether it's new insights, you know, uses in the real world, package updates, tutorials, or whatnot. There's always something for everybody here. So it'll take a couple minutes for our additional finds before we close out here. And for me, a reality in my industry, and I'm sure many others as well as they're building enterprise apps, maybe those that don't necessarily go outside the firewall, so to speak. But yet you have leadership, you know, stakeholders that are asking, hey. You built this great tool.
What's been the user adoption? You know, what where are areas that people are spending their time in the most? Not always questions I really want to have to answer, but if I do have to answer, I wanna make it as easy as possible for me to get those metrics. And that's where Epsilon has pushed an update, a major update to their shiny dot telemetry package version 0 dot 3 comes in with a lot of great updates here, including some very nice quality of life improvements, like actually checking whether it's an authenticated based app to get the user ID, if you will, of that session, which can be great as you're looking at different usage patterns or whatnot and to be more transparent about what you actually want to track. Because but if you don't wanna track all the inputs in your app, you wanna be able to wait to exclude them but not have to exclude them name by name, you can now do a regex if you have all your inputs name of a certain prefix in front or whatnot that you don't wanna include in, you can, you can throw a reg X that way too.
Other enhancements include actually tracking the errors that can occur in your app. And boy, that can be very helpful for diagnostics. Not that I would ever have an app that crashes. Wink wink. But also if you wanna take advantage of an unstructured database to put these metrics in or these uses patterns in, They now support from MongoDB, which, of course, is very popular for unstructured nested type data data representation. So lots more under the hood with that, but they also have updated their package documentation with 3 different vignettes all about the different use cases that you can have for shiny telemetry. So really kudos to them. Looks like a great package, and, yep, this is some idea with every single day. I roll a new app out, so I'll be keeping a close eye on this.
[00:39:07] Mike Thomas:
Me as well, Eric. That sounds really, really exciting. It's something that a lot of our clients are always asking for, right? You you finally get over the hump and build your your beautiful app and deploy it out to the world and then almost immediately, we get the question, oh, you know, can we get some user metrics on this app as well? So I'm excited to check out those new enhancements. An additional find that that I saw in the highlights this week are from, doctor Sophie Lee, who's the founder and director of s cubed, a statistician and educator. Now has a 2 day introduction to R with the Tidyverse course.
It looks fantastic. I'm seeing some, really, really nice, visuals here on this website which I believe are probably borrowed from, who who's the the person in the art ecosystem? I think she may work for observable now. Is it Alison Horst? Alison Horst. Yeah. Alison Horst that used to make the the really nice R, you know, types of of graphics and and imagery. So I see one of those here and that just tells me all I need to know that this is gonna be a fantastic, 2 day training here from November, or excuse me, September 24th to September 26th. So if that's something that that you're interested in or somebody in your team might be interested in, it's gonna cover everything from, R to R studio, data management, visualization, and and ggplot and EDA, and then some best practices for doing re reproducible research. So, you know, I think we cover a lot of in the weeds things sometimes on the highlights and I wanna make sure that we don't forget about those particularly new or or trying to, learn about ours. So this might be a good opportunity to to try to make that jump if that sounds like you.
[00:40:55] Eric Nantz:
Yeah. Fantastic resource here. I'm I'm looking at it. It's a definitely a portal based site and the the styling is fantastic, easy to navigate. So, yeah, yeah, kudos to her and the team. This looks like a fantastic thing to highlight here, and thanks for calling that out. And, boy, we love to call everything out, but, yeah, we're there's always so much time in the day, folks. But, again, that's why we put this link in the show notes. All the highlights you've seen, you've heard us talk about today and those additional resources are all in the show notes and also at arugia.org.
It's the easiest place to bookmark to find all these all these, terrific content and the back catalog of issues as well. And so if you wanna help the project, the best way to help is to share those new resources you found wherever you created them or someone in the community has created them. And it's just a poll request away, all marked down all the time. There's a link in the upper right corner. That fancy little octa octa con cat, whatever you call it is in the upper right corner. Just click that, take them directly to get help pull requests. You don't need an AI bot to fill this out. It is all marked down all the time. Very easy. They get started quickly. We have an issue template to get you up and running quite quickly as well. And if you wanna get a hold of us, we have a few ways to do that. We have a contact page directly in this episode show notes. We are on all the major podcast providers, so you should be able to find us wherever your favorite preferred listening, preferences.
And, also, you can get a hold of us on these social medias as well. I am at our podcast at podcast index dot social on the Mastodon servers. I'm on the weapon x thingy sometimes with at the r cast. I'm mostly on LinkedIn as well. Search by name. You will find me there. And, Mike, where can the listeners get a hold of you? Yep. You can find me on mastodon@[email protected].
[00:42:42] Mike Thomas:
Or you can find me on LinkedIn if you search Catchbrook Analytics, ketchbrook, or you can find me in Seattle in a couple weeks. Shoot me a message if you're gonna be there and and would love to to chat all things are.
[00:42:56] Eric Nantz:
Likewise. Yeah. Yeah. Like I said, the time is coming close. So, yeah, not packing just yet, but that's not too far away, to be honest. And I always bring some tech gadgets too. Who knows? I might, maybe, I might bring a couple mics with me. I'll I'm just saying. I'm just saying. We'll find out. But, nonetheless, we're gonna close-up shop here for this episode of our wicked hot lights. We thank you so much again for listening to our humble little banter here, and we will see you back here for another episode of our weekly highlights next week.
Hello, friends. We're back with episode a 172 of the Art Wicked Highlights podcast. If you're new to the show, this is the weekly podcast where we talk about the latest highlights and awesome additional resources that are shared every single week in this week's our weekly issue. My name is Eric Nantz, and I'm delighted that you join us wherever you are around the world. It's hard to believe July is almost over, but, of course, we got a lot more great art content to talk about with you today. And I never do this alone as you know. So at the virtual hip right here on this split screen here is my cohost, Mike Thomas. Mike, how are you doing today? Doing well, Eric, at the virtual hip. Only for a couple weeks until we get to see each other again in Seattle maybe? That's right. Yeah. Her her the countdown is on, so I gotta get all my bits sorted out and hopefully get that get that talk ready to go, but all all in good time. But, yes, the you might say the nerves are starting to hit a little bit, but again, it'd be great to see you again and see all the the wonderful peeps in data science and other sectors that frequent that conference every year.
[00:01:06] Mike Thomas:
Yes. I'm super excited as well. We have a ton of clients that are going to this conference this year, so I think it's gonna be a big one. I'm I'm almost surprised how many folks I I know that are gonna be there. So it's gonna be it's gonna be a party.
[00:01:19] Eric Nantz:
Yeah. Party as well. Yeah. You're gonna be high in demand, my friend. I hope they even get a few minutes of you after all that.
[00:01:26] Mike Thomas:
Quick short story is last time that last year when we were at Pawsit Conference, Eric and I met and it wasn't 2 minutes after, you know, we met, I guess, for the first time in real life that somebody came up and asked me to take a picture of you and them because you are the celebrity at this conference, Certainly not me. So Oh. Goodness. Goodness. Yeah. That was how it started.
[00:01:49] Eric Nantz:
That's yeah. That's how it always starts. Yeah. But you know, I think the tables will turn this time around. But, nonetheless, we're gonna have fun connecting with everybody. And, yeah, we're we're both gonna we're gonna have a lot going on there, but we got a lot going on here, my friend, with the mic in our hands. So let's get going here. Our issue this week was curated by Ryo Nakagorua, another one of our OG curators and longtime contributors to our weekly. And as always, he had tremendous help from our fellow Rwicky team members and contributors like you all around the world with your poll requests and suggestions.
As you may have heard in previous episodes, I admit the teas has input ever over like a few months ago that I think who needs that like game development engine, like unreal or anything. We can use our developer games. And there is yet another milestone in this workflow here that we're gonna talk about leading off this episode. And we are talking about the latest blog post from Matt Dre who has been on this quest to leverage not only tooling that he's creating, but also augmenting some of the awesome tooling that's being created by Mike Chang who you may also known as cool but useless on the social interwebs.
[00:03:01] Mike Thomas:
So probably the highlight of this blog post is that we know the full name of of Mike FC. Cool but useless. Right?
[00:03:08] Eric Nantz:
You are correct, sir. This may be the first time we've ever seen that spelled out. So I guess the mystery is over.
[00:03:15] Mike Thomas:
I know. I apologize to Mike if he was trying to, keep that from us for because I he did a pretty good job for a long time, until this blog by Matt has come along.
[00:03:26] Eric Nantz:
Yeah. You know, I I liken this to, you know, for those of you that follow pro wrestling, there are always that those days in the eighties where you didn't know that Hulk Hogan is really named Terry Beleja until, like, you were much older. Maybe it's just one of those reveals, like the gimmick the gimmick. The time's up for the gimmick, but nonetheless, we we know who you are, Mike, nonetheless. But, Yes. You've been you've been hard at work on these packages. So what Matt talks about in this post here is that he is leverage what Micah's created called the Nara package, which is basically super charging R's ability to produce raster based graphics, but tailor made to things like pixel art.
And he, and Matt what Matt has done is he's augmented the narrow package tooling with his previous package called rogue like, which if you recall from a highlight episode probably 4 or 5 months ago, was a way in r to create these, in essence, randomly generated dungeon crawlers all in text based format. So you get a nice ASCII art of the dungeon with the layouts, and it would respond to your key presses to have you as the player with a little, you know, maybe p symbol in the middle there going up, down, left, right, and then the randomly generated enemies or other artifacts would move along alongside you.
So he simply now merged the ability to do this, but now instead of the textual representation of those dungeons, he is augmenting some nice retro style looking sprites here. In fact, he's leveraging an open source framework, from a, username Kenny. He has what's called a tiny asset pack of all these different sprite arts that are 16 by 16 pickle pixels. So really, really nice, and if you ever played those RPGs of yesteryear, like, say, dragon warrior or final fantasy, yeah, these are gonna look right at home in those in those artwork.
So how does this actually work? Well, as I mentioned, there is a lot of the underpinnings have already been made by Matt in this roguelike package, But now instead of the textual representation, he is mapping that mesh which basically is a matrix of, you know, 16 by 16 or whatever dimension where each cell has either a movable space or an obstacle, the player or the enemy. And so that's already randomly generated up the front, but then that is being trans translated into these tiles that are being created by the narrow package. So in the blog post, you see the textual structure of it, which again will look very similar to if you use Matt's roguelike package, but then that handoff is then translated, like I said, into this tile based board. And now you see the nice image going going under the matrix representation.
It just looks literally picked out of a retro game. It it really is absolutely amazing what's going on here. And you can do all sorts of things of this. Obviously, you can make this as big or small as you want, but there is a lot more that is coming, and that's, tooling here. He wants to make, you know, really a true loop of the game, which if you played RPGs or roguelike dungeon crawlers in the past, you know that as you move along the board, you get a random encounter with an enemy, do the battle, win or lose, rinse and repeat. But, of course, it's almost like an infinite loop per se. So he's looking at ways of having that true kind of gaming style loop in the back end here. And of course, like any RPG, you're gonna have it after your inventory at some point. Right? You gotta have those weapons, those potions, those antidotes when you get poisoned or whatnot. So, obviously, this is probably gonna be a huge rabbit hole if they choose to go down this route. But I am very much eagerly watching this. But, of course, the first major step is what the user actually sees.
So what's seen here with this, package that's Matt's created, he now calls a tile based. This is the start, folks. Like I said, who needs to pay for that unreal engine or that unity engine? Boot up your r console and go to town. Right, Mike? Absolutely. And it's you know, just reminds me of the old adage that
[00:08:02] Mike Thomas:
R is only a programming language for statistical analysis. Right, Eric? Can't can't do anything else well. Yeah. Not at all. Not at all. And this also throws me back to, I think these graphics are like equivalent to at least, you know, the the Game Boy Color at times of the world. This is reminding me of my Pokemon Blue game that I used to to play on long car rides quite often. Oh, yes. I think the graphics are are quite akin to that. It's it's pretty incredible that I think, as you mentioned, Eric, there's only like 4 they look like emojis, but I guess we're calling them, the these objects from a tiny asset pack which is some resource out there that has all of these different 16 by 16 pixels that you can use. I imagine that they're like Creative Commons licensed or something like that, which is why Matt chose to to use them and incorporate them and this whole entire game it looks like the graphics are just made from these 4 different emojis if you want to call that which is it's pretty incredible.
The idea that you know the the things that we're looking forward to here in the next iterations of this potentially could be, as you mentioned, a true game loop, a way to have some sort of an inventory system. The sound generation is really cool to be able to have some sort of a soundtrack to this game as well. I think I've seen some some more things from Mike FC coming out on Mastodon lately, if I'm not, if I'm not mistaken, around our packages that are doing some audio things. So maybe there's a potential chance that that could get integrated into this package as well but, you know, it's it's hard to do this justice in audio form. You really have to check out the blog post, see the visuals that are there, install the package from GitHub yourself, and try not to waste a couple hours, going down, you know, playing games in R. I I I dare you to try. And so this is this is really really cool stuff. I think it's a fun way to start off the highlights, this week, and I'm looking forward to what else is to come here. Yeah. The creativity
[00:10:09] Eric Nantz:
possibilities here are practically endless. And, of course, I as I'm watching this, I'm wondering, well, how would you be able to distribute this? Can you imagine a web assembly app that puts all this together? Good grief. I mean, I've seen, you know, maybe I shouldn't say too much here on audio, but I'll say it anyway. There are even on the Internet archives, some of these web assembly powered, you know, emulators of the classic arcade games that have long passed your IP windows, but it's all in your browser. It's like using native JavaScript to do it. Just imagine if, you know well, you know, we've been watching the WebAssembly space quite a bit. I wouldn't put a pass either me or someone else maybe working with Matt or Mike to throw their hand at this and see what happens because just imagine how easily you could distribute something like this. Just mind blowing possibilities here. Absolutely. No. Sort of reminds me that that
[00:11:04] Mike Thomas:
the shiny contest is back. I probably should have saved that for my additional highlight, but I saw that, yesterday. I'm come across on my social media feeds, but it sort of reminds me of that the Appsilon app. I believe that was like the shark or underwater. That's right. The Shark Attack app. Yeah. That was awesome. That can be hosted on WebAssembly. Ivan from my team has recently, you know, published his first shiny live experimental app that's a card game. So, yeah, I think the the gaming Shiny Live, you know, crossover here is is gonna happen pretty soon.
[00:11:39] Eric Nantz:
It's not a matter of if, Mike. It's when.
[00:11:43] Mike Thomas:
Sounds like you found your next side project.
[00:11:55] Eric Nantz:
Speaking of trying things out, unless you've been living under a rock, you know that some of the biggest advancements in our world attack have been the use of generative AI and large language models to help with all sorts of things in our daily lives and especially in the world of software development. You've sure you've heard about efforts such as what Microsoft at piloted years ago with what's called copilot, which if you opt into that in your IDE such as visual studio code, also there are plugins to this with, say, our studio and whatnot. You'll get these, you know, auto completed like suggestions as you're typing out code. Maybe it's some boiler plate for a function call. Maybe it's helping flesh out some additional parameters or whatnot. Well, there obviously this space is moving quite fast and our next highway today talks about some recent findings that have been explored by the Epiverse team on their explorations of what's called the copilot workspace initiative.
So as I mentioned, this is coming from the epiverse blog. It's got a set of authors here. We got Joshua Lambert, James Azzam, Pratik Gupta, and Adam Kucharski. Hopefully, I said those right. They have teamed up here to talk about as they've been watching this space of how AI and LOM models are helping with development. What would be the what would be the situation of trying to leverage this new copilot workspace, which is kinda taking what was mentioned earlier by me in this copilot initiative to the next level and not just auto complete code as you're typing, but actually take a set of requirements that are surface Savia GitHub issue and just see how to actually produce the code or produce a solution to the problem at hand.
So they decide, let's do an experiment here. They teamed up with a group of professors and their organization to do 3 different experiments with this copilot workspace to see how it works in the real world. The first experiment, and they tried to go, I guess, from easy to difficult as we go through this is say, they have a function in their r package that has been internal up to this point, but maybe they got user feedback and says, hey. You know what? That's a useful function. Maybe we should export that. So using their epi now 2 package, they looked at an existing issue that was, again, was already filed about, hey, this function called epi now 2_cme c m d stand underscore model function should be exported.
So they let the copilot workspace turn loose on this and it did get the job done, albeit in ways that probably aren't as intuitive to an R user. So let me explain a bit by bit here. The Copilot workspace did determine that, oh, you know what? We need to replace that keyword of internal in the r oxygen documentation of that function and replace that with export and also having to update the namespace. Now there were some little shenanigans here because apparently it does change the formatting one of their other function arguments. I guess doing some text tidying up, which again, yeah, that's fine and all.
But in the end, it did technically get the job done. But this Copilot workspace is not intelligent enough to understand how documentation is updated with the modern r tooling for package development, such as using things like dev tools document or, you know, more natively of our oxygen to the re reupdate the name space dynamically. It did it itself manually like you would maybe in the early days of package development. Package development. So context itself, not quite there yet, but you can't say it didn't get the job done with this. It just did it in a much more manual fashion. So, you know, so far, you know, pretty promising.
Let's take the difficulty up a notch, Mike, because now in the next experiment, they wanna add a new model to the package, albeit not too difficult from a complexity perspective, but this is upping the ante into what the Copilot workspace can do. How did this one fare? What do you think? Yeah. Not quite as well, Eric. They wanted to add what's called a simple epidemic model,
[00:16:21] Mike Thomas:
to the r package, that contains, you know, a bunch of different models as well. So they, I believe, created a new issue if I'm mistaken, to add a basic SIR model, it's called, with a couple sentences on exactly what they were looking for. What happened was get Copilot created a script, a an R script with a naming convention that followed the naming convention of the other modeling scripts that they had. This one was, you know, within the R directory which was good and, the name of the script was model_sir which again, you know, follow the the naming conventions that they've used in this epidemicsr package that they have right now.
And I think this follows a lot of sort of the the same things that I've seen, you know, over and over again with Copilot, with chatGPT, with some of this, it gets you close to the right answer and puts down, you know, some of the the the right things that you would want there, but not in the way that you would necessarily want that things organized, if you will. You know, so the code that was generated, you know, constructed that basic SIR model, used roxigeon 2 as as well, to document the code, but but a lot of aspects of the code, you know, didn't match what they asked for in that issue. The code contained what they call some inadvisable coding practices in R, you know, what we like to call code smells, and the model it self, you know, followed the standard set of of differential equations that are solved using the the desolve which is one, that they had actually requested in their issue, but it didn't have any options to input, you know, things that are are really important to them like interventions, which is, you know, what they the the Copilot had suggested that it would actually include but but failed to do so.
The other downside is that they they use the require, dissolve they use the require function to import the dissolve action, function package, excuse me, in the body of the generated code. And as you know, Eric, when we are developing our packages, that's not something that you want to include in your function. Right? Yeah. The smell was stronger that one. Oh, my goodness. That one stinks to high heaven. And you know, we use a lot of utility functions like from the dev tools and they use this package if we want to leverage a new package within our our package in a right in a proper way we could do you know use this use underscore package dissolve Right? And that'll add that to our description file.
It might add it to the namespace where appropriate. And obviously, we could, request the use of that package in our roxygen, right, import or import from code decoration above that particular function. So that's a pretty bad one right there as well. So you know a a few different code smells here. I don't think that the function itself accomplished exactly what they were looking for that model to do as well as, you know, some of the kind of ugly best practice things. So we're we're starting to head a little bit in the wrong direction here as we get into experiment 3.
[00:19:44] Eric Nantz:
Yeah. And so talk about upping the ante a little bit, but, yeah, it's something that is very relatable to every one of us that are developing, you know, intricate code bases. They wanted to see if this Copilot works makes it actually do an intelligently driven code review of the package itself. As you know, as you get, you know, maybe new features put in bug fixes or whatnot, you get that, point of eventual re release, maybe an update you wanna release on crayon or whatnot. You wanna have that code review to make sure that all the things are looking good. You're being efficient with memory usage, efficient with coding best practices. I always saw a little glimpse that, yeah, it may not be the best at coding practices.
So what they are expecting to see is hopefully in this issue of asking it to do the code review, they would have a documented set with, you know, links to particular snippets of code or maybe things could be optimized or maybe even just asking questions about the code or whatnot. Well, bad news is it didn't actually do any analysis of the code itself. It basically regurgitated some of the changes that were already described in the poll request and looking at change log, I e, from the news file and just kinda bundling all that together as a narrative, which in essence means that it does a great job of reading the news that is not so much looking at the actual code than the thing changes that could be made to make the code better.
So this is where, you know, humans aren't being replaced on this one by by a long shot. But what I do appreciate is a, they gave this, you know, 3, again, realistic use cases and not all difficult. I would say that the first one, the first experiment is definitely the easiest one, and it did get the job done. Just not in the way of you as an our developer wouldn't carry that out. So I think the takeaway here is that as we as I agree with them and their takeaways here, there's a long ways to go. There are avenues of success here, but I think you as the end user definitely need to be vigilant on making sure that whatever prompts or requirements you're feeding into these are being accurately, you know, addressed in the results you get. And if you're getting code back, yes, you definitely should not take that blindly, so to speak. You need to make sure does that fit your overall paradigm of a code base, your stylist, you know, your style guides, if you will, your best practices for your team.
Obviously, there's much more work to be done to make these, in my opinion, more dynamic so that as it looks at the code base for a package or maybe even a set of packages that it can really grok if you will. What are the key paradigms that are governing that code base instead of relying on a whole bunch of additional sources of who knows where it gets it with respect to this effort. I think being intelligent enough about what's actually being done in the project is is the way to go For research software engineering context, like what this is based on, yeah, maybe some help, but in the end, sometimes you you just can't get away from doing some of this manually right now. But this space is moving fast. And again, I really appreciate their their attention to detail to put it through the paces.
But in the end, I'm not I'm not moves asleep over the fact that the AI bots are gonna take over package development anytime soon. What do you think? Me neither, Eric. You know, I I'm trying to be, like, open minded
[00:23:28] Mike Thomas:
about it and I think as long as you have the expertise to be able to to sniff out the good from the bad, running your code, you know, through Copilot for any of these purposes, can still be useful. Right? I'm not sure if I want it to actually physically make these changes, to my code in any way, but maybe, you know, it's not such a bad thing for it to provide me with suggestions. Right? There might be something if I ask it to do a full blown code review, there might be a little thing that it finds that I missed or or somebody in our team missed, because, you know, programmatically pouring through the the code, having a computer do that, might be able to catch things easier than, you know, we can catch with the naked eye.
Even looking at our our code for a little while. So I think I'm open to suggestions, let me put it that way, from Copilot and from the chat GPTs of the world, but I'm still in a place right now where I am going to to often take those with a grain of salt and, you know, leverage sort of my expertise over in what's coming back from those types of models.
[00:24:34] Eric Nantz:
Yeah. One use case I'm seeing more and more often in many industries or many organizations is the process of maybe refactoring from, like, one type of code base to another, especially when you're shifting languages. A lot of companies are turning to LLMs and AI to help with that conversion. I've always had a little bit of spider senses tingling at this because with that new code, will it look like somebody with competent skills and whether it's r, Python, JavaScript, or whatnot, will it look like they wrote it, or is gonna look like a hodgepodge of tutorials that have found online and trying to do all, like you said, these mix mash of different coding styles and practices. So I think we're still a ways there but I know that is a hot topic in many circles to see how fast it can get you to that next step And there may be cases where it gets you really close and you just have to maybe enough or 10% of your time to revise it. There may be other cases where it just gives you absolute garbage, you might as well throw it away.
So in any event, I think keeping an open mind and being realistic, I think are very important in these still early stages of this whole tech sector or this whole industry. But I think I think good things are coming. Just gotta use it responsibly, of course.
[00:25:54] Mike Thomas:
We'll see.
[00:26:04] Eric Nantz:
And then rounding out our highlights today, speaking of refactoring things, you may have a situation where there are some things you could do manually, but then as you do it over and over again, especially as you're dealing with a large code base, maybe not even the one you wrote yourself, you wonder there's got to be something that can help me get there even faster. So our last highlight today comes from a frequent contributor, Myel Salman, from her recent blog about her recent journey to look at all the names of functions under defined in a script or set of scripts and a new approach that is new to her and frankly new to me as well. So this is fresh off the recent USAR conference where I'm seeing some of the videos of that come online, and it sound like it was a terrific event. Of course, I have a little FOMO every time I see that because I still haven't been to a user yet, but someday that will be checked off the bucket list. It just wasn't able to this year.
But one of the talks that my old discovered that she didn't see live, but she heard about after the fact is, Davis Vaughn, from the PASA team talked about a framework called tree sitter. What tree sitter actually is is a mechanism to parse code for somewhat like what we've heard in the past with things like abstract syntax trees and trying to parse either variable names, function names, or whatnot. Apparently, tree sitter is a c library and I guess maybe other libraries as well to help with that mechanism of parsing, you know, code intelligently. So her use case was a following is that she wanted to help out, finding wanted help finding functions in pat in, in the package eye graph. And if you know what an eye graph is, it's kind of like the standard bearer of doing network visual network, you know, data representation, which you can turn into network diagrams, you know, very much like tree setups are very intricate networks.
I graph has a long history. We've covered it before on the show and it is a massive code base. So she wanted to have a use case of okay what functions and I graph now there is a certain operator that I graph has. It's like a bracket or square operator. She wanted to see how many functions were only using within this special operator. So she had to literally go within this operator function to find all the utility functions throughout. The example she has in the post only has a couple of them just for kicks, but imagine there's, like, a whole bunch of them. And she wanted to be able to she had to refactor these, but she wanted to or at least get to the point of refactoring them, And she didn't want to manually copy paste all these function names or discover them.
So she has used things like XML parse data in the past. We've covered on a previous episode of our expirations without the parse, a complicated function. She wanted to see what it'll look like with tree sitter to do that similar operation. So she talks about her for use case here. She loads the package, and then she loads the parser for the r language and that reads in that function text or that script that has the function text. And then she wanted to figure out, okay, where is the root of that tree that is now gonna govern all the child nodes of that function.
So this gets a little in the weeds here, but there is a function in tree sort of called query that you can feed in kind of a snippet of code that you wanted to look for. And there is like a little kind of YAML like structure to it. You define what's on the left side, what's on the right side, and you give it kind of like the plain text like label of that, wherever it's identifier, a definition, and then you kinda give it looks like a regex kind of comparison about what you wanna match it to. This is all news to me. I've never seen the tree sitter syntax before. But sure enough, when you run that and then you get a nested list back in r, that gives you the text of what it found versus the the actual text itself, the expression that's kinda more translated to the tree sitter notation of it. And then you've got you gotta step it away there. She didn't find the children functions yet, but guess what? There is more It's almost like you have to do a nested query to get to that point.
You then do the similar kind of syntax. You're looking at the left side and the right side definitions, finding another query of the previous query and then she was able to find the names of these different functions, again, internal functions that are in this Square operator. And that's a huge list that after that point. But then through some gripping to that, she was able to get all of those hidden function names in that square operator. There's about 5 or 6 of these in in intact, but this this all programmatically shouldn't have to look at all this herself.
So you can imagine if you scale this up to, like, a 100 times this size with a huge r package, this might be a great way to kind of do this kind of, like, unique query language with TreeSetter to get to what you need. Admittedly, I have never even ventured near these rabbit holes before, but I could see for our legacy Go base. And, again, I stress maybe one that you yourself didn't write as you're getting familiar of it. Because I've looked at iGRAPH before, and, yeah, there there's a lot going on under the hood on that one. So I wouldn't know where to look with trying to pick out these internal functions. Looks like tree sitter is something to to take a look at. And fun fact about tree sitter that I discovered after briefly looking at Davis's talk, this is being incorporated directly into posit's new IDE positron as part of its base for when you look at, say, the outline view of a script and getting those function definitions, which we've seen in our studio as well. But now I believe tree sitter is doing the heavy lifting to grab all those contexts around those functions. So it looks like tree sitter is here to stay for sure, but, what an interesting use case here to have to take a note of if I have to deal with legacy code based discovery in the future.
[00:32:34] Mike Thomas:
Yeah. Eric, I feel like Mal always brings us some pretty interesting use cases on the highlights. This is actually quite similar to a blog post that she authored a little while back where she did something very similar but, in XML instead using an XML parsed data R package, I believe. So this is if you wanna take a look at that blog post and this one side by side, I think it'll give you 2 different approaches to essentially doing the same thing. And it looks like there's about, you know, 5 or 6 different functions from the tree sitter package, that Maelle is is really leveraging here. And she does note that, you know, she went through a lot of different emotions as a beginner to this tree sitter package and not all of them were positive. One of them that I must have imagined taking a lot of tinkering with is where she does define that that sort of string that she's looking to to parse which involves, as you said, this this regular expression type of notation that we're looking at to try to, call out sort of the the particular function definitions that we're interested in here and being able to return that as a list that is is parsable.
And so I think sort of the the chief different functions that she's employing from tree sitter after she does that are the the query, the query captures, and then, this node text which I I think sort of turn this list that gets returned into something that's a little more easily parsable, if you will, within r and obviously at some point in here we're leveraging per because, we are out outputting a list, the the map character function, to be able to break things down into just this final, you know, simple handful of of sip 6 different names of functions, that she's interested in. There is one footnote in this blog post. It says, no, I have not installed positron yet. So my l is doing this all from the RStudio IDE at this point. Yeah. But it's interesting to hear that TreeSitter is obviously gaining some some ground. I'm not sure how new in and of itself the the TreeSitter C utility is but it's something that we're we're new to seeing I think in the R Eco system as there's recently been some other blog posts as as you mentioned that you know the folks at Positron are using it as well. Well. So it's very interesting to me. Obviously, you know, I I could see if we have some large code bases where you have to do some sort of, you know, profiling of that that code base or extracting of of particular portions of that code in a way that just really isn't, you know, useful to do manually, the old copy and paste method, and it makes more sense to do that programmatically.
[00:35:23] Eric Nantz:
Grateful to the fact that my EL has now given us 2 different ways to go about doing that. Yeah. And I'm looking at this tree sitter repo, and I will put this in the show notes if you haven't heard of this before. Like, we didn't hear about this before. It looks like a pretty mature project of a lot of modern bindings, nonetheless, and they're trying to be dependency free to start. But then if you wanna hook it in the rust, guess what? You can. Wanna hook it in the WASM or web assembly? Yes. You can. Even has its own COI to boot. So there is a lot going on under the hood with this, and the best part is you can use this wherever you're comfortable with. It doesn't have to be in positron as you said. This could be in any r session. You can put it as on Versus code or whatnot. So, again, what I appreciate is, yeah, being able to learn about these discoveries, but then to be able to use this in my preferred environment. So this will be great. Again, I think the trial with a low friction way to get start and there's an r package now that has been author. We'll put a link to the r package itself in the show notes that ties ties all this together. But, yeah, you never I guess you'll never look at your code the same way again when you're looking in the forest, getting the forest from the trees or whatever it says.
[00:36:32] Mike Thomas:
Exactly. Exactly. And that's a that's a good, tree related pun
[00:36:35] Eric Nantz:
here. Yeah. I know. I try I try I try. But, you well, you hopefully, you don't get lost then as the rest of this because there's a lot a lot of great content here. But as always, we put these in nice sections for you to digest, whether it's new insights, you know, uses in the real world, package updates, tutorials, or whatnot. There's always something for everybody here. So it'll take a couple minutes for our additional finds before we close out here. And for me, a reality in my industry, and I'm sure many others as well as they're building enterprise apps, maybe those that don't necessarily go outside the firewall, so to speak. But yet you have leadership, you know, stakeholders that are asking, hey. You built this great tool.
What's been the user adoption? You know, what where are areas that people are spending their time in the most? Not always questions I really want to have to answer, but if I do have to answer, I wanna make it as easy as possible for me to get those metrics. And that's where Epsilon has pushed an update, a major update to their shiny dot telemetry package version 0 dot 3 comes in with a lot of great updates here, including some very nice quality of life improvements, like actually checking whether it's an authenticated based app to get the user ID, if you will, of that session, which can be great as you're looking at different usage patterns or whatnot and to be more transparent about what you actually want to track. Because but if you don't wanna track all the inputs in your app, you wanna be able to wait to exclude them but not have to exclude them name by name, you can now do a regex if you have all your inputs name of a certain prefix in front or whatnot that you don't wanna include in, you can, you can throw a reg X that way too.
Other enhancements include actually tracking the errors that can occur in your app. And boy, that can be very helpful for diagnostics. Not that I would ever have an app that crashes. Wink wink. But also if you wanna take advantage of an unstructured database to put these metrics in or these uses patterns in, They now support from MongoDB, which, of course, is very popular for unstructured nested type data data representation. So lots more under the hood with that, but they also have updated their package documentation with 3 different vignettes all about the different use cases that you can have for shiny telemetry. So really kudos to them. Looks like a great package, and, yep, this is some idea with every single day. I roll a new app out, so I'll be keeping a close eye on this.
[00:39:07] Mike Thomas:
Me as well, Eric. That sounds really, really exciting. It's something that a lot of our clients are always asking for, right? You you finally get over the hump and build your your beautiful app and deploy it out to the world and then almost immediately, we get the question, oh, you know, can we get some user metrics on this app as well? So I'm excited to check out those new enhancements. An additional find that that I saw in the highlights this week are from, doctor Sophie Lee, who's the founder and director of s cubed, a statistician and educator. Now has a 2 day introduction to R with the Tidyverse course.
It looks fantastic. I'm seeing some, really, really nice, visuals here on this website which I believe are probably borrowed from, who who's the the person in the art ecosystem? I think she may work for observable now. Is it Alison Horst? Alison Horst. Yeah. Alison Horst that used to make the the really nice R, you know, types of of graphics and and imagery. So I see one of those here and that just tells me all I need to know that this is gonna be a fantastic, 2 day training here from November, or excuse me, September 24th to September 26th. So if that's something that that you're interested in or somebody in your team might be interested in, it's gonna cover everything from, R to R studio, data management, visualization, and and ggplot and EDA, and then some best practices for doing re reproducible research. So, you know, I think we cover a lot of in the weeds things sometimes on the highlights and I wanna make sure that we don't forget about those particularly new or or trying to, learn about ours. So this might be a good opportunity to to try to make that jump if that sounds like you.
[00:40:55] Eric Nantz:
Yeah. Fantastic resource here. I'm I'm looking at it. It's a definitely a portal based site and the the styling is fantastic, easy to navigate. So, yeah, yeah, kudos to her and the team. This looks like a fantastic thing to highlight here, and thanks for calling that out. And, boy, we love to call everything out, but, yeah, we're there's always so much time in the day, folks. But, again, that's why we put this link in the show notes. All the highlights you've seen, you've heard us talk about today and those additional resources are all in the show notes and also at arugia.org.
It's the easiest place to bookmark to find all these all these, terrific content and the back catalog of issues as well. And so if you wanna help the project, the best way to help is to share those new resources you found wherever you created them or someone in the community has created them. And it's just a poll request away, all marked down all the time. There's a link in the upper right corner. That fancy little octa octa con cat, whatever you call it is in the upper right corner. Just click that, take them directly to get help pull requests. You don't need an AI bot to fill this out. It is all marked down all the time. Very easy. They get started quickly. We have an issue template to get you up and running quite quickly as well. And if you wanna get a hold of us, we have a few ways to do that. We have a contact page directly in this episode show notes. We are on all the major podcast providers, so you should be able to find us wherever your favorite preferred listening, preferences.
And, also, you can get a hold of us on these social medias as well. I am at our podcast at podcast index dot social on the Mastodon servers. I'm on the weapon x thingy sometimes with at the r cast. I'm mostly on LinkedIn as well. Search by name. You will find me there. And, Mike, where can the listeners get a hold of you? Yep. You can find me on mastodon@[email protected].
[00:42:42] Mike Thomas:
Or you can find me on LinkedIn if you search Catchbrook Analytics, ketchbrook, or you can find me in Seattle in a couple weeks. Shoot me a message if you're gonna be there and and would love to to chat all things are.
[00:42:56] Eric Nantz:
Likewise. Yeah. Yeah. Like I said, the time is coming close. So, yeah, not packing just yet, but that's not too far away, to be honest. And I always bring some tech gadgets too. Who knows? I might, maybe, I might bring a couple mics with me. I'll I'm just saying. I'm just saying. We'll find out. But, nonetheless, we're gonna close-up shop here for this episode of our wicked hot lights. We thank you so much again for listening to our humble little banter here, and we will see you back here for another episode of our weekly highlights next week.