The future of R-Universe looks even brighter for 2025 and beyond, revisiting the key factors for possibly switching to the Positron IDE, and why there is more than meets the eyes when it comes to the potential of LLMs and AI (even in highly-regulated industries).
Episode Links
Episode Links
- This week's curator: Eric Nantz - @[email protected] (Mastodon), @rpodcast.bsky.social (BlueSky), and @theRcast (X/Twitter)
- R-Universe Named R Consortium’s Newest Top Level Project
- Positron vs RStudio - is it time to switch?
- Summer is Coming: AI for Shiny, R, and Pharma
- Entire issue available at rweekly.org/2024-W50
- rOpenSci Blog Post https://ropensci.org/blog/2024/12/03/r-universe-r-consortium-tlp/
- Positron IDE - A new IDE for data science https://drmowinckels.io/blog/2024/positron/
- Fun with Positron https://www.andrewheiss.com/blog/2024/07/08/fun-with-positron/
- Open VSX Registry https://open-vsx.org
- Power Mode Extension https://open-vsx.org/extension/hoovercj/vscode-power-mode
- Joe Cheng's slides from R/Pharma keybote https://jcheng5.github.io/pharma-ai-2024/#/title-slide
- R Consortium Submissions Pilot 2 Shiny Application https://github.com/RConsortium/submissions-pilot2
- Daniel Sebanes Bove & Joe Cheng - Discussion of R/Pharma Keynote https://youtu.be/AU1MmcXYnJ0?si=A3V9JHYdZLiz-lvv
- Practical Tips for Using Generative AI in Data Science Workflows https://youtu.be/rPeOdc8jTSE?si=APtIhpRqlh2I_Ek1
- Apple Music Wrapped in R https://www.andrewheiss.com/blog/2024/12/04/apple-music-wrapped-r/
- Predicting Best Picture at the 2025 Academy Awards https://www.markhw.com/blog/oscars2025
- Navidrome - Your Personal Streaming Service https://www.navidrome.org/
- Scrubbing your music with Maloja https://wenkdth.org/posts/maloja-scrobbling/
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)
- Mike Thomas: @[email protected] (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter)
- The Traveling Band's Last Song - Wild Arms: Armed and Dangerous - Artem Bank - https://ocremix.org/remix/OCR02588
- Secrets Abound - Final Fantasy - Midgarian Sky - https://ocremix.org/remix/OCR02452
[00:00:03]
Eric Nantz:
Hello, friends. We're back with episode 189 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the terrific highlights that are shared on this week's Our Weekly Issue. My name is Eric Nantz, and yes, we are now getting towards the middle of December. And it is the season of giving in many, many were parts of the world. And, of course, over the weekend, my, kiddo was nice and gave to his entire family a nice little nasty cold that I'm just recovering from. Hopefully, the voice is up to snuff for this episode. But, yes, I am here. But, luckily, I'm not here alone this time because fresh from his travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today?
[00:00:41] Mike Thomas:
travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today? travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today?
[00:00:55] Eric Nantz:
you. And luckily, no Wicked Witch is, delaying your flights or anything, so that's terrific. Not this time. Thank goodness. Yes. Yes. It is a busy travel season, so we're always, like, you know, consider ourselves lucky when things go smoothly. And and, honestly, we are also very lucky that we have, on my opinion, a fantastic issue. And I'm not just saying that because, check check notes here. Oh, yeah. It was me curating it this time, but, no, this is awesome because I have all of you in the community to thank for your awesome resources. I was able to merge into this issue, and and I dare say, I think this was a human smoother process than the last time I curated it. And we had some nice poll requests that I merged in and a lot of content that we can talk about. But we'll be focused on the highlights this this time around. But, of course, since this will be the last issue I curate in 2024, just wanted to do a dramatic pause there for a fact, I definitely wanna thank the rest of the team throughout the year. It's been terrific fun to work with you all, and I'm looking forward to having more great great adventures of our weekly in 2025.
But without further ado, let's get right to it. And we are talking about now for our first highlight, a very key piece in the world of the R ecosystem, in particular with package repositories that has had already a wonderful effect on many parts of the r community. And with this recent news that we're about to share, we think there is even bigger things to come for this project. What are we speaking about? This is the R Universe project that has been led by R OpenSci and their, chief engineer, Yaron Ohmes, who has shared with us and as long with the R Consortium blog that the R Universe has been named the newest top level project under the R Consortium umbrella.
Your first question probably is, what does a top level project actually mean here, Overnon sounding really important? Well, and technically speaking in the blog post, it is a particular project that was now going to get 3 years of funding from the ARC consortium, and this is a recognition of this being recognized along with other projects that are currently designated top level which include DBI, the database, you know, common interface that many of the R Packages involving with databases utilize, as a key dependency. The R Ladies initiative and the R user group support program or the RUGS program helping to bring, you know, infrastructure around having all sorts of R type meetups and community resources.
Our OpenSize R Universe is now joining that effort. And there's a great quote from the executive director of the R Consortium, Teri Christiani. She says and I quote, the ability to find and evaluate high quality R packages is important to the R community, and the R Consortium is pleased to support the R Universe project with a long term commitment to help strengthen the foundation of the r package ecosystem. We are pleased to be working more closely together with our rOpenSci on this effort. And there's also a great quote quote from the new executive director of rOpenSci, Noam Ross. He also touches on, you know, the importance of our universe and their excitement to work with the our consortium to strengthen the infrastructure even more.
And we'll also have linked in the show notes. The the you might say the companion blog post to this from our open side directly, from Noam Ross and Yaron who co authored it. It highlights another point that I do want to emphasize is that our universe, yes, on its own is already bringing this revolution of a, you know, a package repository that's powered by a lot of DevOps principles, a lot of automation, and very very intricate infrastructure to help you as a package author give your users a way to have not just a source of a package available, but binaries that are compiled, and more recently even WebAssembly compliant version of your package. That's a huge win right there.
But our universe is now serving as the foundation of newer efforts in the r package management ecosystems. There is a new effort and I must stress it's early days, but there is some excitement around the R multiuse prod multiverse project, I should say. This is being headed by a team of people including Will Landau, the author of Targets, and this is building upon our universe's infrastructure on their automation, their compilation of binaries to help bring a transparent kind of governance that combines with our our universe's infrastructure for another type of package repository.
Certainly, industries are paying a very close attention to this. It is early days, but it just goes to show that the Our Universe platform could be used for more things than just Our Universe itself. And that's what we're probably gonna see in 2025 and beyond. So I am thrilled to see this, you know, more, you can say, rigorous or just more robust backing of the project. We know that times can be tough financially for certain vendors. So the fact that now, our open side can count on this additional funding for this particular project, I think it's gonna be a huge win for your own and and the rest of the our open side team as they make, our our universe even more of a awesome platform for a new and and older package authors alike in the our ecosystem.
[00:06:55] Mike Thomas:
Yeah. Eric, one thing that I will say is a quick call to action as you get toward the end of the year and this isn't to to my organization's own horn or anything like that but we donate every year as a charitable contribution to the our consortium into our open side if you work at a company and I think that it would make sense and you use our there and it benefits your company, I think it would make a whole lot of sense to ask them to do the same thing. This is the season, of a lot of corporate charitable donations, so it might be an easy yes and I think it's certainly worth asking the question considering how much some of these projects have given back to a lot of us. Our universe is incredible. If we think about what we sort of had to do before our universe, there was really no searchable way to figure out what sort of packages are on Korean. I mean, you can do it through, I think some of the API tools that'll give you a list of all of the packages on Korean right in your console.
And then you could do some filtering, but there's no interactive GUI, very visual, hyperlinks to all of the vignettes, with a hyperlink to the GitHub repository, with graphs that show when the most recent contributions were and who the contributors were it's just this fantastic visual medium uh-uh to be able to take a look at all of the different our packages that are out there that at least have been picked up by the our universe project and the scale of it still blows me away. This this whole infrastructure that your own and others has created is it's mind blowing that you know the amount of automation that's involved here, you know, the the way that packages are are being built as binaries instead of users needing to, you know, install from from GitHub and actually build that package themselves.
It's it's absolutely incredible. And to think that there's a a multiverse project out there and more coming on top of this is really really exciting to hear. A question for you, does does multiverse mean multilingual or are we still just talking r?
[00:09:06] Eric Nantz:
It is still focused on r, but, I actually don't have the full genesis on how the name came to be, but I think it is trying to bridge, as I said earlier, these, you know, concepts that can be derived from certain aspects of, say, CRAN versus certain aspects of our universe, and we're blending them together in a way that kinda takes the best of both worlds, as I say. And, again, this is all very early days. You'll hear more about this in 2025, but it is, you know, taking the the best of additional frameworks here and and certainly our universe. Without our universe, our concept, like, our multiverse is simply not possible because can you imagine, Mike, you and I, even if we had a lot of funding spending this up on our own, like, the amount of engineering it takes to build this in a way that now, like I said, can be built upon with its API infrastructure, with its automation, whether it's powered by GitHub actions or other slick services, just the amount of attention to detail that's been laid here. Yeah. We are we are very thankful that this exists at all.
[00:10:16] Mike Thomas:
Yeah. I can't imagine how much the government would have paid a big four consultant to put together a project like this. I think what your own has done, government dollars is probably probably 1,000,000. But
[00:10:32] Eric Nantz:
Sort of looks that way when you look at the UI, doesn't it? Because everything is so polished and, even looking at you know, you can have we we often say in the art community, we have these, you know, verses of packages such as, of course, the tidy verse. And in my industry, a a suite of packages called the pharmaverse. I have an entry to the pharmaverse right on this page. And when I click it, I immediately see all the packages inside that. It's very searchable, very you know, like I say, you can put an API in front of this should you wish. There's so many different ways to get to the interesting parts of a package, including the documentation, which is rendered on the spot in this platform and is absolutely the the attention to detail cannot be understated. So I I expect there's gonna be more really big things coming to this platform that, again, our universe itself will will be a front runner to this, and then other projects like multiverse are gonna hugely benefit from this too.
[00:11:33] Mike Thomas:
This is so stinking cool. I'm looking at the the pharma verse sort of landing page in the our universe under underneath our universe, and it's it's really really cool that the visuals that we have access to just the amount of of automation and ease of use for, working with our tooling and our packages out there and that the way that things are really beautifully organized and documented here is is fantastic. And I guess just to go back to to an earlier comment, and please don't fire me from the R weekly highlights podcast, but if I could speak it into existence, it would be pretty cool to have something like this that could service both R and Python packages.
You know, if we think of we're doing a lot of work building both types of packages for the same sort of function. If you think about what POSIT's doing with, like, both GT and great tables, it's pretty much the same functionality just serving both user bases and we've been doing a lot of the the same lately, and to have sort of one place where maybe the binaries are already built as well to install those things would be be pretty cool. That's why I was thinking multiverse might be multilingual, but but we'll see. Just throwing that out there.
[00:12:47] Eric Nantz:
Yeah. Who knows? We'll take it. And, admittedly, I may have stepped into it a bit on on Mastodon a little bit when I couldn't resist taking the bait from, Bruno Rodriguez when he was talking about his, disdain for managing Python dependencies and educational projects. And I had a, you know, rather snarky comment about this is one of the reasons why I try to avoid Python. It's going through those nightmares. But hey. You know what? They could learn a thing or 2 from our universe. I'm just saying. Because I don't say anything like this with Pypie or anything of that sort, but I think a lot of people, you and me included, I do a little share of Python on the side here and there, would love to have this kind of curated resource, for multiple languages. I I think who knows? Maybe you heard it here first. Maybe we'll look back on this a few years later, and we'll say, hey. We're the ones that spoken in existence. Who knows?
[00:13:42] Mike Thomas:
We can hope.
[00:13:59] Eric Nantz:
Well, Mike, as usual, you kind of have a crystal ball. We were talking about multi language type, situations. Well, I think our next highlight is very much about a new product, an IDE, that is very much trying to be a multilingual data science, IDE powered by the latest innovations in software engineering, and we are speaking about Posit's new IDE called positron, which had its beta earlier this year. And then, of course, there's a huge focus in in quite a few talks at the recent positconf. But that, of course, posit has made the RStudio ID for many years, and it's got a lot of engineering behind it, a lot of commits, a lot of features behind that.
So you as the user who may have been using our studio for many years or even a few years, and you hear about positron, you may be thinking, I wonder if it's time for me to take a serious look at this. Well, there have been a few, few posts that have addressed this, but the latest post I think is, got some real nice insights here. This is coming to us from the jumping rivers blog, in particular, offer by Theo Roe. And the post is titled positron versus our studio. Is it time to switch? Now, of course, we always throw the caveats. This is certainly a subjective decision, but I like what this post is doing. It's kind of laying the facts down on what's currently available in between both environments and kinda comparing and contrasting different aspects of what you, as an R programmer, expect to see or have to utilize in IDs of the past.
So the first thing right off the bat we wanna mention is that Positron is not solely focused on r. It also has support for Python and Julia, but also additional languages that wanna come to play, so to speak. There are ways through its APIs under the hood for a new language to talk to its engine, so to speak. And we'll talk a little bit about how it does it with the r language in a little bit, But this has been built unlike with RStudio, which, let's be frank here, is predominantly an r based IDE with a little bit of Python here and there with Reticulate, but you don't need that with Positron. If you have Python available, if you have R available, Positron is gonna pick it right up. And there have been many, many people, you included Mike, who just mentioned you may be switching between the frameworks for a given project.
In positron is just a toggle in the upper right corner away from switching from an R environment and R interpreter to a Python interpreter. So that is already, I think, for those that are operating in multi language projects, a huge win in the positron favor. There are other things that you might have to get used to in positron that maybe we're kind of bolted on to our studio later in the game, so to speak. Such of one example is the command palette. If you use IDEs like visual studio code before or Adam before that, you may be used to the command palette as a way to quickly bring up a well, it looks like a little search box. You You start typing a few keywords and it will auto complete to a particular command based on what you're searching for, such as maybe adding a new git commit or bringing up a new file, rendering a new app, or things like that.
RStudio never came with a command palette until probably about a couple years ago or so. Now there there may be, you know, those that might say it doesn't quite feel as native in RStudio as it does in Positron. But in Positron, you kind of have to get used to it because that's the way to really unlock most of the functionality in an IDE of a Positron is to interrogate that command palette to get to what you need to run or what you need to open and things like that. So there are ways you can bind additional shortcuts to that, but it is something to get used to alongside the way that it handles settings.
Those are a bit different too. Rstudio, there was a way to find them in, like, your in a config file in your home directory or somewhere else. Positron is similar, but it's kind of agnostic to, you might say the language being used inside. There are plugins or extensions based on languages from time to time, but it's basically a JSON file. You can get to it either using the interactive setting toggle in the editor or just editing that JSON file directly. You get the kind of choose your own adventure with that. It would trip people up in the RStudio world sometimes to figure out exactly where that file is stored and trying to edit it outside of, like, the GUI elements of the settings. So that might be advantageous to you if you wanna really customize your experience with Positron but do it through, like, a file based way.
Other things to watch out for or there may be a benefit to you depending on your perspective is that in positron, you're not necessarily gonna have to spin up what are called those r project files. So the file is starting in or ending in dotrproj. Our studio used that extensively for things like setting up a git repository in a project, setting up a package, other other uses like that. In positron, It's just the folder. You can have what's called a workspace, set of settings in that folder, which basically are a way to customize settings per folder if you wish, but you're not gonna need that dot rproj file to tell positron that you're working in an r project when you work with that.
There have been some people that really like that file, and there have been an equal number of people that really don't like that file in the repo. So that might be helpful to you if you've, you know, had some angst on that in the past that Positron is kinda doing away with because, again, they're building upon an open source clone of Visual Studio Code. They didn't they're they're piggybacking off of another effort with a lot of toying on top of it. Whereas our studio was built literally from the ground up to be a first class r based, data science editor. A couple other things I'll mention before I turn it over to Mike here is that the layout will look a bit different.
I've been used to the layout in Positron because it does have a lot of similarities to the Visual Studio Code layout that I've been using a lot of my open source projects, but you will often see in a default layout in Positron, the file pane is on the left, and this file pane has a lot more going for it than I think the one in RStudio does, and I don't think I'm upsetting people when I say that. There's a lot more you can do in the file pane and positron versus what you could do in the RStudio file pane, such as even just expanding folders with, like, the toggles to expand the nesting or whatnot.
Little things like this can add up for a bigger project. Trust trust me on that. But then you also see on the left side your extensions in another tab, your, Git, integrated Git console in the other tab. There are people that like the the visual visual studio code or the positron way of doing Git versus the way our studio does Git. Your results may vary depending on where you fall on that fence, but it's all right there. There isn't anything special you have to do for it. It'll pick up Git right away. But with that, there might be some things that aren't as intuitive in the beginning. You just have to play a bit a bit. But with the extension ecosystem, you can supercharge your Git experience with extensions like Git Graph.
There are other ones called Git Lens that can really do some slick things for your command pallet and Git operations. There's there's a lot going on here. The other thing I'll come in on that I think is something you wanna look into is the data explorer and positron, which is something the visual studio code does not have. This is something new that posit created in this version of positron. You get a much more richer experience for when you're looking at your data frames to sort the columns by multiple columns if you wish. The filters are gonna be much more intuitive to work with because they're gonna be across the top of your of your data frame and instead of above the actual column itself. So that way you can navigate them much more quickly.
And it can handle larger data sets about, you know, causing your IDE they'll wait by 5 or 10 seconds although the the snapshot arose. So there's a lot of engineering behind that I've heard from the previous talk, so it might be worth a look. If you really find yourself using that data viewer in RStudio a lot, I think the positron data explorer is something you where you wanna take a look at. There are some things to be aware of that just aren't gonna feel as native right now, such as the use of add ins that RStudio used to kind of give you that additional functionality in the editor without building it into the editor itself.
I'm hearing there isn't like a direct one to one to use those just yet. Although it makes me wonder because with the our extension to visual studio code that I use for many years, there were, features that were added to extension a couple years ago to leverage add ins in Visual Studio Code. So I know it's possible, but we'll have to see if positron adopts that or posit adopts that for positron in the future. So with that, any additional functions that use that Rstudio API package that was often used on the back end of Rstudio itself to kind of interrogate features of the ID, that's not going to work well either. And Rmarkdown, you can do Rmarkdown in positron but it won't feel quite as native as it will in, the Rstudio ID.
But if you moved on to quartile that may not be an issue for you because quartile, however, has very first class support in positron. So the question might be have I switched? I am not fully switched yet, and I'm not saying that because I'm using RStudio a lot. It's because I'm still using Visual Studio Code a lot because I have so many workflows that have built upon it. However, now that Bruno and company have figured out a way to get positron installed on Nix systems yes. I have positron on my Nix system. So I am trying to use it. I'm trying to adapt my visual studio code workflows into that. Mostly going well.
The thing I miss the most is the dev container stuff. I live off the dev container feature that visual studio code has. Unfortunately, there's no easy way to get that in the positron because that's a Microsoft specific extension. It's not an open source extension. That's another thing to keep in mind. There may be a few extensions in Visual Studio Code that do not work in Positron because it's using the open source extension registry, not the Microsoft specific one. So with those caveats in mind, I think it has tremendous potential. It is still in beta. So your results may vary depending on the project you like or your entire use of it. But the foundation is there to carry out positron, to really carry out its vision, to be in a first class multilingual environment. So I'll be watching it closely, and I'll see where it breaks and share when it does.
[00:26:02] Mike Thomas:
I'm gonna continue to watch it too, Eric. I have not made the jump yet, and I need to. I need to start exploring it a little bit more. I am very locked into Versus Code and and dev containers and I guess I'm gonna blame you, as opposed to myself. I'm gonna deflect here and say that, you were the one that started me out down that journey that has, locked me into that tooling for right now. I'm just kidding. It's hugely hugely helpful. But, the command palette, you know, we should talk about the the fact that positron so closely mimics Visual Studio Code. And I think for a huge proportion of data scientists out there that are more comfortable in probably our studio than in a, you know, more developer type environment like Versus Code, Positron is a perfect bridge between those two things, in my opinion.
I think it starts to bring in some of the best elements from something like a a v s code or, you know, like a full stack developer platform, into a an environment that RStudio native users might be a little bit more familiar with. You know, the command palette being one of them. I I know that there was a command palette that existed and it and exists now in our studio, as you mentioned, for the last couple of years, but, it's not quite as obvious, if you will, as the command palette that exists in in Positron. That's sort of, you know, a little bit more obviously put in front of you and drives a lot of the functionality, of of the IDE itself. You know, another thing that I think is a benefit and one thing that I love about Versus Code as opposed to RStudio to be able to sort of search all of the files in your projects, there is a Find in Files button in RStudio underneath edit and it it's handy. Works very very well. It takes a couple clicks, you know, or if you know the keyboard shortcut, you'd get to it a little bit quicker. But there's a giant magnifying glass in the left hand sidebar in Positron and Versus Code that allows you to immediately do that. I think it's it's much quicker, and these differences are subtle because I I think the functionality still exists in both places, but the UX is just a wee bit better, in in Positron than in RStudio that I I think, you know, that slight difference, makes all the difference to some extent, if you will.
I think a really interesting thing is, you know, the lack of a need for dotrproj files or our projects you know this is again sort of moving folks away from our specific workflows into you know slightly more developer specific workflows and understanding how to interact with working directories without running set working directory if you can help it right so I don't know how this impacts the the here are package that was developed by by our studio and now pause it I think it'll still work fine because it'll look at your working directory as well and I think it'll create relative links to that, but I know that some of the functionality of the here package, actually looked for that dot our project file if I'm not mistaken, and sort of recursively search to be able to find that to figure out, you know where the the working directory was that needs to be set in order in with respect to all the other files that you want to work with so I'm not sure how that plays into this whole positron IDE e in the event that you know you you have a lot of workflows that depend on here and maybe it's a little less stroke straightforward to create a new our project although I do see in a screen shot here under the workspaces our project section it's very faint but if my eyes aren't deceiving me there's actually a button that says new project under our so maybe that takes care of that that for you I know that you know you may not necessarily need to do that, but if you want to do that and you have a lot of, you know, workflows within your organization and your team that leverage our projects and all of the different functionality that that comes along with that and you wanna continue to do so, it looks like the functionality is still there for you. So that's just something to something to watch out for as you move from RStudio to to Positron and sort of decide on which pieces of of functionality in your current workflows you wanna continue to to leverage in which you you may wanna change, to evolve, you know, your team's practices, if it makes sense to do so. But I thought that this was a really nicely comprehensive blog. I I do think that one of the strengths here that we're going to get is the vast, vast ecosystem, of extensions within that open VSX repository or community of extensions, whatever you wanna call it. I know that there's a lot of RStudio add ins but I can pretty much definitively promise you, that the number of extensions in the OpenVSX ecosystem probably is is quite quite larger than that.
One thing that I absolutely am envious of for the Positron users who are already using it and and something that'll probably push me over the edge here to start using it is the data viewer. There is one in Versus Code. It leaves a lot to be desired. Obviously, the put it nicely.
[00:31:37] Eric Nantz:
I know.
[00:31:38] Mike Thomas:
Obviously, the data viewer in RStudio is is is great. You know, it's it's geared towards data scientists and exploratory data analysis. But this is what we have in positron is is the Rstudio data viewer on steroids I would say you have you know column level summaries summary statistics including missing values in the left hand sidebar while you're viewing your data, in the majority right side of the screen as well as the filters that have all been applied that can easily be, you know, added to or removed, through a click of a button kind of along this nav bar at the top. I think it's a fantastic UI.
I think it's really super powered data viewer compared to anything else that I've seen today. I have seen some products like this that are just, just, you know, standalone SaaS platform data viewers, if you will. But to have this in our IDE, in the same place that we're doing our development work is really really exciting. So, you know, excellent job by jumping rivers. I think to summarize, you know, the trade offs here and the benefits that we have from the Positron IDE. And I'd encourage anyone that hasn't had a chance to check it out yet, self included, to to check it out as soon as you possibly can.
[00:33:00] Eric Nantz:
Yeah. I mean, certainly, it's becoming easier to install as I suffer even, you know, major geeks like me and others. Now we can install it on next. I have put it through the paces a little bit. I started theming a little bit. You're right about that extension ecosystem. There is something for everything. And one of my favorite extensions I was using in my live streams way back when, turns out it's available in the open source one, they call it power mode, where when I type, I get these nice little, like, explosion sparks happening next to the words as I type just to give a little flair. And I was like, there's no way that one's on there. Oh, sure enough. It is. So I can replicate some of my live streaming experience in positron that I had for my shiny dev series stuff from a while back.
I will also have links to a couple additional posts, some of which have been covering highlights before. One of those was some Athanasia Mowinkle, and she talked about her experience of Positron. I hate to say it looks like that here package or project stuff isn't working as nicely in Positron as we're hoping for according to hers. So I'll have to see if that gets better over time. And as well as a post by Andrew Huysse who also has used Positron a bit and gives his 2¢ on his favorite extensions and the customizations that he's done to make it, you know, the experience more fit for his workflow. That's a key right here. Right? Is that Positron is already giving you a lot of nice, you know, all out of the box configurations but there's nothing that's locked so to speak. You can tailor that to whatever you see fit and with the power of the vscode you know open source code I should say foundation you can do all sorts of things. You can get your VIM key bindings. You can do all sorts of interesting ways to make that your own experience.
So I will admit at the day job, we can't really use, positron yet because it's not part of the, posit workbench enterprise product just yet. They're obviously gonna posit's not gonna put that in until it's production ready, so I still have to wait a little bit for my work projects. But sorry for my open source stuff, I'm gonna give it a go and report back on what what breaks and, hopefully, what works even better. You know, Mike, we are in the you might say now we're starting to get into the doldrums of winter, but our next highlight has a very interesting title to it because it speaks on a few different levels.
This is talking about one of our keynotes that was given at the recent R pharma 2024 conference by Opposits CTO, Joe Chang himself, on the new tooling that's coming to the our ecosystem with the realm of artificial intelligence and interacting with large language models. The talk was affectionately to titled Summer is Coming AI for R Shiny and Pharma. So we have talked about some of the new tooling already in previous highlights of this show when we spoke very highly about the Elmer package, which is a key focus of this talk, which is giving you in the our ecosystem, very, you know, robust compliant way to call different LLMs both in, you know, third party services like chatgpt or Claude or our others, as well as self hosted versions of it, as well as the accompanying package ShinyChat, which gives you a way to bring that LOM kind of console experience into your Shiny applications and build upon Elmer to do that.
This talk was a tour de force of a few different concepts, but I wanna set a little bit of context here because, first, I have shared on this show. I've been a skeptic as anybody about kind of how AI can be put in directions that shouldn't be put into. It can be almost nauseating scene. Some of the fluff that's put out there on cough cough LinkedIn about some of the weird uses of it. But guess what? I wasn't alone in that skepticism. Joe Chenk himself was very skeptical of this. It took him a while to warm up to this. He had some epiphanies earlier in the year and combined with, you know, getting to know that the AI tooling as as, as you see it, there's more than meets the eye to steal the frames from transformers because you can build on top of these services. And that wasn't something that came obvious to him right away.
But this talk is first introducing again, the aforementioned new tooling in our, the Elmer package and the shiny chat package with the demonstration that was lifted from deposit conf talk he gave where we have what's called the restaurant tipping Shiny app, where instead of the app developer having to build a whole bunch of sliders, select inputs, toggles to try and be proactive, so to speak, on what the user wants to do to explore the restaurant data. There's, like, a chatbot on the sidebar where the user can type in a key question like, you know, what is the average tip rating for males in this year or whatever?
That is a prompt going to an LLM to translate that prompt into a SQL query and update the Shiny app on the spot. That was an eye opener for me when I first saw that. And then when Joe had mentioned that he was going to give this keynote at our pharma, he, you know, had had a quick call with me to ask, you know, what what can we do to make this a little more relatable to the life sciences folks because, yeah, we all love restaurant data. But this audience in particular, we can be pretty skeptical of things. Let's put it that way, and we we often have to be right. It's a very high regulated industry. So I gave him a little seed of what if we take part of the shiny app that we did for this, our consortium submissions working group, where we sent a traditional Shiny application with a few different summaries to our health regulators as a as a way to prove it out that we could we could send a Shiny app for a submission.
There's a portion in that app, I'll have a link to in the in the show notes, where we have a survival type plot of time to event, so to speak. And we had a couple sliders and toggles that were built by the teal package to explore that data going into the plot. I thought, why not have that chatbot in this display? So Joe, to his credit, it only took him about a week to do this, but he spun up another demo for this presentation to take that Kaplan Meier interactive visualization built with ggplot and put a chatbot into the left of it so that we could ask similar questions on different partitions of the data, and the plot would update on the spot.
New sample sizes, new distribution curves, or survival curves. Amazing. This to me, to be, you know, putting my head my spin on this, is a very intriguing feature when we start looking at data reviews and hopefully finding ways to get to insights more quickly, but in a controlled way. When I say controlled way, that's another part that Joe emphasizes here is that the way this is all built is a very intelligent yet diligently structured prompt that's going to the chat server or the LOM when the app is launched so it has a context set correctly. Now correctly may be a a strong word here because nothing's ever absolutely perfect in this realm of LMs, but it's trying to control the possibility to its best extent of the l m giving complete nonsense to the result coming out and telling it to do a SQL type query that is getting gonna be used to filter the data going into that plot, kind of a translation layer on top of it.
The other key concept is that these packages are leveraging another functionality that you might need if the LOM can't do everything on its own. That's a concept of tool calling. Another eureka moment in my mind where maybe in the example you gave in a talk, you ask an LOM what's the what's the current weather in California. It may not be able to do that on its own because it kind of needs to have an interactive way to look up that weather at a given resource. So Joe's example was giving it access to an r function that calls an API for weather data.
Having the function be documented with a parameter like the city name or whatever have you, the LOM calls that function and then takes its result and it gives it back to the user. But it's kinda like that assistant to the LOM to get the job done. That was the overall moment to me is that we don't necessarily have to be limited by just what the l m can do on its own. We can augment it with other services, other ways that if you can code it up in an r function, you might be able to use it as this tool paradigm with what Elmer can do to call to these LOMs for you on your behalf.
And then the last part of the talk was talking about some of the practical considerations, and there are quite a few. I think the the parts that show me that he is still grounded in this. It's okay to say no, folks. If they've given you results that don't make sense, it just may be time to move on to a different solution. But putting in use cases where the answer is not always so black and white, it may be more of a layer to get to a final answer where you have a little more flexibility, but also keeping a human in the loop, which, again, in my industry, you better believe we better keep humans in the loop when we look at look at these results, there's still a lot of productivity gains to be had if you can harness this the right way. But, again, a very realistic talk. Again, he's excited about the tooling, but he is being realistic too. This is not gonna solve all the world's problems. It's not gonna magically put our drugs on the market, like, half the time as it currently takes.
But I think this can greatly help certain aspects of development such as the way we produce either applications or produce our tooling to interpret this data, get us the insights more quickly. There is a really robust q and a after the talk. I had the pleasure of moderating that, but then we'll also have linked another dedicated talk from our APAC track of our farmer where Daniel Beauvais led a q and a with Joe Chang himself who actually called in later that night around midnight his time to join that call just because he was so passionate about connecting with the Asia Pacific colleagues on it, and there's some great q and a in that in that session too. So am I still skeptical?
I won't lie. I'm still kind of skeptical of certain things. But what Joe gave me in this talk was a way to show that, like I said, there's more than meets the eye for how he can leverage these l o m's and the tooling in front of them to craft a solution that I think is more fit for purpose to your particular needs and cut out all the noise you see in the various social media or other, tech spheres.
[00:45:00] Mike Thomas:
I'm I'm very aligned with you, Eric. I think that the way that Joe is approaching these concepts and the way that Pauseit, in general, is building this tooling out, I think is is fantastic, and it and it aligns with sort of what I would hope for. I was I was quite skeptical, and then, Eric, I know we were both at Positconf this past year, and I watched that keynote talk, that hour long talk from Melissa Van Bussel on practical tips for using generative AI in your data science workflows, and it changed things for me a little bit. It was very applied.
It was very geared toward the audience. And, just to be honest, there were a lot of things that she had presented that I didn't know were possible. And I thought that I knew everything that there was to know about about AI and I I thought that the cons outweigh the pros, but that talk in particular sort of brought the the pros up to to maybe as even with the cons, maybe if not if not more, and maybe want to start to look into things a little bit more in this space. Try some things out and tune out, like you said, maybe some of the LinkedIn narrative marketing hype BS that's that's out there right now. I don't seems like AI agents is is all I'm hearing about these days. I don't even know what an AI agent is. I don't really care to know either. Agentic or or whatever. But, yeah, another mind blowing thing. And, you know, hats off to you. Did a fantastic job moderating this talk.
It's well, well, well worth your time, if you're in the R or data science space and and trying to make heads or tails of these l l m's. And I think that the tooling again that Joe and the the team have put together for us is really really cool. I mean you can't watch it and and say that it's not super cool or super interesting. Whether or not you wanna leverage it is totally up to you and your use case. But some of the possibilities that we have here are really really cool. And thinking about these large language models is maybe a step in the workflow, and their ability to call, you know, another process like an an additional API is really really interesting and work with them. I know that recently, the open a I, if it can't find the answer to your question and it's training data, I believe it can like execute a Google search, or a web based search and then sort of fairly quickly execute that search, process the results that are coming back, you know, maybe just looking at the first few links and trying to crawl over them and and leverage those as the context that it's using to try to answer your question, which is is pretty incredible.
I know that that, you know, just as a tangent here, while we're sort of still on the AI topic, the SoRA model was released from OpenAI yesterday, I believe, which was a long time coming and that is supposed to be text to video. So, you know, check that out. I would recommend, even if you're a skeptic and you you you're really really against this stuff, I think it's it's worth watching just to educate yourself and understand what the art of the possible is because the art of the possible is changing every day. And we're trying to do a lot of thinking at Catchbook about how we are going to integrate these into the Shiny apps that we we develop for our clients in a way that that makes the most sense isn't going to just, you know, involve our team going crazy with all of this stuff to the point where we're just, you know, strapped for resources because everybody wants this. We're we're really trying to work hand in hand with our clients to figure out, you know, how and where it makes the most sense to leverage this type of technology.
So the videos like this and the tutorials, that really take a practical approach and hands on demonstration about how to go about injecting this functionality into your your Shiny apps, are invaluable for us. So a big thank you to you, Eric, and and Joe, for all of the time and effort that you've put into trying to do that for those of us on the ground.
[00:49:16] Eric Nantz:
Oh, I felt he he did all the the hard work. I'm just, like, are you are you kidding me? Is that even possible? I mean, it it is it is amazing to see what we can do. And and, honestly, yeah, I I definitely had almost like a closed eye perspective on this. I just got I got, you might say, perturbed too much by all the noise out there before really giving it a fair shake. But, like you said, we were sitting next to each other at Pazacom. That was step 1. And then step 2 was this talk because now it wasn't just a quote, unquote, you know, fun toy example. Now it's like, okay. What can we do in life sciences that will open some eyes? And there are so many other areas we're pursuing too. It's not just, you know, quote, unquote, the interactive data reviewing. There are many other realms of automation that we wanna use to make the mundane more done more quickly and hopefully find advancements in training these models or giving it the prompt to train itself kind of on the fly.
But that that there is another part of that talk that you all should see. Again, being realistic here. There's some areas that surprised them as well at POSIT, such as when they try to use LOMs to help ingest their documentation, their technical documentation, and then putting, like, maybe a bot in front of that. They had very mixed results on that, which kinda surprised them. Right? Because technical documentation, that's literally the source right there. Right? You would think an LOM can ingest that and then immediately when they get a question about it be able to surface that more effectively. And I know others are looking into that as well. I thought about that area for some of my internal documentation because I write a great website on using r and HPC environments.
It'd be great to have a little bot next to it that people can type their question on, and it's gonna use that doc to kinda help point them in the right direction without, always emailing yours truly when something goes crazy. Not that I'm not that I don't like helping people, but, there's a there's a balance. There's a balance there. So I'm I'm intrigued to see where that goes for sure. Me too. Yep. No. Boundaries are important. There are no boundaries, so to speak, when you see just the breadth of what's possible in this ecosystem these days, and I dare say they are with the issue. Not to sound biased here, but I think we gotta build something for everybody here and all the the full gamut of new packages, and there's a a good chunk of them in this issue. I'm I wasn't I wasn't, I wasn't shy about putting all these great new packages in here as well as updated packages.
Some really interesting tutorials, so we'll take a couple of minutes for our additional finds here. And it's December. If you like me like music and you're on social media, you're on. Lyc. These the the Spotify wrapped post that often people are showing about their favorite tracks that they've listened to in 2024. It's always, entertaining thing to look at. Well, Andrew Heiss, we mentioned him earlier, kinda took matters in his own hands because, a, he doesn't listen to Spotify, and frankly, neither do I, but he listens to Apple Music. So he has this great post called Apple Music wrapped with r, and he leveraged a way of exporting out the metadata associated with his listens from Apple Music and Itunes because there's some somewhat interesting XML based files that you can correct from this.
Built some code via tidyverse type packages to process that XML data with a little bit of intelligent, like, time lapse, you know, summarizations. And he was able to derive basically those key metrics that we often see in the Spotify rap and, you know, got some interesting results on what he's listening to and, to the surprise of no one potentially. I don't know Andrew personally, but apparently, Taylor Swift is in his, his top, tracks to listen to, which I think many in the world would have that same thing. This inspired me and not just on first Andrews as usual. Don't know how he finds the time to do all this and plus he he, wrangled some gnarly XML to make it happen.
But I've recently spun up some really intricate self hosted version of my music listening. I took a day during my break for Thanksgiving, and I ended up ripping a whole bunch of my CDs that I bought when I was a teenager, music CDs, onto my beefy little server here in the basement because I thought these CDs aren't gonna last forever. I might as well rip them up and put them on the MP Threes on my or FLAX, I should say, on my server. But then I thought, well, it's great that I have them out of these files. That's no way to listen to it. There's gotta be a better way to have, like, a Spotify like experience.
So I found this program called Navidrome, which basically I can put in a Docker Compose and my Docker container, serve up the MP threes from a directory, finds the album art, finds the metadata for the artists and, you know, the track and whatnot. And I can basically listen to my songs even in the web player, which doesn't look like much to shout about, but it's an API under the hood much like our universe has an API under the hood. And if you heard about a framework called subsonic, if you have a Subsonic compliant player, you can basically tap into that service and put it on your phone, put it on your computer, whatever have you.
So that's great. But then I thought, well, it's not really keeping track of what I'm listening to. Sure enough. It has a plug in for that too, combined with another project called Melaja. Don't don't ask me how do you name these things, but it basically gives me a way to track every time I play a song. But I keep that data in house folks. I ain't going to Spotify. They ain't going to, what's it called last FM or anything. So next year, I'm gonna speak this in existence. And Mike, you're my accountability buddy here. I'm gonna make a version of Spotify wrapped but completely self hosted with my geeky taste in music. So you heard it here first. So, Andrew, thanks a lot. You may have just nurse sniped me into another project. How do you top that? I have no idea.
[00:55:33] Mike Thomas:
I don't know. And if for the folks that have listened long enough or or know us, there's nothing more we love than than music and nerding out. And when you can combine the 2 of those, it's bad news for everybody else. But that's a great that's a great find. I wanna call out a blog from, for any of the cinephiles out there. A blog from Mark h White the second who is a PhD, and it is about, predicting best picture at the 2025 Academy Awards. As of he's updating this weekly with his predictions on the win probability, that he's seeing based upon, the sort of critical reviews that are posted online.
And it looks like, the brutalist is just ahead of Wicked, in the ranking of which, movie is is most likely to win best picture at the 2025 Academy Awards. So really really neat little blog post. Nice little interactive, visualization here at the top of it. And check-in each week on the blog to see who's winning.
[00:56:36] Eric Nantz:
That's awesome. Yeah. I know a lot of people like to do those predictions and it's always, you know, somewhat fun, sometimes scary trying to predict what ends up being a very subjective voting as the Oscars are or Academy Awards are. So, yeah, I'll be interested to see how that shakes out. And, like you said, very interactive, plotly visualization, Plotly, another package. I should give thanks to Carson Seifert every time I see him for this because I use it in all my apps usually for shining, but, but there are other ones too. As we all know, we can't say Plotly without giving good kudos to echarts from our good friend, Jon Coon. So we're fair and balanced here on this podcast.
[00:57:19] Mike Thomas:
Fair and balanced, and I will extend an olive branch as well to, Kelly Baldwin who wrote a fantastic article on her adventures with Advent of Code using data dot table solutions.
[00:57:33] Eric Nantz:
That was awesome. I I'd love to see that see that. And, I even saw this wasn't necessarily our specific on on Blue Sky, a post about somebody using DuckDV to do the app and a code. Yes. DuckDV with SQL queries, and it's special. I don't know how all of you are able to do this. I was actually talking to a few people earlier today about it. I'm someday, I will do Advent of Code, but my goodness, I feel like I am so far behind. It's almost like imposter syndrome just thinking about it. So I love living vicariously through Kelly and offers as they do this.
[00:58:10] Mike Thomas:
I feel you as well. Someday, we'll get there, Eric. Yep. We can be accountability
[00:58:15] Eric Nantz:
buddies on that one too. But what we are accountable for, hopefully, is sharing, you know, what we find so exciting about the our weekly project and this particular issue. But, of course, you can find this and all the other issues at rweekly.org, as well as how you can give back to the project. And the best way to give back is to share that great resource you found. Or maybe you created that great new package and you want the art community to know about it. We are a poll request away to use the GitHub language. You just find that little poll request, the Octocad icon in the upper right corner. You'll be taken directly to the template. You can fill out the poll request right there. We got nice little template text and navigate with sections your resource should go in. But, again, that curator for that week will be able to merge that into the upcoming issue, and we love it when we get your contributions. It's always a a smile to my face whenever I get the curation, and I don't see a 0 for poll request. This is one time I want the poll request. There's several times that you're dreading it. Not that I would know anything about that. Nonetheless, other ways to get in touch is with us specifically.
We have a contact page that you can find in the episode show notes. You can send us a quick note there. Also, you can send us a fun little boost in Podverse or Fountain or Cast O Matic if you're on a modern podcast app. We have details for that in the show notes as well. And we are on these social medias when we're not being drowned out by AI noise that you might see on various spheres. You can find me on Mastodon where I'm at our podcast at podcast index dot social. I am now more recently on bluesky. I am at our podcast dotbsky.social.
That's a little addendum I should make. I have seen people put custom domains on that, and I need to figure out how they do that. I may be tempted to do that in the future. But, nonetheless, that's where you can find me currently. And I'm also on LinkedIn. Just search my name, and you'll find me there, and I promise I won't send out garbage posts about AI on there. But, Mike, where can the listeners find you? You can find me on mastodon@[email protected].
[01:00:26] Mike Thomas:
You can also find me on blue sky atmikedashthomas atbskor.bsky.social, or on LinkedIn if you search Catch Brook Analytics, ketchb r o o k. You can find out what I'm up to.
[01:00:42] Eric Nantz:
Excellent. Excellent. Always great to see what you're up to. And, you know, I I considered a badge of honor that I tune you in on to the death container round. I have no regrets about that in the least, buddy.
[01:00:53] Mike Thomas:
I am so grateful for it.
[01:00:55] Eric Nantz:
Yes. If I can get my day job and do more of that, but let's not let's say that on a positive note. This was a a great episode, I dare say, and, we hope that you enjoy listening wherever you are. Again, we love to hear from you, especially as the year wraps up. It's always great to hear how your year has been in the our community and your journey with our end data science. We always love hearing your stories. That'll put a wrap on episode 189. That means we're 11 away from 200. And one way or another, we'll get there. But we will be back with another episode of our weekly highlights next week.
Hello, friends. We're back with episode 189 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the terrific highlights that are shared on this week's Our Weekly Issue. My name is Eric Nantz, and yes, we are now getting towards the middle of December. And it is the season of giving in many, many were parts of the world. And, of course, over the weekend, my, kiddo was nice and gave to his entire family a nice little nasty cold that I'm just recovering from. Hopefully, the voice is up to snuff for this episode. But, yes, I am here. But, luckily, I'm not here alone this time because fresh from his travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today?
[00:00:41] Mike Thomas:
travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today? travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today?
[00:00:55] Eric Nantz:
you. And luckily, no Wicked Witch is, delaying your flights or anything, so that's terrific. Not this time. Thank goodness. Yes. Yes. It is a busy travel season, so we're always, like, you know, consider ourselves lucky when things go smoothly. And and, honestly, we are also very lucky that we have, on my opinion, a fantastic issue. And I'm not just saying that because, check check notes here. Oh, yeah. It was me curating it this time, but, no, this is awesome because I have all of you in the community to thank for your awesome resources. I was able to merge into this issue, and and I dare say, I think this was a human smoother process than the last time I curated it. And we had some nice poll requests that I merged in and a lot of content that we can talk about. But we'll be focused on the highlights this this time around. But, of course, since this will be the last issue I curate in 2024, just wanted to do a dramatic pause there for a fact, I definitely wanna thank the rest of the team throughout the year. It's been terrific fun to work with you all, and I'm looking forward to having more great great adventures of our weekly in 2025.
But without further ado, let's get right to it. And we are talking about now for our first highlight, a very key piece in the world of the R ecosystem, in particular with package repositories that has had already a wonderful effect on many parts of the r community. And with this recent news that we're about to share, we think there is even bigger things to come for this project. What are we speaking about? This is the R Universe project that has been led by R OpenSci and their, chief engineer, Yaron Ohmes, who has shared with us and as long with the R Consortium blog that the R Universe has been named the newest top level project under the R Consortium umbrella.
Your first question probably is, what does a top level project actually mean here, Overnon sounding really important? Well, and technically speaking in the blog post, it is a particular project that was now going to get 3 years of funding from the ARC consortium, and this is a recognition of this being recognized along with other projects that are currently designated top level which include DBI, the database, you know, common interface that many of the R Packages involving with databases utilize, as a key dependency. The R Ladies initiative and the R user group support program or the RUGS program helping to bring, you know, infrastructure around having all sorts of R type meetups and community resources.
Our OpenSize R Universe is now joining that effort. And there's a great quote from the executive director of the R Consortium, Teri Christiani. She says and I quote, the ability to find and evaluate high quality R packages is important to the R community, and the R Consortium is pleased to support the R Universe project with a long term commitment to help strengthen the foundation of the r package ecosystem. We are pleased to be working more closely together with our rOpenSci on this effort. And there's also a great quote quote from the new executive director of rOpenSci, Noam Ross. He also touches on, you know, the importance of our universe and their excitement to work with the our consortium to strengthen the infrastructure even more.
And we'll also have linked in the show notes. The the you might say the companion blog post to this from our open side directly, from Noam Ross and Yaron who co authored it. It highlights another point that I do want to emphasize is that our universe, yes, on its own is already bringing this revolution of a, you know, a package repository that's powered by a lot of DevOps principles, a lot of automation, and very very intricate infrastructure to help you as a package author give your users a way to have not just a source of a package available, but binaries that are compiled, and more recently even WebAssembly compliant version of your package. That's a huge win right there.
But our universe is now serving as the foundation of newer efforts in the r package management ecosystems. There is a new effort and I must stress it's early days, but there is some excitement around the R multiuse prod multiverse project, I should say. This is being headed by a team of people including Will Landau, the author of Targets, and this is building upon our universe's infrastructure on their automation, their compilation of binaries to help bring a transparent kind of governance that combines with our our universe's infrastructure for another type of package repository.
Certainly, industries are paying a very close attention to this. It is early days, but it just goes to show that the Our Universe platform could be used for more things than just Our Universe itself. And that's what we're probably gonna see in 2025 and beyond. So I am thrilled to see this, you know, more, you can say, rigorous or just more robust backing of the project. We know that times can be tough financially for certain vendors. So the fact that now, our open side can count on this additional funding for this particular project, I think it's gonna be a huge win for your own and and the rest of the our open side team as they make, our our universe even more of a awesome platform for a new and and older package authors alike in the our ecosystem.
[00:06:55] Mike Thomas:
Yeah. Eric, one thing that I will say is a quick call to action as you get toward the end of the year and this isn't to to my organization's own horn or anything like that but we donate every year as a charitable contribution to the our consortium into our open side if you work at a company and I think that it would make sense and you use our there and it benefits your company, I think it would make a whole lot of sense to ask them to do the same thing. This is the season, of a lot of corporate charitable donations, so it might be an easy yes and I think it's certainly worth asking the question considering how much some of these projects have given back to a lot of us. Our universe is incredible. If we think about what we sort of had to do before our universe, there was really no searchable way to figure out what sort of packages are on Korean. I mean, you can do it through, I think some of the API tools that'll give you a list of all of the packages on Korean right in your console.
And then you could do some filtering, but there's no interactive GUI, very visual, hyperlinks to all of the vignettes, with a hyperlink to the GitHub repository, with graphs that show when the most recent contributions were and who the contributors were it's just this fantastic visual medium uh-uh to be able to take a look at all of the different our packages that are out there that at least have been picked up by the our universe project and the scale of it still blows me away. This this whole infrastructure that your own and others has created is it's mind blowing that you know the amount of automation that's involved here, you know, the the way that packages are are being built as binaries instead of users needing to, you know, install from from GitHub and actually build that package themselves.
It's it's absolutely incredible. And to think that there's a a multiverse project out there and more coming on top of this is really really exciting to hear. A question for you, does does multiverse mean multilingual or are we still just talking r?
[00:09:06] Eric Nantz:
It is still focused on r, but, I actually don't have the full genesis on how the name came to be, but I think it is trying to bridge, as I said earlier, these, you know, concepts that can be derived from certain aspects of, say, CRAN versus certain aspects of our universe, and we're blending them together in a way that kinda takes the best of both worlds, as I say. And, again, this is all very early days. You'll hear more about this in 2025, but it is, you know, taking the the best of additional frameworks here and and certainly our universe. Without our universe, our concept, like, our multiverse is simply not possible because can you imagine, Mike, you and I, even if we had a lot of funding spending this up on our own, like, the amount of engineering it takes to build this in a way that now, like I said, can be built upon with its API infrastructure, with its automation, whether it's powered by GitHub actions or other slick services, just the amount of attention to detail that's been laid here. Yeah. We are we are very thankful that this exists at all.
[00:10:16] Mike Thomas:
Yeah. I can't imagine how much the government would have paid a big four consultant to put together a project like this. I think what your own has done, government dollars is probably probably 1,000,000. But
[00:10:32] Eric Nantz:
Sort of looks that way when you look at the UI, doesn't it? Because everything is so polished and, even looking at you know, you can have we we often say in the art community, we have these, you know, verses of packages such as, of course, the tidy verse. And in my industry, a a suite of packages called the pharmaverse. I have an entry to the pharmaverse right on this page. And when I click it, I immediately see all the packages inside that. It's very searchable, very you know, like I say, you can put an API in front of this should you wish. There's so many different ways to get to the interesting parts of a package, including the documentation, which is rendered on the spot in this platform and is absolutely the the attention to detail cannot be understated. So I I expect there's gonna be more really big things coming to this platform that, again, our universe itself will will be a front runner to this, and then other projects like multiverse are gonna hugely benefit from this too.
[00:11:33] Mike Thomas:
This is so stinking cool. I'm looking at the the pharma verse sort of landing page in the our universe under underneath our universe, and it's it's really really cool that the visuals that we have access to just the amount of of automation and ease of use for, working with our tooling and our packages out there and that the way that things are really beautifully organized and documented here is is fantastic. And I guess just to go back to to an earlier comment, and please don't fire me from the R weekly highlights podcast, but if I could speak it into existence, it would be pretty cool to have something like this that could service both R and Python packages.
You know, if we think of we're doing a lot of work building both types of packages for the same sort of function. If you think about what POSIT's doing with, like, both GT and great tables, it's pretty much the same functionality just serving both user bases and we've been doing a lot of the the same lately, and to have sort of one place where maybe the binaries are already built as well to install those things would be be pretty cool. That's why I was thinking multiverse might be multilingual, but but we'll see. Just throwing that out there.
[00:12:47] Eric Nantz:
Yeah. Who knows? We'll take it. And, admittedly, I may have stepped into it a bit on on Mastodon a little bit when I couldn't resist taking the bait from, Bruno Rodriguez when he was talking about his, disdain for managing Python dependencies and educational projects. And I had a, you know, rather snarky comment about this is one of the reasons why I try to avoid Python. It's going through those nightmares. But hey. You know what? They could learn a thing or 2 from our universe. I'm just saying. Because I don't say anything like this with Pypie or anything of that sort, but I think a lot of people, you and me included, I do a little share of Python on the side here and there, would love to have this kind of curated resource, for multiple languages. I I think who knows? Maybe you heard it here first. Maybe we'll look back on this a few years later, and we'll say, hey. We're the ones that spoken in existence. Who knows?
[00:13:42] Mike Thomas:
We can hope.
[00:13:59] Eric Nantz:
Well, Mike, as usual, you kind of have a crystal ball. We were talking about multi language type, situations. Well, I think our next highlight is very much about a new product, an IDE, that is very much trying to be a multilingual data science, IDE powered by the latest innovations in software engineering, and we are speaking about Posit's new IDE called positron, which had its beta earlier this year. And then, of course, there's a huge focus in in quite a few talks at the recent positconf. But that, of course, posit has made the RStudio ID for many years, and it's got a lot of engineering behind it, a lot of commits, a lot of features behind that.
So you as the user who may have been using our studio for many years or even a few years, and you hear about positron, you may be thinking, I wonder if it's time for me to take a serious look at this. Well, there have been a few, few posts that have addressed this, but the latest post I think is, got some real nice insights here. This is coming to us from the jumping rivers blog, in particular, offer by Theo Roe. And the post is titled positron versus our studio. Is it time to switch? Now, of course, we always throw the caveats. This is certainly a subjective decision, but I like what this post is doing. It's kind of laying the facts down on what's currently available in between both environments and kinda comparing and contrasting different aspects of what you, as an R programmer, expect to see or have to utilize in IDs of the past.
So the first thing right off the bat we wanna mention is that Positron is not solely focused on r. It also has support for Python and Julia, but also additional languages that wanna come to play, so to speak. There are ways through its APIs under the hood for a new language to talk to its engine, so to speak. And we'll talk a little bit about how it does it with the r language in a little bit, But this has been built unlike with RStudio, which, let's be frank here, is predominantly an r based IDE with a little bit of Python here and there with Reticulate, but you don't need that with Positron. If you have Python available, if you have R available, Positron is gonna pick it right up. And there have been many, many people, you included Mike, who just mentioned you may be switching between the frameworks for a given project.
In positron is just a toggle in the upper right corner away from switching from an R environment and R interpreter to a Python interpreter. So that is already, I think, for those that are operating in multi language projects, a huge win in the positron favor. There are other things that you might have to get used to in positron that maybe we're kind of bolted on to our studio later in the game, so to speak. Such of one example is the command palette. If you use IDEs like visual studio code before or Adam before that, you may be used to the command palette as a way to quickly bring up a well, it looks like a little search box. You You start typing a few keywords and it will auto complete to a particular command based on what you're searching for, such as maybe adding a new git commit or bringing up a new file, rendering a new app, or things like that.
RStudio never came with a command palette until probably about a couple years ago or so. Now there there may be, you know, those that might say it doesn't quite feel as native in RStudio as it does in Positron. But in Positron, you kind of have to get used to it because that's the way to really unlock most of the functionality in an IDE of a Positron is to interrogate that command palette to get to what you need to run or what you need to open and things like that. So there are ways you can bind additional shortcuts to that, but it is something to get used to alongside the way that it handles settings.
Those are a bit different too. Rstudio, there was a way to find them in, like, your in a config file in your home directory or somewhere else. Positron is similar, but it's kind of agnostic to, you might say the language being used inside. There are plugins or extensions based on languages from time to time, but it's basically a JSON file. You can get to it either using the interactive setting toggle in the editor or just editing that JSON file directly. You get the kind of choose your own adventure with that. It would trip people up in the RStudio world sometimes to figure out exactly where that file is stored and trying to edit it outside of, like, the GUI elements of the settings. So that might be advantageous to you if you wanna really customize your experience with Positron but do it through, like, a file based way.
Other things to watch out for or there may be a benefit to you depending on your perspective is that in positron, you're not necessarily gonna have to spin up what are called those r project files. So the file is starting in or ending in dotrproj. Our studio used that extensively for things like setting up a git repository in a project, setting up a package, other other uses like that. In positron, It's just the folder. You can have what's called a workspace, set of settings in that folder, which basically are a way to customize settings per folder if you wish, but you're not gonna need that dot rproj file to tell positron that you're working in an r project when you work with that.
There have been some people that really like that file, and there have been an equal number of people that really don't like that file in the repo. So that might be helpful to you if you've, you know, had some angst on that in the past that Positron is kinda doing away with because, again, they're building upon an open source clone of Visual Studio Code. They didn't they're they're piggybacking off of another effort with a lot of toying on top of it. Whereas our studio was built literally from the ground up to be a first class r based, data science editor. A couple other things I'll mention before I turn it over to Mike here is that the layout will look a bit different.
I've been used to the layout in Positron because it does have a lot of similarities to the Visual Studio Code layout that I've been using a lot of my open source projects, but you will often see in a default layout in Positron, the file pane is on the left, and this file pane has a lot more going for it than I think the one in RStudio does, and I don't think I'm upsetting people when I say that. There's a lot more you can do in the file pane and positron versus what you could do in the RStudio file pane, such as even just expanding folders with, like, the toggles to expand the nesting or whatnot.
Little things like this can add up for a bigger project. Trust trust me on that. But then you also see on the left side your extensions in another tab, your, Git, integrated Git console in the other tab. There are people that like the the visual visual studio code or the positron way of doing Git versus the way our studio does Git. Your results may vary depending on where you fall on that fence, but it's all right there. There isn't anything special you have to do for it. It'll pick up Git right away. But with that, there might be some things that aren't as intuitive in the beginning. You just have to play a bit a bit. But with the extension ecosystem, you can supercharge your Git experience with extensions like Git Graph.
There are other ones called Git Lens that can really do some slick things for your command pallet and Git operations. There's there's a lot going on here. The other thing I'll come in on that I think is something you wanna look into is the data explorer and positron, which is something the visual studio code does not have. This is something new that posit created in this version of positron. You get a much more richer experience for when you're looking at your data frames to sort the columns by multiple columns if you wish. The filters are gonna be much more intuitive to work with because they're gonna be across the top of your of your data frame and instead of above the actual column itself. So that way you can navigate them much more quickly.
And it can handle larger data sets about, you know, causing your IDE they'll wait by 5 or 10 seconds although the the snapshot arose. So there's a lot of engineering behind that I've heard from the previous talk, so it might be worth a look. If you really find yourself using that data viewer in RStudio a lot, I think the positron data explorer is something you where you wanna take a look at. There are some things to be aware of that just aren't gonna feel as native right now, such as the use of add ins that RStudio used to kind of give you that additional functionality in the editor without building it into the editor itself.
I'm hearing there isn't like a direct one to one to use those just yet. Although it makes me wonder because with the our extension to visual studio code that I use for many years, there were, features that were added to extension a couple years ago to leverage add ins in Visual Studio Code. So I know it's possible, but we'll have to see if positron adopts that or posit adopts that for positron in the future. So with that, any additional functions that use that Rstudio API package that was often used on the back end of Rstudio itself to kind of interrogate features of the ID, that's not going to work well either. And Rmarkdown, you can do Rmarkdown in positron but it won't feel quite as native as it will in, the Rstudio ID.
But if you moved on to quartile that may not be an issue for you because quartile, however, has very first class support in positron. So the question might be have I switched? I am not fully switched yet, and I'm not saying that because I'm using RStudio a lot. It's because I'm still using Visual Studio Code a lot because I have so many workflows that have built upon it. However, now that Bruno and company have figured out a way to get positron installed on Nix systems yes. I have positron on my Nix system. So I am trying to use it. I'm trying to adapt my visual studio code workflows into that. Mostly going well.
The thing I miss the most is the dev container stuff. I live off the dev container feature that visual studio code has. Unfortunately, there's no easy way to get that in the positron because that's a Microsoft specific extension. It's not an open source extension. That's another thing to keep in mind. There may be a few extensions in Visual Studio Code that do not work in Positron because it's using the open source extension registry, not the Microsoft specific one. So with those caveats in mind, I think it has tremendous potential. It is still in beta. So your results may vary depending on the project you like or your entire use of it. But the foundation is there to carry out positron, to really carry out its vision, to be in a first class multilingual environment. So I'll be watching it closely, and I'll see where it breaks and share when it does.
[00:26:02] Mike Thomas:
I'm gonna continue to watch it too, Eric. I have not made the jump yet, and I need to. I need to start exploring it a little bit more. I am very locked into Versus Code and and dev containers and I guess I'm gonna blame you, as opposed to myself. I'm gonna deflect here and say that, you were the one that started me out down that journey that has, locked me into that tooling for right now. I'm just kidding. It's hugely hugely helpful. But, the command palette, you know, we should talk about the the fact that positron so closely mimics Visual Studio Code. And I think for a huge proportion of data scientists out there that are more comfortable in probably our studio than in a, you know, more developer type environment like Versus Code, Positron is a perfect bridge between those two things, in my opinion.
I think it starts to bring in some of the best elements from something like a a v s code or, you know, like a full stack developer platform, into a an environment that RStudio native users might be a little bit more familiar with. You know, the command palette being one of them. I I know that there was a command palette that existed and it and exists now in our studio, as you mentioned, for the last couple of years, but, it's not quite as obvious, if you will, as the command palette that exists in in Positron. That's sort of, you know, a little bit more obviously put in front of you and drives a lot of the functionality, of of the IDE itself. You know, another thing that I think is a benefit and one thing that I love about Versus Code as opposed to RStudio to be able to sort of search all of the files in your projects, there is a Find in Files button in RStudio underneath edit and it it's handy. Works very very well. It takes a couple clicks, you know, or if you know the keyboard shortcut, you'd get to it a little bit quicker. But there's a giant magnifying glass in the left hand sidebar in Positron and Versus Code that allows you to immediately do that. I think it's it's much quicker, and these differences are subtle because I I think the functionality still exists in both places, but the UX is just a wee bit better, in in Positron than in RStudio that I I think, you know, that slight difference, makes all the difference to some extent, if you will.
I think a really interesting thing is, you know, the lack of a need for dotrproj files or our projects you know this is again sort of moving folks away from our specific workflows into you know slightly more developer specific workflows and understanding how to interact with working directories without running set working directory if you can help it right so I don't know how this impacts the the here are package that was developed by by our studio and now pause it I think it'll still work fine because it'll look at your working directory as well and I think it'll create relative links to that, but I know that some of the functionality of the here package, actually looked for that dot our project file if I'm not mistaken, and sort of recursively search to be able to find that to figure out, you know where the the working directory was that needs to be set in order in with respect to all the other files that you want to work with so I'm not sure how that plays into this whole positron IDE e in the event that you know you you have a lot of workflows that depend on here and maybe it's a little less stroke straightforward to create a new our project although I do see in a screen shot here under the workspaces our project section it's very faint but if my eyes aren't deceiving me there's actually a button that says new project under our so maybe that takes care of that that for you I know that you know you may not necessarily need to do that, but if you want to do that and you have a lot of, you know, workflows within your organization and your team that leverage our projects and all of the different functionality that that comes along with that and you wanna continue to do so, it looks like the functionality is still there for you. So that's just something to something to watch out for as you move from RStudio to to Positron and sort of decide on which pieces of of functionality in your current workflows you wanna continue to to leverage in which you you may wanna change, to evolve, you know, your team's practices, if it makes sense to do so. But I thought that this was a really nicely comprehensive blog. I I do think that one of the strengths here that we're going to get is the vast, vast ecosystem, of extensions within that open VSX repository or community of extensions, whatever you wanna call it. I know that there's a lot of RStudio add ins but I can pretty much definitively promise you, that the number of extensions in the OpenVSX ecosystem probably is is quite quite larger than that.
One thing that I absolutely am envious of for the Positron users who are already using it and and something that'll probably push me over the edge here to start using it is the data viewer. There is one in Versus Code. It leaves a lot to be desired. Obviously, the put it nicely.
[00:31:37] Eric Nantz:
I know.
[00:31:38] Mike Thomas:
Obviously, the data viewer in RStudio is is is great. You know, it's it's geared towards data scientists and exploratory data analysis. But this is what we have in positron is is the Rstudio data viewer on steroids I would say you have you know column level summaries summary statistics including missing values in the left hand sidebar while you're viewing your data, in the majority right side of the screen as well as the filters that have all been applied that can easily be, you know, added to or removed, through a click of a button kind of along this nav bar at the top. I think it's a fantastic UI.
I think it's really super powered data viewer compared to anything else that I've seen today. I have seen some products like this that are just, just, you know, standalone SaaS platform data viewers, if you will. But to have this in our IDE, in the same place that we're doing our development work is really really exciting. So, you know, excellent job by jumping rivers. I think to summarize, you know, the trade offs here and the benefits that we have from the Positron IDE. And I'd encourage anyone that hasn't had a chance to check it out yet, self included, to to check it out as soon as you possibly can.
[00:33:00] Eric Nantz:
Yeah. I mean, certainly, it's becoming easier to install as I suffer even, you know, major geeks like me and others. Now we can install it on next. I have put it through the paces a little bit. I started theming a little bit. You're right about that extension ecosystem. There is something for everything. And one of my favorite extensions I was using in my live streams way back when, turns out it's available in the open source one, they call it power mode, where when I type, I get these nice little, like, explosion sparks happening next to the words as I type just to give a little flair. And I was like, there's no way that one's on there. Oh, sure enough. It is. So I can replicate some of my live streaming experience in positron that I had for my shiny dev series stuff from a while back.
I will also have links to a couple additional posts, some of which have been covering highlights before. One of those was some Athanasia Mowinkle, and she talked about her experience of Positron. I hate to say it looks like that here package or project stuff isn't working as nicely in Positron as we're hoping for according to hers. So I'll have to see if that gets better over time. And as well as a post by Andrew Huysse who also has used Positron a bit and gives his 2¢ on his favorite extensions and the customizations that he's done to make it, you know, the experience more fit for his workflow. That's a key right here. Right? Is that Positron is already giving you a lot of nice, you know, all out of the box configurations but there's nothing that's locked so to speak. You can tailor that to whatever you see fit and with the power of the vscode you know open source code I should say foundation you can do all sorts of things. You can get your VIM key bindings. You can do all sorts of interesting ways to make that your own experience.
So I will admit at the day job, we can't really use, positron yet because it's not part of the, posit workbench enterprise product just yet. They're obviously gonna posit's not gonna put that in until it's production ready, so I still have to wait a little bit for my work projects. But sorry for my open source stuff, I'm gonna give it a go and report back on what what breaks and, hopefully, what works even better. You know, Mike, we are in the you might say now we're starting to get into the doldrums of winter, but our next highlight has a very interesting title to it because it speaks on a few different levels.
This is talking about one of our keynotes that was given at the recent R pharma 2024 conference by Opposits CTO, Joe Chang himself, on the new tooling that's coming to the our ecosystem with the realm of artificial intelligence and interacting with large language models. The talk was affectionately to titled Summer is Coming AI for R Shiny and Pharma. So we have talked about some of the new tooling already in previous highlights of this show when we spoke very highly about the Elmer package, which is a key focus of this talk, which is giving you in the our ecosystem, very, you know, robust compliant way to call different LLMs both in, you know, third party services like chatgpt or Claude or our others, as well as self hosted versions of it, as well as the accompanying package ShinyChat, which gives you a way to bring that LOM kind of console experience into your Shiny applications and build upon Elmer to do that.
This talk was a tour de force of a few different concepts, but I wanna set a little bit of context here because, first, I have shared on this show. I've been a skeptic as anybody about kind of how AI can be put in directions that shouldn't be put into. It can be almost nauseating scene. Some of the fluff that's put out there on cough cough LinkedIn about some of the weird uses of it. But guess what? I wasn't alone in that skepticism. Joe Chenk himself was very skeptical of this. It took him a while to warm up to this. He had some epiphanies earlier in the year and combined with, you know, getting to know that the AI tooling as as, as you see it, there's more than meets the eye to steal the frames from transformers because you can build on top of these services. And that wasn't something that came obvious to him right away.
But this talk is first introducing again, the aforementioned new tooling in our, the Elmer package and the shiny chat package with the demonstration that was lifted from deposit conf talk he gave where we have what's called the restaurant tipping Shiny app, where instead of the app developer having to build a whole bunch of sliders, select inputs, toggles to try and be proactive, so to speak, on what the user wants to do to explore the restaurant data. There's, like, a chatbot on the sidebar where the user can type in a key question like, you know, what is the average tip rating for males in this year or whatever?
That is a prompt going to an LLM to translate that prompt into a SQL query and update the Shiny app on the spot. That was an eye opener for me when I first saw that. And then when Joe had mentioned that he was going to give this keynote at our pharma, he, you know, had had a quick call with me to ask, you know, what what can we do to make this a little more relatable to the life sciences folks because, yeah, we all love restaurant data. But this audience in particular, we can be pretty skeptical of things. Let's put it that way, and we we often have to be right. It's a very high regulated industry. So I gave him a little seed of what if we take part of the shiny app that we did for this, our consortium submissions working group, where we sent a traditional Shiny application with a few different summaries to our health regulators as a as a way to prove it out that we could we could send a Shiny app for a submission.
There's a portion in that app, I'll have a link to in the in the show notes, where we have a survival type plot of time to event, so to speak. And we had a couple sliders and toggles that were built by the teal package to explore that data going into the plot. I thought, why not have that chatbot in this display? So Joe, to his credit, it only took him about a week to do this, but he spun up another demo for this presentation to take that Kaplan Meier interactive visualization built with ggplot and put a chatbot into the left of it so that we could ask similar questions on different partitions of the data, and the plot would update on the spot.
New sample sizes, new distribution curves, or survival curves. Amazing. This to me, to be, you know, putting my head my spin on this, is a very intriguing feature when we start looking at data reviews and hopefully finding ways to get to insights more quickly, but in a controlled way. When I say controlled way, that's another part that Joe emphasizes here is that the way this is all built is a very intelligent yet diligently structured prompt that's going to the chat server or the LOM when the app is launched so it has a context set correctly. Now correctly may be a a strong word here because nothing's ever absolutely perfect in this realm of LMs, but it's trying to control the possibility to its best extent of the l m giving complete nonsense to the result coming out and telling it to do a SQL type query that is getting gonna be used to filter the data going into that plot, kind of a translation layer on top of it.
The other key concept is that these packages are leveraging another functionality that you might need if the LOM can't do everything on its own. That's a concept of tool calling. Another eureka moment in my mind where maybe in the example you gave in a talk, you ask an LOM what's the what's the current weather in California. It may not be able to do that on its own because it kind of needs to have an interactive way to look up that weather at a given resource. So Joe's example was giving it access to an r function that calls an API for weather data.
Having the function be documented with a parameter like the city name or whatever have you, the LOM calls that function and then takes its result and it gives it back to the user. But it's kinda like that assistant to the LOM to get the job done. That was the overall moment to me is that we don't necessarily have to be limited by just what the l m can do on its own. We can augment it with other services, other ways that if you can code it up in an r function, you might be able to use it as this tool paradigm with what Elmer can do to call to these LOMs for you on your behalf.
And then the last part of the talk was talking about some of the practical considerations, and there are quite a few. I think the the parts that show me that he is still grounded in this. It's okay to say no, folks. If they've given you results that don't make sense, it just may be time to move on to a different solution. But putting in use cases where the answer is not always so black and white, it may be more of a layer to get to a final answer where you have a little more flexibility, but also keeping a human in the loop, which, again, in my industry, you better believe we better keep humans in the loop when we look at look at these results, there's still a lot of productivity gains to be had if you can harness this the right way. But, again, a very realistic talk. Again, he's excited about the tooling, but he is being realistic too. This is not gonna solve all the world's problems. It's not gonna magically put our drugs on the market, like, half the time as it currently takes.
But I think this can greatly help certain aspects of development such as the way we produce either applications or produce our tooling to interpret this data, get us the insights more quickly. There is a really robust q and a after the talk. I had the pleasure of moderating that, but then we'll also have linked another dedicated talk from our APAC track of our farmer where Daniel Beauvais led a q and a with Joe Chang himself who actually called in later that night around midnight his time to join that call just because he was so passionate about connecting with the Asia Pacific colleagues on it, and there's some great q and a in that in that session too. So am I still skeptical?
I won't lie. I'm still kind of skeptical of certain things. But what Joe gave me in this talk was a way to show that, like I said, there's more than meets the eye for how he can leverage these l o m's and the tooling in front of them to craft a solution that I think is more fit for purpose to your particular needs and cut out all the noise you see in the various social media or other, tech spheres.
[00:45:00] Mike Thomas:
I'm I'm very aligned with you, Eric. I think that the way that Joe is approaching these concepts and the way that Pauseit, in general, is building this tooling out, I think is is fantastic, and it and it aligns with sort of what I would hope for. I was I was quite skeptical, and then, Eric, I know we were both at Positconf this past year, and I watched that keynote talk, that hour long talk from Melissa Van Bussel on practical tips for using generative AI in your data science workflows, and it changed things for me a little bit. It was very applied.
It was very geared toward the audience. And, just to be honest, there were a lot of things that she had presented that I didn't know were possible. And I thought that I knew everything that there was to know about about AI and I I thought that the cons outweigh the pros, but that talk in particular sort of brought the the pros up to to maybe as even with the cons, maybe if not if not more, and maybe want to start to look into things a little bit more in this space. Try some things out and tune out, like you said, maybe some of the LinkedIn narrative marketing hype BS that's that's out there right now. I don't seems like AI agents is is all I'm hearing about these days. I don't even know what an AI agent is. I don't really care to know either. Agentic or or whatever. But, yeah, another mind blowing thing. And, you know, hats off to you. Did a fantastic job moderating this talk.
It's well, well, well worth your time, if you're in the R or data science space and and trying to make heads or tails of these l l m's. And I think that the tooling again that Joe and the the team have put together for us is really really cool. I mean you can't watch it and and say that it's not super cool or super interesting. Whether or not you wanna leverage it is totally up to you and your use case. But some of the possibilities that we have here are really really cool. And thinking about these large language models is maybe a step in the workflow, and their ability to call, you know, another process like an an additional API is really really interesting and work with them. I know that recently, the open a I, if it can't find the answer to your question and it's training data, I believe it can like execute a Google search, or a web based search and then sort of fairly quickly execute that search, process the results that are coming back, you know, maybe just looking at the first few links and trying to crawl over them and and leverage those as the context that it's using to try to answer your question, which is is pretty incredible.
I know that that, you know, just as a tangent here, while we're sort of still on the AI topic, the SoRA model was released from OpenAI yesterday, I believe, which was a long time coming and that is supposed to be text to video. So, you know, check that out. I would recommend, even if you're a skeptic and you you you're really really against this stuff, I think it's it's worth watching just to educate yourself and understand what the art of the possible is because the art of the possible is changing every day. And we're trying to do a lot of thinking at Catchbook about how we are going to integrate these into the Shiny apps that we we develop for our clients in a way that that makes the most sense isn't going to just, you know, involve our team going crazy with all of this stuff to the point where we're just, you know, strapped for resources because everybody wants this. We're we're really trying to work hand in hand with our clients to figure out, you know, how and where it makes the most sense to leverage this type of technology.
So the videos like this and the tutorials, that really take a practical approach and hands on demonstration about how to go about injecting this functionality into your your Shiny apps, are invaluable for us. So a big thank you to you, Eric, and and Joe, for all of the time and effort that you've put into trying to do that for those of us on the ground.
[00:49:16] Eric Nantz:
Oh, I felt he he did all the the hard work. I'm just, like, are you are you kidding me? Is that even possible? I mean, it it is it is amazing to see what we can do. And and, honestly, yeah, I I definitely had almost like a closed eye perspective on this. I just got I got, you might say, perturbed too much by all the noise out there before really giving it a fair shake. But, like you said, we were sitting next to each other at Pazacom. That was step 1. And then step 2 was this talk because now it wasn't just a quote, unquote, you know, fun toy example. Now it's like, okay. What can we do in life sciences that will open some eyes? And there are so many other areas we're pursuing too. It's not just, you know, quote, unquote, the interactive data reviewing. There are many other realms of automation that we wanna use to make the mundane more done more quickly and hopefully find advancements in training these models or giving it the prompt to train itself kind of on the fly.
But that that there is another part of that talk that you all should see. Again, being realistic here. There's some areas that surprised them as well at POSIT, such as when they try to use LOMs to help ingest their documentation, their technical documentation, and then putting, like, maybe a bot in front of that. They had very mixed results on that, which kinda surprised them. Right? Because technical documentation, that's literally the source right there. Right? You would think an LOM can ingest that and then immediately when they get a question about it be able to surface that more effectively. And I know others are looking into that as well. I thought about that area for some of my internal documentation because I write a great website on using r and HPC environments.
It'd be great to have a little bot next to it that people can type their question on, and it's gonna use that doc to kinda help point them in the right direction without, always emailing yours truly when something goes crazy. Not that I'm not that I don't like helping people, but, there's a there's a balance. There's a balance there. So I'm I'm intrigued to see where that goes for sure. Me too. Yep. No. Boundaries are important. There are no boundaries, so to speak, when you see just the breadth of what's possible in this ecosystem these days, and I dare say they are with the issue. Not to sound biased here, but I think we gotta build something for everybody here and all the the full gamut of new packages, and there's a a good chunk of them in this issue. I'm I wasn't I wasn't, I wasn't shy about putting all these great new packages in here as well as updated packages.
Some really interesting tutorials, so we'll take a couple of minutes for our additional finds here. And it's December. If you like me like music and you're on social media, you're on. Lyc. These the the Spotify wrapped post that often people are showing about their favorite tracks that they've listened to in 2024. It's always, entertaining thing to look at. Well, Andrew Heiss, we mentioned him earlier, kinda took matters in his own hands because, a, he doesn't listen to Spotify, and frankly, neither do I, but he listens to Apple Music. So he has this great post called Apple Music wrapped with r, and he leveraged a way of exporting out the metadata associated with his listens from Apple Music and Itunes because there's some somewhat interesting XML based files that you can correct from this.
Built some code via tidyverse type packages to process that XML data with a little bit of intelligent, like, time lapse, you know, summarizations. And he was able to derive basically those key metrics that we often see in the Spotify rap and, you know, got some interesting results on what he's listening to and, to the surprise of no one potentially. I don't know Andrew personally, but apparently, Taylor Swift is in his, his top, tracks to listen to, which I think many in the world would have that same thing. This inspired me and not just on first Andrews as usual. Don't know how he finds the time to do all this and plus he he, wrangled some gnarly XML to make it happen.
But I've recently spun up some really intricate self hosted version of my music listening. I took a day during my break for Thanksgiving, and I ended up ripping a whole bunch of my CDs that I bought when I was a teenager, music CDs, onto my beefy little server here in the basement because I thought these CDs aren't gonna last forever. I might as well rip them up and put them on the MP Threes on my or FLAX, I should say, on my server. But then I thought, well, it's great that I have them out of these files. That's no way to listen to it. There's gotta be a better way to have, like, a Spotify like experience.
So I found this program called Navidrome, which basically I can put in a Docker Compose and my Docker container, serve up the MP threes from a directory, finds the album art, finds the metadata for the artists and, you know, the track and whatnot. And I can basically listen to my songs even in the web player, which doesn't look like much to shout about, but it's an API under the hood much like our universe has an API under the hood. And if you heard about a framework called subsonic, if you have a Subsonic compliant player, you can basically tap into that service and put it on your phone, put it on your computer, whatever have you.
So that's great. But then I thought, well, it's not really keeping track of what I'm listening to. Sure enough. It has a plug in for that too, combined with another project called Melaja. Don't don't ask me how do you name these things, but it basically gives me a way to track every time I play a song. But I keep that data in house folks. I ain't going to Spotify. They ain't going to, what's it called last FM or anything. So next year, I'm gonna speak this in existence. And Mike, you're my accountability buddy here. I'm gonna make a version of Spotify wrapped but completely self hosted with my geeky taste in music. So you heard it here first. So, Andrew, thanks a lot. You may have just nurse sniped me into another project. How do you top that? I have no idea.
[00:55:33] Mike Thomas:
I don't know. And if for the folks that have listened long enough or or know us, there's nothing more we love than than music and nerding out. And when you can combine the 2 of those, it's bad news for everybody else. But that's a great that's a great find. I wanna call out a blog from, for any of the cinephiles out there. A blog from Mark h White the second who is a PhD, and it is about, predicting best picture at the 2025 Academy Awards. As of he's updating this weekly with his predictions on the win probability, that he's seeing based upon, the sort of critical reviews that are posted online.
And it looks like, the brutalist is just ahead of Wicked, in the ranking of which, movie is is most likely to win best picture at the 2025 Academy Awards. So really really neat little blog post. Nice little interactive, visualization here at the top of it. And check-in each week on the blog to see who's winning.
[00:56:36] Eric Nantz:
That's awesome. Yeah. I know a lot of people like to do those predictions and it's always, you know, somewhat fun, sometimes scary trying to predict what ends up being a very subjective voting as the Oscars are or Academy Awards are. So, yeah, I'll be interested to see how that shakes out. And, like you said, very interactive, plotly visualization, Plotly, another package. I should give thanks to Carson Seifert every time I see him for this because I use it in all my apps usually for shining, but, but there are other ones too. As we all know, we can't say Plotly without giving good kudos to echarts from our good friend, Jon Coon. So we're fair and balanced here on this podcast.
[00:57:19] Mike Thomas:
Fair and balanced, and I will extend an olive branch as well to, Kelly Baldwin who wrote a fantastic article on her adventures with Advent of Code using data dot table solutions.
[00:57:33] Eric Nantz:
That was awesome. I I'd love to see that see that. And, I even saw this wasn't necessarily our specific on on Blue Sky, a post about somebody using DuckDV to do the app and a code. Yes. DuckDV with SQL queries, and it's special. I don't know how all of you are able to do this. I was actually talking to a few people earlier today about it. I'm someday, I will do Advent of Code, but my goodness, I feel like I am so far behind. It's almost like imposter syndrome just thinking about it. So I love living vicariously through Kelly and offers as they do this.
[00:58:10] Mike Thomas:
I feel you as well. Someday, we'll get there, Eric. Yep. We can be accountability
[00:58:15] Eric Nantz:
buddies on that one too. But what we are accountable for, hopefully, is sharing, you know, what we find so exciting about the our weekly project and this particular issue. But, of course, you can find this and all the other issues at rweekly.org, as well as how you can give back to the project. And the best way to give back is to share that great resource you found. Or maybe you created that great new package and you want the art community to know about it. We are a poll request away to use the GitHub language. You just find that little poll request, the Octocad icon in the upper right corner. You'll be taken directly to the template. You can fill out the poll request right there. We got nice little template text and navigate with sections your resource should go in. But, again, that curator for that week will be able to merge that into the upcoming issue, and we love it when we get your contributions. It's always a a smile to my face whenever I get the curation, and I don't see a 0 for poll request. This is one time I want the poll request. There's several times that you're dreading it. Not that I would know anything about that. Nonetheless, other ways to get in touch is with us specifically.
We have a contact page that you can find in the episode show notes. You can send us a quick note there. Also, you can send us a fun little boost in Podverse or Fountain or Cast O Matic if you're on a modern podcast app. We have details for that in the show notes as well. And we are on these social medias when we're not being drowned out by AI noise that you might see on various spheres. You can find me on Mastodon where I'm at our podcast at podcast index dot social. I am now more recently on bluesky. I am at our podcast dotbsky.social.
That's a little addendum I should make. I have seen people put custom domains on that, and I need to figure out how they do that. I may be tempted to do that in the future. But, nonetheless, that's where you can find me currently. And I'm also on LinkedIn. Just search my name, and you'll find me there, and I promise I won't send out garbage posts about AI on there. But, Mike, where can the listeners find you? You can find me on mastodon@[email protected].
[01:00:26] Mike Thomas:
You can also find me on blue sky atmikedashthomas atbskor.bsky.social, or on LinkedIn if you search Catch Brook Analytics, ketchb r o o k. You can find out what I'm up to.
[01:00:42] Eric Nantz:
Excellent. Excellent. Always great to see what you're up to. And, you know, I I considered a badge of honor that I tune you in on to the death container round. I have no regrets about that in the least, buddy.
[01:00:53] Mike Thomas:
I am so grateful for it.
[01:00:55] Eric Nantz:
Yes. If I can get my day job and do more of that, but let's not let's say that on a positive note. This was a a great episode, I dare say, and, we hope that you enjoy listening wherever you are. Again, we love to hear from you, especially as the year wraps up. It's always great to hear how your year has been in the our community and your journey with our end data science. We always love hearing your stories. That'll put a wrap on episode 189. That means we're 11 away from 200. And one way or another, we'll get there. But we will be back with another episode of our weekly highlights next week.