A major milestone for leveraging LLMs in R just landed with the new ellmer package, along with a terrific showcase of retrieval-augmented generation combining ellmer and DuckDB. Plus an inspiring roundup of the recent Closeread contest winners.
Episode Links
Episode Links
- This week's curator: Sam Parmar - @parmsam@fosstodon.org (Mastodon) & @parmsam_ (X/Twitter)
- Announcing ellmer: A package for interacting with Large Language Models in R
- Rapid RAG Prototyping: Building a Retrieval Augmented Generation Prototype with ellmer and DuckDB
- Winners of the Closeread Prize – Data-Driven Scrollytelling with Quarto
- Entire issue available at rweekly.org/2025-W10
- Coder Radio episode 608 - R with Eric Nantz https://coder.show/608
- nhyris - The minimal framework for transform R shiny application into standalone
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @rpodcast@podcastindex.social (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)
- Mike Thomas: @mike_thomas@fosstodon.org (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter)
- Watermelon Flava - Breath of Fire III - Joshua Morse, posu yan - https://ocremix.org/remix/OCR01411
- Stomp the Summer Sky - Secret of Mana - Ziwtra - https://ocremix.org/remix/OCR00859
[00:00:03]
Eric Nantz:
Hello, friends. We are back with episode 97 of the Our Wicked Highlights podcast. My name is Eric Nance, and thank you for joining us. And, unfortunately, we were off last week, but, at least I am back here this week. This is normally the point where I introduce my awesome co host, Mike Thomas, but, he and his family have been, the victim of a diabolical flu ongoing, which is affecting a lot of people, I would say. So he's off this week, but we wish him all the best, to get better soon. Nonetheless, I'll pilot the ship today if he will. And before we get to the me of the episode, I wanna give a couple, little plugs here.
I had the good fortune of joining the Coder radio program, episode 608. It has been recently revamped by the host, Michael Dominic. He's been really fun to talk to. It's ironically the third time I've been on one of his, podcasting adventures, but this time it was on the granddaddy of them all, so to speak, with the show that he's been involved with for many years. So we had a lot of fun talking about all things are and where it fits in in the world of software development, how it compares to other languages that we often hear about, and he seemed interested in getting his, data science action on down the road. So still use a lot of Python, so we'll have to work on that.
I I kid. I kid. But, no, it was a great episode. We'll have a link to that recording in the episode show notes if you wanna listen into that. Also, speaking of podcast, there will be a newer podcast that with an episode in April that I just finished recording an interview with, and I'll plug that when it gets released. But it was a lot of fun, and it definitely focuses in in how we're using our shiny life sciences, another topic that it's near to dear to my heart, so to speak. So look for an announcement of that, coming up next month. Alright. Well, nonetheless, let's get the show on the road here, and our issue this week is curated by Sam Parmer.
And as always, he had tremendous help from our fellow Rwicky team members and contributors like all of you around the world with your poll requests and suggestions. It is definitely the age of leveraging large language models. Right? And while there can be some, you know, a lot of fluff in the news about this, but I've recently been convinced, as I've mentioned on previous episodes of this very show, about when used in the right way, can really, enhance your productivity in various efforts. And one of the engines that has been, you know, with a lot of rapid development in the last year just had a milestone release.
We are speaking about the Elmer package, author by Posit, and in particular, Hadley Wickham, which had its first CRAN release that just landed last week, version o .1.one. It is now on CRAN. Before, it's been on GitHub, but there has been, as I said, a lot of rapid development, and this release definitely shows the fruits of that said development. So for those that are new to Elmer, just what is this? This is now becoming one of the de facto standards for interacting with large language models in your r session, and you get to choose from quite a few providers on this.
Some of the names you've definitely heard of if you've been working in this space a bit, such as OpenAI, Anthropic, Gemini as well. But this particular release has also support for, you might say, more of the industry or cloud based front ends to these services. I'm speaking about Azure, AWS Bedrock, Databricks, or Snowflake. That support is in this this release on CRAN. So if you're in an organization, and most of the time these organizations are putting some kind of layer in front via these aforementioned cloud services, Elmer should be able to talk to those as well. This is certainly something I'm paying very close attention to because my particular organization is leveraging AWS Bedrock, so I'm eager to try it out with that and just see how smooth it goes.
So once you've got your provider chosen, it has fit for purpose functions, Elmer does, to initialize the what is called the chat object that's gonna govern kind of your interaction with that LOM for that r session. They're named very intuitively, such as chat underscore Gemini, chat underscore OpenAI, etcetera, etcetera. And like many things with web services, you do have to set up an API key to authenticate with these services, but you can do that with the classical dot r environment approach for your project. I've also recently started playing with, with the dot env package with a dot env file. It's all similar. Right? Just an environment variable.
Again, full caveats because I can't say I can't ignore this whenever I tell about API keys. Never ever put this file in your version control repo. Trust me. There'll be dragons if you do. Ask me later. Nonetheless, once you get that, authentication squared away, it's time to actually interact with that that LOM. You've got a few different ways of doing that in Elmer. You can give yourself that kind of more classical chat console experience that you might get when you go to, say, the web UI portal of chat g p t and whatnot via the live underscore console function or having a browser based version of that, a live underscore browser.
Great way to start playing with it, just making sure things are working, and maybe that's good enough for you. I think the real meat of where Elmer comes into play comes in the other ways. You can interact with these models, such as a programmatic way with the chat function itself that you can put in your scripts. You can also bring a pro programmatic way of interacting with a chat by returning the result of that as a string for further processing. Here is the part that has my attention the most, the part that was the eye opener for me when I first started learning about this, and this is called the tool or function calling. Because right now, the LOM, especially those off the shelf and these public services, have been trained on data that may date back a couple sometimes three or four years from now or from this point.
And there may be some things that have happened since those training dates have been concluded that the LM is not gonna be able to answer very well. Combined with any time it needs to get more real time access to information, you're kind of out of luck. That's where tool calling comes in. There are examples on the Elmer repo that take the this idea of getting, say, the current time or the current weather forecast. The idea being is that you help this chat interface that you're interacting with by giving it a fit for purpose function, registering that with the the chat object where you register on a score tool.
And then the bot can basically call your function as needed to help get the information that you're requesting. I for every reason, I never knew anything like this existed. This is not something that Elmer invented. Other, you know, services are offering this. I just never really dived into it until I've learned about Elmer back in the early days last year. There is also for those that are interacting with Shiny, putting Elmer in those capabilities, the idea of having streaming and asynchronous calls via the stream function, that's gonna be perfect if you wanna put this as an embedded chat like window or chat like tab into your Shiny app as they're exploring this.
You kinda see some of this in action with a Shiny assistant that Pause it made along with other, you know, bells and whistles. There are things to happen, you know, coming in because Elmer, I think, is definitely gonna be one of those that gets rapidly iterated on as our users are adopting this in their main workflows. Some of the what's coming up in the pipeline according to the blog post is that there will be better use of caching and parallel requests so that not only are they running things in in optimized performance, but making sure that they're minimizing the cost associated with surfacing these queries, which is translated into sometimes tokens, if you will, for these services. And you might have, you know, a batch of tokens that you've bought.
You don't wanna use those up right away if you can't be efficient. So that's in the works. And then the other part in the works is working on, more, standardized processing via a new package for retrieval augmented generation, basically a way to feed into your l o m, you know, interface, some either domain specific data, maybe some private data that you don't want being given to that service, but it can much like the tool calling, but the LOM tap into this resource in that particular session as a way to augment its information. Again, a really, you know, interesting capability that we're gonna be keeping an eye on.
Elmer has a lot going on. I I'm I'm very impressed with the engineering that Hadley and and the team have been putting into this effort. I hear big things are coming with this as the backbone and some of the newer efforts that Posit is creating. So I'm very eager to learn more about that as that gets revealed. I'd imagine we'll hear more about that deposit comp hashtag just saying. But I am starting to play with this a bit more and more. I think there are some great use cases. Again, not for every use case, but the fact that I can use Elmer to, you know, define the chat interface the way I see fit, I think, is gonna be the start of something really great. We've already touched on some extension packages in the community.
Some are authored by Simon Couch that we talked about a couple weeks ago that are wrapping Elmer functionality and more, you know, specific use cases. I think that's just a tip of the iceberg. I think we're gonna see a rapid growth in this field. And, again, as long as we're using this responsibly, we're always checking the outputs that are coming back to us. I'm slowly but surely starting to come around to this, and I've always been one of the biggest skeptics of AI workflows. This this is definitely turning my attention a bit. So I'm eager to play with the new version of Elmer, and like I said, it's on Kran, so highly recommend you give it a shot if you're curious about how these services work that could help with your art productivity.
Literally just mentioned some of the things in the works in the Elmer ecosystem in regards to the rag approach or, to more specifically, the retrieval augmented generation, a concept I'm learning a bit more about. I've heard, you know, some mixed results on it, but I've always just wondered just how do you actually try this out in lieu of not having a a really, you know, standard approach to it, via, say, the Elmer ecosystem. Our next highlight is showing some intriguing use cases that you could do when you combine Elmer with, wait for it, DuckDV, one of my favorite new database packages, to to supercharge your rag prototyping.
And this blog post for our second highlight comes from Christophe Schurk. Again, pronunciation is hard. Hope I got that close. He is an independent data science consultant and BI consultant, and he introduces this post much like how I just mentioned in the previous highlight that when you have these LMS, it's only leveraging the training data that it was based upon for its model. When you need to give it some additional information, what are the ways of doing that? Certainly, you can try to beef up, if you will, the prompt that you're supplying to the LOM to have more context around it to give it a better framing perhaps, but that may not be the only way. We just mentioned the tool calling, I think, is another really innovative feature to give it some flexibility in how it gets this information.
But let's be real here, folks. A lot of the information that can help these LMS is, I'm gonna maybe be a little reaching here, somewhat trapped into these documents or other static type of information that an organization might have for a specific domain, such as saying the case of life sciences, maybe the documentation associated with a clinical trial, all the nuances of that trial, the operational aspects of it that aren't necessarily captured in the data itself. Maybe for a technical project, the documentation on how to use that product, such as, say, leveraging the quarto, you know, the quarto documentation to publishing system, There's a wealth of documentation online for it. Could you get an L one to tap into that information as a way to augment its responses?
So let's think about what happens when you try to get some current information without using the rag approach. And so what Christophe does in the first example is he leverages, a simple chat interface of OpenAI via Elmer and the GPT four o mini model of literally just asking the chatbot, who is Christophe Schuch, I e, asking about himself. It's revealing in the snippet of response that he shows in the post because the latest update that the l o m admits to having is from October 2021. Now that may be fine, you know, if if Christophe hasn't done a lot since then, but I'd imagine he's been doing some stuff since then. So, you know, it's not a terrible response, but it illustrates the fact that it couldn't tap into what Kristoff has been up to lately.
So how can Kristoff give it a helping hand, given us some more up to date descriptions of himself that could be fed into the ways that it responds to these type of queries. Now we're gonna get into the nuts and bolts of where RAG comes into play. But a fundamental block of RAG is having, you know, translating, say, this text, this information that you're supplying, and translate that into what are called embeddings that give it a more quantitative, you can say maybe say score that can help it determine if that particular text that's been quantified here is similar enough to what the user is asking about.
This is basically turning that text into some high dimensional vectors. This isn't too, you know, dissimilar from the deep learning paradigm that a lot of the LMS are based upon. And then being able to translate that, but also you need to be able to weigh to query this information very quickly if this is not part of that trained model right off the bat. So he experiments with a function that will take, and the model endpoint from, in this case, OpenAI called the text embedding three small model that will simply grab a string of text that the user supplies, feed it into this particular model to generate back this embedding vector of numbers, and they won't look like anything to us. They'll look like numbers that you might get from, like, a a call to our norm or something for a standard normal distribution.
But it's a sequence of of any of decimal type of numbers, but they are quantifying the similarity, that you get in certain pieces of that text. So if you do this for a lot of text, you'll get these different similarity scores of these embeddings that you can then use as part of the searching that the chatbot will do when it's feeding in the response. So you might think, well, I could just store that in a data frame and be done with it. Well, imagine the text gets larger and larger, much more than just a little embedding of who is Christophe Schuch. What if it's a full set of documents?
You need a way to query this really quickly, and that is where DuckDV comes in. Already making a lot of waves in terms of how much high performance DuckDV is offering, users that are leveraging its database functionality. It's not just how it's optimized right off the bat with these queries itself. DuckDB, much like the our ecosystem, has an extension system where people in the community can build upon the foundation of DuckDB in these specialized cases. One of those is this vector database type of extension called VSS to let the let but when feeding in this information in DuckDV to be able to search these similarity scores very efficiently, That is great for these kind of prototyping cases.
But to be honest, DuckDV, I think, is also extending quite nice of production too. So the next part of the post, Christophe walks us through establishing a DuckDV database and then installing the extension itself. You do this via queries. Admittedly, that seemed not intuitive, but I've done this a few times. So now it makes sense. You install the VSS extension, and then you load that in your session. From there, he does he establishes a table that's gonna have two columns in it. One of the raw text, which is the typical text format of a of a SQL column, but then this next one is called embedding, which is a float definition.
And he's already right off the bat giving it a fixed array size, which is coming back from what he looked at in terms of this API call and looking at the result the optimal result that it gets back for this dimensionality that, in this case, the OpenAI model is coming back, this text embedding three small. It returns a vector of length one thousand five thirty six. So that's what he's gonna define for this, length of the array in this database column. So with that in place, now it's time to okay. Let's start with that original question and feed into it a set of documents that will have imagine these are coming from, like, real documents, but he does a simple, set of vectors here of the different types of responses. One of which is a more accurate representation from his words on who he is.
And then once that text is defined, he defines a function to convert that text into the embedding numbers, again, using that OpenAI model for text embeddings. And then because of some limitations in duck duck DB, the r binding to duck DB, he does a custom query that's dynamically generated based on the text and the embedding results and results and feeds that into the table that he defined earlier. But he doesn't stop there. He does two other examples, one of which is a slightly less optimal description of himself. And then the last one is a whole bunch of nonsense about him where he becomes this renowned intergalactic cartographer.
Sounds kind of fun. Right? But that's definitely not him. So it's a good way to have a control, case compared to the two that are more accurate to who he is. So all those are fed into the database table. And then the other key point that he mentions here is that when you have these algorithms are gonna do, you know, very rigorous searching of these fields, it's always optimal for our database to have an index, which can be used to help speed up these queries. You know, index being like a sequence of numbers. Right? But you can flag a particular column in a database to be that index.
And then he notes that if you wanna keep this persistent, this high optimized DuckDB, this index needs to be persistent as well. So he has the code via the SQL queries on how to actually do this. And I don't have the expertise to translate what he's doing under the hood, but the code is in the blog post where he's flagging in, the type of indexing called hierarchical navigable small world. I I apparently uses, an optimized nearest neighbor searching mechanism that, again, will be fit for purpose for these embedding type of columns.
With all that in place, that's a lot of setup. Right? Now you can actually start to use this. So he tests this out by, again, the original question, who is Christophe Schruuck, And then he makes a function that's gonna quickly translate that into the embedding. And then in this the query deductDB, figure out where the similarity of this embedding, string, the embedded version of the string, how close it is to what's already in the database via a set of parameters that he defines here, such as the minimum amount of similarity and the number of results or documents to retrieve that will help in that answer.
He does one here, but you could have more than one if you have a a more volume of set of input and you wanna have a way to query this more, you know, with more rigorous information available. So of all that, it's got a function in place, and then he shows just how that can be used in the chatbot that was constructed by Elmer. But with this augmented version of the string that kind of embeds in, you might say, via a prompt like technique, the info and the relevant info that he gets from this prework he does of querying the database. So it's a way to kind of do the RAG approach, have it in your local session, grab the information from DuckDV, and then augment your existing question with that additional input.
And then at the end of this post, he shows exactly how that's illustrated, and it is definitely a more up to date description on who he is as compared to what we had in the original part of the blog post. Really, really fascinating. I've always wondered just what was the what are the techniques that rag involves? So because I've only heard it spoken about at a high level. Never really dived into it. This really gets my ideas flowing here that as long as I can get this information in a text format, whoever it comes from structure text or whatnot, and then do some work with DuckDB to kind of set up this, this database with this table optimized for these for this information, I could see prototyping this with some, like I said, maybe smaller documents that give more context to the type of question I'm gonna answer in hopes that it can leverage this information effectively.
Wow. I've got a lot of ideas for this. So I'm really thankful for Christophe's post here. And, again, this is all available off the shelf with the recent version of Elmer, with the DuckDVR package. You can get all this set up in your r session without even having to leave it. Really, really fascinating. I am definitely gonna play with this a bit. I have a project that I just started at the day job where I'm leveraging some techniques of Elmer. And I thought about how do I supplement my prompt with more, you know, fit for purpose information that's more current, this seems like, an approach to go with. So excited to see where this goes. And, yeah. Thank you, Kristoff, for writing this great highlight, and, I'll be, checking with you again in the future to see how much progress you make here.
Wrapping up our highlights today, we got a a fun one to to summarize here because I was speaking very enthusiastically about a newer development I learned from the recent Pasa conference back last year with respect to giving, a way to have a more dynamic way of of expressing a data driven story or some data insights in a way that beats the typical static type of document or even some of the more standard HTML layouts. And that, of course, is using this new close read quarto extension. For those aren't familiar, close read is an extension to give you that really nice and polished way to scroll HTML and have like dynamic visuals update as a user scrolls.
You can zoom in and out on things. You can really style it very effectively. And I've seen this technique used in practice by say five thirty eight or the New York Times, you know, data type of blogs that they'll put out there. They look fascinating. They take a lot of engineering to build, but I think closer he's gonna give all of us, you know, data scientists that like to use Quartle, a very minimal friction approach to creating these these resources for everybody. And one of the ways that kind of spur the community on to, you know, test their might, if he will, of using close read is posit ran the close read contest for users to give their submissions on how they're using close read to talk about an interesting topic, maybe serve as some interesting visualizations and leverage whatever additional tools that could be used along with close read to make that a real engaging experience.
And so this comes to us from the posit blog once again, and this is the results of the close read contest. There was an esteemed group of judges that included the close read authors, Andrew Bray and James Goldie, along with Curtis Kephart, you know, data scientists and community manager at Posit, along with Joshua Bird, Suzanne Lyons, and and they've been contributing from their experience as a data journalist and and scientist journalist. And so with that in place, let's talk about the winners here because there are some really fascinating things to look at. The grand prize winner was the Euro twenty twenty four final scrolly towing analysis.
Oh, if Mike was here, I'm sure he'd be geeked out about this. I know he's a huge football fan, the worldwide football I should say, and this this, this close read application as a as a sports nut maybe for hockey more than others, I could definitely see where I could use this. So this was talking specifically about, the the result of the tournament, the the game between Spain and and England in the final. And as you scroll through this after some setup at the top, you get a nice visual of the soccer field and the different positions and the numbers of the players involved. And when you hover over that, you can see their names as you go along. So you see the lineup as the game started.
And then as you scroll, it it it surfaces on, say, the formations and narrowing through the different players at each part of the formation. So this is the defense, the forwards, the middle forwards, etcetera. And it does this for each team. Again, as you scroll the mouse, it's changing the visual on the fly, changing the description of throughout, and then it gets really interesting when he talks about the game itself. So just what happened, say, midway through the game and and a goal scored, and it's actually zooming in on the plot at the parts where the the ball was kicked and then, hence, talking about the how the play started, who are the defenders around it. Oh my goodness. I mean, just scrolling it right now as I speak. This is just fascinating. Even highlighting the regions near the net where it was going where the pass was targeted and how the the the receiving player kicked it into the net.
Oh goodness. And then there's even more insights where it augments both the field display with, some line charts of the minute and and the, total expected goal percentage as it changes throughout the duration of the game. We see this a lot, like ESPN and other, sports sites as they look at the trends in the game and, like the probabilities of winning and things like that. Oh goodness. This is just magical to me. I'm I'm I'm blown away by it if you couldn't tell. So that that is a well deserved grand prize winner, I must say. And that was that was authored by Oscar Bartolome Pato. I hope I said his name right.
Fantastic. Oh, wow. What a deserving entry, and it uses g g flat two and g g I r f. Just imagine the things you can do with those packages. Just just amazing stuff. Of course, there are a lot more entries and some additional prizes that were awarded. Some of these include a really, really thoughtful demonstration of an an engaging story around the housing and neighborhood income inequality in Vienna authored by Matthias Schuttner. Again you can look at the show notes that play with these and in the linked resource. We've got, you know, a great example of an engaging story on that front.
Great technical achievements such as which way do you ski, which is really impressive visuals of, like, mapping and and other dynamic tables that are being surfaced as the user scrolls. Like, oh goodness. There's just there's just so much happening in this space. Again, the power of interactive elements, folks. This is why it's so hard for me to go to static formats when I document my results of an analysis. There's just so much you can do with with this ecosystem. It just again, just fascinating to to see. I definitely don't have time to go through all the additional winners, but I definitely see things that you could be used in the educational space with mathematics, some great honorable mentions as well.
I as I said, maybe it was a few weeks ago, I could see trying to embed some shiny components into these and seeing what happens, especially with web assembly. So I'm that that itch to try that just got a lot bigger, to say the least with looking at these winners. So I'm I'm blown away. Absolutely blown away. Let's put this in perspective, folks. Close read has only been in existence for maybe half a year at the most or somewhere around there. And look at what people are doing here already with it. Like that is it goes to show you and there's admittedly there's some parallels in my opinion to shiny itself.
Little deposit know at the time when shiny was put out there just how embraced how it would be embraced so quickly by the art community. And lo and behold, we start to see these really novel uses of shiny that would blow everybody away. Close read seems to be following a similar trend here. I definitely have ideas with this, and now you've got at your available to you now over 40 of these submissions, if my my math is correct here, that you can look at all of these on deposit community, blog as well as the GitHub discussions that talk about, you know, extensions or use cases of close read.
This is terrific. Absolutely terrific. I invite you to check out these submissions. No way does an audio podcast do this justice, but you can see them all, like I said, on deposit forum and and see what inspires you. As a sports fan, as a a fan of engaging storytelling, yeah, I've got I've got me some ideas. We'll see what happens. And there's a lot more that you can see happen in the art community when you look at the rest of the issue of our weekly. As always, our curators, in this case, Sam did a tremendous job with this issue. And since we are off last week, my additional finds are actually gonna go back to the last issue that we would have talked about if we hadn't been off last week, an issue that Colin Faye curated and one of the highlights there that is an area that I played with and I've had admittedly very mixed results with, but looks like someone's trying it again.
We are talking about the way to distribute a shiny app in a more self contained manner. And in particular, what I tried a few years ago when I had a project where I tried to share a shiny app that could be installed on somebody's laptop or computer in a training session, I had tried wrapping a Shiny app into the Electron framework. And if you're not familiar with Electron, you can think of that as a way to wrap what amounts to an embeddable Chrome browser with JavaScript and and a web based application. So if you use things like Slack, Discord, countless others that they may look like they're native, but they're not really. They're cross platform because Electron is compiling them into that operating system's preferred, you know, format, such as, I guess, cutables for Windows, DMGs for Linux, or may I should say Mac OS 10, and then other formats for Linux. So it it it it runs the gamut.
Nonetheless, it looks like there is some attempts to make this a bit more robust for the shiny side of things once again. In particular, Jinhwan Kim has a blog post talking about a minimal framework called Nyriss, which is an an ag an anagram of shiny to transform a shiny application into a standalone application. There is a lot to this. He does link to a video that demonstrates this in action, but it is indeed based on Electron. He's tried to make it easy so you can kind of bootstrap all this once you clone the repository that that he has set up for you.
And doing some shell scripting that is tapping into Node. Js, which you have to have installed since it is all JavaScript under the hood. And another shell script called project that will help you include the R installation that you're using on your system, the R packages, the electron bits, the Node. Js packages. And then out of that you get a Shiny application that you can execute as an executable. I don't believe there's support for this for all the operating systems from what I can read yet. So it looks like there is more to come in this. I'm I'm impressed with how far he's gotten.
I will say though, I think just enough bad things can happen when you've got something that's not so trivial. I feel like the wave of the future is still the web assembly route. Maybe I'm wrong, but I think the web assembly route has a lot of momentum behind it. I think we're gonna get to a point where we're much closer to that vision of I compile a Shiny app and web assembly. I give one file to somebody. They just execute it in their web browser, and they're done because everybody's got a web browser. Right? So Shiny is, of course, browser based, you know, type of app. Long as you can execute it there, you're good to go. It's not quite there yet because you still have to run some kind of web server process, whether it's an r or Python or something else to host that. That's still a part I'm still struggling with, but we're getting much closer folks. We're getting much closer to support for WebAssembly is just getting better and better every week.
Electron, if this can be, you know, used in many different types of applications, I could see this as a good alternative. I've just been jaded by past experience. So maybe Jin Hwan has found a nugget to crack here. I will keep watching this space. I'm still still a bit more biased towards web assembly. Nonetheless, if you are gonna play with the electron and you were struggling with some of the previous presentations you saw, let's say, a previous shiny conference or a previous Rstudio conference, or I've seen this mentioned once or twice before, you may give this a play. See what happens, and nonetheless, try a Vatoy project and and hope for the best.
Perhaps this could be wrapped with a Knicks bootstrapping. Who knows? I don't know. I'm just spitballing here. Okay. We're gonna wrap up this episode of r w hives. But before I go, I wanna make sure I tell you all how to get in touch with us. We are a community project through and through. There is no corporate sponsor to r weekly. We don't get donations coming. It is all through the hours of us on the our weekly team and hopefully from you and the community help us out with curating these issues. One of the best ways to help is to send your suggestions for our upcoming issue, a great blog post, a great new package, a great tutorial, maybe an upcoming event. We wanna hear about it. You can send that to us by going to rweekly.0rg.
There is a link in the upper right corner, a little banner for you to fill out a poll request that will, you know, have a template for you to fill out quickly. And once you submit that, our curator for the upcoming issue will indeed be able to merge that into the upcoming issue. All marked down all the time, folks. That's scrolling, telling, close read extension, all marked down. Combined with our code, you can do it. And, also, we love to hear from you as well directly. We have a feedback, you know, portal that you can link link to in the episode show notes. Go ahead and click on that and send us a little message along the way. And if you're on a modern podcast app, like Podverse, Fountain, Cast O Matic, CurioCaster, You can send us a fun little boost along the way.
I would say in particular, fountain makes this the easiest. So if you want it, you're interested in playing with that, I would get in touch with fountain. Get that on your system, and it'll have we'll walk you through right through all the steps you need to get a wallet established and send us a fun little boost. Lots of fun to be had there. Trust me on that. And, also, you can get in touch with me on social media these days. I am on blue sky with at rpodcast@bsky.social. Also on Mastodon, where I am at rpodcast@podcastindexonsocial, as well as LinkedIn. Just search my name and you'll find me there.
Alright. We're gonna put a ball on episode a 97. Again, I may have mentioned this a couple weeks ago. If you have a favorite memory, a favorite segment, or just, you know, what's your favorite part of our weekly, we'd love to hear about it, and we'd be glad to read it on the episode 200. Share your feedback. I have no idea what else I'll do for that episode. One way or another, we'll make it fun. Alright. I'm gonna close-up shop here for episode 97. Thank you so much for listening. We'll see you back for episode 98 of our weekly highlights next week.
Hello, friends. We are back with episode 97 of the Our Wicked Highlights podcast. My name is Eric Nance, and thank you for joining us. And, unfortunately, we were off last week, but, at least I am back here this week. This is normally the point where I introduce my awesome co host, Mike Thomas, but, he and his family have been, the victim of a diabolical flu ongoing, which is affecting a lot of people, I would say. So he's off this week, but we wish him all the best, to get better soon. Nonetheless, I'll pilot the ship today if he will. And before we get to the me of the episode, I wanna give a couple, little plugs here.
I had the good fortune of joining the Coder radio program, episode 608. It has been recently revamped by the host, Michael Dominic. He's been really fun to talk to. It's ironically the third time I've been on one of his, podcasting adventures, but this time it was on the granddaddy of them all, so to speak, with the show that he's been involved with for many years. So we had a lot of fun talking about all things are and where it fits in in the world of software development, how it compares to other languages that we often hear about, and he seemed interested in getting his, data science action on down the road. So still use a lot of Python, so we'll have to work on that.
I I kid. I kid. But, no, it was a great episode. We'll have a link to that recording in the episode show notes if you wanna listen into that. Also, speaking of podcast, there will be a newer podcast that with an episode in April that I just finished recording an interview with, and I'll plug that when it gets released. But it was a lot of fun, and it definitely focuses in in how we're using our shiny life sciences, another topic that it's near to dear to my heart, so to speak. So look for an announcement of that, coming up next month. Alright. Well, nonetheless, let's get the show on the road here, and our issue this week is curated by Sam Parmer.
And as always, he had tremendous help from our fellow Rwicky team members and contributors like all of you around the world with your poll requests and suggestions. It is definitely the age of leveraging large language models. Right? And while there can be some, you know, a lot of fluff in the news about this, but I've recently been convinced, as I've mentioned on previous episodes of this very show, about when used in the right way, can really, enhance your productivity in various efforts. And one of the engines that has been, you know, with a lot of rapid development in the last year just had a milestone release.
We are speaking about the Elmer package, author by Posit, and in particular, Hadley Wickham, which had its first CRAN release that just landed last week, version o .1.one. It is now on CRAN. Before, it's been on GitHub, but there has been, as I said, a lot of rapid development, and this release definitely shows the fruits of that said development. So for those that are new to Elmer, just what is this? This is now becoming one of the de facto standards for interacting with large language models in your r session, and you get to choose from quite a few providers on this.
Some of the names you've definitely heard of if you've been working in this space a bit, such as OpenAI, Anthropic, Gemini as well. But this particular release has also support for, you might say, more of the industry or cloud based front ends to these services. I'm speaking about Azure, AWS Bedrock, Databricks, or Snowflake. That support is in this this release on CRAN. So if you're in an organization, and most of the time these organizations are putting some kind of layer in front via these aforementioned cloud services, Elmer should be able to talk to those as well. This is certainly something I'm paying very close attention to because my particular organization is leveraging AWS Bedrock, so I'm eager to try it out with that and just see how smooth it goes.
So once you've got your provider chosen, it has fit for purpose functions, Elmer does, to initialize the what is called the chat object that's gonna govern kind of your interaction with that LOM for that r session. They're named very intuitively, such as chat underscore Gemini, chat underscore OpenAI, etcetera, etcetera. And like many things with web services, you do have to set up an API key to authenticate with these services, but you can do that with the classical dot r environment approach for your project. I've also recently started playing with, with the dot env package with a dot env file. It's all similar. Right? Just an environment variable.
Again, full caveats because I can't say I can't ignore this whenever I tell about API keys. Never ever put this file in your version control repo. Trust me. There'll be dragons if you do. Ask me later. Nonetheless, once you get that, authentication squared away, it's time to actually interact with that that LOM. You've got a few different ways of doing that in Elmer. You can give yourself that kind of more classical chat console experience that you might get when you go to, say, the web UI portal of chat g p t and whatnot via the live underscore console function or having a browser based version of that, a live underscore browser.
Great way to start playing with it, just making sure things are working, and maybe that's good enough for you. I think the real meat of where Elmer comes into play comes in the other ways. You can interact with these models, such as a programmatic way with the chat function itself that you can put in your scripts. You can also bring a pro programmatic way of interacting with a chat by returning the result of that as a string for further processing. Here is the part that has my attention the most, the part that was the eye opener for me when I first started learning about this, and this is called the tool or function calling. Because right now, the LOM, especially those off the shelf and these public services, have been trained on data that may date back a couple sometimes three or four years from now or from this point.
And there may be some things that have happened since those training dates have been concluded that the LM is not gonna be able to answer very well. Combined with any time it needs to get more real time access to information, you're kind of out of luck. That's where tool calling comes in. There are examples on the Elmer repo that take the this idea of getting, say, the current time or the current weather forecast. The idea being is that you help this chat interface that you're interacting with by giving it a fit for purpose function, registering that with the the chat object where you register on a score tool.
And then the bot can basically call your function as needed to help get the information that you're requesting. I for every reason, I never knew anything like this existed. This is not something that Elmer invented. Other, you know, services are offering this. I just never really dived into it until I've learned about Elmer back in the early days last year. There is also for those that are interacting with Shiny, putting Elmer in those capabilities, the idea of having streaming and asynchronous calls via the stream function, that's gonna be perfect if you wanna put this as an embedded chat like window or chat like tab into your Shiny app as they're exploring this.
You kinda see some of this in action with a Shiny assistant that Pause it made along with other, you know, bells and whistles. There are things to happen, you know, coming in because Elmer, I think, is definitely gonna be one of those that gets rapidly iterated on as our users are adopting this in their main workflows. Some of the what's coming up in the pipeline according to the blog post is that there will be better use of caching and parallel requests so that not only are they running things in in optimized performance, but making sure that they're minimizing the cost associated with surfacing these queries, which is translated into sometimes tokens, if you will, for these services. And you might have, you know, a batch of tokens that you've bought.
You don't wanna use those up right away if you can't be efficient. So that's in the works. And then the other part in the works is working on, more, standardized processing via a new package for retrieval augmented generation, basically a way to feed into your l o m, you know, interface, some either domain specific data, maybe some private data that you don't want being given to that service, but it can much like the tool calling, but the LOM tap into this resource in that particular session as a way to augment its information. Again, a really, you know, interesting capability that we're gonna be keeping an eye on.
Elmer has a lot going on. I I'm I'm very impressed with the engineering that Hadley and and the team have been putting into this effort. I hear big things are coming with this as the backbone and some of the newer efforts that Posit is creating. So I'm very eager to learn more about that as that gets revealed. I'd imagine we'll hear more about that deposit comp hashtag just saying. But I am starting to play with this a bit more and more. I think there are some great use cases. Again, not for every use case, but the fact that I can use Elmer to, you know, define the chat interface the way I see fit, I think, is gonna be the start of something really great. We've already touched on some extension packages in the community.
Some are authored by Simon Couch that we talked about a couple weeks ago that are wrapping Elmer functionality and more, you know, specific use cases. I think that's just a tip of the iceberg. I think we're gonna see a rapid growth in this field. And, again, as long as we're using this responsibly, we're always checking the outputs that are coming back to us. I'm slowly but surely starting to come around to this, and I've always been one of the biggest skeptics of AI workflows. This this is definitely turning my attention a bit. So I'm eager to play with the new version of Elmer, and like I said, it's on Kran, so highly recommend you give it a shot if you're curious about how these services work that could help with your art productivity.
Literally just mentioned some of the things in the works in the Elmer ecosystem in regards to the rag approach or, to more specifically, the retrieval augmented generation, a concept I'm learning a bit more about. I've heard, you know, some mixed results on it, but I've always just wondered just how do you actually try this out in lieu of not having a a really, you know, standard approach to it, via, say, the Elmer ecosystem. Our next highlight is showing some intriguing use cases that you could do when you combine Elmer with, wait for it, DuckDV, one of my favorite new database packages, to to supercharge your rag prototyping.
And this blog post for our second highlight comes from Christophe Schurk. Again, pronunciation is hard. Hope I got that close. He is an independent data science consultant and BI consultant, and he introduces this post much like how I just mentioned in the previous highlight that when you have these LMS, it's only leveraging the training data that it was based upon for its model. When you need to give it some additional information, what are the ways of doing that? Certainly, you can try to beef up, if you will, the prompt that you're supplying to the LOM to have more context around it to give it a better framing perhaps, but that may not be the only way. We just mentioned the tool calling, I think, is another really innovative feature to give it some flexibility in how it gets this information.
But let's be real here, folks. A lot of the information that can help these LMS is, I'm gonna maybe be a little reaching here, somewhat trapped into these documents or other static type of information that an organization might have for a specific domain, such as saying the case of life sciences, maybe the documentation associated with a clinical trial, all the nuances of that trial, the operational aspects of it that aren't necessarily captured in the data itself. Maybe for a technical project, the documentation on how to use that product, such as, say, leveraging the quarto, you know, the quarto documentation to publishing system, There's a wealth of documentation online for it. Could you get an L one to tap into that information as a way to augment its responses?
So let's think about what happens when you try to get some current information without using the rag approach. And so what Christophe does in the first example is he leverages, a simple chat interface of OpenAI via Elmer and the GPT four o mini model of literally just asking the chatbot, who is Christophe Schuch, I e, asking about himself. It's revealing in the snippet of response that he shows in the post because the latest update that the l o m admits to having is from October 2021. Now that may be fine, you know, if if Christophe hasn't done a lot since then, but I'd imagine he's been doing some stuff since then. So, you know, it's not a terrible response, but it illustrates the fact that it couldn't tap into what Kristoff has been up to lately.
So how can Kristoff give it a helping hand, given us some more up to date descriptions of himself that could be fed into the ways that it responds to these type of queries. Now we're gonna get into the nuts and bolts of where RAG comes into play. But a fundamental block of RAG is having, you know, translating, say, this text, this information that you're supplying, and translate that into what are called embeddings that give it a more quantitative, you can say maybe say score that can help it determine if that particular text that's been quantified here is similar enough to what the user is asking about.
This is basically turning that text into some high dimensional vectors. This isn't too, you know, dissimilar from the deep learning paradigm that a lot of the LMS are based upon. And then being able to translate that, but also you need to be able to weigh to query this information very quickly if this is not part of that trained model right off the bat. So he experiments with a function that will take, and the model endpoint from, in this case, OpenAI called the text embedding three small model that will simply grab a string of text that the user supplies, feed it into this particular model to generate back this embedding vector of numbers, and they won't look like anything to us. They'll look like numbers that you might get from, like, a a call to our norm or something for a standard normal distribution.
But it's a sequence of of any of decimal type of numbers, but they are quantifying the similarity, that you get in certain pieces of that text. So if you do this for a lot of text, you'll get these different similarity scores of these embeddings that you can then use as part of the searching that the chatbot will do when it's feeding in the response. So you might think, well, I could just store that in a data frame and be done with it. Well, imagine the text gets larger and larger, much more than just a little embedding of who is Christophe Schuch. What if it's a full set of documents?
You need a way to query this really quickly, and that is where DuckDV comes in. Already making a lot of waves in terms of how much high performance DuckDV is offering, users that are leveraging its database functionality. It's not just how it's optimized right off the bat with these queries itself. DuckDB, much like the our ecosystem, has an extension system where people in the community can build upon the foundation of DuckDB in these specialized cases. One of those is this vector database type of extension called VSS to let the let but when feeding in this information in DuckDV to be able to search these similarity scores very efficiently, That is great for these kind of prototyping cases.
But to be honest, DuckDV, I think, is also extending quite nice of production too. So the next part of the post, Christophe walks us through establishing a DuckDV database and then installing the extension itself. You do this via queries. Admittedly, that seemed not intuitive, but I've done this a few times. So now it makes sense. You install the VSS extension, and then you load that in your session. From there, he does he establishes a table that's gonna have two columns in it. One of the raw text, which is the typical text format of a of a SQL column, but then this next one is called embedding, which is a float definition.
And he's already right off the bat giving it a fixed array size, which is coming back from what he looked at in terms of this API call and looking at the result the optimal result that it gets back for this dimensionality that, in this case, the OpenAI model is coming back, this text embedding three small. It returns a vector of length one thousand five thirty six. So that's what he's gonna define for this, length of the array in this database column. So with that in place, now it's time to okay. Let's start with that original question and feed into it a set of documents that will have imagine these are coming from, like, real documents, but he does a simple, set of vectors here of the different types of responses. One of which is a more accurate representation from his words on who he is.
And then once that text is defined, he defines a function to convert that text into the embedding numbers, again, using that OpenAI model for text embeddings. And then because of some limitations in duck duck DB, the r binding to duck DB, he does a custom query that's dynamically generated based on the text and the embedding results and results and feeds that into the table that he defined earlier. But he doesn't stop there. He does two other examples, one of which is a slightly less optimal description of himself. And then the last one is a whole bunch of nonsense about him where he becomes this renowned intergalactic cartographer.
Sounds kind of fun. Right? But that's definitely not him. So it's a good way to have a control, case compared to the two that are more accurate to who he is. So all those are fed into the database table. And then the other key point that he mentions here is that when you have these algorithms are gonna do, you know, very rigorous searching of these fields, it's always optimal for our database to have an index, which can be used to help speed up these queries. You know, index being like a sequence of numbers. Right? But you can flag a particular column in a database to be that index.
And then he notes that if you wanna keep this persistent, this high optimized DuckDB, this index needs to be persistent as well. So he has the code via the SQL queries on how to actually do this. And I don't have the expertise to translate what he's doing under the hood, but the code is in the blog post where he's flagging in, the type of indexing called hierarchical navigable small world. I I apparently uses, an optimized nearest neighbor searching mechanism that, again, will be fit for purpose for these embedding type of columns.
With all that in place, that's a lot of setup. Right? Now you can actually start to use this. So he tests this out by, again, the original question, who is Christophe Schruuck, And then he makes a function that's gonna quickly translate that into the embedding. And then in this the query deductDB, figure out where the similarity of this embedding, string, the embedded version of the string, how close it is to what's already in the database via a set of parameters that he defines here, such as the minimum amount of similarity and the number of results or documents to retrieve that will help in that answer.
He does one here, but you could have more than one if you have a a more volume of set of input and you wanna have a way to query this more, you know, with more rigorous information available. So of all that, it's got a function in place, and then he shows just how that can be used in the chatbot that was constructed by Elmer. But with this augmented version of the string that kind of embeds in, you might say, via a prompt like technique, the info and the relevant info that he gets from this prework he does of querying the database. So it's a way to kind of do the RAG approach, have it in your local session, grab the information from DuckDV, and then augment your existing question with that additional input.
And then at the end of this post, he shows exactly how that's illustrated, and it is definitely a more up to date description on who he is as compared to what we had in the original part of the blog post. Really, really fascinating. I've always wondered just what was the what are the techniques that rag involves? So because I've only heard it spoken about at a high level. Never really dived into it. This really gets my ideas flowing here that as long as I can get this information in a text format, whoever it comes from structure text or whatnot, and then do some work with DuckDB to kind of set up this, this database with this table optimized for these for this information, I could see prototyping this with some, like I said, maybe smaller documents that give more context to the type of question I'm gonna answer in hopes that it can leverage this information effectively.
Wow. I've got a lot of ideas for this. So I'm really thankful for Christophe's post here. And, again, this is all available off the shelf with the recent version of Elmer, with the DuckDVR package. You can get all this set up in your r session without even having to leave it. Really, really fascinating. I am definitely gonna play with this a bit. I have a project that I just started at the day job where I'm leveraging some techniques of Elmer. And I thought about how do I supplement my prompt with more, you know, fit for purpose information that's more current, this seems like, an approach to go with. So excited to see where this goes. And, yeah. Thank you, Kristoff, for writing this great highlight, and, I'll be, checking with you again in the future to see how much progress you make here.
Wrapping up our highlights today, we got a a fun one to to summarize here because I was speaking very enthusiastically about a newer development I learned from the recent Pasa conference back last year with respect to giving, a way to have a more dynamic way of of expressing a data driven story or some data insights in a way that beats the typical static type of document or even some of the more standard HTML layouts. And that, of course, is using this new close read quarto extension. For those aren't familiar, close read is an extension to give you that really nice and polished way to scroll HTML and have like dynamic visuals update as a user scrolls.
You can zoom in and out on things. You can really style it very effectively. And I've seen this technique used in practice by say five thirty eight or the New York Times, you know, data type of blogs that they'll put out there. They look fascinating. They take a lot of engineering to build, but I think closer he's gonna give all of us, you know, data scientists that like to use Quartle, a very minimal friction approach to creating these these resources for everybody. And one of the ways that kind of spur the community on to, you know, test their might, if he will, of using close read is posit ran the close read contest for users to give their submissions on how they're using close read to talk about an interesting topic, maybe serve as some interesting visualizations and leverage whatever additional tools that could be used along with close read to make that a real engaging experience.
And so this comes to us from the posit blog once again, and this is the results of the close read contest. There was an esteemed group of judges that included the close read authors, Andrew Bray and James Goldie, along with Curtis Kephart, you know, data scientists and community manager at Posit, along with Joshua Bird, Suzanne Lyons, and and they've been contributing from their experience as a data journalist and and scientist journalist. And so with that in place, let's talk about the winners here because there are some really fascinating things to look at. The grand prize winner was the Euro twenty twenty four final scrolly towing analysis.
Oh, if Mike was here, I'm sure he'd be geeked out about this. I know he's a huge football fan, the worldwide football I should say, and this this, this close read application as a as a sports nut maybe for hockey more than others, I could definitely see where I could use this. So this was talking specifically about, the the result of the tournament, the the game between Spain and and England in the final. And as you scroll through this after some setup at the top, you get a nice visual of the soccer field and the different positions and the numbers of the players involved. And when you hover over that, you can see their names as you go along. So you see the lineup as the game started.
And then as you scroll, it it it surfaces on, say, the formations and narrowing through the different players at each part of the formation. So this is the defense, the forwards, the middle forwards, etcetera. And it does this for each team. Again, as you scroll the mouse, it's changing the visual on the fly, changing the description of throughout, and then it gets really interesting when he talks about the game itself. So just what happened, say, midway through the game and and a goal scored, and it's actually zooming in on the plot at the parts where the the ball was kicked and then, hence, talking about the how the play started, who are the defenders around it. Oh my goodness. I mean, just scrolling it right now as I speak. This is just fascinating. Even highlighting the regions near the net where it was going where the pass was targeted and how the the the receiving player kicked it into the net.
Oh goodness. And then there's even more insights where it augments both the field display with, some line charts of the minute and and the, total expected goal percentage as it changes throughout the duration of the game. We see this a lot, like ESPN and other, sports sites as they look at the trends in the game and, like the probabilities of winning and things like that. Oh goodness. This is just magical to me. I'm I'm I'm blown away by it if you couldn't tell. So that that is a well deserved grand prize winner, I must say. And that was that was authored by Oscar Bartolome Pato. I hope I said his name right.
Fantastic. Oh, wow. What a deserving entry, and it uses g g flat two and g g I r f. Just imagine the things you can do with those packages. Just just amazing stuff. Of course, there are a lot more entries and some additional prizes that were awarded. Some of these include a really, really thoughtful demonstration of an an engaging story around the housing and neighborhood income inequality in Vienna authored by Matthias Schuttner. Again you can look at the show notes that play with these and in the linked resource. We've got, you know, a great example of an engaging story on that front.
Great technical achievements such as which way do you ski, which is really impressive visuals of, like, mapping and and other dynamic tables that are being surfaced as the user scrolls. Like, oh goodness. There's just there's just so much happening in this space. Again, the power of interactive elements, folks. This is why it's so hard for me to go to static formats when I document my results of an analysis. There's just so much you can do with with this ecosystem. It just again, just fascinating to to see. I definitely don't have time to go through all the additional winners, but I definitely see things that you could be used in the educational space with mathematics, some great honorable mentions as well.
I as I said, maybe it was a few weeks ago, I could see trying to embed some shiny components into these and seeing what happens, especially with web assembly. So I'm that that itch to try that just got a lot bigger, to say the least with looking at these winners. So I'm I'm blown away. Absolutely blown away. Let's put this in perspective, folks. Close read has only been in existence for maybe half a year at the most or somewhere around there. And look at what people are doing here already with it. Like that is it goes to show you and there's admittedly there's some parallels in my opinion to shiny itself.
Little deposit know at the time when shiny was put out there just how embraced how it would be embraced so quickly by the art community. And lo and behold, we start to see these really novel uses of shiny that would blow everybody away. Close read seems to be following a similar trend here. I definitely have ideas with this, and now you've got at your available to you now over 40 of these submissions, if my my math is correct here, that you can look at all of these on deposit community, blog as well as the GitHub discussions that talk about, you know, extensions or use cases of close read.
This is terrific. Absolutely terrific. I invite you to check out these submissions. No way does an audio podcast do this justice, but you can see them all, like I said, on deposit forum and and see what inspires you. As a sports fan, as a a fan of engaging storytelling, yeah, I've got I've got me some ideas. We'll see what happens. And there's a lot more that you can see happen in the art community when you look at the rest of the issue of our weekly. As always, our curators, in this case, Sam did a tremendous job with this issue. And since we are off last week, my additional finds are actually gonna go back to the last issue that we would have talked about if we hadn't been off last week, an issue that Colin Faye curated and one of the highlights there that is an area that I played with and I've had admittedly very mixed results with, but looks like someone's trying it again.
We are talking about the way to distribute a shiny app in a more self contained manner. And in particular, what I tried a few years ago when I had a project where I tried to share a shiny app that could be installed on somebody's laptop or computer in a training session, I had tried wrapping a Shiny app into the Electron framework. And if you're not familiar with Electron, you can think of that as a way to wrap what amounts to an embeddable Chrome browser with JavaScript and and a web based application. So if you use things like Slack, Discord, countless others that they may look like they're native, but they're not really. They're cross platform because Electron is compiling them into that operating system's preferred, you know, format, such as, I guess, cutables for Windows, DMGs for Linux, or may I should say Mac OS 10, and then other formats for Linux. So it it it it runs the gamut.
Nonetheless, it looks like there is some attempts to make this a bit more robust for the shiny side of things once again. In particular, Jinhwan Kim has a blog post talking about a minimal framework called Nyriss, which is an an ag an anagram of shiny to transform a shiny application into a standalone application. There is a lot to this. He does link to a video that demonstrates this in action, but it is indeed based on Electron. He's tried to make it easy so you can kind of bootstrap all this once you clone the repository that that he has set up for you.
And doing some shell scripting that is tapping into Node. Js, which you have to have installed since it is all JavaScript under the hood. And another shell script called project that will help you include the R installation that you're using on your system, the R packages, the electron bits, the Node. Js packages. And then out of that you get a Shiny application that you can execute as an executable. I don't believe there's support for this for all the operating systems from what I can read yet. So it looks like there is more to come in this. I'm I'm impressed with how far he's gotten.
I will say though, I think just enough bad things can happen when you've got something that's not so trivial. I feel like the wave of the future is still the web assembly route. Maybe I'm wrong, but I think the web assembly route has a lot of momentum behind it. I think we're gonna get to a point where we're much closer to that vision of I compile a Shiny app and web assembly. I give one file to somebody. They just execute it in their web browser, and they're done because everybody's got a web browser. Right? So Shiny is, of course, browser based, you know, type of app. Long as you can execute it there, you're good to go. It's not quite there yet because you still have to run some kind of web server process, whether it's an r or Python or something else to host that. That's still a part I'm still struggling with, but we're getting much closer folks. We're getting much closer to support for WebAssembly is just getting better and better every week.
Electron, if this can be, you know, used in many different types of applications, I could see this as a good alternative. I've just been jaded by past experience. So maybe Jin Hwan has found a nugget to crack here. I will keep watching this space. I'm still still a bit more biased towards web assembly. Nonetheless, if you are gonna play with the electron and you were struggling with some of the previous presentations you saw, let's say, a previous shiny conference or a previous Rstudio conference, or I've seen this mentioned once or twice before, you may give this a play. See what happens, and nonetheless, try a Vatoy project and and hope for the best.
Perhaps this could be wrapped with a Knicks bootstrapping. Who knows? I don't know. I'm just spitballing here. Okay. We're gonna wrap up this episode of r w hives. But before I go, I wanna make sure I tell you all how to get in touch with us. We are a community project through and through. There is no corporate sponsor to r weekly. We don't get donations coming. It is all through the hours of us on the our weekly team and hopefully from you and the community help us out with curating these issues. One of the best ways to help is to send your suggestions for our upcoming issue, a great blog post, a great new package, a great tutorial, maybe an upcoming event. We wanna hear about it. You can send that to us by going to rweekly.0rg.
There is a link in the upper right corner, a little banner for you to fill out a poll request that will, you know, have a template for you to fill out quickly. And once you submit that, our curator for the upcoming issue will indeed be able to merge that into the upcoming issue. All marked down all the time, folks. That's scrolling, telling, close read extension, all marked down. Combined with our code, you can do it. And, also, we love to hear from you as well directly. We have a feedback, you know, portal that you can link link to in the episode show notes. Go ahead and click on that and send us a little message along the way. And if you're on a modern podcast app, like Podverse, Fountain, Cast O Matic, CurioCaster, You can send us a fun little boost along the way.
I would say in particular, fountain makes this the easiest. So if you want it, you're interested in playing with that, I would get in touch with fountain. Get that on your system, and it'll have we'll walk you through right through all the steps you need to get a wallet established and send us a fun little boost. Lots of fun to be had there. Trust me on that. And, also, you can get in touch with me on social media these days. I am on blue sky with at rpodcast@bsky.social. Also on Mastodon, where I am at rpodcast@podcastindexonsocial, as well as LinkedIn. Just search my name and you'll find me there.
Alright. We're gonna put a ball on episode a 97. Again, I may have mentioned this a couple weeks ago. If you have a favorite memory, a favorite segment, or just, you know, what's your favorite part of our weekly, we'd love to hear about it, and we'd be glad to read it on the episode 200. Share your feedback. I have no idea what else I'll do for that episode. One way or another, we'll make it fun. Alright. I'm gonna close-up shop here for episode 97. Thank you so much for listening. We'll see you back for episode 98 of our weekly highlights next week.