Episode Links
Supporting the show
- This week's curator: Jon Calder - @[email protected] (Mastodon) & @jonmcalder (X/Twitter)
- Let's Talk About the Weather
- 2024 Shiny Contest
- Entire issue available at rweekly.org/2024-W31
Supporting the show
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon) and @theRcast (X/Twitter)
- Mike Thomas: @mike[email protected] (Mastodon) and @mikeketchbrook (X/Twitter)
- A Flea and His Giant - Megaman X: Maverick Rising - Chuck Dietz - https://maverick.ocremix.org/music.php
[00:00:03]
Eric Nantz:
Hello, friends. We're back with episode 173 of the R Weekly Highlights podcast. This is the weekly podcast where we talk about the terrific resources that are being shared every single week at rweekly.org. My name is Eric Nantz, and I'm delighted you're joining us today. And I admit I look at the calendar at the top of my screen as I record this, and I cannot believe it's July 30th already. We are almost in August, which means my kids are almost in which might be a little bit of normalcy for those parents out there, if you can relate to that. But, nonetheless, we're here to talk about all things are in our weekly. And I don't do this alone as always. I'm joined by my awesome co host, Mike Thomas.
Mike, where did the time go?
[00:00:43] Mike Thomas:
I don't know, Eric, but, I imagine potentially like you you may be also scrambling, to put a presentation together for a conference in August. That's sort of where I'm at. These conferences have come up quite quickly. And, Yeah. It's crunch time.
[00:01:00] Eric Nantz:
It is crunch time. Yes. I'm gonna frantically put in the finishing touches on my upcoming talk about WebAssembly. Very excited for it. Having some initial good feedback, so I won't put any spoilers out here, but I got some good stories to tell with that with that effort. But, yeah, less than 2 weeks from now. So as you heard from last week weeks before, Mike and I will be there at Positconf, and definitely welcome for you to come say hi. And I cannot confirm or deny that I might have some stickers with me. We'll find out, but nonetheless, come say hi nonetheless. We're always all connecting to the listeners over at these events, and, yeah, this will be an exciting time.
[00:01:38] Mike Thomas:
Yes. Absolutely. I have a presentation not at Pazitconf, a different conference. It's on the topic of AI and you're gonna like this. My first slide is you might not need AI.
[00:01:50] Eric Nantz:
Good. Hit them quick with it. Really setting the tone. You should. You should. That that's terrific. Now I I but, yeah, who knows? And, you know, give or take here here and there, but there are some fluffy efforts going on in various industries. So, yeah, you keep it real, Mike. You keep it real. That's what we do. That's all we do. And what else keeps it real? Well, it's organically real in terms of the awesome content at our weekly every single week. And as you know, we have a rotating set of curators that pitch in on different weeks to help, assemble the issue.
And this week's our curator is John Calder, another one of our OGs, if you will, of the our weekly team. And as always, he had tremendous help from our fellow our weekly team members and contributors like all of you around the world. And our first highlight just happens to come from one of our fellow curators. And looking at this a phenomenon that unfortunately can occur when we grab our data from online sources and things get kind of uprooted from us, but the community comes to the rescue once again. This blog post is coming from Jonathan Carroll, who again, on top of his awesome efforts of our weekly is always looking at new things to learn in his blog. And it's always a fascinating read.
Well, he's taken a bit of a detour from his programming language exploits and other languages. He's gonna talk to us a bit about the weather in Australia, and he's actually looked at this for quite a while now. In fact, since the mid 20 tens, he's been grabbing weather data from Australia that's been exposed by the Bureau of Meteorology. He says don't call it the bomb. I won't do that. But this has been keeping track of weather for a good while now. And but there's a bit of a bad news is that unlike other services, maybe you shouldn't be surprised about this. They may have an official API to download all this.
So Jonathan would leverage his scraping skills using the various utilities that we've covered many times in this podcast and elsewhere. There are many awesome packages in order to help with web scraping in particular, like the Arvest package, but there's many others in this space. Well, he's been doing that for a good bit. But there was a recent, mishap, if you will, where the service back in 2021, his function to do the scraping was not working anymore. No code changes. What gives? Right? Well, turns out as he did a little investigation in terms of user agents and other pits here, There was an official statement that was released.
It says, and I quote, the bureau is monitoring screen scraping activity on the site and will commence interrupting and eventually blocking this activity on this site from Wednesday, 3rd March 2021. Well, epic fail. Right? That's not good. And now, rightfully so, John's a little peeved about this because this is from a government site. You know, Australia, like other countries, we have taxes. We're open to pay the government from these kind of services. So oh, kinda throw your hands up on that one. Mike's biting his tongue here. I can tell there. Yep. Been yep. Yep. We've we've been there with these things getting uprooted from various sectors and government.
But the community comes to the rescue once again, unfortunately, where Adam Sparks, who's also been interested in weather data from Australia as well, he discovered yet another site that was kind of filling the niche of what was happening with the bureau site called silo. And he and Adam has built a new our package, which also is actually included in this our weekly issue called weather Oz and actually has an accompanying paper in the journal of, statistical software, the boots. So lots of lots of effort behind this, which helps have a compliant R package to grab the weather data from this silo site.
Terrific. Okay. Back of business, John says. Now he can leverage this new package to have a very handy function called get stations metadata, which he's able to put in the station name and then which API to use. And by default, I believe it is a silo API. And sure enough, you get a tidy data frame back of the various metadata associated with this. And once you get the station code, then he can actually grab more of the data itself. And this is where he started now updating his functions to grab, you know, the various metadata associated with temperatures and whatnot, longitude, latitude, lots of other metrics here. I'm looking at the glimpse of the data frame here. There is a lot going on here. So if you're a weather junkie or weather nerd, this one's for you. There's lots going on in this space.
So he was able to grab almost 50,000 observations over the last 135 years. So there you go. You got yourself some time series, if I do say so myself. So after a bit of tidying up, he decides, okay. You know what? I used to do some plots of this in the past as I was investigating some questions. Let me run these again. And sure enough with the new data and the tiny format, a little bit of g two code, and he's got a nice set of charts, Mike. Why don't you walk us through what he's trying to visualize here?
[00:07:21] Mike Thomas:
Sure. So it looks like John originally had tried to create these chart charts using a package called Bombrang, which I was not familiar but appears to be be superseded, and now we're transitioning to ggplot, and I really really enjoy the ggplot code that he's written here. 1st, he's taking a look at the daily maximum temperatures and as you said, Eric, there is a lot of data here. We're we're going back, from 2024 back to 18/89. So we have a scatterplot here with a beautiful curve on it, that shows, you know, has a different point for the day of the year, for each year between 18/89 and 2024 and what the temperature was on the y axis, and we have some some color gradients based upon the decade as well, which I I think is a really unique and sort of interesting way to, add this additional dimension to the data as well. And it produces this really nice curve really beautifully done in g ggplot. We're we're using Viridis, to accommodate the color blind folks out there. So I can't say enough about this. It has a great caption on it as well. One of the things that I really appreciate, that John does, you know, throughout some of the other plots in this, chart is the use of I don't know if you would call them, like, Lambda functions for for additional filtering. But within the ggplot syntax you can pass, starting with like a a tilde a dplyr filter statement where the data frame itself that's being used by ggplot is, you know, referenced by this dotx placeholder.
And there's multiple examples here in the code of how John goes about doing that and it's a really clean, nice syntax. It's probably something that I don't use enough. So if you are starting to get into, you know, a little more complicated ggplot visualizations, where in certain aspects of the plot or certain layers you you wanna use a portion, only a portion of the data and not the entire dataset, such as, in this daily minimum temperatures plot that John creates. He wants to highlight specifically in red so that these points stand out, you know, really really obviously to the user. All the observations, from June onward in the year 2024, so that you can see those really on top of the the rest of the Viridis, you know, gradiented, points on this daily minimum temperatures plot. And it's it's really simple syntax, I would say. Just this really nice dplyr filter statement and it's it's great that we have this beautiful concise interoperability sort of between ggplot and dplyr, to allow us to use those tools together to add these layers, you know, some that have a filtered context and and some that do not. Then we move into fastening as well where he has a daily maximum temperature plot, fasted by each month of the year. And one of the nice things here, again, is where we are, again, sort of calling out a specific point on each facet based upon a dplyr filter context where we're we're actually slicing the max, the the highest particular temperature across all 100 plus years, whatever that math is, between 18/89 and 2004.
And we're coloring that particular point red and we're sticking a geomtext label on top of it to let us know what year, that highest temperature in that particular fasted month took place in. So it makes it really, really easy to consume and take a look at this this chart to see the the temperatures, over time in a particular month, but then also call out the year that had the highest temperature in that month. It's it's really, really well done data visualization and he does the same thing, with the daily minimum temperature. And one sort of neat visual trick is that instead of, highlighting those those, you know, maximum or highest, temperature dots with the color red, as he did in the maximum temperature side, he will, highlight those with the color blue because these minimum temperatures are the the coldest temperatures, that that took place or that the year that had the the coldest temperature in that particular month I thought that was that was really nifty and something that I probably would not have thought to do but it's really interesting. And the the captions here and really the attention to detail are are beautiful and it's just a one of those really nice data visualization, blog posts, Eric, that I know you and I really love and appreciate because I'm a visual learner. This stuff stands out to me. Most of the work that we do for our our clients, ends or or, you know, utilizes some sort of data visualization as well to get our point across because it's it's one of the the ways that we communicate data the most effectively. So if you are just looking to either up your ggplot game or just get a little refresher and and take a look at what, John has put together here and some of his ggplot work. I can't recommend this blog post enough.
[00:12:24] Eric Nantz:
Yeah. It's such a great introduction to a very concise EDA with a novel dataset too and very logically exploring these trends that he has seen as the years have gone by. Can we see, like, a seasonal type pattern in terms of the variation of these temperatures? And, yeah, he has noted that it's been a 2024. Apparently one of these lowest points was a negative 5 degrees Celsius. And yeah, that's a bit chilly. And, someone who who's lived in Michigan over my formative years. Yeah. I know how cold things can get. But, yeah, it's interesting to see. Again, the facets really show the story of how the spread tightens in the middle of the year and then spreads more out at the extremes or, I should say, the beginnings and ends of the year. Yeah. Really, really novel use. Like I said, that way, I'm the like functionality.
The key point there is these geomes in these data arguments. As long as you're getting a data frame back, you don't have to do this pre computed in fashion. You can do it in line, so to speak, in the geo, which again is great, especially if you're doing this EDA and you wanna iterate on this pretty quickly and with some pretty concise code. So I think it's a novel technique that, like you, Mike, I have not utilized this enough. So I have to take a take a bit of learning here as I revamp my simulation or visualizations of simulations that we're trying to do a better job of these days.
[00:13:51] Mike Thomas:
Yep. And you know me, Eric, you know, in terms of code review and things like that and collaborating with the team, I'm always trying to arrive at code that is as concise as possible while getting, you know, the point obviously across as effectively as possible about what it's trying to accomplish. And I I think this is a great example of doing that.
[00:14:12] Eric Nantz:
Yep. And and John never just stops there. He's got a a truckload of other amazing posts in his learning adventures. So if you really wanna get into the the nuts and bolts of other programming languages and how from an our user's perspective that he relates to them, there is a lot going on in his his blog. So if you haven't bookmarked it before, you absolutely should.
[00:14:36] Mike Thomas:
Yeah. And his session info, he's got the the OS.
[00:14:39] Eric Nantz:
The OS says pop, p o p with an Yes. Information on the pop up list. That's what I'm using on this very machine, PopOS. A Linux. It's a Linux distribution. It's, made by the vendor called System 76 who makes hardware dedicated to Linux. So this, box I'm talking to you right now is called a Feilio PC that they make right here in the US and Colorado. And I've had it for about 4 years. Yep. If not longer, actually. Yep. So Very cool. A fellow PopOS user. That's awesome. He's a he's a fellow Linux nerd. So that's awesome. Love it. You know, Mike, I always wonder if someone was kind of missing this year as the months have gone by. Well, there is a tradition that certainly a shiny enthusiasts are very eager to see happen, and it is back. What am I talking about?
It is the 2024 shiny contest run by posit is officially up and underway. And this blog post comes from us from the author of shiny himself, Joe Chang, as well as our posit, community manager, Curtis Kephart. And then the blog post is pretty short and sweet. If you've familiar with the contest before, there's not a lot of changes. But if you're new to this, this is the annual tradition where they are reaching out to the community and inviting them to submit their entries of their innovative ways of creating and and deploying shiny applications.
There are a set of requirements to be aware of is that both the data and the code behind the app should be open source publicly available. So, obviously, it goes about saying you probably don't wanna use your company's internal data for this, but that's neither here nor there. And they also invite you to deploy the application on this time around the recently launched posit connect cloud service, which I believe just went publicly available a month ago or so. In the past, that's been Shiny apps. Io. But if I'm, you know, that's the usual evolution of a enterprise and their software products, I must say. And that you store the code in a public repository that could be GitHub, get lab, that bucket, whatever have you. And there is a set of judges, which I don't believe, you know, the names just yet, but there will be a set of judges that evaluate each of the applications based on a set of metrics.
I speak a little bit of, experience on this because it was like a few years ago, I was a judge on one of these contests and my goodness, the submissions were such high quality. It was really tough. Great to see the applications, but, man, trying to find trying to judge these in a fair way when you just wanna say they're all great. Right? So it's a there's a set of judges to help compete with these metrics and a lot of nice awards at the end there with you are a runner-up or an honorable mention and the grand prizes. They have all the details there. Yes. There will be a bunch of stickers being thrown your way, and the grand prize winner actually gets a half hour private meeting with members of the shiny team. So my goodness. That's a that's an awesome opportunity in and of itself to go over your app and what you can do in the future.
So you may wonder where can I go to look? Well, I will say that but and the nice thing in this post is they've linked to the previous blog post where they talked about the winners and runners up all the way back to 2019. I still remember that you're very fondly of being like, okay. I'm all over this. I had fun creating a Lego mosaic app as part of a shiny contest of yesteryear. That was a that was a fun time. Of course, if you look at that code now compared to what I do now, just don't judge too harshly. But, nonetheless, it's a great time of year. We're always excited to see what the community comes, comes to bear with this. So, yeah, if you're a shiny and a shiny enthusiast, this is this is a great opportunity to test your might if you will and see what you can bring into the community.
[00:18:53] Mike Thomas:
I agree here. I can I feel like there's so many more options in the shiny ecosystem nowadays and so many different sort of branches and paths that you could pursue as as you you know make your entry into this contest you know is it are you just going to have the the repository are you going to deploy your app somewhere is it going to be on shiny apps that I owe or is it going to be on shiny live right you know leveraging the web our framework are you going to use shiny for Python or are you going to use shiny for our or are you going to use that the teal framework for building shiny apps which is popular among folks you know in your space Eric and and farm on in life sciences and they did note that they are going to have a special recognition for developers shiny so if you were someone just getting started with shiny don't be intimidated by this contest actually embrace it because you could potentially get the the special recognition for for giving it a shot and I believe that that you will find if you are a new developer to shiny and you take place, in this contest and you participate in it, you know, one of the I think sort of where this is hosted is is the posit community? Correct. So I would recommend that you ask any questions that you have along the way within that Posit community forum. People are really responsive.
I hope and I would, you know, based upon my experience, I think that you'll find it welcoming. I think that you will find, folks pretty responsive to a lot of your questions, especially, you know, within the context of of this particular contest and everybody trying to put their best work forward. So, welcome everyone, 1 and all. It's put your submission in. I think the deadline is, Eric, when did we when does it look like the deadline is? Let me take a look. I should know that. Check this out beforehand. Deadline for submission is September 15th at midnight, anywhere on earth.
So you've got a little bit of time. It looks like we've got about a month and a half here, if my math is correct, based upon the time that we'll put this, this episode out. So get to developing.
[00:21:06] Eric Nantz:
That's right. And and certainly kudos to the shiny team and Curtis for working with, like like you mentioned earlier, this is the first time that we're really calling out the opportunity in the life sciences space to leverage the t o framework, which is making huge waves in our industry to help build these very comprehensive shiny ass of a modular structure that are tailored for reviewing the, you know, the types of clinical data that we deal with on a daily basis. So this is an awesome opportunity to see what you can do with that. And I've even seen others in the community that are not not necessarily part of pharma levers teal to do some really fun exploratory data analysis of a shiny front end. So, yeah, definitely, if you're a teal enthusiast, like, that's growing pretty rapidly. Yeah. This is a great opportunity to put your put your work out there as well. And certainly, like you mentioned earlier, for our friends in the Python space, Shawnee 1 dot over Python was just released. So another great opportunity if you've been wanting to test your mind with the Python side of things. This is a great place to do it. I always think in general, finding, a dataset or or a domain that you're interested in in and having an opportunity like this is such a great learning opportunity too. Because again, you're going to see not just your submission being put into the queue. You're going to see in real time as these submissions come in, there will be a post on posit community dedicated or a category dedicated to this contest and you'll see these start to trickle in and you get can get really inspired to see what what everybody's up to. So I remember going down to the wire on my Lego one and seeing just the, the, the high quality submissions and have admittedly a first time have low imposter syndrome. Like, oh my gosh, I can't believe this. But then again, I'm liking it to great learning opportunity and it's an, someone else will benefit from what you're putting out there. It always happens. So I remember getting some nice, comments on my submission, but then many others were providing questions and comments on these posts. So, again, very big emphasis on learning and having fun learning, I might add.
[00:23:17] Mike Thomas:
Absolutely. I can't wait to get my hands on keyboard and see what our team can come up with for a submission. I'm gonna hold myself to it.
[00:23:24] Eric Nantz:
Yeah. Yeah. Once the dust settles on deposit comp, I might have some more geeky opportunities to do shiny stuff with podcasting data again. We'll see. We'll see. But, yeah, we're on this podcast now. Right? And we wish we could talk about the rest of Arwek issue, but, there's always so much that we have time in today. So we're gonna wrap up here. Well, a couple of our additional fines that we saw in this week's issue, which again is always linked in the show notes of this episode. And I've always been, you know, trying my best to leverage, get effectively, especially with, you know, nice commit messages, you know, following, you know, standards that we maybe code reviews are gonna make easier if we follow the standards of one area that I admit I don't do enough of is there is a mechanism and get that will basically check before you commit run, I should say, a custom script that might check for various things, such as maybe number of files you're committing, maybe number of lines, maybe the type of commit message, maybe you have, like, a a very formal framework of it. You can build what are called pre commit hooks in your local Git repository to help be that kind of frontline check before you actually do the commit.
Well, there is a new package in your ecosystem called pre commit, which will let you author these pre commit hooks from the comfort of your r installation itself. And there is also, I believe, a little bit of requirement for Python as well. So this might be using reticulate under the hood. I'm not quite sure. But once you have everything set up, you're gonna be able to do, you know, a very nice, there's a nice package documentation site that we'll link to in the show notes going from the motivation of why you want to leverage this as well as some built in hooks that you can leverage such as for styling, such as making sure your read me is up to date if you're doing our markdown for your read me, making sure you're not leaving a browser statement in your code because who would ever do that? Oh goodness gracious. Who would ever push that into production? I don't know. My goodness. Where was this few years ago? But, yeah, there's a lot to choose from here, so, I may need this. What do you think, Mike?
[00:25:42] Mike Thomas:
That's an awesome resource, Eric, and I may need that as well. Yeah. That that could have saved me many many many times over so I need to check that out. I found a really really cool blog post called llama llama oh give me a sign what's in the latest IDSA guideline. And I could probably just leave it at that. That's such a cool blog post title but I won't. Right. This this one is authored by Ken Koon Wong and it is all about leveraging, you know, open source, I believe, LLM, RAG to be able to ask questions of the latest guidelines from the Infectious Diseases, Society of America and it is incredibly comprehensive.
It, uses Reticulate as well to interrupt R and Python to be able to to stand this solution up and lots of gifts, lots of content, it's an incredibly of blog post and if if this is something, that you might find interesting in using LLMs and and rag to be able to ask questions or or summarize a particular document or set of documents or or guidance that's out there such as, the the Infectious Diseases Society of America guidelines. I would highly recommend checking out, how can went about doing this. It's incredibly comprehensive.
[00:27:05] Eric Nantz:
Oh, this is incredibly useful too because in many enterprises, it's not just about trying to be, quote, unquote, future proofing your questions, but taking what you already have, whether it's these documents of, like, important information or metadata associated with, like, infrastructure or where you're designing experiments or whatnot. Being this is a great pose. I'll kind of walk through what is that process of feeding those documents in, creating under the hood what these vector databases they caused. So then the LM will use that as a source to help answer these questions. So this is a a very hot topic. And and and my industry is we deal with, you know, thousands upon thousands of study documents and patient documents and whatnot to be able to, you know, effectively put an LM in front of that to explore what we have.
Heck, you know what? Even in the podcasting space, there's been talks of trying to consume our show notes and having an LOM in front of that to maybe figure out, Hey, when did Mike and Mike and Eric talk about that, that, that new shiny package or that or that new, portal thing? You know? Boy, that would be the dream. Right? That little bot in front of that. I don't know. Just saying. Sweet. It's all the metadata is there, folks. This may maybe we'll help to build that, but this post is a good good way to get us started, I think.
[00:28:27] Mike Thomas:
Add us to the side project list.
[00:28:29] Eric Nantz:
As if it wasn't long enough already. What am I doing to myself? And on top of that, I'm trying to learn Nicks to boot on this. So my gosh. What what am I doing to myself? Nonetheless, no side project elsewhere here. We're gonna tell you about how you can contribute to rweekly itself. That doesn't have to be as in-depth as a side project. We are just a pull request away. If you find a great new resource, new package, new blog post, whatever have you, we it's all marked down that comprises the our weekly site. So if you know how to put a link in our markdown or markdown itself, you know how to contribute to our weekly. It's basically that easy. You have a template and a GitHub issue, or a poll request template to navigate you through the requirements. Again, very straightforward. We welcome all your contributions.
And, of course, we welcome hearing from you as well. As we mentioned in a couple weeks, Mike and I will be in Seattle for Positomp. But in the meantime, if you wanna get a hold of us, there's a few ways to do that. We have a contact page in the episode show notes when you download this in your favorite podcasting software of choice. You also if you're on a modern podcast app like Podverse or Fountain, you can send us a little boost along the way as well. We got details on how to do that in the show notes, and you can find us on the various spheres of social media circles.
I am mostly on Mastodon these days with [email protected]. You can also find me on LinkedIn. You search my name. You'll find me causing lots of fun stuff there potentially and on that weapon exiting from time to time with at the r cast. Mike, where can listeners get a hold of you? You could find me on mastodon@[email protected],
[00:30:09] Mike Thomas:
or you can find me on LinkedIn if you search Catchbrook Analytics, k e t c h b r o o k. You can see what I'm up to.
[00:30:17] Eric Nantz:
Very good. I'm always every time I boot up in the morning, I'm looking looking for Mike's newest adventures, so I'm always pleased when I see that. Nonetheless, yeah, lots of adventures that we have to get on to the rest of our day. But, of course, we thank you so much for listening from wherever you are around the world. We really enjoy doing this every week, but we're gonna close-up the the proverbial mics here, and then we're gonna invite you to join us for another episode of our weekly highlights next week.
Hello, friends. We're back with episode 173 of the R Weekly Highlights podcast. This is the weekly podcast where we talk about the terrific resources that are being shared every single week at rweekly.org. My name is Eric Nantz, and I'm delighted you're joining us today. And I admit I look at the calendar at the top of my screen as I record this, and I cannot believe it's July 30th already. We are almost in August, which means my kids are almost in which might be a little bit of normalcy for those parents out there, if you can relate to that. But, nonetheless, we're here to talk about all things are in our weekly. And I don't do this alone as always. I'm joined by my awesome co host, Mike Thomas.
Mike, where did the time go?
[00:00:43] Mike Thomas:
I don't know, Eric, but, I imagine potentially like you you may be also scrambling, to put a presentation together for a conference in August. That's sort of where I'm at. These conferences have come up quite quickly. And, Yeah. It's crunch time.
[00:01:00] Eric Nantz:
It is crunch time. Yes. I'm gonna frantically put in the finishing touches on my upcoming talk about WebAssembly. Very excited for it. Having some initial good feedback, so I won't put any spoilers out here, but I got some good stories to tell with that with that effort. But, yeah, less than 2 weeks from now. So as you heard from last week weeks before, Mike and I will be there at Positconf, and definitely welcome for you to come say hi. And I cannot confirm or deny that I might have some stickers with me. We'll find out, but nonetheless, come say hi nonetheless. We're always all connecting to the listeners over at these events, and, yeah, this will be an exciting time.
[00:01:38] Mike Thomas:
Yes. Absolutely. I have a presentation not at Pazitconf, a different conference. It's on the topic of AI and you're gonna like this. My first slide is you might not need AI.
[00:01:50] Eric Nantz:
Good. Hit them quick with it. Really setting the tone. You should. You should. That that's terrific. Now I I but, yeah, who knows? And, you know, give or take here here and there, but there are some fluffy efforts going on in various industries. So, yeah, you keep it real, Mike. You keep it real. That's what we do. That's all we do. And what else keeps it real? Well, it's organically real in terms of the awesome content at our weekly every single week. And as you know, we have a rotating set of curators that pitch in on different weeks to help, assemble the issue.
And this week's our curator is John Calder, another one of our OGs, if you will, of the our weekly team. And as always, he had tremendous help from our fellow our weekly team members and contributors like all of you around the world. And our first highlight just happens to come from one of our fellow curators. And looking at this a phenomenon that unfortunately can occur when we grab our data from online sources and things get kind of uprooted from us, but the community comes to the rescue once again. This blog post is coming from Jonathan Carroll, who again, on top of his awesome efforts of our weekly is always looking at new things to learn in his blog. And it's always a fascinating read.
Well, he's taken a bit of a detour from his programming language exploits and other languages. He's gonna talk to us a bit about the weather in Australia, and he's actually looked at this for quite a while now. In fact, since the mid 20 tens, he's been grabbing weather data from Australia that's been exposed by the Bureau of Meteorology. He says don't call it the bomb. I won't do that. But this has been keeping track of weather for a good while now. And but there's a bit of a bad news is that unlike other services, maybe you shouldn't be surprised about this. They may have an official API to download all this.
So Jonathan would leverage his scraping skills using the various utilities that we've covered many times in this podcast and elsewhere. There are many awesome packages in order to help with web scraping in particular, like the Arvest package, but there's many others in this space. Well, he's been doing that for a good bit. But there was a recent, mishap, if you will, where the service back in 2021, his function to do the scraping was not working anymore. No code changes. What gives? Right? Well, turns out as he did a little investigation in terms of user agents and other pits here, There was an official statement that was released.
It says, and I quote, the bureau is monitoring screen scraping activity on the site and will commence interrupting and eventually blocking this activity on this site from Wednesday, 3rd March 2021. Well, epic fail. Right? That's not good. And now, rightfully so, John's a little peeved about this because this is from a government site. You know, Australia, like other countries, we have taxes. We're open to pay the government from these kind of services. So oh, kinda throw your hands up on that one. Mike's biting his tongue here. I can tell there. Yep. Been yep. Yep. We've we've been there with these things getting uprooted from various sectors and government.
But the community comes to the rescue once again, unfortunately, where Adam Sparks, who's also been interested in weather data from Australia as well, he discovered yet another site that was kind of filling the niche of what was happening with the bureau site called silo. And he and Adam has built a new our package, which also is actually included in this our weekly issue called weather Oz and actually has an accompanying paper in the journal of, statistical software, the boots. So lots of lots of effort behind this, which helps have a compliant R package to grab the weather data from this silo site.
Terrific. Okay. Back of business, John says. Now he can leverage this new package to have a very handy function called get stations metadata, which he's able to put in the station name and then which API to use. And by default, I believe it is a silo API. And sure enough, you get a tidy data frame back of the various metadata associated with this. And once you get the station code, then he can actually grab more of the data itself. And this is where he started now updating his functions to grab, you know, the various metadata associated with temperatures and whatnot, longitude, latitude, lots of other metrics here. I'm looking at the glimpse of the data frame here. There is a lot going on here. So if you're a weather junkie or weather nerd, this one's for you. There's lots going on in this space.
So he was able to grab almost 50,000 observations over the last 135 years. So there you go. You got yourself some time series, if I do say so myself. So after a bit of tidying up, he decides, okay. You know what? I used to do some plots of this in the past as I was investigating some questions. Let me run these again. And sure enough with the new data and the tiny format, a little bit of g two code, and he's got a nice set of charts, Mike. Why don't you walk us through what he's trying to visualize here?
[00:07:21] Mike Thomas:
Sure. So it looks like John originally had tried to create these chart charts using a package called Bombrang, which I was not familiar but appears to be be superseded, and now we're transitioning to ggplot, and I really really enjoy the ggplot code that he's written here. 1st, he's taking a look at the daily maximum temperatures and as you said, Eric, there is a lot of data here. We're we're going back, from 2024 back to 18/89. So we have a scatterplot here with a beautiful curve on it, that shows, you know, has a different point for the day of the year, for each year between 18/89 and 2024 and what the temperature was on the y axis, and we have some some color gradients based upon the decade as well, which I I think is a really unique and sort of interesting way to, add this additional dimension to the data as well. And it produces this really nice curve really beautifully done in g ggplot. We're we're using Viridis, to accommodate the color blind folks out there. So I can't say enough about this. It has a great caption on it as well. One of the things that I really appreciate, that John does, you know, throughout some of the other plots in this, chart is the use of I don't know if you would call them, like, Lambda functions for for additional filtering. But within the ggplot syntax you can pass, starting with like a a tilde a dplyr filter statement where the data frame itself that's being used by ggplot is, you know, referenced by this dotx placeholder.
And there's multiple examples here in the code of how John goes about doing that and it's a really clean, nice syntax. It's probably something that I don't use enough. So if you are starting to get into, you know, a little more complicated ggplot visualizations, where in certain aspects of the plot or certain layers you you wanna use a portion, only a portion of the data and not the entire dataset, such as, in this daily minimum temperatures plot that John creates. He wants to highlight specifically in red so that these points stand out, you know, really really obviously to the user. All the observations, from June onward in the year 2024, so that you can see those really on top of the the rest of the Viridis, you know, gradiented, points on this daily minimum temperatures plot. And it's it's really simple syntax, I would say. Just this really nice dplyr filter statement and it's it's great that we have this beautiful concise interoperability sort of between ggplot and dplyr, to allow us to use those tools together to add these layers, you know, some that have a filtered context and and some that do not. Then we move into fastening as well where he has a daily maximum temperature plot, fasted by each month of the year. And one of the nice things here, again, is where we are, again, sort of calling out a specific point on each facet based upon a dplyr filter context where we're we're actually slicing the max, the the highest particular temperature across all 100 plus years, whatever that math is, between 18/89 and 2004.
And we're coloring that particular point red and we're sticking a geomtext label on top of it to let us know what year, that highest temperature in that particular fasted month took place in. So it makes it really, really easy to consume and take a look at this this chart to see the the temperatures, over time in a particular month, but then also call out the year that had the highest temperature in that month. It's it's really, really well done data visualization and he does the same thing, with the daily minimum temperature. And one sort of neat visual trick is that instead of, highlighting those those, you know, maximum or highest, temperature dots with the color red, as he did in the maximum temperature side, he will, highlight those with the color blue because these minimum temperatures are the the coldest temperatures, that that took place or that the year that had the the coldest temperature in that particular month I thought that was that was really nifty and something that I probably would not have thought to do but it's really interesting. And the the captions here and really the attention to detail are are beautiful and it's just a one of those really nice data visualization, blog posts, Eric, that I know you and I really love and appreciate because I'm a visual learner. This stuff stands out to me. Most of the work that we do for our our clients, ends or or, you know, utilizes some sort of data visualization as well to get our point across because it's it's one of the the ways that we communicate data the most effectively. So if you are just looking to either up your ggplot game or just get a little refresher and and take a look at what, John has put together here and some of his ggplot work. I can't recommend this blog post enough.
[00:12:24] Eric Nantz:
Yeah. It's such a great introduction to a very concise EDA with a novel dataset too and very logically exploring these trends that he has seen as the years have gone by. Can we see, like, a seasonal type pattern in terms of the variation of these temperatures? And, yeah, he has noted that it's been a 2024. Apparently one of these lowest points was a negative 5 degrees Celsius. And yeah, that's a bit chilly. And, someone who who's lived in Michigan over my formative years. Yeah. I know how cold things can get. But, yeah, it's interesting to see. Again, the facets really show the story of how the spread tightens in the middle of the year and then spreads more out at the extremes or, I should say, the beginnings and ends of the year. Yeah. Really, really novel use. Like I said, that way, I'm the like functionality.
The key point there is these geomes in these data arguments. As long as you're getting a data frame back, you don't have to do this pre computed in fashion. You can do it in line, so to speak, in the geo, which again is great, especially if you're doing this EDA and you wanna iterate on this pretty quickly and with some pretty concise code. So I think it's a novel technique that, like you, Mike, I have not utilized this enough. So I have to take a take a bit of learning here as I revamp my simulation or visualizations of simulations that we're trying to do a better job of these days.
[00:13:51] Mike Thomas:
Yep. And you know me, Eric, you know, in terms of code review and things like that and collaborating with the team, I'm always trying to arrive at code that is as concise as possible while getting, you know, the point obviously across as effectively as possible about what it's trying to accomplish. And I I think this is a great example of doing that.
[00:14:12] Eric Nantz:
Yep. And and John never just stops there. He's got a a truckload of other amazing posts in his learning adventures. So if you really wanna get into the the nuts and bolts of other programming languages and how from an our user's perspective that he relates to them, there is a lot going on in his his blog. So if you haven't bookmarked it before, you absolutely should.
[00:14:36] Mike Thomas:
Yeah. And his session info, he's got the the OS.
[00:14:39] Eric Nantz:
The OS says pop, p o p with an Yes. Information on the pop up list. That's what I'm using on this very machine, PopOS. A Linux. It's a Linux distribution. It's, made by the vendor called System 76 who makes hardware dedicated to Linux. So this, box I'm talking to you right now is called a Feilio PC that they make right here in the US and Colorado. And I've had it for about 4 years. Yep. If not longer, actually. Yep. So Very cool. A fellow PopOS user. That's awesome. He's a he's a fellow Linux nerd. So that's awesome. Love it. You know, Mike, I always wonder if someone was kind of missing this year as the months have gone by. Well, there is a tradition that certainly a shiny enthusiasts are very eager to see happen, and it is back. What am I talking about?
It is the 2024 shiny contest run by posit is officially up and underway. And this blog post comes from us from the author of shiny himself, Joe Chang, as well as our posit, community manager, Curtis Kephart. And then the blog post is pretty short and sweet. If you've familiar with the contest before, there's not a lot of changes. But if you're new to this, this is the annual tradition where they are reaching out to the community and inviting them to submit their entries of their innovative ways of creating and and deploying shiny applications.
There are a set of requirements to be aware of is that both the data and the code behind the app should be open source publicly available. So, obviously, it goes about saying you probably don't wanna use your company's internal data for this, but that's neither here nor there. And they also invite you to deploy the application on this time around the recently launched posit connect cloud service, which I believe just went publicly available a month ago or so. In the past, that's been Shiny apps. Io. But if I'm, you know, that's the usual evolution of a enterprise and their software products, I must say. And that you store the code in a public repository that could be GitHub, get lab, that bucket, whatever have you. And there is a set of judges, which I don't believe, you know, the names just yet, but there will be a set of judges that evaluate each of the applications based on a set of metrics.
I speak a little bit of, experience on this because it was like a few years ago, I was a judge on one of these contests and my goodness, the submissions were such high quality. It was really tough. Great to see the applications, but, man, trying to find trying to judge these in a fair way when you just wanna say they're all great. Right? So it's a there's a set of judges to help compete with these metrics and a lot of nice awards at the end there with you are a runner-up or an honorable mention and the grand prizes. They have all the details there. Yes. There will be a bunch of stickers being thrown your way, and the grand prize winner actually gets a half hour private meeting with members of the shiny team. So my goodness. That's a that's an awesome opportunity in and of itself to go over your app and what you can do in the future.
So you may wonder where can I go to look? Well, I will say that but and the nice thing in this post is they've linked to the previous blog post where they talked about the winners and runners up all the way back to 2019. I still remember that you're very fondly of being like, okay. I'm all over this. I had fun creating a Lego mosaic app as part of a shiny contest of yesteryear. That was a that was a fun time. Of course, if you look at that code now compared to what I do now, just don't judge too harshly. But, nonetheless, it's a great time of year. We're always excited to see what the community comes, comes to bear with this. So, yeah, if you're a shiny and a shiny enthusiast, this is this is a great opportunity to test your might if you will and see what you can bring into the community.
[00:18:53] Mike Thomas:
I agree here. I can I feel like there's so many more options in the shiny ecosystem nowadays and so many different sort of branches and paths that you could pursue as as you you know make your entry into this contest you know is it are you just going to have the the repository are you going to deploy your app somewhere is it going to be on shiny apps that I owe or is it going to be on shiny live right you know leveraging the web our framework are you going to use shiny for Python or are you going to use shiny for our or are you going to use that the teal framework for building shiny apps which is popular among folks you know in your space Eric and and farm on in life sciences and they did note that they are going to have a special recognition for developers shiny so if you were someone just getting started with shiny don't be intimidated by this contest actually embrace it because you could potentially get the the special recognition for for giving it a shot and I believe that that you will find if you are a new developer to shiny and you take place, in this contest and you participate in it, you know, one of the I think sort of where this is hosted is is the posit community? Correct. So I would recommend that you ask any questions that you have along the way within that Posit community forum. People are really responsive.
I hope and I would, you know, based upon my experience, I think that you'll find it welcoming. I think that you will find, folks pretty responsive to a lot of your questions, especially, you know, within the context of of this particular contest and everybody trying to put their best work forward. So, welcome everyone, 1 and all. It's put your submission in. I think the deadline is, Eric, when did we when does it look like the deadline is? Let me take a look. I should know that. Check this out beforehand. Deadline for submission is September 15th at midnight, anywhere on earth.
So you've got a little bit of time. It looks like we've got about a month and a half here, if my math is correct, based upon the time that we'll put this, this episode out. So get to developing.
[00:21:06] Eric Nantz:
That's right. And and certainly kudos to the shiny team and Curtis for working with, like like you mentioned earlier, this is the first time that we're really calling out the opportunity in the life sciences space to leverage the t o framework, which is making huge waves in our industry to help build these very comprehensive shiny ass of a modular structure that are tailored for reviewing the, you know, the types of clinical data that we deal with on a daily basis. So this is an awesome opportunity to see what you can do with that. And I've even seen others in the community that are not not necessarily part of pharma levers teal to do some really fun exploratory data analysis of a shiny front end. So, yeah, definitely, if you're a teal enthusiast, like, that's growing pretty rapidly. Yeah. This is a great opportunity to put your put your work out there as well. And certainly, like you mentioned earlier, for our friends in the Python space, Shawnee 1 dot over Python was just released. So another great opportunity if you've been wanting to test your mind with the Python side of things. This is a great place to do it. I always think in general, finding, a dataset or or a domain that you're interested in in and having an opportunity like this is such a great learning opportunity too. Because again, you're going to see not just your submission being put into the queue. You're going to see in real time as these submissions come in, there will be a post on posit community dedicated or a category dedicated to this contest and you'll see these start to trickle in and you get can get really inspired to see what what everybody's up to. So I remember going down to the wire on my Lego one and seeing just the, the, the high quality submissions and have admittedly a first time have low imposter syndrome. Like, oh my gosh, I can't believe this. But then again, I'm liking it to great learning opportunity and it's an, someone else will benefit from what you're putting out there. It always happens. So I remember getting some nice, comments on my submission, but then many others were providing questions and comments on these posts. So, again, very big emphasis on learning and having fun learning, I might add.
[00:23:17] Mike Thomas:
Absolutely. I can't wait to get my hands on keyboard and see what our team can come up with for a submission. I'm gonna hold myself to it.
[00:23:24] Eric Nantz:
Yeah. Yeah. Once the dust settles on deposit comp, I might have some more geeky opportunities to do shiny stuff with podcasting data again. We'll see. We'll see. But, yeah, we're on this podcast now. Right? And we wish we could talk about the rest of Arwek issue, but, there's always so much that we have time in today. So we're gonna wrap up here. Well, a couple of our additional fines that we saw in this week's issue, which again is always linked in the show notes of this episode. And I've always been, you know, trying my best to leverage, get effectively, especially with, you know, nice commit messages, you know, following, you know, standards that we maybe code reviews are gonna make easier if we follow the standards of one area that I admit I don't do enough of is there is a mechanism and get that will basically check before you commit run, I should say, a custom script that might check for various things, such as maybe number of files you're committing, maybe number of lines, maybe the type of commit message, maybe you have, like, a a very formal framework of it. You can build what are called pre commit hooks in your local Git repository to help be that kind of frontline check before you actually do the commit.
Well, there is a new package in your ecosystem called pre commit, which will let you author these pre commit hooks from the comfort of your r installation itself. And there is also, I believe, a little bit of requirement for Python as well. So this might be using reticulate under the hood. I'm not quite sure. But once you have everything set up, you're gonna be able to do, you know, a very nice, there's a nice package documentation site that we'll link to in the show notes going from the motivation of why you want to leverage this as well as some built in hooks that you can leverage such as for styling, such as making sure your read me is up to date if you're doing our markdown for your read me, making sure you're not leaving a browser statement in your code because who would ever do that? Oh goodness gracious. Who would ever push that into production? I don't know. My goodness. Where was this few years ago? But, yeah, there's a lot to choose from here, so, I may need this. What do you think, Mike?
[00:25:42] Mike Thomas:
That's an awesome resource, Eric, and I may need that as well. Yeah. That that could have saved me many many many times over so I need to check that out. I found a really really cool blog post called llama llama oh give me a sign what's in the latest IDSA guideline. And I could probably just leave it at that. That's such a cool blog post title but I won't. Right. This this one is authored by Ken Koon Wong and it is all about leveraging, you know, open source, I believe, LLM, RAG to be able to ask questions of the latest guidelines from the Infectious Diseases, Society of America and it is incredibly comprehensive.
It, uses Reticulate as well to interrupt R and Python to be able to to stand this solution up and lots of gifts, lots of content, it's an incredibly of blog post and if if this is something, that you might find interesting in using LLMs and and rag to be able to ask questions or or summarize a particular document or set of documents or or guidance that's out there such as, the the Infectious Diseases Society of America guidelines. I would highly recommend checking out, how can went about doing this. It's incredibly comprehensive.
[00:27:05] Eric Nantz:
Oh, this is incredibly useful too because in many enterprises, it's not just about trying to be, quote, unquote, future proofing your questions, but taking what you already have, whether it's these documents of, like, important information or metadata associated with, like, infrastructure or where you're designing experiments or whatnot. Being this is a great pose. I'll kind of walk through what is that process of feeding those documents in, creating under the hood what these vector databases they caused. So then the LM will use that as a source to help answer these questions. So this is a a very hot topic. And and and my industry is we deal with, you know, thousands upon thousands of study documents and patient documents and whatnot to be able to, you know, effectively put an LM in front of that to explore what we have.
Heck, you know what? Even in the podcasting space, there's been talks of trying to consume our show notes and having an LOM in front of that to maybe figure out, Hey, when did Mike and Mike and Eric talk about that, that, that new shiny package or that or that new, portal thing? You know? Boy, that would be the dream. Right? That little bot in front of that. I don't know. Just saying. Sweet. It's all the metadata is there, folks. This may maybe we'll help to build that, but this post is a good good way to get us started, I think.
[00:28:27] Mike Thomas:
Add us to the side project list.
[00:28:29] Eric Nantz:
As if it wasn't long enough already. What am I doing to myself? And on top of that, I'm trying to learn Nicks to boot on this. So my gosh. What what am I doing to myself? Nonetheless, no side project elsewhere here. We're gonna tell you about how you can contribute to rweekly itself. That doesn't have to be as in-depth as a side project. We are just a pull request away. If you find a great new resource, new package, new blog post, whatever have you, we it's all marked down that comprises the our weekly site. So if you know how to put a link in our markdown or markdown itself, you know how to contribute to our weekly. It's basically that easy. You have a template and a GitHub issue, or a poll request template to navigate you through the requirements. Again, very straightforward. We welcome all your contributions.
And, of course, we welcome hearing from you as well. As we mentioned in a couple weeks, Mike and I will be in Seattle for Positomp. But in the meantime, if you wanna get a hold of us, there's a few ways to do that. We have a contact page in the episode show notes when you download this in your favorite podcasting software of choice. You also if you're on a modern podcast app like Podverse or Fountain, you can send us a little boost along the way as well. We got details on how to do that in the show notes, and you can find us on the various spheres of social media circles.
I am mostly on Mastodon these days with [email protected]. You can also find me on LinkedIn. You search my name. You'll find me causing lots of fun stuff there potentially and on that weapon exiting from time to time with at the r cast. Mike, where can listeners get a hold of you? You could find me on mastodon@[email protected],
[00:30:09] Mike Thomas:
or you can find me on LinkedIn if you search Catchbrook Analytics, k e t c h b r o o k. You can see what I'm up to.
[00:30:17] Eric Nantz:
Very good. I'm always every time I boot up in the morning, I'm looking looking for Mike's newest adventures, so I'm always pleased when I see that. Nonetheless, yeah, lots of adventures that we have to get on to the rest of our day. But, of course, we thank you so much for listening from wherever you are around the world. We really enjoy doing this every week, but we're gonna close-up the the proverbial mics here, and then we're gonna invite you to join us for another episode of our weekly highlights next week.