As the holiday season enters the picture, learn how a humble R package helps you to give thanks to the contributors of your open-source package. Plus a practical introduction to missing value interpolation with a tried-and-true R package with a rich history, and a comprehensive analysis to predict an NBA superstar's next shot result (who has made a lot of shots already in his career).
Episode Links
Episode Links
- This week's curator: Jon Carroll - @[email protected] (Mastodon) & @carroll_jono (X/Twitter)
- Give Thanks with the allcontributors Package
- How to Interpolate Missing Values in R: A Step-by-Sthttps://github.com/ropensci/allcontributors/?tab=readme-ov-file#contributorsep Guide with Examples
- Predicting NBA Score Plays - Steph Curry Shots
- Entire issue available at rweekly.org/2024-W49
- allcontributors - Acknowledge all contributors to a project https://docs.ropensci.org/allcontributors/
- Contributor section of {allcontributors} README https://github.com/ropensci/allcontributors/?tab=readme-ov-file#contributors
- zoo - S3 Infrastructure for Regular and Irregular Time Series https://cran.r-project.org/web/packages/zoo/index.html
- mice - Multivariate Imputation by Chained Equations https://amices.org/mice/
- naniar - Data structures, summaries, and visualization for missing data http://naniar.njtierney.com/
- hoopR - Data and tools for men's basketball https://hoopr.sportsdataverse.org/
- BasketballAnalyzeR - Analysis and Visualization of Basketball Data https://github.com/sndmrc/BasketballAnalyzeR/tree/master
- Optimizing R/Shiny App Performance with Advanced Caching Techniques https://www.appsilon.com/post/r-shiny-caching-techniques
- Use the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedback
- R-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.
- A new way to think about value: https://value4value.info
- Get in touch with us on social media
- Eric Nantz: @[email protected] (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)
- Mike Thomas: @[email protected] (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter)
- Diddy Kong Luau Party - Diddy Kong Racing - Guifrog - https://ocremix.org/remix/OCR02794
- From Downtown - NBA Jam - ktriton, Mustin, Steve Lella - https://ocremix.org/remix/OCR02111
[00:00:03]
Eric Nantz:
Hello, friends. We are back with episode a 188 of the R weekly highlights podcast. This is the weekly show where we talk about the excellent highlights that are shared in this week's our weekly issue, all available at ourweekly.org. My name is Eric Nantz, and I'm delighted you join us from wherever you are around the world. And I'm actually fresh off of a very short, yet very fun vacation during the Thanksgiving holiday break here in the United States where I was able to go back to my old, hometown in Michigan. And, of course, I go there and sure enough, Lake Effect snow decides to pay a visit as well while I'm up there to remind me of all those times I had to travel through pretty ugly snow, especially in graduate school. But, nonetheless, we had a great time. Like, one of my highlights is actually taking the kids to a local hockey hockey game up in the in the city. They had a lot of fun with that, and our team won, so that made it even better.
Nonetheless, I'm back here at the, virtual studio here if you wanna call it that. And I am flying solo today as my co host, Mike Thomas, is out on business. But, nonetheless, I'll hold the fort down as best I can, and we have a terrific issue to share with all of you today. This week's issue was curated by another longtime contributor and curator to Art Weekly, Jonathan Caroll, who apparently never never stops to work on both dev tasks and curating issues. But nonetheless, he had tremendous help from our fellow our weekly team members and contributors like all of you around the world with your poll requests and other suggestions.
As I said, we I just finished celebrating here in the United States, the Thanksgiving week last week, which means that there's always a time, especially this in in the December month, to start reflecting a bit on the year and especially, you know, showing some gratitude and giving some thanks to those that have been helping me throughout my professional and personal life. And alongside that, we have, of course, in the world of open source, many many different ways people can contribute to a given project but also various ways of acknowledging them as well. However, that's a nice little shout out on social media, maybe a great blog post of acknowledgments and and etcetera.
But there are other ways to acknowledge these great contributions as well. We're going to talk about a very innovative way of doing this, just that, within the R ecosystem. This comes to us from the rOpenSci blog and has been authored by Mark Padgham, who is a research software scientist over at rOpenSci, where he puts a spotlight on a package he's created to help you in this process of, in a more automated way, giving thanks to contributors to a given package. And this has been motivated by a service that is called all contributors.org, which kind of gives a both a automated bot as well as kind of a general guidance to how as a given, say, a software project can acknowledge the different contributors to their particular project.
And they do this through what happens to be, you know, commit messages in the git repository for that project by having special, you might say, kind of tags used in the commit message. In fact, they look kind of similar to an r oxygen tag with the at sign in front of it. And this bot will actually parse the commit messages and start to put them in a more streamlined format in, say, a project's readme or other documentation. Now this is great and it looks like a lot of projects do use this service, but Mark highlights a couple of disadvantages to this particular workflow, especially in the realm of data science and in the realm of R Packages.
And as I said, one of the mechanisms that this works with is putting those contribution type tags in a commit message. I don't know about you. Sometimes it's easy to forget these things when you're committing and you're all in the dev mode, so to speak. In fact, it reminds me, slight tangent here, I've been really trying to opt in to the world of conventional commits, where I kind of give a prefix to the commit message about the type of commit that's all about, whether it's a new feature, a bug report, etcetera. And I've been wanting to do this for years, but it's only until a recent project that I really literally forced myself to do it. And it hasn't been easy folks. It does take a lot of discipline to do that. So this could be viewed in a similar light.
And then the other disadvantage that Mark highlights here is that these acknowledgments of these contributors are packed in a single contributor section. And there isn't really a neat way to customize that appearance. It's kind of what you see is what you get from that bot itself. And also those contributors are listed in that section. It doesn't actually link out to what contributions they actually did. So that brings Mark to introducing us to the all contributors R package, which as you might guess is also part of the ROpenSci umbrella. But this is meant to give a package author a very nice and semi automated way to help acknowledge contributors to their particular project.
And this is one caveat that Mark mentions here in the post is that this is acknowledging contributions that are through the git log of a repository or through GitHub interactions. That means that other types of contributions that are listed in that that other service we just mentioned may not be picked up here such as like organizational contributions, you know, organizing documents or organizing a community around a project. You know, those kind of things will not be tracked here with the all contributors package. So if you are looking to acknowledge a more broad spectrum of contributions you still might, you know, be recommended to go to that, all contributors.orgserver spot instead.
Nonetheless, if you are content with acknowledging the code based contributions, this package really gives you one function to get this all going and that is the addContributors function. It does have a lot of parameters that you can customize to your needs, but the goal of this is just running this function at the root of your projects repository locally. It will automatically add or update your list of contributors that you can put in to, say, a readme and whatnot. The other way Mark recommends to leverage is instead of us running that function from time to time to get this update. He includes a way to copy a template GitHub actions workflow, which would help you for when you commit this workflow file for the first time that in every push to your repository, it's gonna run the same all contributors function.
And you can also define this regular timed intervals as well so that no matter what is being contributed to, as you make that push, whether you merge in pull requests from every contributors, maybe they solved issues or whatnot, you will get that into this list automatically through that GitHub Action. The other nice benefit of this package as well is that there is going to be a direct link for each contributor to their contribution itself. He gives an example that we'll put in the show notes of, somewhat ironically enough, the read me of the all contributors package. Where if you look at this as I'm talking through this or after listening to this, you'll see at the bottom there is a Contributor section where it's got a nice little introductory sentence that this was automated by the All Contributors package.
And then it's got a, basically the avatars of each contributor, but it's broken down by different categories. And in the case of this package there's a category for code contributions as well as issues contributions. And under their avatar is a hyperlink that has their GitHub user ID, ID. But when you click that link it's going to basically, for this particular packages repository, show you the commits where that particular contributor was directly involved. I even, just as I'm recording, just clicked on the link under the avatar of my Elle Salmond, who, of course, has been a very frequent contributor to highlights in the past. And sure enough, it took me right to the 4 commits that she was involved with with this package. That is really really neat.
You can also do the same thing with the other contribution types as well, such as the issue contribution types. So clicking on, say, an avatar or link under the avatar of a contributor on the issues section, you will see take you'll be taken directly to a filter view of the issue tracker in GitHub where that that issue was authored by that particular user. So that is terrific. That is absolutely terrific. So and and this is a wonderful time to think about a package like this because for getting involved in open source, it can sometimes other than hearing from, say, the maintainer of a project themselves from time to time, it can sometimes feel like you're sending that to the void a little bit. Yeah. You may be scratching your own inch potentially, but admittedly it is a nice pick me up, especially mentally, where in open source it's rarely compensated with, say, financial compensation or whatnot or other things to compensate for the time spent for these contributions.
Every little bit of kind of nice, you know, kudos or pick me ups or whatever you want to call it, It goes a long way, especially for those that are new to the world of open source contributions. So I did not know about this package before the highlights so that's why I'm always thankful for our weekly itself to bring this to my attention. And with this new, ShinyStay package I'm making that hopefully will get contributions from others in the R and Shiny communities, I will be very glad to leverage in all contributors to that workflow to be able to make that, you know, an easy way for me to give those proper acknowledgments without, you know, using myself as the excuse for forgetting to do it in a manual way.
So really slick looking package here by Mark. I'm very interested in trying this out. And, definitely, if you want if you've been trying this out in your various package development in the community, feel free to send me a shout out, or contact us. I'll I'll put all that information at the end of the episode. I'd love to hear how you all are using all contributors or other ways you've been sending the kudos and thanks to your contributors to your open source projects. Our next highlight here gets into a very realistic issue that anybody in a data analysis is likely to encounter from time to time, especially when you get away from those cookie cutter textbook examples that you might have had in those statistics or math textbooks back in the day, and you start dealing with real data collected from humans.
And, hence, there are times where in the ideal situation, you would have all the available data for that particular variable or that particular record. Folks, it doesn't happen that way no matter which industry you're in. I can definitely speak in life sciences. It definitely happens, but many of our industries have to deal with how we account for missing observations, missing values in our datasets. And there are a multitude of ways to look at this. And one of the fundamental techniques that are often used, especially when you get to modeling and predictions, is interpolation of missing data.
Going over all the ways of doing this would be entirely another podcast or 2 dedicated to that alone. But our next highlight here is a very fit for purpose fundamental principle that has been available in R for many years. And this comes to us from the latest one of the latest blog posts from Steven Sanderson who has an amazing blog. If you haven't subscribed to it before, it's called Steve's data tips and tricks, and he turns this content out regularly. In fact, when I set up my little self hosted RSS feed on my beefy server here in the basement, I'm using a service called comma feed, and I this is one of the first blogs I put in there because there is always something great to learn from Steven about both R and even some of his recent tutorials on Linux itself, which have been entertaining read.
So in his blog post here, he talks about how to interpolate missing values in R and using at the back end of this a package called zoo. And this brings back a lot of memories to yours truly here from his very early days in the R ecosystem and learning R for the first time in graduate school. I may have shared the story before, but the very first time I saw R was when my one of my graduate school professors in statistics taught us a time series course. And on top of just never seeing R for the first time, I'm trying to get a hang of the language, and many examples had library statements at the top and one of them was indeed library zoo.
So the zoo package has been around, forget this, at the time of this recording for 20 years. Now that is some serious longevity folks. I'd I'd checked the CRAN archives just to be sure and sure enough that is legit. Zoo is tailored for a lot of very powerful functions dealing with time series data, appropriate enough given I saw that in the time series course. There is one function though that is used throughout many analyses and not just in time series analyses that this package surfaces to you and that is called naapprox And this has quite a few parameters to do interpolation, but Steven talks to us about what are some of the use cases for this.
So this na. Approx function, by default, will let you be able to interpolate missing values, numeric values, in a given vector. So in his first example he starts off with a simple vector of a few missing values. And sure enough when he runs na. Approx on that vector those NAs are filled in with missing values. And the way it's in doing the interpolation it's using the surrounding values to do, in essence, an averaging using those values around and then breaking that up as appropriate depending on the observation. So in this very basic example with a vector from 1 to 8 but then missing values were say a 3 or 4 or an 8 should be this na. Approx function is going to indeed put in 3, 4, and 8, taking in account the number of missing values and the boundaries around them.
Now a more realistic example is looking at, in this case, a simple time series type data set, where for a series of 4 dates there are, or 5 dates I should say, there are 2 missing values between the first and the 4th, 4th observation. And with that it's basically taking, you know, 2 approximate values, and they are they are not neatly rounded values. They are, in essence, averages of those boundaries at the different slots. And that, again, can be great for small records, but there may be situations where it's not just like a few missing values here and there sporadically placed.
There could be large gaps of missing data in that given variable. And the na. Approx function gives you an argument called maxGap, which will tell the function don't fill in missing values if there is a continuous sequence of it that exceeds this maximum gap amount. So in the next example, he has a, set of 5 missing values consecutively between a couple of non missing values. And when you feed in the na. Approx function a max gap value 2, guess what? Those missing values are not touched. So that can be very important if there is a very valid reason for that gap and missing data. You want to make sure that you're, you know, tailoring that rule appropriately.
So that gives you just a scratch at the surface of what interpolation can do. But again, in the realm of data science, sure enough, you probably will be tasked with doing an imputation in one way, shape, or form. So the zoo package is giving you one of those fundamental techniques, fundamental solutions to tackle that problem. And I will mention in the show notes of this episode we also, there is a huge domain of how we can deal with missing values in R. And a couple packages come to mind to accomplish this. One of which, if you want to really get into the different ways that this can occur, is the R package I've heard about for quite some time called MICE.
That stands for multivariate imputation by chained equations. That can be really important if you really need to customize the different interpolation methods and take advantage of some really novel statistical methodologies to do this. So I'll have a link to that in the show notes because I see that used quite a bit, especially in my industry of life sciences. There are many many ways you can deal with that here. And another R package that I have had great success with is called the Narnier package. That is a terrific package offered by Nick Tierney.
And this is also a wonderful package to look at missing data, as well as visualizing missing data and exploring the relationships that are as a result of the missing data. So you'll probably want to look at that as well if you're gonna do a more rigorous analysis of this. So I have a link to Nick's, Narnier package in the show notes as well. So I hope, Steven keeps this great content going. It's another short to the point tutorial, but again a very important issue that anybody involved with data analysis is going to have to deal with at some way shape or form. Rounding out our highlights today, there is, of course, an explosion of data available to us in the world of sports these days as both the professional leagues and other endeavors have, you know, exposed a multitude of new ways of collecting data, whether it's sensors on say players jerseys, whether it's custom cameras in the arenas, or other types of advanced tracking.
There is just a wealth of opportunity to leverage some really really nice techniques in the, say, prediction side of things, inference side of things, and whatnot. And so our last highlight today is looking at the realm of basketball and looking at a novel way of predicting, and in this case one of the most prolific shooters in NBA history, who is no, who is in no doubt going to the Hall of Fame when he retires. And this blog post is looking at can we accurately predict whether Steph Curry, who is the very famous superstar on the Golden State Warriors NBA basketball team, a winner of multiple MVPs and multiple championships, what techniques we can use to predict if he's going to make or miss a shot.
And this blog post comes to us from Nils Indetrin, who I hope I said that right. And this is a wealth of knowledge in this blog post. I probably won't be able to do adjust this in this recap here, but I'm going to give you kind of my key takeaways of this. And this project is looking at those spurred on apparently from a conversation that Nils had with a friend of his where the question was whether Steph Curry would make a shot or predict him wherever he would make a shot given that he's attempted it. So what kind of data can we use to help explore this in a data driven way? Well, first, it's a little bit of EDA action, which again is is a very, important technique as you're getting to know what's available.
And there is a wonderful R package that makes this very very possible in the realm of NBA and and basketball analytics as a whole, and that package is called the Hooper package. We'll have a link to that in the show notes as well. This is authored by Sayem Golani. Hopefully I said that right. And this package has a wealth of functions for assembling data collected for the NBA and college basketball as well using online sources such as ESPN's NBA statistics as well as many others as well. So in the first part of this blog post here, Neil shares some of the code he used from the Hooper package, the load, 3 different perspectives or granularity, if you will, of the data going into these predictions.
One of which is the team statistics after each game. This is often called the box score data that you often get in a website or in the old days a newspaper. I still remember reading sports sections of those in local newspaper in a in a bygone era if you might say, but the Hooper package has a very simple function to load the team box scores given a range of years. You can also get the player specific box score data with a another function called load NBA player box. And last, but certainly not least, because we're looking at the grand, the very granular level of whoever Steph Curry is going to make a shot or not, that screens play by play data. And, yes, Hooper has a function to load that as well.
Each of these functions provides a nice data frame that once he's able to assemble that, there's a little bit of massaging to begin exploring this data, doing some aggregations, to get a good feel for what the overall performance is for these given players that are in the similar position as Steph Curry, which is called Point Guard. So there's a nice little tidyverse set of code here to do the filtering and the summarization with group processing. And this, now we get to something I'm really intrigued by here, is that, yes, Nils shows kind of the static output of this tibble, the of the first five rows of these different point guards, but he wants you, the reader, to be able to explore this in a really neat and interactive way.
So even though the code SIP is not shown for this, I really wanna see it, but I think I know what's going on here. Nils has created an interactive reactable table, probably powered by the reactable format or package as well, with widgets at the top of the table to let you filter by team, by conference, and by minutes played via a slider. So I have if I have a guess, there's some really innovative crosstalk action happening here to dynamically change the table view in this HTML document. This is why I just love the HTML format. We get these rich experiences of exploring the data right here in this blog post about you having to boot up r and run this code yourself that is just spot on awesome so love playing with this table here in this format of fantastic I did some sleuthing. I couldn't find this code on on Nils, GitHub profile, but Nils, if you're listening, I'd love to see how you made this table. This looks absolutely excellent.
So, nonetheless, once you can explore the scoping, you see that Steph Steph Curry is 4th in some of these metrics in terms of total points, but there are other metrics instead of total points that we can look at here in kind of a multi dimensional view. The next part of the post is going to create what are called radial charts that are basically a polar coordinate version of these different metrics that we can use that like the vertices at the edges of these circles, and with lines going how how high the value is relative to each other. So there are 6 radial plots here, and what it shows is that in the dimension of point, 3 pointers made Steph Curry is tops, but then there are other dimensions that players such as Luka Doncic who is a superstar in the Dallas Dallas Mavericks who went to the NBA finals last year.
He's got a lot of, you know, optimized metrics as well as well as Damian Lillard, Kyrie Irving. But it just shows that there are multiple facets to being a prolific point guard in these metrics. But then we can also, getting back to Steph Curry himself, there is another package, and in fact I should say another package is powering the visuals in this blog post, and this package is called Basketball Analyzer. That's available on GitHub and that'll be linked in the show notes. And this gives us both those radio profile charts as well as, what I'm about to talk about next, a heat map of the shot chart for any given NBA player where you can color code the shots by whoever they made or missed.
You can also do it instead of just the dots themselves you can also use a gradient kind of more traditional heat map where you see the brighter colors are around the 3 point line as well as near DeBasker when he does those fantastic layups after he makes it, you know, tries to break ankles of a defender after his his deeks and whatnot. So that's kind of giving a profile where the shot volume is coming from for Steph Curry, and that's a great way to help inform the next meat of this post, which is modeling itself. The first step is trying to figure out what are that this, you know, the type of data we want to feed into this prediction, And that's where there is a set of tidyverse code that join these different granular sets of data data together to get, an overall set that has the right metrics we need.
And, in essence, our response variable that we're gonna use for the prediction is called scoring play, which is obviously false if he is Steph didn't score on that play or true if he did. And then the types of predictors going into this model are gonna be the location of where the shot took place. Those are in coordinates x and y, as well as the minutes during the game, as well as additional metrics such as the opponent that was guarding him at the time, as well as other metrics as well. And once that data is assembled, again, all the this type of code is in the blog post, it's time to use the tidy models ecosystem to split this data into test and training sets. Again, using the very handy functions that tidymodels gives you with the initial split, and then defining the training and testing sets from that, and then setting up the folds for cross validation.
Again, these are all fundamental techniques that have been talked about in previous highlights. And again you're invited to check out the tidy models website if you want a primer on how all this workflow works and in the tidy workflow. It is very very elegant here in this post. Next, it's time to set up the recipe for actually, you know, preparing for the model prediction. And that's where we have to encode all the input variables, especially those categorical ones, and making sure they get the right indicators for that. And that's where there is, you know, very handy functions such as step novel, step dummy to take, and again with tidy selection, the nominal prediction predictions or predictive variables without transforming the outcome variables. So you can get that dataset neat and tidy for the prediction itself. And for the prediction itself he is leveraging one of the very popular methods for a classification that's XGBoosting, and that is defined with the boost tree function. You can see the different parameters that he defines there, setting up the engine, setting up the workflow, and then making sure he's got the grid for the searching of those predictions, and then finally taking advantage of multiple cores to tune that that grid. And then it's time to look at the prediction results, such as model performance.
And one of the fundamental ways to do that is looking at the metric called Area Under the Curve or AUC. And in that you can see in the scatter plots in the post that there is a concentration around of an AUC of 0.75 to 0.8 with these different model metrics that are displayed here as well as then looking at an area of variable importance plot, which I often do in my classification analyses. And this is where things get pretty interesting, where the most important variable by far in this analysis was the opponent guarding Steph at the time of the shot that was indicated by this opponent athlete or no direct opponent indicator.
So, obviously, if there's no opponent at the time he's shooting, Steph's gonna have a higher likelihood of making that shot as compared to somebody with the arms right in his face as he's shooting. The other important variables were the location of the shot, which again would make sense because typically speaking for most people the farther the shot is the harder it is to make. Although Steph seems to make an art in making shots up near half court and look like they were a free throw or something. But then there are other variables as well that all kind of have a similar concentration at the lower end of variable importance.
And then last, a couple additional diagnostic plots that Nils outlines here, or displays here, is one of which looking at the specificity and sensitivity with the area under the curve, which shows kind of a similar story that around the AUC of 0.78 and an accuracy of 0.73. And then and then looking at a confusion matrix as well, looking at how often the false positives, false negatives occurred with the prediction. And this is an interesting insight looking at this confusion matrix is that the model fit here seems to be performing better at predicting shots that aren't going in, whereas it's not as a clear story for predicting the shots that do go in.
Now that was using, I should have mentioned this earlier in the post, was using data from 2022 and 2023, but you might say how does this help with today's data I. E. The 2024 NBA season? So Nils grabs that data again from the Hooper package and then fits the model on that particular set of data. And in there we see that the area then the model predictions there's a nice little visualization via again a shot chart kind of heat map combination here that the Steph Curry is indeed more likely to score when he's near the basket. That seems pretty intuitive. And then it's interesting that the predict the model is predicting the location of the shots to be kind of from the corners of the court and as opposed to other parts, and he does also a nice little shot chart with the color codings of what were false positives and false negatives.
And the false negatives are mostly occurring for the 3 pointers. That's interesting. And then most of the false positives are around the back board or near the the basket itself. So in terms of accuracy, it looks like this model is around that 75% accuracy mark. Again, not gangbusters great here, but at the same time it's not it's not terrible either. Like it it's it's pretty neat. So it this was a very comprehensive post here. Again, I don't think I could do it justice in this recap here, but lots of techniques at play here. 1st, a very novel use of existing packages to get this data yourself without having to become a wizard at data scraping.
And then also the tidymodels ecosystem along with the tidyverse itself for data wrangling to fit this prediction. But as you can imagine there is a lot of future directions this could take that such as using additional data points on the players themselves, maybe, you know, better ways of training the data, and using feature engineering as well, which is a very popular technique, especially if you're going new to this type of exercise or you're new to this type of data. Principal component analysis can help reduce the dimension quite a bit and maybe give you some better prediction performance there. But this blog post is absolutely fantastic. Again, I'm really intrigued by the visuals here. I love that interactive expiration of the data in the middle of the post here. I would love to see the code that did that. I can guess how it was done, but I'd love to see how how Nils pulled that off. It's a really great read. I definitely invite you to check it out, especially if you're, you know, keen into seeing just what is possible with sports analytics and with the tools that we have in your ecosystem available.
There's a whole lot more we could talk about in this issue, but I'm gonna take time to put the spotlight on an additional find before we wrap things up here. And as I'm getting really knee deep back in the Shiny development recently I just put in an initial release of a very, fresh off the presses app and at the day job and it's been getting great reviews. I've been putting a lot of attention to detail to performance where I can, albeit I've only scratched the surface of it. I've just been trying to optimize my use of reactive objects appropriately, making sure that there are indicators when things do take a little bit of time, and trying to do a lot of upfront work so that the experience is quite seamless for the user.
Well there are other ways of optimizing the performance of your Shiny application and one of them is called caching. So I'm gonna put a link in the show notes. So this additional find is from the Appsilon blog on how you can optimize your Shiny application performance with advanced caching techniques, where there are different types of cache you can use. So the post talks about ways you can do within base Shiny itself with reactives and in memory caching with bind cache, which I have not done as much and I probably should, but then tapping into the ecosystem for more even more powerful ways of caching such as using Redis backed by or with the front end of the Redux package as well as session specific caching and using databases.
So this post gives you a tour de force of how this works with an example application that kind of shows a lot of these principles in action. So another terrific blog by the Absalon, the fine folks at Absalon. Definitely worth a read if you're trying to take your performance to another level. And, of course, there is a lot more to this issue and we invite you to check it out. Where can you find it, of course? It's at our weekly.org. That's where you find all the great content, including this week's current issue, a healthy mix that John has put together, new packages, great tutorials, updated packages, nice little quotes of the week, especially those powered by other services such as Mastodon.
You and I are big fans of Mastodon, I can tell. And also, we have a direct link at the top of the page to all the back catalog of issues. So as the year is closing out, you'll definitely want to reminisce on a lot of these great content that have been assembled by our fully driven community effort of rweekly. It's all about contributors at the top of the show. Right? All of you producing great our content that are coming into our weekly, we thank you for producing all that. And also our curator team, you know, works around the clock. Obviously we have different rotations, but it is very much a community effort. We are not driven by any financial backing here. We are doing this out of our love for the RN community and so we appreciate all of your well wishes as we keep this going because it is not easy to keep a project like our weekly going. There are a lot of moving parts, and I'm just so thankful that we have a wonderful team with this. We could all use your help. And one of the ways you can help is giving us that great content by giving us a poll request for our upcoming issue draft.
You can get to that right at the top of the page with that little Octocad in the upper right corner. It'll take you directly to this this next week's draft issue. It's all marked down all the time, so if you find that great blog post, that great tutorial, that great new package, and you want the rest of the R community to hear about it, give us that little poll request. We got a handy little template for you to follow. It's very straightforward and we're always happy to help if you have any questions, but that is one of the best ways to get back to the project.
And also we are definitely working looking for more curators to join our team. We've had a couple move on or about to move on, so we definitely have spots to fill. And if you're passionate about the art community and sharing the knowledge and making sure those in the community can access this knowledge in an open way driven by the community, That's what our weekly is all about. So please get in touch with us. You can find all those details on the our weekly website and the GitHub repository. I also love hearing from you in the community as well. We have lots of ways that you can contact me and my awesome co host, Mike. One of those is a contact page. Directly link in the show notes of this episode.
You can send us a little note. I'll get that directly in my inbox and I'll be happy to read it, maybe even on the show as well. Also, you can get in touch with us on social media these days. I am mostly on Mastodon with at our podcast at podcast index.social. And now I am on Blue sky as well. I have ventured into that land and so far I'm I'm liking it. So I you can find me on blue sky with at our podcast dot bsky dot social. Hopefully I say that right. It seems like that's the right way to say it. I've been following a lot of my favorite people on there and also new followers. I appreciate you all getting connected with me and I'm definitely looking forward to putting more content on that service as well.
And also I'm very happy to again remind all of you especially those in life sciences that the R pharma 2024 recordings are now on the YouTube channel for R pharma. I also want to thank our curator John Curl. He was an instrumental part of the Asia Pacific track that we have for the first time this year. We have all those recordings on the YouTube channel. There is just a lot of wonderful content there really worth a watch even if you're not in life sciences there's a little something for everybody there. And with that, I'm going to close-up episode a 188 of our wiki highlights, and we will be back with another edition of our wiki highlights next week.
Hello, friends. We are back with episode a 188 of the R weekly highlights podcast. This is the weekly show where we talk about the excellent highlights that are shared in this week's our weekly issue, all available at ourweekly.org. My name is Eric Nantz, and I'm delighted you join us from wherever you are around the world. And I'm actually fresh off of a very short, yet very fun vacation during the Thanksgiving holiday break here in the United States where I was able to go back to my old, hometown in Michigan. And, of course, I go there and sure enough, Lake Effect snow decides to pay a visit as well while I'm up there to remind me of all those times I had to travel through pretty ugly snow, especially in graduate school. But, nonetheless, we had a great time. Like, one of my highlights is actually taking the kids to a local hockey hockey game up in the in the city. They had a lot of fun with that, and our team won, so that made it even better.
Nonetheless, I'm back here at the, virtual studio here if you wanna call it that. And I am flying solo today as my co host, Mike Thomas, is out on business. But, nonetheless, I'll hold the fort down as best I can, and we have a terrific issue to share with all of you today. This week's issue was curated by another longtime contributor and curator to Art Weekly, Jonathan Caroll, who apparently never never stops to work on both dev tasks and curating issues. But nonetheless, he had tremendous help from our fellow our weekly team members and contributors like all of you around the world with your poll requests and other suggestions.
As I said, we I just finished celebrating here in the United States, the Thanksgiving week last week, which means that there's always a time, especially this in in the December month, to start reflecting a bit on the year and especially, you know, showing some gratitude and giving some thanks to those that have been helping me throughout my professional and personal life. And alongside that, we have, of course, in the world of open source, many many different ways people can contribute to a given project but also various ways of acknowledging them as well. However, that's a nice little shout out on social media, maybe a great blog post of acknowledgments and and etcetera.
But there are other ways to acknowledge these great contributions as well. We're going to talk about a very innovative way of doing this, just that, within the R ecosystem. This comes to us from the rOpenSci blog and has been authored by Mark Padgham, who is a research software scientist over at rOpenSci, where he puts a spotlight on a package he's created to help you in this process of, in a more automated way, giving thanks to contributors to a given package. And this has been motivated by a service that is called all contributors.org, which kind of gives a both a automated bot as well as kind of a general guidance to how as a given, say, a software project can acknowledge the different contributors to their particular project.
And they do this through what happens to be, you know, commit messages in the git repository for that project by having special, you might say, kind of tags used in the commit message. In fact, they look kind of similar to an r oxygen tag with the at sign in front of it. And this bot will actually parse the commit messages and start to put them in a more streamlined format in, say, a project's readme or other documentation. Now this is great and it looks like a lot of projects do use this service, but Mark highlights a couple of disadvantages to this particular workflow, especially in the realm of data science and in the realm of R Packages.
And as I said, one of the mechanisms that this works with is putting those contribution type tags in a commit message. I don't know about you. Sometimes it's easy to forget these things when you're committing and you're all in the dev mode, so to speak. In fact, it reminds me, slight tangent here, I've been really trying to opt in to the world of conventional commits, where I kind of give a prefix to the commit message about the type of commit that's all about, whether it's a new feature, a bug report, etcetera. And I've been wanting to do this for years, but it's only until a recent project that I really literally forced myself to do it. And it hasn't been easy folks. It does take a lot of discipline to do that. So this could be viewed in a similar light.
And then the other disadvantage that Mark highlights here is that these acknowledgments of these contributors are packed in a single contributor section. And there isn't really a neat way to customize that appearance. It's kind of what you see is what you get from that bot itself. And also those contributors are listed in that section. It doesn't actually link out to what contributions they actually did. So that brings Mark to introducing us to the all contributors R package, which as you might guess is also part of the ROpenSci umbrella. But this is meant to give a package author a very nice and semi automated way to help acknowledge contributors to their particular project.
And this is one caveat that Mark mentions here in the post is that this is acknowledging contributions that are through the git log of a repository or through GitHub interactions. That means that other types of contributions that are listed in that that other service we just mentioned may not be picked up here such as like organizational contributions, you know, organizing documents or organizing a community around a project. You know, those kind of things will not be tracked here with the all contributors package. So if you are looking to acknowledge a more broad spectrum of contributions you still might, you know, be recommended to go to that, all contributors.orgserver spot instead.
Nonetheless, if you are content with acknowledging the code based contributions, this package really gives you one function to get this all going and that is the addContributors function. It does have a lot of parameters that you can customize to your needs, but the goal of this is just running this function at the root of your projects repository locally. It will automatically add or update your list of contributors that you can put in to, say, a readme and whatnot. The other way Mark recommends to leverage is instead of us running that function from time to time to get this update. He includes a way to copy a template GitHub actions workflow, which would help you for when you commit this workflow file for the first time that in every push to your repository, it's gonna run the same all contributors function.
And you can also define this regular timed intervals as well so that no matter what is being contributed to, as you make that push, whether you merge in pull requests from every contributors, maybe they solved issues or whatnot, you will get that into this list automatically through that GitHub Action. The other nice benefit of this package as well is that there is going to be a direct link for each contributor to their contribution itself. He gives an example that we'll put in the show notes of, somewhat ironically enough, the read me of the all contributors package. Where if you look at this as I'm talking through this or after listening to this, you'll see at the bottom there is a Contributor section where it's got a nice little introductory sentence that this was automated by the All Contributors package.
And then it's got a, basically the avatars of each contributor, but it's broken down by different categories. And in the case of this package there's a category for code contributions as well as issues contributions. And under their avatar is a hyperlink that has their GitHub user ID, ID. But when you click that link it's going to basically, for this particular packages repository, show you the commits where that particular contributor was directly involved. I even, just as I'm recording, just clicked on the link under the avatar of my Elle Salmond, who, of course, has been a very frequent contributor to highlights in the past. And sure enough, it took me right to the 4 commits that she was involved with with this package. That is really really neat.
You can also do the same thing with the other contribution types as well, such as the issue contribution types. So clicking on, say, an avatar or link under the avatar of a contributor on the issues section, you will see take you'll be taken directly to a filter view of the issue tracker in GitHub where that that issue was authored by that particular user. So that is terrific. That is absolutely terrific. So and and this is a wonderful time to think about a package like this because for getting involved in open source, it can sometimes other than hearing from, say, the maintainer of a project themselves from time to time, it can sometimes feel like you're sending that to the void a little bit. Yeah. You may be scratching your own inch potentially, but admittedly it is a nice pick me up, especially mentally, where in open source it's rarely compensated with, say, financial compensation or whatnot or other things to compensate for the time spent for these contributions.
Every little bit of kind of nice, you know, kudos or pick me ups or whatever you want to call it, It goes a long way, especially for those that are new to the world of open source contributions. So I did not know about this package before the highlights so that's why I'm always thankful for our weekly itself to bring this to my attention. And with this new, ShinyStay package I'm making that hopefully will get contributions from others in the R and Shiny communities, I will be very glad to leverage in all contributors to that workflow to be able to make that, you know, an easy way for me to give those proper acknowledgments without, you know, using myself as the excuse for forgetting to do it in a manual way.
So really slick looking package here by Mark. I'm very interested in trying this out. And, definitely, if you want if you've been trying this out in your various package development in the community, feel free to send me a shout out, or contact us. I'll I'll put all that information at the end of the episode. I'd love to hear how you all are using all contributors or other ways you've been sending the kudos and thanks to your contributors to your open source projects. Our next highlight here gets into a very realistic issue that anybody in a data analysis is likely to encounter from time to time, especially when you get away from those cookie cutter textbook examples that you might have had in those statistics or math textbooks back in the day, and you start dealing with real data collected from humans.
And, hence, there are times where in the ideal situation, you would have all the available data for that particular variable or that particular record. Folks, it doesn't happen that way no matter which industry you're in. I can definitely speak in life sciences. It definitely happens, but many of our industries have to deal with how we account for missing observations, missing values in our datasets. And there are a multitude of ways to look at this. And one of the fundamental techniques that are often used, especially when you get to modeling and predictions, is interpolation of missing data.
Going over all the ways of doing this would be entirely another podcast or 2 dedicated to that alone. But our next highlight here is a very fit for purpose fundamental principle that has been available in R for many years. And this comes to us from the latest one of the latest blog posts from Steven Sanderson who has an amazing blog. If you haven't subscribed to it before, it's called Steve's data tips and tricks, and he turns this content out regularly. In fact, when I set up my little self hosted RSS feed on my beefy server here in the basement, I'm using a service called comma feed, and I this is one of the first blogs I put in there because there is always something great to learn from Steven about both R and even some of his recent tutorials on Linux itself, which have been entertaining read.
So in his blog post here, he talks about how to interpolate missing values in R and using at the back end of this a package called zoo. And this brings back a lot of memories to yours truly here from his very early days in the R ecosystem and learning R for the first time in graduate school. I may have shared the story before, but the very first time I saw R was when my one of my graduate school professors in statistics taught us a time series course. And on top of just never seeing R for the first time, I'm trying to get a hang of the language, and many examples had library statements at the top and one of them was indeed library zoo.
So the zoo package has been around, forget this, at the time of this recording for 20 years. Now that is some serious longevity folks. I'd I'd checked the CRAN archives just to be sure and sure enough that is legit. Zoo is tailored for a lot of very powerful functions dealing with time series data, appropriate enough given I saw that in the time series course. There is one function though that is used throughout many analyses and not just in time series analyses that this package surfaces to you and that is called naapprox And this has quite a few parameters to do interpolation, but Steven talks to us about what are some of the use cases for this.
So this na. Approx function, by default, will let you be able to interpolate missing values, numeric values, in a given vector. So in his first example he starts off with a simple vector of a few missing values. And sure enough when he runs na. Approx on that vector those NAs are filled in with missing values. And the way it's in doing the interpolation it's using the surrounding values to do, in essence, an averaging using those values around and then breaking that up as appropriate depending on the observation. So in this very basic example with a vector from 1 to 8 but then missing values were say a 3 or 4 or an 8 should be this na. Approx function is going to indeed put in 3, 4, and 8, taking in account the number of missing values and the boundaries around them.
Now a more realistic example is looking at, in this case, a simple time series type data set, where for a series of 4 dates there are, or 5 dates I should say, there are 2 missing values between the first and the 4th, 4th observation. And with that it's basically taking, you know, 2 approximate values, and they are they are not neatly rounded values. They are, in essence, averages of those boundaries at the different slots. And that, again, can be great for small records, but there may be situations where it's not just like a few missing values here and there sporadically placed.
There could be large gaps of missing data in that given variable. And the na. Approx function gives you an argument called maxGap, which will tell the function don't fill in missing values if there is a continuous sequence of it that exceeds this maximum gap amount. So in the next example, he has a, set of 5 missing values consecutively between a couple of non missing values. And when you feed in the na. Approx function a max gap value 2, guess what? Those missing values are not touched. So that can be very important if there is a very valid reason for that gap and missing data. You want to make sure that you're, you know, tailoring that rule appropriately.
So that gives you just a scratch at the surface of what interpolation can do. But again, in the realm of data science, sure enough, you probably will be tasked with doing an imputation in one way, shape, or form. So the zoo package is giving you one of those fundamental techniques, fundamental solutions to tackle that problem. And I will mention in the show notes of this episode we also, there is a huge domain of how we can deal with missing values in R. And a couple packages come to mind to accomplish this. One of which, if you want to really get into the different ways that this can occur, is the R package I've heard about for quite some time called MICE.
That stands for multivariate imputation by chained equations. That can be really important if you really need to customize the different interpolation methods and take advantage of some really novel statistical methodologies to do this. So I'll have a link to that in the show notes because I see that used quite a bit, especially in my industry of life sciences. There are many many ways you can deal with that here. And another R package that I have had great success with is called the Narnier package. That is a terrific package offered by Nick Tierney.
And this is also a wonderful package to look at missing data, as well as visualizing missing data and exploring the relationships that are as a result of the missing data. So you'll probably want to look at that as well if you're gonna do a more rigorous analysis of this. So I have a link to Nick's, Narnier package in the show notes as well. So I hope, Steven keeps this great content going. It's another short to the point tutorial, but again a very important issue that anybody involved with data analysis is going to have to deal with at some way shape or form. Rounding out our highlights today, there is, of course, an explosion of data available to us in the world of sports these days as both the professional leagues and other endeavors have, you know, exposed a multitude of new ways of collecting data, whether it's sensors on say players jerseys, whether it's custom cameras in the arenas, or other types of advanced tracking.
There is just a wealth of opportunity to leverage some really really nice techniques in the, say, prediction side of things, inference side of things, and whatnot. And so our last highlight today is looking at the realm of basketball and looking at a novel way of predicting, and in this case one of the most prolific shooters in NBA history, who is no, who is in no doubt going to the Hall of Fame when he retires. And this blog post is looking at can we accurately predict whether Steph Curry, who is the very famous superstar on the Golden State Warriors NBA basketball team, a winner of multiple MVPs and multiple championships, what techniques we can use to predict if he's going to make or miss a shot.
And this blog post comes to us from Nils Indetrin, who I hope I said that right. And this is a wealth of knowledge in this blog post. I probably won't be able to do adjust this in this recap here, but I'm going to give you kind of my key takeaways of this. And this project is looking at those spurred on apparently from a conversation that Nils had with a friend of his where the question was whether Steph Curry would make a shot or predict him wherever he would make a shot given that he's attempted it. So what kind of data can we use to help explore this in a data driven way? Well, first, it's a little bit of EDA action, which again is is a very, important technique as you're getting to know what's available.
And there is a wonderful R package that makes this very very possible in the realm of NBA and and basketball analytics as a whole, and that package is called the Hooper package. We'll have a link to that in the show notes as well. This is authored by Sayem Golani. Hopefully I said that right. And this package has a wealth of functions for assembling data collected for the NBA and college basketball as well using online sources such as ESPN's NBA statistics as well as many others as well. So in the first part of this blog post here, Neil shares some of the code he used from the Hooper package, the load, 3 different perspectives or granularity, if you will, of the data going into these predictions.
One of which is the team statistics after each game. This is often called the box score data that you often get in a website or in the old days a newspaper. I still remember reading sports sections of those in local newspaper in a in a bygone era if you might say, but the Hooper package has a very simple function to load the team box scores given a range of years. You can also get the player specific box score data with a another function called load NBA player box. And last, but certainly not least, because we're looking at the grand, the very granular level of whoever Steph Curry is going to make a shot or not, that screens play by play data. And, yes, Hooper has a function to load that as well.
Each of these functions provides a nice data frame that once he's able to assemble that, there's a little bit of massaging to begin exploring this data, doing some aggregations, to get a good feel for what the overall performance is for these given players that are in the similar position as Steph Curry, which is called Point Guard. So there's a nice little tidyverse set of code here to do the filtering and the summarization with group processing. And this, now we get to something I'm really intrigued by here, is that, yes, Nils shows kind of the static output of this tibble, the of the first five rows of these different point guards, but he wants you, the reader, to be able to explore this in a really neat and interactive way.
So even though the code SIP is not shown for this, I really wanna see it, but I think I know what's going on here. Nils has created an interactive reactable table, probably powered by the reactable format or package as well, with widgets at the top of the table to let you filter by team, by conference, and by minutes played via a slider. So I have if I have a guess, there's some really innovative crosstalk action happening here to dynamically change the table view in this HTML document. This is why I just love the HTML format. We get these rich experiences of exploring the data right here in this blog post about you having to boot up r and run this code yourself that is just spot on awesome so love playing with this table here in this format of fantastic I did some sleuthing. I couldn't find this code on on Nils, GitHub profile, but Nils, if you're listening, I'd love to see how you made this table. This looks absolutely excellent.
So, nonetheless, once you can explore the scoping, you see that Steph Steph Curry is 4th in some of these metrics in terms of total points, but there are other metrics instead of total points that we can look at here in kind of a multi dimensional view. The next part of the post is going to create what are called radial charts that are basically a polar coordinate version of these different metrics that we can use that like the vertices at the edges of these circles, and with lines going how how high the value is relative to each other. So there are 6 radial plots here, and what it shows is that in the dimension of point, 3 pointers made Steph Curry is tops, but then there are other dimensions that players such as Luka Doncic who is a superstar in the Dallas Dallas Mavericks who went to the NBA finals last year.
He's got a lot of, you know, optimized metrics as well as well as Damian Lillard, Kyrie Irving. But it just shows that there are multiple facets to being a prolific point guard in these metrics. But then we can also, getting back to Steph Curry himself, there is another package, and in fact I should say another package is powering the visuals in this blog post, and this package is called Basketball Analyzer. That's available on GitHub and that'll be linked in the show notes. And this gives us both those radio profile charts as well as, what I'm about to talk about next, a heat map of the shot chart for any given NBA player where you can color code the shots by whoever they made or missed.
You can also do it instead of just the dots themselves you can also use a gradient kind of more traditional heat map where you see the brighter colors are around the 3 point line as well as near DeBasker when he does those fantastic layups after he makes it, you know, tries to break ankles of a defender after his his deeks and whatnot. So that's kind of giving a profile where the shot volume is coming from for Steph Curry, and that's a great way to help inform the next meat of this post, which is modeling itself. The first step is trying to figure out what are that this, you know, the type of data we want to feed into this prediction, And that's where there is a set of tidyverse code that join these different granular sets of data data together to get, an overall set that has the right metrics we need.
And, in essence, our response variable that we're gonna use for the prediction is called scoring play, which is obviously false if he is Steph didn't score on that play or true if he did. And then the types of predictors going into this model are gonna be the location of where the shot took place. Those are in coordinates x and y, as well as the minutes during the game, as well as additional metrics such as the opponent that was guarding him at the time, as well as other metrics as well. And once that data is assembled, again, all the this type of code is in the blog post, it's time to use the tidy models ecosystem to split this data into test and training sets. Again, using the very handy functions that tidymodels gives you with the initial split, and then defining the training and testing sets from that, and then setting up the folds for cross validation.
Again, these are all fundamental techniques that have been talked about in previous highlights. And again you're invited to check out the tidy models website if you want a primer on how all this workflow works and in the tidy workflow. It is very very elegant here in this post. Next, it's time to set up the recipe for actually, you know, preparing for the model prediction. And that's where we have to encode all the input variables, especially those categorical ones, and making sure they get the right indicators for that. And that's where there is, you know, very handy functions such as step novel, step dummy to take, and again with tidy selection, the nominal prediction predictions or predictive variables without transforming the outcome variables. So you can get that dataset neat and tidy for the prediction itself. And for the prediction itself he is leveraging one of the very popular methods for a classification that's XGBoosting, and that is defined with the boost tree function. You can see the different parameters that he defines there, setting up the engine, setting up the workflow, and then making sure he's got the grid for the searching of those predictions, and then finally taking advantage of multiple cores to tune that that grid. And then it's time to look at the prediction results, such as model performance.
And one of the fundamental ways to do that is looking at the metric called Area Under the Curve or AUC. And in that you can see in the scatter plots in the post that there is a concentration around of an AUC of 0.75 to 0.8 with these different model metrics that are displayed here as well as then looking at an area of variable importance plot, which I often do in my classification analyses. And this is where things get pretty interesting, where the most important variable by far in this analysis was the opponent guarding Steph at the time of the shot that was indicated by this opponent athlete or no direct opponent indicator.
So, obviously, if there's no opponent at the time he's shooting, Steph's gonna have a higher likelihood of making that shot as compared to somebody with the arms right in his face as he's shooting. The other important variables were the location of the shot, which again would make sense because typically speaking for most people the farther the shot is the harder it is to make. Although Steph seems to make an art in making shots up near half court and look like they were a free throw or something. But then there are other variables as well that all kind of have a similar concentration at the lower end of variable importance.
And then last, a couple additional diagnostic plots that Nils outlines here, or displays here, is one of which looking at the specificity and sensitivity with the area under the curve, which shows kind of a similar story that around the AUC of 0.78 and an accuracy of 0.73. And then and then looking at a confusion matrix as well, looking at how often the false positives, false negatives occurred with the prediction. And this is an interesting insight looking at this confusion matrix is that the model fit here seems to be performing better at predicting shots that aren't going in, whereas it's not as a clear story for predicting the shots that do go in.
Now that was using, I should have mentioned this earlier in the post, was using data from 2022 and 2023, but you might say how does this help with today's data I. E. The 2024 NBA season? So Nils grabs that data again from the Hooper package and then fits the model on that particular set of data. And in there we see that the area then the model predictions there's a nice little visualization via again a shot chart kind of heat map combination here that the Steph Curry is indeed more likely to score when he's near the basket. That seems pretty intuitive. And then it's interesting that the predict the model is predicting the location of the shots to be kind of from the corners of the court and as opposed to other parts, and he does also a nice little shot chart with the color codings of what were false positives and false negatives.
And the false negatives are mostly occurring for the 3 pointers. That's interesting. And then most of the false positives are around the back board or near the the basket itself. So in terms of accuracy, it looks like this model is around that 75% accuracy mark. Again, not gangbusters great here, but at the same time it's not it's not terrible either. Like it it's it's pretty neat. So it this was a very comprehensive post here. Again, I don't think I could do it justice in this recap here, but lots of techniques at play here. 1st, a very novel use of existing packages to get this data yourself without having to become a wizard at data scraping.
And then also the tidymodels ecosystem along with the tidyverse itself for data wrangling to fit this prediction. But as you can imagine there is a lot of future directions this could take that such as using additional data points on the players themselves, maybe, you know, better ways of training the data, and using feature engineering as well, which is a very popular technique, especially if you're going new to this type of exercise or you're new to this type of data. Principal component analysis can help reduce the dimension quite a bit and maybe give you some better prediction performance there. But this blog post is absolutely fantastic. Again, I'm really intrigued by the visuals here. I love that interactive expiration of the data in the middle of the post here. I would love to see the code that did that. I can guess how it was done, but I'd love to see how how Nils pulled that off. It's a really great read. I definitely invite you to check it out, especially if you're, you know, keen into seeing just what is possible with sports analytics and with the tools that we have in your ecosystem available.
There's a whole lot more we could talk about in this issue, but I'm gonna take time to put the spotlight on an additional find before we wrap things up here. And as I'm getting really knee deep back in the Shiny development recently I just put in an initial release of a very, fresh off the presses app and at the day job and it's been getting great reviews. I've been putting a lot of attention to detail to performance where I can, albeit I've only scratched the surface of it. I've just been trying to optimize my use of reactive objects appropriately, making sure that there are indicators when things do take a little bit of time, and trying to do a lot of upfront work so that the experience is quite seamless for the user.
Well there are other ways of optimizing the performance of your Shiny application and one of them is called caching. So I'm gonna put a link in the show notes. So this additional find is from the Appsilon blog on how you can optimize your Shiny application performance with advanced caching techniques, where there are different types of cache you can use. So the post talks about ways you can do within base Shiny itself with reactives and in memory caching with bind cache, which I have not done as much and I probably should, but then tapping into the ecosystem for more even more powerful ways of caching such as using Redis backed by or with the front end of the Redux package as well as session specific caching and using databases.
So this post gives you a tour de force of how this works with an example application that kind of shows a lot of these principles in action. So another terrific blog by the Absalon, the fine folks at Absalon. Definitely worth a read if you're trying to take your performance to another level. And, of course, there is a lot more to this issue and we invite you to check it out. Where can you find it, of course? It's at our weekly.org. That's where you find all the great content, including this week's current issue, a healthy mix that John has put together, new packages, great tutorials, updated packages, nice little quotes of the week, especially those powered by other services such as Mastodon.
You and I are big fans of Mastodon, I can tell. And also, we have a direct link at the top of the page to all the back catalog of issues. So as the year is closing out, you'll definitely want to reminisce on a lot of these great content that have been assembled by our fully driven community effort of rweekly. It's all about contributors at the top of the show. Right? All of you producing great our content that are coming into our weekly, we thank you for producing all that. And also our curator team, you know, works around the clock. Obviously we have different rotations, but it is very much a community effort. We are not driven by any financial backing here. We are doing this out of our love for the RN community and so we appreciate all of your well wishes as we keep this going because it is not easy to keep a project like our weekly going. There are a lot of moving parts, and I'm just so thankful that we have a wonderful team with this. We could all use your help. And one of the ways you can help is giving us that great content by giving us a poll request for our upcoming issue draft.
You can get to that right at the top of the page with that little Octocad in the upper right corner. It'll take you directly to this this next week's draft issue. It's all marked down all the time, so if you find that great blog post, that great tutorial, that great new package, and you want the rest of the R community to hear about it, give us that little poll request. We got a handy little template for you to follow. It's very straightforward and we're always happy to help if you have any questions, but that is one of the best ways to get back to the project.
And also we are definitely working looking for more curators to join our team. We've had a couple move on or about to move on, so we definitely have spots to fill. And if you're passionate about the art community and sharing the knowledge and making sure those in the community can access this knowledge in an open way driven by the community, That's what our weekly is all about. So please get in touch with us. You can find all those details on the our weekly website and the GitHub repository. I also love hearing from you in the community as well. We have lots of ways that you can contact me and my awesome co host, Mike. One of those is a contact page. Directly link in the show notes of this episode.
You can send us a little note. I'll get that directly in my inbox and I'll be happy to read it, maybe even on the show as well. Also, you can get in touch with us on social media these days. I am mostly on Mastodon with at our podcast at podcast index.social. And now I am on Blue sky as well. I have ventured into that land and so far I'm I'm liking it. So I you can find me on blue sky with at our podcast dot bsky dot social. Hopefully I say that right. It seems like that's the right way to say it. I've been following a lot of my favorite people on there and also new followers. I appreciate you all getting connected with me and I'm definitely looking forward to putting more content on that service as well.
And also I'm very happy to again remind all of you especially those in life sciences that the R pharma 2024 recordings are now on the YouTube channel for R pharma. I also want to thank our curator John Curl. He was an instrumental part of the Asia Pacific track that we have for the first time this year. We have all those recordings on the YouTube channel. There is just a lot of wonderful content there really worth a watch even if you're not in life sciences there's a little something for everybody there. And with that, I'm going to close-up episode a 188 of our wiki highlights, and we will be back with another edition of our wiki highlights next week.