A major achievement unlocked! In episode 100 of RWeekly Highlights: The new {rtoot} package for collecting and analyzing Mastodon data, using the {unheadr} package to fix broken and irregular column headers, a tour of the apply functions in base R, and creating posters of NBA rosters with R and ImageMagick.
Plus a big announcement on a new way to directly support the show!
Episode Links
- This week's curator: Ryo Nakagawara - @R_by_Ryo (Twitter) & @[email protected] (Mastodon)
- {rtoot}: Collecting and analyzing mastodon data!
- Fixing broken and irregular column headers
- Let's Get Apply'ing
- NBA Posters
- Entire issue available at rweekly.org/2022-W46
Supplement Resources
- Everything I know about Mastodon: https://blog.djnavarro.net/posts/2022-11-03_what-i-know-about-mastodon/
- Fedi.tips (an unofficial guide to Mastodon and the Fediverse): https://fedi.tips
- RWeekly is now on Mastodon! @[email protected]
- Favorite/most important base R functions new users should know: https://twitter.com/newmeyermn/status/1591464874827460608
Supporting the Show
- New Podcast Apps: https://podcastindex.org/apps?appTypes=app&elements=Value
- A new way to think about value: https://value4value.info/
Hello, friends. This is episode 100. Yes, you heard that right. 100 of the R Weekly
Highlights podcast. It is amazing that we got here and I'm feeling very happy to share
that with all of you around the world. And of course, this episode 100, I'm not going
to mince words here, would not have been possible without my supremely awesome co-host and joining
me many episodes along the way. Mike Thomas. Mike, how are you doing today, my friend?
I'm doing great. We had last week off because of the RFARMA conference. So it's nice to
have a little bi-week, get some rest and come back strong for episode 100 today. I don't
know if I've really tallied how many of the 100 I've been around for, but it's been fun.
So I appreciate you having me on for however many episodes it's been.
Yes. Well, it's been a true pleasure. And yeah, it turns out when regarding last week,
when you're involved in a very time consuming conference, I'll be it in a good way. Yeah,
it was hard to squeeze another episode, but we're here now. So hopefully we can be more
regular from this point forward. But again, yeah, really happy that we made this milestone
and we're going to have fun along the way like we always do because we got a really
jam-packed issue to talk about. And in fact, there are four, count them, four highlights
for episode 100. So that tells you how supersize it is. And then towards the end of the show,
I'm going to have a pretty big announcement about how all of you can even support the
show in a really fun way. But let's get to the meat of it, shall we? And our curator
this week is Rio Nakagorora, then another longtime contributor to the Our Weekly Project.
And of course, he had tremendous help as always from our fellow Our Weekly team members and
contributors like you all around the world. So I couldn't have scripted, frankly, a more
fitting first highlight in this episode is we're going to dive into a really powerful
new theme that I think brings community right to the forefront of a lot of the avenues of
technology that we're working with today. So many of these popular online services in
say social media, the DevOps sector and other parts of tech are brought to us by a single
entity or a single company. And certainly that can work really well and be very convenient
for us. But it's not always a bright outlook, especially when a huge change in ownership
or future direction takes place for a certain company. Now, I won't be around the bush
any longer. If you've been keeping up on the latest tech news recently, you've heard that
Elon Musk has acquired Twitter after a rather long saga, which had its own twists and turns
that we definitely don't have time to get into today. But the effects of this event
have been pretty widespread. And for a myriad of reasons, many in the our community and
other communities have been on the lookout for kind of a new social media messaging platform.
Well, to put our way back machine in motion here, in early 2016, a recent college graduate
named Eugene Roccho took advantage of a period between graduation and starting an actual
job to put into fruition his own take on what he felt Twitter should be. A decentralized
completely open source micro blogging platform called Mastodon, which is not governed by
a single company. It is a federation, if you will, across multiple members of a worldwide
community. Get to what all that means in a little bit. But there are a few key differences
to be aware of when you're going from the Twitter mindset to how Mastodon works. And
honestly, a terrific summary of this has been written by our weekly contributor, Danielle
Navarro. And we'll put a link to her great blog post in the episode show notes today.
And you know what else Mastodon has? An API, of course. And in almost the blink of an eye,
if you will, David Shoach, the team lead for transparent social analytics in Jesus in Germany,
has authored an R package, appropriately named Rtute, for all of us to interact with on Mastodon
directly in R. And I got to think the inspiration for that name probably came from another highly
acclaimed package that dealt with a Twitter API called Rtweet. And so what is Rtute all
about? Well, kind of the major things that you might expect out of a client that deals
with this kind of platform. Rtute lets you easily grab the various toots, which is analogous
to tweets on the other platform that you have made, perhaps toots associated with a hashtag
such as RStats, grabbing metadata around users, and getting trends that are seen in the various
servers out there. And again, I'm really impressed by this, because apparently from the idea
of this package to its release on CRAN was about a week. That's crazy to me to get something
like this done in one week. That just shows you that with motivation and the tooling we
have for package development, you can get an idea to the masses, if you will, very quickly.
There still sounds like some improvements that are to be had. But the reason this is
so important now is that much like how we're seeing this convergence of the R community
to the various Macedon servers out there, I think the community contributing to a package
like this is going to go a long way to how we can leverage Macedon and its services in
a very powerful data driven way to pave the way for future research or future over developments.
And I'm really excited to see what the future entails. The package is really cleanly written,
really concise code to get our repos out there if you want to contribute with the issues
that have already been identified. And honestly, we'll us on the R Weekly side, we'll be taking
a hard look at this package because alongside this post, I'm happy to announce that R Weekly
is now officially on Macedon. We now have an account called at R Weekly at Fostedon.org.
We'll have a link to that in the show notes, but we will be posting on that account each
new issue release and other fun tidbits or news related to the R Weekly project. So it's
a really exciting time for us and R2 is going to be a very important component of our revised
backend to take advantage of these services to the fullest. So yeah, Mike, when are you
going to get tootin' out with R2?
When you have an eight month old, you just can't take the word toots seriously, but I'm
going to try to do my best.
It'll be tough, I understand.
But I saw some, you know, I really appreciated this blog post and the speed to which this
package has hit CRAN and allowed us to get up and running with the Macedon API is incredible.
One of the ones that piqued my interest from the blog post is a function called get timeline
home, which allows you to download the most recent toots from your own timeline, which
I feel like could be easily spun into a shiny app or something like that, that you could
have, I don't know, running on a Raspberry Pi and a little monitor on your desk that
just sort of shows you the latest statuses of those that you're connected with on Macedon.
I haven't used a ton of the R2 functionality, but it seems like R2 has some really interesting
functions, you know, which means that Macedon has a lot of interesting APIs for doing things
that go well beyond just posting a status, like you talked about getting metadata, checking
out who your connections are, who others connections are.
There's some really great getting started with Macedon blogs that I've seen come out
in the last few days really for learning how to sign up, log in, post statuses, connect
with others on the platform and more, but I think maybe from a bigger picture color
commentary perspective, you know, for us having a place where the data science community can
come together is really, really important.
I know it's been important to me and you career wise and everything that I've seen, you know,
in terms of blog posts in the community, people are really saying the same thing that interestingly
Twitter has really shaped some of our careers, you know, and that's really not an understatement
in the data science community.
Twitter used to be this place, but I think it's going to be interesting how, you know,
the whole server landscape shapes out for the DS community in particular.
I know that there's a few folks out there who are looking at spinning up, I guess, their
own Macedon servers, so it'll be interesting to, I guess, see how all the chips fall there
and I am yet to sign up, but I think it's going to happen today, Eric.
I think it's, I think I'm too late.
Oh, I don't think there's such thing as too late.
You'd be surprised.
Fun fact, when I formally introduced the, or set the first toot from the R Weekly account,
the reception was very positive.
Like I couldn't believe it.
It was, I feel like the time there's a real big inflection point right now.
And again, you hate to see it coming from an event like what's happening as we're hearing
about Twitter having a pretty massive set of layoffs or are obviously our, our thoughts
are with anybody that's been affected by that adversely.
I just want to always look at positives out of this as well.
And the fact that now the data science communities, you know, our communities, you know, the intersection
of this can feel like coming to a platform that's not going to be uprooted by a single
person, a single process, decentralize is a good thing here.
And that's where Mastodon is one of the more quote unquote famous examples of something
called the Fediverse.
It's not just Mastodon folks, there are many tools that are taking advantage of this technology
that's underpinning ways of decentralizing a lot of what we used to think was somewhat
closed wall gardens, if you will, anything like PureTube, like other communication platforms.
It's a really exciting time and even podcasting is getting into this too.
So I'm more to say about that later.
So again, really exciting time and I'm just getting started with it as well.
So I haven't figured everything out yet, but I'm really excited to see what a journey has.
And I think the positive effects that you just said, Mike, that we've had, you know,
with the Twitter communities on with R and data science, I think we're going to see those
benefits and probably even more so in this new era of our social media communication
with Mastodon.
So I'm excited.
I do have friends that have, you know, said sometimes a bit of a difficult transition
trying to figure out how to keep up with the latest.
So there is going to be an adjustment.
It's not a one-to-one replacement, but I think with due time, we're seeing, like you said,
some great resources being written by members of our communities as well as pointers to
resources that are excellent for getting started.
So again, we'll have Daniel's blog posts and the show notes along with others that we think
would be really helpful in this journey.
And so like I said, a very exciting time indeed.
Absolutely. And I do have more faith in the decentralization model for this particular
brand of social media than I have for maybe currency at the moment.
Well said.
Well said.
Yes.
We could have a whole nother rant about that.
We're going to transition here to our next highlight where we always have our callbacks
on our weekly, right, and especially the episodes we've had previously.
And so it's quite appropriate that in episode 100, we have a callback to a very recent episode
where we saw our friends at the TidyX crew give their take on how to tidy up a somewhat
messy data format and illustrate their approaches to handle that.
Well, if you didn't think that was messy enough, our second highlight brings another tool available
for your importing and cleaning arsenal, especially with messy spreadsheet data.
So this was inspired by a recent Our Ladies Chile meetup that shared a somewhat, you might
say spooky Excel data import postdoctoral researcher Luis de Verde Arrigotia.
He shared a novel use case of his very own R package called Unheader.
That's a cool name, isn't it?
To tame the wild issue of extremely bizarre column headers in your data.
So what are we talking about here?
So you could tell this is definitely inspired by some real world situations where the examples
have headers that have a mix of like units of measurement, the variable name, and then
just blank cells somewhere to just delineate spaces between headings and columns.
And this would be extremely difficult to manage without the help of a few packages or in this
case his own package Unheader to tame this in a really concise way with basically one
or two function calls to translate that header into something that you can do something with,
so to speak, where instead of having like three or four levels, if you will, now you
have a header that's clean with like the variable name and then the measurement separated by
underscore.
But then you get back to your comfortable tidy syntax that you can deal with.
But again, for any of you in the trenches that are dealing with this data from like
raw instruments or collaborators, I think Excel and having the fanciest layouts in the
world is such a good thing.
It's not such a good thing for us data scientists, is it?
So Unheader, I think is a really great package to put in your toolbox to deal with these
situations.
And I would definitely have a look at this if I'm in the rather unfortunate position
of dealing with this raw data anytime soon.
But Mike, what did you think about Unheader and some of the great utilities that it offers?
I love this package and this blog post was my first introduction to the Unheader package.
So I'm very grateful for the package and that it hit our weekly this week.
If you're a fan of those like satisfying class of videos, like someone power washing away
dirt to get to the shine underneath, then you are going to absolutely love this blog
post in this package.
And Luis has these beautiful visuals in there.
They remind me of something that like Alice and Horse would put together showing sort
of the messy data at the start with some annotation on the side and some graphics with a little
dog that's pointing out the issues in the data and really, really well described problem
statement and then what he wants to get to through these visuals.
And not only is the Unheader package just a great name, but also some of these functions
within the package, MASH headers, I just love those function names as well.
And it's pretty incredible how little syntax it takes in this package to do quite a bit
if you're someone who has ever had to wrangle these multi-column headers, maybe the cells
in Excel were merged and centered and you have data sort of all over the place before
you actually hit, you have column header data all over the place before you actually hit
the observations in your dataset and the functions that he has written in this package to sort
of concatenate these column headers from different rows together to identify where the white
space and the null values are to make these clean headers at the end of the day.
It's just really beautiful syntax and code and I think it's incredibly useful.
It probably most of the time is going to be a great complement to the read Excel package
or open XLS packages if you are ingesting Excel data that has crazy headers this way
because we're always trying to automate these workflows when we can and part of that automation
is restructuring our column headers and not always just tidying up the actual observations
in the data, unfortunately.
So I think this is another tidy-ish tool in your toolbox to have and very grateful for
Luis to have not only put the package together but the blog post as well.
Yeah, this would have been really handy many years ago when I thought I was going to get
some really clean data from a lab vendor giving us some custom biomarker data but no, their
idea of tidy was way different than mine with some of the most cryptic headers known to
humankind and I had to figure out how do I make sense of it for one and then figure out
how to get it all tidied up.
So yeah, if you're in this situation, Unheader from Luis is going to be a great asset to
your toolbox as you said.
So yeah, tidying doesn't always have to be a chore.
These packages make it a heck of a lot easier and definitely highly recommended to check
that out.
It's a great transition.
You bet because sometimes it is pretty easy to take for granted that we were able to put
packages like Unheader or others on our particular setups for running R. Maybe you can, like
I said, install new packages like that, maybe you can swap out a compiler to get the most
speed out of your computations and much more.
But you may also find yourself in a situation where you are dealing with a bit of constraints
in your environment, maybe it's from an IT group or whatnot.
Never.
Never.
Oh, never.
Oh, yeah.
Yeah.
You should have heard the pre-show something constrains poor Mike's been dealing with.
Oh, that's a bonus content waiting to happen, but it's you might find yourself in a situation
where the IT admins give you an R install.
So that's good.
But that's about it.
You don't have R studio.
You don't have anything like that.
You got the flash and prompt at your disposal and you just better make the best of it.
Well, if that resonates with you or maybe the more appropriately, if you find yourself
in that situation unwillingly, then our next highway kind of brings us to the root of how
important concepts in R that you can do any time like vectorization and functional programming
can be achieved in this quote unquote vanilla install.
And that's where a cognitive neuroscientist, Anthanasia Monica Mowinckel walks us through
a common development process of creating code that loops through certain observations and
performs derivations on particular columns of a data set.
And starting with what many of us probably did when we first learned R, create your loops,
create some variables in the global environment.
But that can have some various pitfalls that can occur, especially when you start to troubleshoot
things.
And so that's where the post transitions to, hey, we got to take a functional approach
to this.
So that's where she starts building custom functions that will seem pretty familiar to
some of the concepts we've been talking about through, frankly, the life cycle of this very
podcast.
Have yourself fit for purpose functions.
You can reuse them in many different ways and it's going to make your environment cleaner
and make debugging a lot easier.
And then where can we fit those in?
That's where Anthanasia gives us a tour of how the various apply functions that are built
into your basic installation of R can work, going from the typical apply to s-apply, m-apply,
which is her favorite.
And honestly, you also get to see some of the little idiosyncrancies are somewhat different
calls that you have to make with parameters as you transition from these.
And while this wasn't necessarily the intent of her post, I also think that the blog post
serves as another use case of why the highly acclaimed per package exists.
Because with per, you do get a consistent API, if you will, of running these map reduced
like functions with clear inputs and clear expectations of what the output should be.
Now, again, in her use case, she may be doing a very limited system where hers just simply
isn't available.
So you have to make the best of what you have.
So it's important to have that perspective, especially in these situations.
But it's also showing you that, yes, in BaseR, you can accomplish all these things.
Just got to have a little more investment to learn how the functions work and get a
hang of it, hopefully building some examples for your repertoire to have as reference.
So again, maybe you don't find yourself in this situation routinely, but I have found
myself in this from here and there when I got a custom VM setup to do some HPC work.
And for whatever reason, the IT group couldn't get me that fancy R installation with thousands
of packages like in our default install.
So I had to make the best of it while I waited for certain things to complete.
So again, great illustration of the concepts.
Like I said, vectorization, functional programming can be hugely valuable no matter which environment
you're in.
So that was quite a reality check like post, Mike, what did you think about Anastasia's
post here?
One thing I really loved about Anastasia's blog post is the fact that she actually showcases
some of the errors that she runs into in the console along the way as she works through
the problem from start to finish.
And I think that this is so important.
It's not something that we see in blog posts very often, but I think when you're trying
to teach a new concept to showcase those errors that you ran into along the way, instead of
just saying, hey, here's the right way to do it, I think can really instruct people
a lot better sometimes and can really be a better learning experience for folks who are
trying to follow along with that blog post sometimes because it really provides a vulnerable
look into your thought process that most often will be similar to somebody else's thought
process along the way.
I always believe that it's great use of time to take a dive into Basar once in a while.
I know the tidyverse is amazing.
We know the tidyverse is amazing, but you might be surprised what utility there is in
some Basar functions.
If you do spend some time, just check out what there is in the Basar installation.
There was a tweet recently that we'll link to in the show notes, and it was like, what
is your favorite Basar function?
I can't remember who posted it out there.
Two of mine are the any and all functions from Basar that allow you to provide a vector
of different logicals to each of those functions, and if one of them is true in the any function,
then it'll return a true, and if one of them is false in the all function, it'll return
a false.
It's very useful for checks in my Shiny apps, particularly, or whenever you're doing any
sort of data engineering code where you're having to use some control flow, some if statements,
the any and the all functions, I find useful all the time.
I don't know, Eric, do you have any favorite Basar functions?
Oh, yeah, those are always in any app I make.
It's hard to avoid not benefiting from those great simple ways of invoking the conditional
logic, and one that I always have in my scripts is finding unique values of variables with
unique.
The unique function is literally in every program I make, because I'm always troubleshooting,
oh, wait, did I get all those treatment group levels, or did I get all those lab values,
or oh, jeez, this data set has 5,000 observations.
I don't want to just skim through that.
Just give me the uniques, baby, and then that's what I get, so unique is very valuable in
my Basar tool set.
Unique in sort.
Oh, yes.
Always in those widgets, yeah, in the choices for the widgets.
Absolutely.
Yeah.
Oh, that's a great call out.
Yeah.
There are tons of gems like that.
Set diff.
Set diff.
Oh, yeah.
Another huge one there, too.
I was just doing that recently to troubleshoot why I was not getting certain observations
and lab data for this FDA submission stuff I'm working on, so that being very quick to
figure out, oh, yeah, because I took it from that data set, and I didn't have that record.
Oh, okay.
Now I got it.
So yeah, lots of these things are quite valuable in your tool set.
Well, Mike, I couldn't have said it better myself.
A lot of that base functionality is hugely important to have knowledge of.
And the other part that I mentioned towards the outset was this idea of how R can really
cleanly interop with other tools in the open source world.
And that's where our last highlight in this supersized edition of R-Week Highlights comes
into play, because no R-Weekly episode would be complete without a little visit to our
data viz corner with a little API magic to boot.
And so we marveled in the past on how R itself, or like I said, in combination with other
open source visualization tools, you can make something that looks just as professional
as coming from that fancy high-cost product like Adobe Illustrator, just as an example.
And with his first post produced in that little engine called Quartle, the latest blog post
from Abdul Issa Bitta, who is a full stack developer and an actualist, brings us a comprehensive
guide to creating posters of current NBA player headshots for each team in the league with
both R and some great graphics software in the open source world.
Now first, ain't nobody got time to hard code all those names.
So the first step is to utilize the public-facing ESPN APIs to dynamically grab JSON representations
of both the team and individual player metadata.
That's a pretty clean win there.
And fortunately, he did some research to find where those APIs were, but you get some JSON
out of that.
So of course, that means some wrangling, right?
Data wrangling or JSON, who hasn't done that if you've dealt with APIs in the past?
But Abdul has some really clean and concise code with explanations on how he wrangled
that with the per package.
There's another callback to extract only the bits needed for his particular project here.
And one of the major goals of this post was not just to do some wrangling there.
It was also to look at how can we generate the actual poster.
And surely you can do this some with ggplot too, I'm sure.
And he even mentions that.
But the goal of this was to figure out how can we hook into another tool that's available
in the open source community to do it.
And that's where he hooks into what I call the Swiss army knife of graphics processing
called image magic.
And in particular, image magic contains a handy function called montage that allows
for a collection of image files to be concatenated together into one, well, montage.
See?
Good names, right?
The results look terrific.
But like any experience of real world data, there can be a couple little edge cases here,
such as dealing with long player names for certain teams and et cetera, figuring out
how to deal with that to print cleanly.
But when you look to the end of this post, it looks terrific.
Really polished visualization.
And in the end, it's a fantastic walkthrough and illustration to show how open source can
stand shoulder to shoulder with those proprietary behemoths in the visualization stack.
So really cool use case of grabbing your data, assaging it a little bit, and then bringing
that interoperability principle in mind with R to bring that novel visualization to fruition.
So Mike, you're going to make any sports posters out of all this?
I think I'm going to have to.
I mean, this is bringing me back to like my college dorm room days with some sports posters
on the wall.
I think I could, instead of going out and buying one for 25 bucks, I think with a little
R, a little image magic, some bash work on the command line, I think I could make one
up myself here.
These posters look absolutely incredible.
I feel like it's an opportunity to say in an end to end project, it's never just blank.
So in this case, it was never just R. We usually say it's never just shiny, but in this case,
it was never just R. With that image magic command line tool, it's a phenomenal way to
stitch these images together on a very fluid background that looks like, I don't know,
all of these individuals were really standing together against the same background.
It's pretty incredible.
He shows not only the bash commands for that, but also how to execute those bash commands
using the system functionality in base R, and to execute those commands using R. Love
the quarto blog.
The call outs maybe are like my favorite thing in the entire world, and he has a couple really
nice call outs.
I don't know.
They come out so, so well in a blog post.
They really make you stop and take time to read them.
So fantastic post from top to bottom.
I'm not sure if I was familiar with the R JSON package.
I've been familiar with the JSON Lite package.
It looks like there's some overlapping functionality there, perhaps.
Yeah, I'm fairly sure R JSON came before JSON Lite.
And certainly many, many people have used that in the past too.
So they both do a good job.
Yes, it absolutely looks like it.
And again, he's employing Per to not only create one poster, but to create multiple
posters over all 30 teams in the NBA to have this one final massive graphic that fits right
on this HTML page in your quarto doc.
So incredible read from top to bottom.
I think it's another great example of sort of accomplishing an entire project within
a blog post from start to finish.
And I think if you're the type who learns really well from use case project based type
learning, this would be a fantastic blog post for you to check out.
Well said.
Yeah.
And another, like I said, great, great example of how you can stitch a lot of these pipelines
together and be able to create something really aesthetically visually pleasing and not have
to shell out a whole boatload of money to that other software.
Not that I don't have opinions about that, but I digress, I digress, but what we can
all agree on is that this is a fantastic issue.
And as I mentioned, we were off last week.
So for my additional little fine here, I'm going to give a call back to last week's issue.
And in particular, an amazing blog post from another regular contributor to our week.
Basically, Shannon Peligi, who posted basically a narrative of her in-depth conversation with
Posit's own Jenny Bryan, affectionately titled Yak Shaving.
You're going to have to read the post to figure out why it's called Yak Shaving.
But at a high level, this is talking about Jenny's guide or Jenny's principles, how
she approaches learning a new technical paradigm, a new technical idea, and just the practical
side of how she goes about it.
That's a really entertaining read and also quite insightful, especially as I think about
what's on my docket next year to learn every year a new skill or a new technology or a
new way to orchestrate things together.
I'm definitely going to take some lessons learned from Shannon's post here.
So Mike, what did you want to call out today?
Sure.
No, that's a great pull up from last week's highlight.
I found one in this week's highlight, which is Julia Silge's post.
I thought it was pretty topical today on how to delete all of your tweets programmatically
in R in the R tweet package.
She does have a link to how to download your Twitter archive first, which is a great first
step.
Just that first paragraph, otherwise you will not have any of your tweets anymore.
But if you feel like you want to go through the process of downloading your Twitter archive
and then getting rid of your tweets as well as your account, I think you can follow along
with Julia's post to do that.
Excellent.
Fine.
And I think a lot of people are going to be taking advantage of this, not to this in future,
maybe myself included.
I'm not sure if I'll delete everything, but at the minimum having a backup never, ever
hurts for sure.
Absolutely.
And it looks like Julia has been hard at work because it looked like there were a few tidy
models ecosystem packages that had brand new releases this week as well, or very recently
as well, including parsnip and a couple others that I had seen.
Yeah.
That world never stops spinning, right?
Tidy models is an exciting place to watch, and I'm always eager to see Julia's blog posts
and as well as her screencasts, which are always entertaining as well.
And I want to thank Julia on behalf of the recent endeavor I was embarking on, the R
Pharma Conference, where Julia was a panelist on lessons and ideas for women developing
careers in life sciences and the analytics space.
So she did a great job with that.
So thank you, Julia.
We see here in the audio form here.
And yeah, so that's, we're about to wrap up episode 100, but I want to make an announcement
here that I'm really excited to share with all of you about the future direction of the
show.
What's not changing?
Me and Mike coming on here and bantering about our praise of the highlights and our community
in general.
You're always going to get that, but I'm happy to announce a way to make it even easier for
you, the listener, to share a little bit back with us as they sway or a concept that I've
been reading about quite a bit and I'm fully invested in called value for value.
What this really means that if you get any use out of this podcast, however big or small,
we're going to make it super easy for you to share your value back with us.
Is it a set amount?
No, this is what you choose.
And the easiest way to get started with this is to grab yourself a new podcast app.
And I have a URL just for you to find it, newpodcastapps.com that'll be linked in the
show notes.
And there will be a handy little button in many of these apps that you could download,
whether it's iOS, Android, the web or both.
In particular, I like the ones called fountain pod verse cast the Maddox.
Those are just the name of few where you can send us what's called a boost to give us a
little encouragement perhaps in our endeavors here.
And so I won't say too much more about it, I want to encourage all of you to do your
research and read up on this if you're really interested.
But I do want to mention that I would not be putting my voice or my support for this
if I didn't really believe in it.
And if you've ever listened to many of my podcasts in the past, whether it's our weekly
highlights or the old art podcast, which I hope to resurrect again someday, I have never
taken sponsors ever because I wanted to just be driven by the community for the community.
And this endeavor of value for value actually has some great synergies of what we just talked
about at the top with the mastodon situation, a decentralized way for you to give value
back to us and us to share value with you.
So again, have a look at newpodcastsapps.com and certainly reach out to me if you'd like
more details on how all this works.
But I want to thank some good friends in the community that might be listening to this.
Adam Curry, who actually is the originator of podcasts itself, has educated me on this
as well as my friends at Jupiter Broadcasting have been using this principle quite a bit
in their endeavors.
So again, check that out and just have a look and let me know what you think.
So with that, how can you reach us?
Well, as I mentioned at the outset, our weekly itself has a brand new mastodon account.
We are at our weekly at fostodon.org.
We'll have a link to that in the notes.
And also you can find me.
I am still on Twitter, as they say, with that the artcast, but I am also on mastodon at
our podcast at podcastindex.social, we'll have a link to that in the show notes.
But Mike, where can people find you?
I will be on mastodon today, I promise, but I do not know my handle yet, address or whatever
it's called.
So for now, I'm hanging on on Twitter, but at Mike underscore Ketchbrook k-e-t-c-h-b-r-o-o-k
and we'll hope to have some similar variation to that in my mastodon handle.
And Eric, I think I speak for the whole community where we certainly do not doubt and very much
appreciate all of the effort you have put in to make this content fully community driven.
So thank you very much for your efforts.
And I think we're excited about what the future holds.
And this episode is brought to you by MongoDB.
So you want to store your structured data in an unstructured way for no reason at all,
just to bring it back structured into a data frame later, MongoDB, it's perfect.
Well there you go.
Yeah, we've got a new direction, don't we?
That wasn't quite in the value for value script, but we'll let it aside, we'll let it aside
for now.
But it is an exciting adventure that we're about to embark in the next batch of episodes,
who knows how long this train will last.
We're excited to have you all along the journey to join us for this.
So with that, we're going to close up shop on episode 100.
Again, thank you to everybody around the world who's been listening to all the previous episodes
when I was foaming my way through the very beginning and then right at the ship when
Mike came along to join me for the ride.
So we will see you again in a week.
Until then, that's the end of episode 100 of our weekly highlights.
And again, we'll be back here next week.
Highlights podcast. It is amazing that we got here and I'm feeling very happy to share
that with all of you around the world. And of course, this episode 100, I'm not going
to mince words here, would not have been possible without my supremely awesome co-host and joining
me many episodes along the way. Mike Thomas. Mike, how are you doing today, my friend?
I'm doing great. We had last week off because of the RFARMA conference. So it's nice to
have a little bi-week, get some rest and come back strong for episode 100 today. I don't
know if I've really tallied how many of the 100 I've been around for, but it's been fun.
So I appreciate you having me on for however many episodes it's been.
Yes. Well, it's been a true pleasure. And yeah, it turns out when regarding last week,
when you're involved in a very time consuming conference, I'll be it in a good way. Yeah,
it was hard to squeeze another episode, but we're here now. So hopefully we can be more
regular from this point forward. But again, yeah, really happy that we made this milestone
and we're going to have fun along the way like we always do because we got a really
jam-packed issue to talk about. And in fact, there are four, count them, four highlights
for episode 100. So that tells you how supersize it is. And then towards the end of the show,
I'm going to have a pretty big announcement about how all of you can even support the
show in a really fun way. But let's get to the meat of it, shall we? And our curator
this week is Rio Nakagorora, then another longtime contributor to the Our Weekly Project.
And of course, he had tremendous help as always from our fellow Our Weekly team members and
contributors like you all around the world. So I couldn't have scripted, frankly, a more
fitting first highlight in this episode is we're going to dive into a really powerful
new theme that I think brings community right to the forefront of a lot of the avenues of
technology that we're working with today. So many of these popular online services in
say social media, the DevOps sector and other parts of tech are brought to us by a single
entity or a single company. And certainly that can work really well and be very convenient
for us. But it's not always a bright outlook, especially when a huge change in ownership
or future direction takes place for a certain company. Now, I won't be around the bush
any longer. If you've been keeping up on the latest tech news recently, you've heard that
Elon Musk has acquired Twitter after a rather long saga, which had its own twists and turns
that we definitely don't have time to get into today. But the effects of this event
have been pretty widespread. And for a myriad of reasons, many in the our community and
other communities have been on the lookout for kind of a new social media messaging platform.
Well, to put our way back machine in motion here, in early 2016, a recent college graduate
named Eugene Roccho took advantage of a period between graduation and starting an actual
job to put into fruition his own take on what he felt Twitter should be. A decentralized
completely open source micro blogging platform called Mastodon, which is not governed by
a single company. It is a federation, if you will, across multiple members of a worldwide
community. Get to what all that means in a little bit. But there are a few key differences
to be aware of when you're going from the Twitter mindset to how Mastodon works. And
honestly, a terrific summary of this has been written by our weekly contributor, Danielle
Navarro. And we'll put a link to her great blog post in the episode show notes today.
And you know what else Mastodon has? An API, of course. And in almost the blink of an eye,
if you will, David Shoach, the team lead for transparent social analytics in Jesus in Germany,
has authored an R package, appropriately named Rtute, for all of us to interact with on Mastodon
directly in R. And I got to think the inspiration for that name probably came from another highly
acclaimed package that dealt with a Twitter API called Rtweet. And so what is Rtute all
about? Well, kind of the major things that you might expect out of a client that deals
with this kind of platform. Rtute lets you easily grab the various toots, which is analogous
to tweets on the other platform that you have made, perhaps toots associated with a hashtag
such as RStats, grabbing metadata around users, and getting trends that are seen in the various
servers out there. And again, I'm really impressed by this, because apparently from the idea
of this package to its release on CRAN was about a week. That's crazy to me to get something
like this done in one week. That just shows you that with motivation and the tooling we
have for package development, you can get an idea to the masses, if you will, very quickly.
There still sounds like some improvements that are to be had. But the reason this is
so important now is that much like how we're seeing this convergence of the R community
to the various Macedon servers out there, I think the community contributing to a package
like this is going to go a long way to how we can leverage Macedon and its services in
a very powerful data driven way to pave the way for future research or future over developments.
And I'm really excited to see what the future entails. The package is really cleanly written,
really concise code to get our repos out there if you want to contribute with the issues
that have already been identified. And honestly, we'll us on the R Weekly side, we'll be taking
a hard look at this package because alongside this post, I'm happy to announce that R Weekly
is now officially on Macedon. We now have an account called at R Weekly at Fostedon.org.
We'll have a link to that in the show notes, but we will be posting on that account each
new issue release and other fun tidbits or news related to the R Weekly project. So it's
a really exciting time for us and R2 is going to be a very important component of our revised
backend to take advantage of these services to the fullest. So yeah, Mike, when are you
going to get tootin' out with R2?
When you have an eight month old, you just can't take the word toots seriously, but I'm
going to try to do my best.
It'll be tough, I understand.
But I saw some, you know, I really appreciated this blog post and the speed to which this
package has hit CRAN and allowed us to get up and running with the Macedon API is incredible.
One of the ones that piqued my interest from the blog post is a function called get timeline
home, which allows you to download the most recent toots from your own timeline, which
I feel like could be easily spun into a shiny app or something like that, that you could
have, I don't know, running on a Raspberry Pi and a little monitor on your desk that
just sort of shows you the latest statuses of those that you're connected with on Macedon.
I haven't used a ton of the R2 functionality, but it seems like R2 has some really interesting
functions, you know, which means that Macedon has a lot of interesting APIs for doing things
that go well beyond just posting a status, like you talked about getting metadata, checking
out who your connections are, who others connections are.
There's some really great getting started with Macedon blogs that I've seen come out
in the last few days really for learning how to sign up, log in, post statuses, connect
with others on the platform and more, but I think maybe from a bigger picture color
commentary perspective, you know, for us having a place where the data science community can
come together is really, really important.
I know it's been important to me and you career wise and everything that I've seen, you know,
in terms of blog posts in the community, people are really saying the same thing that interestingly
Twitter has really shaped some of our careers, you know, and that's really not an understatement
in the data science community.
Twitter used to be this place, but I think it's going to be interesting how, you know,
the whole server landscape shapes out for the DS community in particular.
I know that there's a few folks out there who are looking at spinning up, I guess, their
own Macedon servers, so it'll be interesting to, I guess, see how all the chips fall there
and I am yet to sign up, but I think it's going to happen today, Eric.
I think it's, I think I'm too late.
Oh, I don't think there's such thing as too late.
You'd be surprised.
Fun fact, when I formally introduced the, or set the first toot from the R Weekly account,
the reception was very positive.
Like I couldn't believe it.
It was, I feel like the time there's a real big inflection point right now.
And again, you hate to see it coming from an event like what's happening as we're hearing
about Twitter having a pretty massive set of layoffs or are obviously our, our thoughts
are with anybody that's been affected by that adversely.
I just want to always look at positives out of this as well.
And the fact that now the data science communities, you know, our communities, you know, the intersection
of this can feel like coming to a platform that's not going to be uprooted by a single
person, a single process, decentralize is a good thing here.
And that's where Mastodon is one of the more quote unquote famous examples of something
called the Fediverse.
It's not just Mastodon folks, there are many tools that are taking advantage of this technology
that's underpinning ways of decentralizing a lot of what we used to think was somewhat
closed wall gardens, if you will, anything like PureTube, like other communication platforms.
It's a really exciting time and even podcasting is getting into this too.
So I'm more to say about that later.
So again, really exciting time and I'm just getting started with it as well.
So I haven't figured everything out yet, but I'm really excited to see what a journey has.
And I think the positive effects that you just said, Mike, that we've had, you know,
with the Twitter communities on with R and data science, I think we're going to see those
benefits and probably even more so in this new era of our social media communication
with Mastodon.
So I'm excited.
I do have friends that have, you know, said sometimes a bit of a difficult transition
trying to figure out how to keep up with the latest.
So there is going to be an adjustment.
It's not a one-to-one replacement, but I think with due time, we're seeing, like you said,
some great resources being written by members of our communities as well as pointers to
resources that are excellent for getting started.
So again, we'll have Daniel's blog posts and the show notes along with others that we think
would be really helpful in this journey.
And so like I said, a very exciting time indeed.
Absolutely. And I do have more faith in the decentralization model for this particular
brand of social media than I have for maybe currency at the moment.
Well said.
Well said.
Yes.
We could have a whole nother rant about that.
We're going to transition here to our next highlight where we always have our callbacks
on our weekly, right, and especially the episodes we've had previously.
And so it's quite appropriate that in episode 100, we have a callback to a very recent episode
where we saw our friends at the TidyX crew give their take on how to tidy up a somewhat
messy data format and illustrate their approaches to handle that.
Well, if you didn't think that was messy enough, our second highlight brings another tool available
for your importing and cleaning arsenal, especially with messy spreadsheet data.
So this was inspired by a recent Our Ladies Chile meetup that shared a somewhat, you might
say spooky Excel data import postdoctoral researcher Luis de Verde Arrigotia.
He shared a novel use case of his very own R package called Unheader.
That's a cool name, isn't it?
To tame the wild issue of extremely bizarre column headers in your data.
So what are we talking about here?
So you could tell this is definitely inspired by some real world situations where the examples
have headers that have a mix of like units of measurement, the variable name, and then
just blank cells somewhere to just delineate spaces between headings and columns.
And this would be extremely difficult to manage without the help of a few packages or in this
case his own package Unheader to tame this in a really concise way with basically one
or two function calls to translate that header into something that you can do something with,
so to speak, where instead of having like three or four levels, if you will, now you
have a header that's clean with like the variable name and then the measurement separated by
underscore.
But then you get back to your comfortable tidy syntax that you can deal with.
But again, for any of you in the trenches that are dealing with this data from like
raw instruments or collaborators, I think Excel and having the fanciest layouts in the
world is such a good thing.
It's not such a good thing for us data scientists, is it?
So Unheader, I think is a really great package to put in your toolbox to deal with these
situations.
And I would definitely have a look at this if I'm in the rather unfortunate position
of dealing with this raw data anytime soon.
But Mike, what did you think about Unheader and some of the great utilities that it offers?
I love this package and this blog post was my first introduction to the Unheader package.
So I'm very grateful for the package and that it hit our weekly this week.
If you're a fan of those like satisfying class of videos, like someone power washing away
dirt to get to the shine underneath, then you are going to absolutely love this blog
post in this package.
And Luis has these beautiful visuals in there.
They remind me of something that like Alice and Horse would put together showing sort
of the messy data at the start with some annotation on the side and some graphics with a little
dog that's pointing out the issues in the data and really, really well described problem
statement and then what he wants to get to through these visuals.
And not only is the Unheader package just a great name, but also some of these functions
within the package, MASH headers, I just love those function names as well.
And it's pretty incredible how little syntax it takes in this package to do quite a bit
if you're someone who has ever had to wrangle these multi-column headers, maybe the cells
in Excel were merged and centered and you have data sort of all over the place before
you actually hit, you have column header data all over the place before you actually hit
the observations in your dataset and the functions that he has written in this package to sort
of concatenate these column headers from different rows together to identify where the white
space and the null values are to make these clean headers at the end of the day.
It's just really beautiful syntax and code and I think it's incredibly useful.
It probably most of the time is going to be a great complement to the read Excel package
or open XLS packages if you are ingesting Excel data that has crazy headers this way
because we're always trying to automate these workflows when we can and part of that automation
is restructuring our column headers and not always just tidying up the actual observations
in the data, unfortunately.
So I think this is another tidy-ish tool in your toolbox to have and very grateful for
Luis to have not only put the package together but the blog post as well.
Yeah, this would have been really handy many years ago when I thought I was going to get
some really clean data from a lab vendor giving us some custom biomarker data but no, their
idea of tidy was way different than mine with some of the most cryptic headers known to
humankind and I had to figure out how do I make sense of it for one and then figure out
how to get it all tidied up.
So yeah, if you're in this situation, Unheader from Luis is going to be a great asset to
your toolbox as you said.
So yeah, tidying doesn't always have to be a chore.
These packages make it a heck of a lot easier and definitely highly recommended to check
that out.
It's a great transition.
You bet because sometimes it is pretty easy to take for granted that we were able to put
packages like Unheader or others on our particular setups for running R. Maybe you can, like
I said, install new packages like that, maybe you can swap out a compiler to get the most
speed out of your computations and much more.
But you may also find yourself in a situation where you are dealing with a bit of constraints
in your environment, maybe it's from an IT group or whatnot.
Never.
Never.
Oh, never.
Oh, yeah.
Yeah.
You should have heard the pre-show something constrains poor Mike's been dealing with.
Oh, that's a bonus content waiting to happen, but it's you might find yourself in a situation
where the IT admins give you an R install.
So that's good.
But that's about it.
You don't have R studio.
You don't have anything like that.
You got the flash and prompt at your disposal and you just better make the best of it.
Well, if that resonates with you or maybe the more appropriately, if you find yourself
in that situation unwillingly, then our next highway kind of brings us to the root of how
important concepts in R that you can do any time like vectorization and functional programming
can be achieved in this quote unquote vanilla install.
And that's where a cognitive neuroscientist, Anthanasia Monica Mowinckel walks us through
a common development process of creating code that loops through certain observations and
performs derivations on particular columns of a data set.
And starting with what many of us probably did when we first learned R, create your loops,
create some variables in the global environment.
But that can have some various pitfalls that can occur, especially when you start to troubleshoot
things.
And so that's where the post transitions to, hey, we got to take a functional approach
to this.
So that's where she starts building custom functions that will seem pretty familiar to
some of the concepts we've been talking about through, frankly, the life cycle of this very
podcast.
Have yourself fit for purpose functions.
You can reuse them in many different ways and it's going to make your environment cleaner
and make debugging a lot easier.
And then where can we fit those in?
That's where Anthanasia gives us a tour of how the various apply functions that are built
into your basic installation of R can work, going from the typical apply to s-apply, m-apply,
which is her favorite.
And honestly, you also get to see some of the little idiosyncrancies are somewhat different
calls that you have to make with parameters as you transition from these.
And while this wasn't necessarily the intent of her post, I also think that the blog post
serves as another use case of why the highly acclaimed per package exists.
Because with per, you do get a consistent API, if you will, of running these map reduced
like functions with clear inputs and clear expectations of what the output should be.
Now, again, in her use case, she may be doing a very limited system where hers just simply
isn't available.
So you have to make the best of what you have.
So it's important to have that perspective, especially in these situations.
But it's also showing you that, yes, in BaseR, you can accomplish all these things.
Just got to have a little more investment to learn how the functions work and get a
hang of it, hopefully building some examples for your repertoire to have as reference.
So again, maybe you don't find yourself in this situation routinely, but I have found
myself in this from here and there when I got a custom VM setup to do some HPC work.
And for whatever reason, the IT group couldn't get me that fancy R installation with thousands
of packages like in our default install.
So I had to make the best of it while I waited for certain things to complete.
So again, great illustration of the concepts.
Like I said, vectorization, functional programming can be hugely valuable no matter which environment
you're in.
So that was quite a reality check like post, Mike, what did you think about Anastasia's
post here?
One thing I really loved about Anastasia's blog post is the fact that she actually showcases
some of the errors that she runs into in the console along the way as she works through
the problem from start to finish.
And I think that this is so important.
It's not something that we see in blog posts very often, but I think when you're trying
to teach a new concept to showcase those errors that you ran into along the way, instead of
just saying, hey, here's the right way to do it, I think can really instruct people
a lot better sometimes and can really be a better learning experience for folks who are
trying to follow along with that blog post sometimes because it really provides a vulnerable
look into your thought process that most often will be similar to somebody else's thought
process along the way.
I always believe that it's great use of time to take a dive into Basar once in a while.
I know the tidyverse is amazing.
We know the tidyverse is amazing, but you might be surprised what utility there is in
some Basar functions.
If you do spend some time, just check out what there is in the Basar installation.
There was a tweet recently that we'll link to in the show notes, and it was like, what
is your favorite Basar function?
I can't remember who posted it out there.
Two of mine are the any and all functions from Basar that allow you to provide a vector
of different logicals to each of those functions, and if one of them is true in the any function,
then it'll return a true, and if one of them is false in the all function, it'll return
a false.
It's very useful for checks in my Shiny apps, particularly, or whenever you're doing any
sort of data engineering code where you're having to use some control flow, some if statements,
the any and the all functions, I find useful all the time.
I don't know, Eric, do you have any favorite Basar functions?
Oh, yeah, those are always in any app I make.
It's hard to avoid not benefiting from those great simple ways of invoking the conditional
logic, and one that I always have in my scripts is finding unique values of variables with
unique.
The unique function is literally in every program I make, because I'm always troubleshooting,
oh, wait, did I get all those treatment group levels, or did I get all those lab values,
or oh, jeez, this data set has 5,000 observations.
I don't want to just skim through that.
Just give me the uniques, baby, and then that's what I get, so unique is very valuable in
my Basar tool set.
Unique in sort.
Oh, yes.
Always in those widgets, yeah, in the choices for the widgets.
Absolutely.
Yeah.
Oh, that's a great call out.
Yeah.
There are tons of gems like that.
Set diff.
Set diff.
Oh, yeah.
Another huge one there, too.
I was just doing that recently to troubleshoot why I was not getting certain observations
and lab data for this FDA submission stuff I'm working on, so that being very quick to
figure out, oh, yeah, because I took it from that data set, and I didn't have that record.
Oh, okay.
Now I got it.
So yeah, lots of these things are quite valuable in your tool set.
Well, Mike, I couldn't have said it better myself.
A lot of that base functionality is hugely important to have knowledge of.
And the other part that I mentioned towards the outset was this idea of how R can really
cleanly interop with other tools in the open source world.
And that's where our last highlight in this supersized edition of R-Week Highlights comes
into play, because no R-Weekly episode would be complete without a little visit to our
data viz corner with a little API magic to boot.
And so we marveled in the past on how R itself, or like I said, in combination with other
open source visualization tools, you can make something that looks just as professional
as coming from that fancy high-cost product like Adobe Illustrator, just as an example.
And with his first post produced in that little engine called Quartle, the latest blog post
from Abdul Issa Bitta, who is a full stack developer and an actualist, brings us a comprehensive
guide to creating posters of current NBA player headshots for each team in the league with
both R and some great graphics software in the open source world.
Now first, ain't nobody got time to hard code all those names.
So the first step is to utilize the public-facing ESPN APIs to dynamically grab JSON representations
of both the team and individual player metadata.
That's a pretty clean win there.
And fortunately, he did some research to find where those APIs were, but you get some JSON
out of that.
So of course, that means some wrangling, right?
Data wrangling or JSON, who hasn't done that if you've dealt with APIs in the past?
But Abdul has some really clean and concise code with explanations on how he wrangled
that with the per package.
There's another callback to extract only the bits needed for his particular project here.
And one of the major goals of this post was not just to do some wrangling there.
It was also to look at how can we generate the actual poster.
And surely you can do this some with ggplot too, I'm sure.
And he even mentions that.
But the goal of this was to figure out how can we hook into another tool that's available
in the open source community to do it.
And that's where he hooks into what I call the Swiss army knife of graphics processing
called image magic.
And in particular, image magic contains a handy function called montage that allows
for a collection of image files to be concatenated together into one, well, montage.
See?
Good names, right?
The results look terrific.
But like any experience of real world data, there can be a couple little edge cases here,
such as dealing with long player names for certain teams and et cetera, figuring out
how to deal with that to print cleanly.
But when you look to the end of this post, it looks terrific.
Really polished visualization.
And in the end, it's a fantastic walkthrough and illustration to show how open source can
stand shoulder to shoulder with those proprietary behemoths in the visualization stack.
So really cool use case of grabbing your data, assaging it a little bit, and then bringing
that interoperability principle in mind with R to bring that novel visualization to fruition.
So Mike, you're going to make any sports posters out of all this?
I think I'm going to have to.
I mean, this is bringing me back to like my college dorm room days with some sports posters
on the wall.
I think I could, instead of going out and buying one for 25 bucks, I think with a little
R, a little image magic, some bash work on the command line, I think I could make one
up myself here.
These posters look absolutely incredible.
I feel like it's an opportunity to say in an end to end project, it's never just blank.
So in this case, it was never just R. We usually say it's never just shiny, but in this case,
it was never just R. With that image magic command line tool, it's a phenomenal way to
stitch these images together on a very fluid background that looks like, I don't know,
all of these individuals were really standing together against the same background.
It's pretty incredible.
He shows not only the bash commands for that, but also how to execute those bash commands
using the system functionality in base R, and to execute those commands using R. Love
the quarto blog.
The call outs maybe are like my favorite thing in the entire world, and he has a couple really
nice call outs.
I don't know.
They come out so, so well in a blog post.
They really make you stop and take time to read them.
So fantastic post from top to bottom.
I'm not sure if I was familiar with the R JSON package.
I've been familiar with the JSON Lite package.
It looks like there's some overlapping functionality there, perhaps.
Yeah, I'm fairly sure R JSON came before JSON Lite.
And certainly many, many people have used that in the past too.
So they both do a good job.
Yes, it absolutely looks like it.
And again, he's employing Per to not only create one poster, but to create multiple
posters over all 30 teams in the NBA to have this one final massive graphic that fits right
on this HTML page in your quarto doc.
So incredible read from top to bottom.
I think it's another great example of sort of accomplishing an entire project within
a blog post from start to finish.
And I think if you're the type who learns really well from use case project based type
learning, this would be a fantastic blog post for you to check out.
Well said.
Yeah.
And another, like I said, great, great example of how you can stitch a lot of these pipelines
together and be able to create something really aesthetically visually pleasing and not have
to shell out a whole boatload of money to that other software.
Not that I don't have opinions about that, but I digress, I digress, but what we can
all agree on is that this is a fantastic issue.
And as I mentioned, we were off last week.
So for my additional little fine here, I'm going to give a call back to last week's issue.
And in particular, an amazing blog post from another regular contributor to our week.
Basically, Shannon Peligi, who posted basically a narrative of her in-depth conversation with
Posit's own Jenny Bryan, affectionately titled Yak Shaving.
You're going to have to read the post to figure out why it's called Yak Shaving.
But at a high level, this is talking about Jenny's guide or Jenny's principles, how
she approaches learning a new technical paradigm, a new technical idea, and just the practical
side of how she goes about it.
That's a really entertaining read and also quite insightful, especially as I think about
what's on my docket next year to learn every year a new skill or a new technology or a
new way to orchestrate things together.
I'm definitely going to take some lessons learned from Shannon's post here.
So Mike, what did you want to call out today?
Sure.
No, that's a great pull up from last week's highlight.
I found one in this week's highlight, which is Julia Silge's post.
I thought it was pretty topical today on how to delete all of your tweets programmatically
in R in the R tweet package.
She does have a link to how to download your Twitter archive first, which is a great first
step.
Just that first paragraph, otherwise you will not have any of your tweets anymore.
But if you feel like you want to go through the process of downloading your Twitter archive
and then getting rid of your tweets as well as your account, I think you can follow along
with Julia's post to do that.
Excellent.
Fine.
And I think a lot of people are going to be taking advantage of this, not to this in future,
maybe myself included.
I'm not sure if I'll delete everything, but at the minimum having a backup never, ever
hurts for sure.
Absolutely.
And it looks like Julia has been hard at work because it looked like there were a few tidy
models ecosystem packages that had brand new releases this week as well, or very recently
as well, including parsnip and a couple others that I had seen.
Yeah.
That world never stops spinning, right?
Tidy models is an exciting place to watch, and I'm always eager to see Julia's blog posts
and as well as her screencasts, which are always entertaining as well.
And I want to thank Julia on behalf of the recent endeavor I was embarking on, the R
Pharma Conference, where Julia was a panelist on lessons and ideas for women developing
careers in life sciences and the analytics space.
So she did a great job with that.
So thank you, Julia.
We see here in the audio form here.
And yeah, so that's, we're about to wrap up episode 100, but I want to make an announcement
here that I'm really excited to share with all of you about the future direction of the
show.
What's not changing?
Me and Mike coming on here and bantering about our praise of the highlights and our community
in general.
You're always going to get that, but I'm happy to announce a way to make it even easier for
you, the listener, to share a little bit back with us as they sway or a concept that I've
been reading about quite a bit and I'm fully invested in called value for value.
What this really means that if you get any use out of this podcast, however big or small,
we're going to make it super easy for you to share your value back with us.
Is it a set amount?
No, this is what you choose.
And the easiest way to get started with this is to grab yourself a new podcast app.
And I have a URL just for you to find it, newpodcastapps.com that'll be linked in the
show notes.
And there will be a handy little button in many of these apps that you could download,
whether it's iOS, Android, the web or both.
In particular, I like the ones called fountain pod verse cast the Maddox.
Those are just the name of few where you can send us what's called a boost to give us a
little encouragement perhaps in our endeavors here.
And so I won't say too much more about it, I want to encourage all of you to do your
research and read up on this if you're really interested.
But I do want to mention that I would not be putting my voice or my support for this
if I didn't really believe in it.
And if you've ever listened to many of my podcasts in the past, whether it's our weekly
highlights or the old art podcast, which I hope to resurrect again someday, I have never
taken sponsors ever because I wanted to just be driven by the community for the community.
And this endeavor of value for value actually has some great synergies of what we just talked
about at the top with the mastodon situation, a decentralized way for you to give value
back to us and us to share value with you.
So again, have a look at newpodcastsapps.com and certainly reach out to me if you'd like
more details on how all this works.
But I want to thank some good friends in the community that might be listening to this.
Adam Curry, who actually is the originator of podcasts itself, has educated me on this
as well as my friends at Jupiter Broadcasting have been using this principle quite a bit
in their endeavors.
So again, check that out and just have a look and let me know what you think.
So with that, how can you reach us?
Well, as I mentioned at the outset, our weekly itself has a brand new mastodon account.
We are at our weekly at fostodon.org.
We'll have a link to that in the notes.
And also you can find me.
I am still on Twitter, as they say, with that the artcast, but I am also on mastodon at
our podcast at podcastindex.social, we'll have a link to that in the show notes.
But Mike, where can people find you?
I will be on mastodon today, I promise, but I do not know my handle yet, address or whatever
it's called.
So for now, I'm hanging on on Twitter, but at Mike underscore Ketchbrook k-e-t-c-h-b-r-o-o-k
and we'll hope to have some similar variation to that in my mastodon handle.
And Eric, I think I speak for the whole community where we certainly do not doubt and very much
appreciate all of the effort you have put in to make this content fully community driven.
So thank you very much for your efforts.
And I think we're excited about what the future holds.
And this episode is brought to you by MongoDB.
So you want to store your structured data in an unstructured way for no reason at all,
just to bring it back structured into a data frame later, MongoDB, it's perfect.
Well there you go.
Yeah, we've got a new direction, don't we?
That wasn't quite in the value for value script, but we'll let it aside, we'll let it aside
for now.
But it is an exciting adventure that we're about to embark in the next batch of episodes,
who knows how long this train will last.
We're excited to have you all along the journey to join us for this.
So with that, we're going to close up shop on episode 100.
Again, thank you to everybody around the world who's been listening to all the previous episodes
when I was foaming my way through the very beginning and then right at the ship when
Mike came along to join me for the ride.
So we will see you again in a week.
Until then, that's the end of episode 100 of our weekly highlights.
And again, we'll be back here next week.