Over four days in July, data scientists from around the world convened at the French Institute in New York City for the ninth annual New York R Conference and workshops. Featuring a lineup of experts from a wide array of industries, and a community so closely connected by the open source ecosystem, NYR was more than just a mere gathering; it was a celebration of a collective passion for data exploration in R and Python. This shared enthusiasm is the reason why we put on the event: to spark interdisciplinary dialogues, bridge connections, and cultivate shared discoveries. Let’s dive into the highlights from this year’s event…
Watch every talk from the 2023 New York R Conference → Conference Videos
Check out the best photos from the conference → 2023 NYR Photo Gallery
Thank you, speakers!
Pictured above from left to right (Top): Max Kuhn, Caitlin Hudon, Wes McKinney, Ayanthi Gunawardana, Bob Rudis, Molly Huie, Andrew Wallender, Jared P. Lander. (Middle): Emily Riederer, Mitchell O'Hara-Wild, Hamdan Azhar, Jessica Duncan, Emil Hvitfeldt, Saar Golde, Caterina Constantinescu. (Bottom): Matt Dupree, Mike Band, Rick Saporta, Ryan Klein, Chrys Wu, George Perrett, Daniel Chen.
If you multiply the number of speakers (21) by the time allotted for each (20 minutes), you arrive at 440 minutes of insights from expert data scientists across various industries. Each speaker offered unique insights into the latest data science trends, unified by a central theme: R and Python programming languages were fundamental to their work. We start with day one…
Conference Day 1
The conference commenced with Emil Hvitfeldt, a software engineer from Posit, educating the audience on innovative methods to craft presentations using code. He delved into the challenges of constructing slides with CSS and JS, highlighting how Quarto and revealjs can elevate presentations by optimizing the "effect to effort" ratio (something we can all relate to). A few talks later, Daniel Chen, a Postdoc at the University of British Columbia and a contributor at Lander Analytics, echoed the merits of Quarto, making a persuasive case for transitioning from RMarkdown and Python Jupyter Notebooks to Quarto.
Throughout the day, there was a strong emphasis on LLMs (large language models). Matt Dupree, founder of EXORVA, ushered attendees beyond ChatGPT, introducing them to OpenAI’s embedding APIs. For those intrigued by the power of GenAI to sift through vast text volumes to unearth valuable insights, Matt's presentation is a must-watch.
That wasn’t the only GenAI talk of the day. I showcased how LLMs can craft an entire R package through prompt engineering. The significant influence of this technology in the open source community cannot be overstated. LLMs have substantially lowered the threshold for generating code in R or Python and even for constructing an entire R package.
A highlight from the first day was a presentation by Mike Band of the NFL’s Next Gen Stats, who is also a contributor to Lander Analytics. Making his fourth appearance at the NY R Conference in five years, Mike discussed the evolution of machine learning in player tracking data in professional football. My collaboration with Mike dates back to our joint venture with the Minnesota Vikings, developing a predictive model for the 2015 NFL Draft when Mike was an intern with the team.
Pictured above Emil Hvitfeldt, Posit
Conference Day 2
Day two kicked off with George Perrett from the NYU Steinhardt PRIISM center. He introduced Bayesian Additive Regression Trees (BART), a machine learning algorithm melding the strengths of boosted trees with Bayesian inference. Unlike most model outputs which provide a mere point estimate, BART models offer confidence intervals with each prediction. I deeply appreciate George's contributions to the open source community and his significant work in causal inference.
LLMs returned to the spotlight on day two. Caterina Constantinescu from GlobalLogic presented an insightful overview of the latest trends in GenAI. She provided an examination of the challenges and constraints companies encounter when considering the best implementations of this novel technology. Addressing concerns about licensing, privacy, data ownership, and even AI-generated inaccuracies (termed 'hallucinations'), Caterina assured the audience that AI isn't poised to "take our jobs"…yet.
Familiar faces to the New York R Conference, Wes McKinney and Max Kuhn, were among the final presenters. Wes, the creator of the pandas Python library and Apache Arrow, shared his personal journey through the last 15 years of work in data science. His experiences over the years reflect major breakthroughs in the open source community. Notably, Wes is among the trio, including Dan Chen and myself, to have spoken at every NYR conference over its nine-year duration.
Following Wes, Max Kuhn of Posit and author of R's caret package, expounded on strategies to refine the outputs of predictive models through model calibration. Max illustrated mathematical methods that enhance predictions which may be directionally accurate but imprecise in scale.
Pictured above Caterina Constantinescu, GlobalLogic
We closed out the conference with a live taping of the SuperDataScience Podcast
The conference concluded with a live recording of the SuperDataScience Podcast. For its third consecutive year, the podcast hosted a live session at the New York R conference. Jon Krohn, the host, was accompanied by Chris Wiggins, Chief Data Scientist at the New York Times and a faculty member at Columbia University. Together, they embarked on a captivating conversation through the history of data and statistics, tracing its roots from centuries past to its modern significance. They discussed the challenges posed by most data scientists' limited humanities exposure and delved into the contentious history of Bayesian statistics.
Chris’s wealth of knowledge was evident throughout the hour-long podcast. For those eager to delve deeper, I recommend his book, How Data Happened: A History from the Age of Reason to the Age of Algorithms.
Our in-person and virtual workshops were held July 11th & 12th at Columbia University thanks to the statistics department, featuring four two-day interactive learning opportunities with Mitchell O’Hara-Wild, Max Kuhn, Jonah Gabry, Malcolm Barrett and Lucy D'Agostino McGowan, who covered the following topics:
Tidy Time Series and Forecasting in R with Mitchell O’Hara-Wild
Machine Learning in R with Max Kuhn
Bayesian Data Analysis and STAN with Jonah Gabry
Causal Inference in R with Malcolm Barrett and Lucy D'Agostino McGowan
Pictured above the Tidy Time Series and Forecasting in R Workshop with Mitchell O’Hara-Wild
Thank you to all that attended in-person and virtually!
Observing the reunion of the R Community in person was a heartwarming experience. We were also pleased to welcome those unable to make it to NYC. Our offerings varied from physical books to e-books, all presented on a dynamic virtual platform. Meanwhile, delightful food and beverages were continually available for our in-person attendees.
Special thanks to our sponsors!
Our esteemed sponsors greatly enhanced every facet of this conference. Your support is genuinely appreciated! Thank you, Posit, R Consortium, Columbia University, Pearson, Springer, Chapman & Hall/CRC and Manning.
A special shoutout to Nicole and the rest of the Lander Analytics team!
As always, my team excelled in the planning and execution of this year's event. To the outstanding Lander Analytics team: your dedication and effort are always evident. Orchestrating a successful hybrid conference demands meticulous coordination. I want to especially recognize Nicole DelGiudice, the pivotal force behind the event.
Spending the week with all who could join us in New York was a joy. We're glad to have offered a virtual option for those unable to attend in person this year. Looking ahead to our next conference, be sure to book your tickets for the 2023 Government & Public Sector R Conference in October. Remember to use the promo code LANDER20 for a 20% discount.
We hope to see you at our next event!
Jared P. Lander
Chief Data Scientist, Lander Analytics
All photos byJoshua Cork