RCONIS at China Pharma R User Group (RUG) Meeting 2025

The 3rd China Pharmaceutical Industry R User Group (RUG) Meeting was held on March 28, 2025, both on-site and online. On-site meeting locations were Shanghai (Johnson & Johnson campus) and Beijing (Sanofi campus). Daniel was fortunate to be able to attend in person and gave a presentation in Shanghai, and shares his experience and notes here.

RCONIS presentation

I presented “Designing Clinical Trials in R with rpact and crmPack” and the slides are available here and the recording here. Despite being the last presentation of the day (so participants were maybe a bit tired and looking forward to the dinner already) and being the only presentation in English (all others were in Chinese), the audience was well engaged and I got a few questions (and I could even use my broken Chinese a little bit at the end, but it is not captured in the recording 😆). My main objectives were to raise awareness about the open source R packages rpact and crmPack. Plus it was a great opportunity to distribute hex stickers of the two package logos, as well as RCONIS stickers, too!

Presentation notes

Here come a few notes that I took during the meeting. You can also find all the abstracts here.

AI coding assistants for clinical data analysis in R

Steven Brooks and Xiecheng Gu from Boehringer Ingelheim (BI) presented this interesting talk (slides, recording). I was very impressed by Steven giving his presentation in Chinese, for the very first time! (Good that my presentation was not directly after his, otherwise it would have disappointed the audience a lot 😆) They presented 3 different solution paths they tried out: Pandas AI Agent, gptstudio and btw. They started with the Pandas AI Agent, which is a Python-based AI coding assistant for data analysis, which normally produces Python code based on a natural language prompt. They added an R translator to it, and made it aware of internal data standards in BI. They continued with gptstudio, which is an RStudio addin which enables LLM assisted coding, writing and analysis. They forked a BI version to access their LLM. They mentioned that because it does not execute the code it produces, there are more bugs compared with Pandas AI Agent. Finally they presented btw, which is a very new package (only 4 months old) from Posit, and thus not yet on CRAN. But it has a lot of promise: It can access the R environment including data frames, documentation etc, and provide this information to the LLM. Unfortunately, it does not keep the chat history.

How AI is changing the way we work

Jiaqi Song from J&J presented this talk (slides, recording). He started from the idea that the production code would be programmed in SAS, whereas the quality control (QC) code would be written in R and produced with the help of AI. The problem is then that this code needs to use specialized packages, such as rtables and tern, which is difficult for the AI to generate, because there is not a lot of training data available. His idea is to then provide example code first as an attachment to the LLM prompt. One example project was building a SAS to R translator Shiny app. They found that the LLM responded quite well to requests for adopting the UI of the app, after review by the developer. They also tried out reasoning models in the LLM, and found that they can come up with very lengthy explanations for relatively simple code. Another use case was R learning support. The idea is that we move from “reading books” (which is great but can take a long time) to “smart reading assistants” (which can be much faster and hopefully condenses the key ideas from the book to us). Similarly, instead of “watching whole videos” we can use “video annotations” to abbreviate the learning process. I found a cool idea to prompt the LLM with “Create a learning plan for me” which can then be very tailored to your learning objectives and existing skill set.

He shared a few tipps at the end:

Build from the bottom up, and piece by piece, otherwise you can run in circles as the LLM is shifting all pieces at the same time
Start a new prompt when you have reached a dead end, and try to rephrase the question
Try something with AI that you think you cannot do without AI
Try to move questions from company internal forums to open source forums (e.g. Stack Overflow) to get more feedback and help from the community, which allows to share learnings across companies and at the same time create more training material for LLMs

AI generated ADaM data sets using R code

Shuang Gao from BeiGene presented this talk (slides, recording 1, recording 2). The motivation is to be faster, make less errors, have an alternative to SAS, and reallocate resources to more value-adding tasks. The idea here was to provide the SAP and CRFs and then let the AI generate the ADaM datasets from there, via producing R code, built on the admiral package. They use a multi-agents process with separate steps for specifications, code generation, debugging, validation and review. I found the final slide with the values guiding their work quite interesting: On the one hand, leverage AI to enhance each colleague’s skill set. On the other hand, leverage AI to convert individual experts’ knowledge into a common knowledge base.

easeP21: R tool to ease the Pinnacle 21 review process

Longfei Li and Xing Wang from Sanofi presented this in Beijing (slides, recording). The motivation is that the P21 validation report lists all issues, but without stating in which data set they occur. Note that there is a free community version, vs. an expert version of the P21 tool. The idea now is to ease the P21 process by mapping the issues into SDTM and ADaM specific summary reports. So, easeP21 does not replace P21 but just makes it easier to use. The rules can be specified to categorize issues automatically later. The split of error messages into multiple affected datasets is possible too. They noted that P21 helps with CDISC compliance, but it is just a tool for pharma companies to achieve this. Before running P21, it is good to run another SDTM check tool, e.g. the sdtmchecks package.

R + VBA to generate Excel sheets with macros

Jundong Ma from Dizal Pharmaceuticals presented this talk (slides, recording). The motivation here is to automate the generation of medical data review listings, which use the data from the electronic data capture (EDC) system (so not yet SDTM or ADaM). Dizal Pharma is using Excel sheets for these listings, which include VBA macros. They found that openxlsx2 is the best R package to create Excel sheets, and it supports different code syntax as well. You can then call R from VBA as well. VBA can work with Windows API and allows customization. Still I wonder if it would not be better to do this in a Shiny templated app instead. Of couse one advantage is that you don’t need a Shiny server. Which is a good segway to the next talk’s notes!

Vue and webR Integration for Serverless Local Statistical Analysis in a Single HTML File

Kaiping Yang from BeiGene presented this very interesting talk (slides, talk). He combined Vue, which is a responsive Javascript front-end for user interfaces, with webR, which allows to run R in the browser. Both can be included in the HTML page via Javascript sections. For the coding, AI can be very helpful. He mentioned later to me that he does not know anything about Javascript before but he can still manage to quickly create this serverless interactive analysis page. In comparison to Shinylive, you can skip all the Shiny part and the app is therefore more lightweight and more responsive. I also found that there is already a little interface with R: vueR by Kent Russell, with some interesting discussion about the connections to Shiny here.

Leverage open-source knowledge into statistical validity with validation

Frank Yang from CIMS Global presented this talk, which was great, not just because it mentioned openstatsware prominently! (slides, recording). He in particular also mentioned the openstatsguide which we released last year (link). CIMS has built up a workflow for validation of R packages, which is based on valtools, which also has a cool cheatsheet. Internally they developed several packages, including cimstfl (which is similar to Roche’s TLG catalog), interactive.stats (which is a teal fork it seems) as well as stats2csr to help with CSR creation.

Links

The conference website is here and it contains the program with all abstracts and slides, as well as video recordings of the presentations, and a collection of photos.

Big thanks to the organizers

Joe Zhu, Baoqing Li and Fan Zhang put together a great conference, which is always a lot of work. Especially the fact that they had two locations (Shanghai and Beijing) and a hybrid format (on-site and online) is a big challenge. Plus they gave us great conference swag bags 🎉 I am sure that the participants appreciated the effort and enjoyed the conference. Thank you very much for the invitation to present! 🙏