Jo Lynne Rokia — CAVATICA

Dr. Jo Lynne Rokita leads the Bioinformatics Translational Pediatric Oncology Team at the Center for Data-Driven Discovery in Biomedicine at the Children’s Hospital of Philadelphia. Her research teams broadly studies 1) mechanisms of RNA splicing that promote tumorigenesis and/or create novel targetable splice-derived neoepitopes in pediatric brain tumors and 2) pathogenic germline risk variants in pediatric brain tumor patients and how these variants influence tumor development. She seeks to advance pediatric oncology research and precision medicine through collaboration and development of scalable open-source analytical tools, frameworks, and data resources utilizing Docker, GitHub, AWS, and/or CAVATICA.

You can view some of her work at the following links:

https://pubmed.ncbi.nlm.nih.gov/37492101/ https://d3b-center.github.io/OpenPedCan-manuscript/
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03922-7
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008263
https://academic.oup.com/bioinformatics/article/40/3/btae114/7616989

Please note that the following statements presented are paraphrased and not direct quotes from the conversation.

Q: What is the scope of your current research?

A: Our research spans both germline and somatic aspects of pediatric brain tumor studies. On the germline side, we're collaborating with Dr. Sharon Diskin to conduct the first large-scale landscape analysis of germline pathogenic variants in pediatric brain tumors from the Children's Brain Tumor Network (CBTN). We’re also examining how these variants influence somatic mutations. To support this, we've developed several applications in CAVATICA.

Q: Could you tell us more about the app development related to germline pathogenicity?

A: Definitely. While I didn’t cover this in my webinar, we've developed a couple of key apps necessary for post-processing after running the Kids First workflows. These include a pathogenicity pre-processing app, which is now openly available, and a loss of heterogeneity pre-processing app. Together, these apps add crucial annotations from databases like ClinVar, InterVar, and AutoPBS1.

We also created our own downstream classification system for pathogenicity using an R package, which we call AutoGVP. This system was published earlier this year. So, essentially, after the Kids First workflow, we built additional tools for a more complete analysis pipeline, allowing users to either work through CAVATICA or directly input their own data.

Q: What about your work on RNA splicing in pediatric brain tumors?

A: We have two main focuses for RNA splicing. The first is the discovery of splice variants that may drive tumorigenesis in pediatric brain tumors. We started with high-grade gliomas, where we identified a key splice event involving an exon in the CLIC1 gene that’s frequently spliced in these tumors. This appears to influence the splicing landscape and has oncogenic implications.

The second area is investigating whether these splice variants produce neoepitopes, which could be potential immunotherapy targets. We're analyzing this using the RMETS algorithm in conjunction with data from the Genotype-Tissue Expression (GTEx) project to compare tumor-specific splice variants to normal tissues.

Q: How are you integrating other data sources into this research?

A: In addition to GTEx, we're working with the EVODEVO dataset, which is an evolutionary developmental atlas that includes data from two brain regions and several other body regions. By combining EVODEVO and GTEx, we’re trying to identify tumor-specific neoepitopes that could lead to the development of RNA-based CAR therapies.

These two areas—splice variant discovery and neoepitope identification—are key parts of our ongoing research efforts.

Q: What has been your experience working with the CAVATICA platform?

A: I have to confess, I don’t use CAVATICA much myself—my team handles most of that. But I do work with the engineers, who are fantastic, and I can navigate the platform to find data when I need it. What I’m really interested in is learning more about using tools like RStudio and the Data Cruncher—if that’s still what they’re called! Right now, when we need data, we typically use an API interface to pull and merge it. So, my experience with CAVATICA is limited, but I recognize its great utility.

Children’s National, doesn’t really use CAVATICA, which is something I’d like to change. It would be amazing to continue working with a harmonized dataset like the one we use on CAVATICA. Children’s National has a Brain Tumor Institute, and they’re interested in starting a bioinformatics core. Right now, there’s little support for people doing lab work or bioinformatics, so we’re essentially going to build that support structure from scratch. It’s an exciting opportunity, and I’ve already been looking into AWS grants to help get us started.

Q: How do you see your research evolving in the next few years?

A: Now that we’ve set up the Open Pediatric Brain Tumor Atlas (PBTA) and Open PedCan, we have this rich, multimodal dataset. I hope we can continue our basic research, but the next step is to start translating some of that work. For example, we’re working on identifying neoepitopes, and I’d love to see us develop RNA-based CAR therapies and start lab testing within the next one to two years.

Q: Do you have any advice for others in the field, especially those new to cloud computing or working with multiple datasets?

A: My advice is pretty simple: make your work reproducible. I’m a big proponent of Docker—it makes things easier for everyone. Whether you're using CAVATICA or working with other platforms, Docker will help get the basics down and allow others to run your data seamlessly.