Introduction

HotpotBio is the open research group of Hotpot.ai, established to give back to science via biomedical contributions. Due to restrictions on publishing GenAI research, we launched this group to support the community in other ways.

HotpotBio is inspired by open source where ephemeral groups can drive innovation by attracting talent across organizational boundaries.

Ideally, the world magically cures cancer and creates AI doctors tomorrow, allowing this group to dissolve.

HotpotBio only exists because there are too many open questions.

Problem

Cancer is the second leading cause of death worldwide, claiming the lives of roughly 10 million people per year and devastating the lives of millions more [1].

There are about 8 billion people and 12.7 million doctors. Human doctors cannot bridge this gap and provide the personalized healthcare everyone deserves.

North Star

The mission for HotpotBio is to advance cancer research and AI doctors with open research [2].

While curing cancer and AI doctors are fantasies today, many groups are working feverishly to close the gap between dream and reality.

One day, it is my belief that everyone will enjoy personalized healthcare supervised by Stanford-caliber AI doctors.

We hope to play a tiny part.

Even if the dream never materializes, it is better to aim for the stars and land on the mountains than to not aim at all.

Challenge

Quality datasets and benchmarks can unlock rapid progress in machine learning (ML), but most technologists lack medical expertise while most doctors lack technical expertise. Without better datasets and evaluations, it is hard to train models and improve AI -- not unlike teaching students with incomplete textbooks and practice exams.

Purpose

Our primary objective is to package medical knowledge into a format suitable for engineers and researchers to further AI biomedicine, regardless of medical background. Concretely, we will focus on publishing datasets and benchmarks in collaboration with medical professionals from Stanford, UCSF, and other leading institutions. We will finetune and develop models, as resources permit.

AI Doctor Research

The research is broadly organized into the categories below. Descriptions are tailored for non-technical audiences.

AI Vision

  • Surgery: how to provide real-time anatomy detection for surgeons and self-paced video education for residents?
  • Medical imaging: how to improve detection of disease, musculoskeletal injury, and anatomical abnormalities, both in the clinic and in telemedicine?

AI Hearing

  • Biomedical transcription: how to transcribe audio, particularly conversations with heavy accents?

AI Reading

  • Biomedical RAG factuality: how to accurately answer questions given a specific context?
  • Biomedical text understanding: how to extract information and entities from both structured and unstructured text?
  • Biomedical reliability: how to achieve consistency across identical conditions?

AI Privacy

  • Biomedical privacy: how to preserve patient confidentiality while expanding datasets and facilitating multi-institute collaboration?

AI Reasoning

  • Biomedical reasoning: how to ensure diagnoses and recommendations match expert clinical judgment?

Cancer Research

Our research investigates the association between Epstein-Barr virus (EBV) and cancer, concentrating on the topics below.

Viruses cause cervical cancer, Burkitt lymphoma, nasopharyngeal cancer (NPC), and several other cancer types, but the data is inconclusive for more common cancer types like breast cancer and lung cancer [3-8].

  • Joint Omics Adaptive Nosological (JOAN) detection framework: systematic computational-experimental framework for detecting viruses in cancer samples, starting with adenocarcinomas.
  • EBV sequence conservation.
  • EBV association with breast cancer, starting with triple-negative breast cancer (TNBC).
  • EBV association with lung cancer, starting with non-small cell lung cancer (NSCLC).
  • EBV association with NPC, Burkitt lymphoma, and gastric cancer.
  • EBV association with MYC.
  • 1K TNBC dataset: see below.

How To Contribute

We welcome contributors of all backgrounds — healthcare professionals, academic researchers, software developers, ML engineers, or anyone passionate about healthcare and AI.

For aspiring founders, we hope HotpotBio offers a hub to connect medical professionals and technical individuals since startups are great vehicles for delivering change.

  1. Review Collaboration Areas.
  2. Reach out with the details below. See Author and Contact for contact information.
  • Authorship: are you interested in authorship?
  • Interest areas: which areas interest you?
  • Matching: are you interested in pairing with technical/healthcare professionals?

Collaboration Areas

We welcome contributors in the areas below.

If areas of interest are missing, please let us know.

Healthcare

  • Oncology
  • Virology
  • Surgery
  • Pediatrics
  • Neurology
  • Cardiology
  • Pulmonology
  • Radiology
  • Geriatrics
  • Gastroenterology
  • Pathology
  • Endocrinology
  • Hematology
  • Bioinformatics
  • Genomics
  • Clinical investigations

Contributions can fit any schedule and take one of many forms:

  • Creating 100-200 multiple choice questions per specialty
  • Reviewing questions
  • Defining key clinical tasks and requirements
  • Conducting lit reviews
  • Reviewing paper drafts

Machine Learning

  • VLM
  • Computer vision
  • NLP
  • LLM

Software Development

  • Full-stack web development for simplifying how healthcare professionals create and review training data

Research Culture

HotpotBio focuses on science, deferring policy and ethics to other forums.

Although this position may not appeal to all, the benefit of clear values is cultivating an environment where everyone can concentrate on science. Organizational theory demonstrates that teams united by shared priorities and explicit expectations tend to foster more productive collaborations.

I understand the anxiety around AI, but our culture is rooted in a deep study of technology history and societal progress. Throughout time, a consistent pattern has characterized the emergence of disruptive technology. This cycle was observed with books, computers, the web, and it's repeating again with AI. Fear dominates the discourse while concerned critics seek to curb capabilities and protect the masses.

With hindsight, we know those noble intentions were misguided and failed to account for the transformative benefits spawned by innovation. General technology, by definition, is wieldable for good or bad, but the good vastly outweighs the bad. This propels the world to greater heights of prosperity and accessibility.

On ethics, most people aspire to be moral and responsible, but the challenge is: whose values dictate tradeoffs and resolve disputes? Officials from California, Texas, China, India, France, Japan, the UK, Saudi Arabia, or where? Whose risk profile shines the way forward? For instance, GPT-2, GPT-3, and GPT-4 were all considered too dangerous for the average person, but those worries proved exaggerated at best and unfounded at worst. Moreover, it's presumptuous to assume one jurisdiction can bottle up software ingenuity or constrain global innovation. If America surrenders AI leadership, other nations will readily fill the void.

While healthy people can afford the luxury of endless deliberation, the sick cannot. With nearly 800K people passing away each month from cancer, discovering breakthroughs even one month sooner can save lives and spare immeasurable suffering.

Intelligent people may disagree. I respect different opinions and hope others can as well.

TNBC Dataset Initiative

One TNBC dataset could power tens to hundreds of cancer studies and hopefully set a new precedent for tackling tumor subtypes. See here for details.

OpenAI, GoogleDeepMind, X, AWS, and Anthropic

We welcome partnerships with AI leaders to advance benchmarks and datasets for biomedicine. Long context, image, and video evaluations of frontier models are expensive. Credits and other support would accelerate these into reality.

Sponsors

Hotpot.ai

Author and Contact

Clarence Hu

References

  1. WHO Cancer Fact Sheet.
  2. Given controversies over the definition of "open source," the term "open research" reflects a desire to advance biomedicine without getting ensnared by semantic debates.
  3. Extrachromosomal Amplification of Human Papillomavirus Episomes as a Mechanism of Cervical Carcinogenesis.
  4. Gaps and Opportunities to Improve Prevention of Human Papillomavirus-Related Cancers.
  5. Epstein-Barr virus provides a survival factor to Burkitt's lymphomas.
  6. Targeting Epstein-Barr Virus in Nasopharyngeal Carcinoma.
  7. EBV Infection and Its Regulated Metabolic Reprogramming in Nasopharyngeal Tumorigenesis.
  8. EBV infection-induced GPX4 promotes chemoresistance and tumor progression in nasopharyngeal carcinoma.
AI Headshots AI Image Generator