MCP (Model Context Protocol) Server for HGNC (HUGO Gene Nomenclature Committee) Gene Nomenclature Resources.
Overview
This R package provides tools for accessing HGNC gene nomenclature data, including functions for searching, resolving, and validating gene symbols. It also includes an MCP (Model Context Protocol) server that exposes these tools to LLM copilots and other MCP-compatible clients.
Installation
Install from GitHub
# Install the latest version from GitHub
remotes::install_github("armish/hgnc.mcp")Install from source
If you’ve cloned the repository locally:
# First, generate documentation (required before installation)
source("generate-docs.R")
# Then install the package
devtools::install(".")Alternatively, you can generate documentation manually:
roxygen2::roxygenise()
devtools::install(".")Data Management
The package uses smart caching to manage the HGNC complete dataset:
- First use: Downloads data from the official HGNC source and caches it locally
- Subsequent uses: Loads from cache for fast access
- Updates: Automatically checks if cache is stale (default: 30 days) and refreshes if needed
Data Functions
# Load HGNC data (downloads and caches on first use)
hgnc <- load_hgnc_data()
# Check cache status
get_hgnc_cache_info()
# Force refresh the cache
hgnc <- load_hgnc_data(force = TRUE)
# Or explicitly download
download_hgnc_data(force = TRUE)
# Clear cache
clear_hgnc_cache()
# Check if cache is fresh (default: 30 days)
is_hgnc_cache_fresh()
is_hgnc_cache_fresh(max_age_days = 7)Data Source
HGNC data is sourced from: https://storage.googleapis.com/public-download-files/hgnc/tsv/tsv/hgnc_complete_set.txt
Features
- Smart local caching with automatic updates
- Cross-platform cache directory management
- Configurable cache freshness (default: 30 days)
- Cache metadata tracking (download time, file size, source URL)
- HGNC REST API client with rate limiting and caching
- Gene symbol search, resolution, and validation tools
- Batch operations for gene lists
- Gene group and family queries
- Change tracking for updated symbols
- MCP server for integration with LLM copilots and AI tools
MCP Server
The package includes a Model Context Protocol (MCP) server that exposes HGNC nomenclature services to AI assistants, copilots, and other MCP-compatible clients. The server provides three types of MCP primitives:
- Tools: API endpoints for actions like search, normalize, and validate
- Resources: Read-only data for context injection (gene cards, group information, dataset metadata)
- Prompts: Workflow templates that guide AI assistants through multi-step nomenclature tasks
This allows LLMs to directly access HGNC services for gene name resolution, validation, compliance checking, and more.
Starting the MCP Server
Using R:
# Load the package
library(hgnc.mcp)
# Check dependencies
check_mcp_dependencies()
# Start with stdio transport (recommended for desktop clients like Claude Desktop)
start_hgnc_mcp_server(transport = "stdio")
# Or start HTTP server on default port 8080 (for web-based or remote clients)
start_hgnc_mcp_server(transport = "http")
# Customize HTTP server configuration
start_hgnc_mcp_server(
transport = "http",
port = 9090,
host = "0.0.0.0",
swagger = TRUE
)Using the standalone script:
# Start with stdio transport (for desktop clients)
Rscript inst/scripts/run_server.R --stdio
# Start HTTP server (default behavior)
Rscript inst/scripts/run_server.R
# Custom port for HTTP mode
Rscript inst/scripts/run_server.R --port 9090
# Update cache before starting
Rscript inst/scripts/run_server.R --update-cache
# Disable Swagger UI (HTTP mode only)
Rscript inst/scripts/run_server.R --no-swagger
# Get help
Rscript inst/scripts/run_server.R --helpMCP Client Configuration
Claude Desktop (Recommended)
Claude Desktop supports stdio transport for efficient local communication. Configuration file location: - macOS: ~/Library/Application Support/Claude/claude_desktop_config.json - Windows: %APPDATA%\Claude\claude_desktop_config.json - Linux: ~/.config/Claude/claude_desktop_config.json
Using local R installation:
{
"mcpServers": {
"hgnc": {
"command": "Rscript",
"args": ["-e", "hgnc.mcp::start_hgnc_mcp_server(transport='stdio')"],
"env": {
"HGNC_CACHE_DIR": "${HOME}/.cache/hgnc"
}
}
}
}Using Docker:
{
"mcpServers": {
"hgnc": {
"command": "docker",
"args": ["run", "--rm", "-i", "-v", "hgnc-cache:/home/hgnc/.cache/hgnc", "ghcr.io/armish/hgnc.mcp:latest", "--stdio"]
}
}
}Other MCP Clients: For HTTP-based clients, start the server with transport='http' and connect to http://localhost:8080/mcp.
Available MCP Tools
The MCP server exposes the following tools:
- info - Get HGNC REST API metadata and capabilities
- find - Search for genes by query across symbols, aliases, and names
- fetch - Fetch complete gene records by field value (HGNC ID, symbol, etc.)
- resolve_symbol - Resolve a gene symbol to the current approved HGNC symbol
- normalize_list - Batch normalize a list of gene symbols (fast, uses local cache)
- xrefs - Extract cross-references (Entrez, Ensembl, UniProt, OMIM, etc.)
- group_members - Get all genes in a specific gene group or family
- search_groups - Search for gene groups by keyword
- changes - Track nomenclature changes since a specific date
- validate_panel - Validate gene panels against HGNC policy with replacement suggestions
Available MCP Resources
Resources provide read-only data for context injection:
- get_gene_card - Formatted gene information cards (JSON/markdown/text)
- get_group_card - Gene group information with members
- get_changes_summary - Nomenclature changes since a date
- snapshot - Dataset metadata (static resource)
Available MCP Prompts
Note: MCP Prompts are currently being integrated. Prompt functionality will be automatically enabled once the
plumber2mcppackage NAMESPACE is updated to exportpr_mcp_prompt(). The prompt functions are implemented and ready to use.
Prompts are workflow templates that guide AI assistants through multi-step HGNC tasks:
normalize-gene-list - Guides through normalizing gene symbols to approved HGNC nomenclature. Helps with batch symbol resolution, handling aliases/previous symbols, and optionally fetching cross-references.
check-nomenclature-compliance - Validates gene panels against HGNC nomenclature policy. Identifies non-approved symbols, withdrawn genes, and duplicates, then provides replacement suggestions with rationale.
what-changed-since - Generates human-readable summaries of HGNC nomenclature changes since a specific date. Useful for governance, compliance tracking, and watchlist monitoring.
build-gene-set-from-group - Discovers HGNC gene groups by keyword search and builds reusable gene set definitions from members. Provides output in multiple formats (list, table, JSON) with metadata for reproducibility.
API Documentation
When the server is running with Swagger enabled (default), you can access the interactive API documentation at:
http://localhost:8080/__docs__/
This provides detailed information about each endpoint, request/response formats, and allows you to test the API directly from your browser.
Deployment
The HGNC MCP server can be deployed in several ways depending on your needs.
Docker Deployment
The easiest way to deploy the server is using Docker. Pre-built images are publicly available on GitHub Container Registry and support multiple architectures.
Available Images
-
Registry:
ghcr.io/armish/hgnc.mcp -
Platforms:
linux/amd64,linux/arm64 -
Tags:
-
latest- Latest stable release from main branch -
main- Latest commit on main branch -
v*- Specific version tags (e.g.,v1.0.0) -
pr-*- Pull request builds for testing
-
Images are automatically built and published on every push to the main branch via GitHub Actions.
Quick Start with Docker
# Pull the pre-built image (supports both amd64 and arm64)
docker pull ghcr.io/armish/hgnc.mcp:latest
# For Claude Desktop (stdio mode)
docker run --rm -i \
-v hgnc-cache:/home/hgnc/.cache/hgnc \
ghcr.io/armish/hgnc.mcp:latest --stdio
# For HTTP server mode
docker run -d \
--name hgnc-mcp-server \
-p 8080:8080 \
-v hgnc-cache:/home/hgnc/.cache/hgnc \
ghcr.io/armish/hgnc.mcp:latest
# Access the server (HTTP mode only)
open http://localhost:8080/__docs__/Build from Source
# Clone the repository
git clone https://github.com/armish/hgnc.mcp.git
cd hgnc.mcp
# Build the Docker image
docker build -t hgnc-mcp:latest .
# Run in stdio mode (for Claude Desktop)
docker run --rm -i \
-v hgnc-cache:/home/hgnc/.cache/hgnc \
hgnc-mcp:latest --stdio
# Or run in HTTP mode
docker run -d \
--name hgnc-mcp-server \
-p 8080:8080 \
-v hgnc-cache:/home/hgnc/.cache/hgnc \
hgnc-mcp:latestDocker Compose
For a more complete setup with persistent storage:
# Start the server and supporting services
docker compose up -d
# View logs
docker compose logs -f
# Test the server
docker compose --profile test up hgnc-test-client
# Stop the server
docker compose downNote: This uses the modern
docker composecommand (Docker Compose V2). If you have the legacy standalone version, usedocker-compose(with a hyphen) instead.
See examples/docker/README.md for advanced Docker deployment options, including: - Production deployment with Nginx reverse proxy - Development setup with hot reload - Resource limits and health checks - TLS/HTTPS configuration
Production Deployment
For production environments, we recommend:
- Use Docker - The provided Dockerfile uses multi-stage builds and runs as a non-root user
- Set up a reverse proxy - Use Nginx or similar for TLS, rate limiting, and load balancing
- Persistent cache - Mount a volume for the HGNC data cache
- Health monitoring - The container includes health checks; integrate with your monitoring system
- Resource limits - Set appropriate CPU and memory limits (recommended: 2 CPU, 4GB RAM)
Example production docker-compose configuration:
services:
hgnc-mcp-server:
image: ghcr.io/armish/hgnc.mcp:latest
ports:
- "127.0.0.1:8080:8080" # Only expose to localhost
volumes:
- hgnc-cache:/home/hgnc/.cache/hgnc
restart: unless-stopped
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G
healthcheck:
test: ["CMD", "Rscript", "-e", "tryCatch(httr::GET('http://localhost:8080/__docs__/'), error = function(e) quit(status=1))"]
interval: 30s
timeout: 10s
retries: 3Cloud Deployment
The Docker image can be deployed to any cloud platform that supports containers:
Kubernetes
For Kubernetes deployments, see the example manifests in examples/kubernetes/ (coming soon).
CI/CD Integration
The repository includes GitHub Actions workflows for: - R CMD check - Package validation across multiple platforms - Test coverage - Automated testing with coverage reports - Docker build - Multi-platform Docker image builds (amd64, arm64)
See .github/workflows/ for workflow configurations.
To use these workflows in your fork: 1. Enable GitHub Actions in your repository settings 2. Add any required secrets (e.g., CODECOV_TOKEN) 3. Push to trigger the workflows
Configuration Options
The server can be configured via command-line arguments or environment variables:
| Option | Environment Variable | Default | Description |
|---|---|---|---|
--port |
MCP_SERVER_PORT |
8080 | Server port |
--host |
MCP_SERVER_HOST |
0.0.0.0 | Server host |
--no-swagger |
- | false | Disable Swagger UI |
--check-cache |
- | false | Check cache before starting |
--update-cache |
- | false | Force cache update |
| - | HGNC_CACHE_DIR |
Platform default | Cache directory |
Example with environment variables:
Security Considerations
When deploying the HGNC MCP server:
-
Network Access: By default, the server binds to
0.0.0.0(all interfaces). For production, bind to127.0.0.1and use a reverse proxy - Rate Limiting: The server includes internal rate limiting for HGNC API calls, but consider adding external rate limiting via reverse proxy
- Authentication: The MCP server does not include authentication. Use a reverse proxy with authentication if needed
- TLS: Always use TLS/HTTPS in production. Configure this at the reverse proxy level
- Resource Limits: Set appropriate CPU and memory limits to prevent resource exhaustion
- Updates: Regularly update the Docker image to get security patches and HGNC data updates
Monitoring
Monitor the following for production deployments:
-
Health endpoint:
GET /__docs__/- Returns 200 if server is healthy - Container health: Docker/Kubernetes health checks are configured
- Resource usage: Monitor CPU and memory usage
-
Cache freshness: Check
get_hgnc_cache_info()for cache age - Error rates: Monitor server logs for errors
Example health check script:
Troubleshooting
Common deployment issues:
Port already in use:
# Find what's using the port
lsof -i :8080
# Use a different port
docker run -p 9090:8080 hgnc-mcp:latestOut of memory:
Cache not persisting:
# Verify volume exists
docker volume inspect hgnc-cache
# Check mount point
docker inspect hgnc-mcp-server | grep Mounts -A 10For more deployment examples and troubleshooting, see: - Docker deployment guide - MCP client configuration - GitHub Issues
Usage Examples
Basic Gene Lookups
library(hgnc.mcp)
# Search for genes
results <- hgnc_find("BRCA")
# Fetch a specific gene
gene <- hgnc_fetch("symbol", "BRCA1")
# Resolve a symbol (handles aliases and previous symbols)
resolution <- hgnc_resolve_symbol("BRCA1", mode = "lenient")Batch Operations
# Normalize a list of gene symbols
symbols <- c("BRCA1", "tp53", "EGFR", "OLD_SYMBOL", "invalid")
result <- hgnc_normalize_list(symbols)
# View results
print(result$results)
print(result$summary)
print(result$warnings)Validation
# Validate a gene panel
panel <- c("BRCA1", "BRCA2", "TP53", "ATM", "CHEK2")
validation <- hgnc_validate_panel(panel)
# Check for issues
print(validation$summary)
print(validation$report)Change Tracking
# Find genes modified in the last 30 days
recent_changes <- hgnc_changes(since = Sys.Date() - 30)
print(recent_changes$summary)
# Track symbol changes since a specific date
symbol_changes <- hgnc_changes(
since = "2024-01-01",
change_type = "symbol"
)Gene Groups
# Search for gene groups
kinases <- hgnc_search_groups("kinase")
# Get members of a specific group
members <- hgnc_group_members("Protein kinases")Real-World Use Cases
Clinical Genomics: Panel Validation
Ensure clinical gene panels use current HGNC-approved nomenclature:
# Load a clinical panel from a CSV file
clinical_panel <- read.csv("hereditary_cancer_panel.csv")$gene_symbol
# Validate against HGNC standards
validation <- hgnc_validate_panel(clinical_panel, policy = "HGNC")
# Review any issues
if (validation$summary$status != "PASS") {
cat("Issues found:\n")
print(validation$report)
# Get replacement suggestions
if (!is.null(validation$suggestions)) {
cat("\nSuggested updates:\n")
print(validation$suggestions[, c("input_symbol", "suggested_symbol", "reason")])
}
}
# Generate normalized panel for clinical use
normalized <- hgnc_normalize_list(
clinical_panel,
return_fields = c("symbol", "name", "hgnc_id", "omim_id", "location")
)
# Export for lab reporting system
write.csv(normalized$results, "validated_panel.csv", row.names = FALSE)Research: Cross-Study Data Integration
Harmonize gene symbols across multiple datasets:
# Combine gene lists from different studies
study1_genes <- read.csv("rnaseq_study1.csv")$gene
study2_genes <- read.csv("microarray_study2.csv")$gene
study3_genes <- read.csv("proteomics_study3.csv")$gene
all_genes <- unique(c(study1_genes, study2_genes, study3_genes))
# Normalize to current HGNC symbols
normalized <- hgnc_normalize_list(
all_genes,
return_fields = c("symbol", "name", "hgnc_id", "entrez_id", "ensembl_gene_id"),
dedupe = TRUE
)
# Map back to original datasets with unified nomenclature
# This eliminates false negatives from symbol inconsistenciesDrug Development: Target Validation
Build and maintain target gene lists for drug development:
# Build a kinase inhibitor target panel
kinase_groups <- hgnc_search_groups("kinase")
all_kinases <- hgnc_group_members("Protein kinases")
# Filter for specific kinase families of interest
tyrosine_kinases <- hgnc_search_groups("tyrosine kinase")
target_kinases <- hgnc_group_members("Receptor tyrosine kinases")
# Get comprehensive cross-references for target validation
targets_with_xrefs <- hgnc_normalize_list(
target_kinases$symbol,
return_fields = c("symbol", "name", "hgnc_id", "entrez_id",
"ensembl_gene_id", "uniprot_id", "omim_id")
)
# Track any nomenclature changes quarterly
quarterly_changes <- hgnc_changes(since = Sys.Date() - 90, change_type = "all")
target_updates <- quarterly_changes$changes[
quarterly_changes$changes$symbol %in% target_kinases$symbol,
]Regulatory Compliance: Audit Trail
Maintain nomenclature compliance for regulatory submissions:
# Document panel version with HGNC provenance
create_compliance_report <- function(panel_genes, panel_name) {
# Normalize genes
normalized <- hgnc_normalize_list(
panel_genes,
return_fields = c("symbol", "name", "hgnc_id", "status", "location")
)
# Validate
validation <- hgnc_validate_panel(panel_genes)
# Get cache info for provenance
cache_info <- get_hgnc_cache_info()
# Create report
report <- list(
panel_name = panel_name,
report_date = Sys.Date(),
hgnc_version = cache_info$download_date,
hgnc_source = cache_info$source_url,
total_genes = length(panel_genes),
valid_genes = sum(normalized$results$status == "Approved"),
validation_status = validation$summary$status,
normalized_genes = normalized$results,
validation_report = validation$report,
warnings = normalized$warnings
)
# Save for audit trail
saveRDS(report, paste0(panel_name, "_", Sys.Date(), "_compliance.rds"))
return(report)
}
# Use for regulatory submission
panel <- c("BRCA1", "BRCA2", "TP53", "PTEN", "ATM")
compliance <- create_compliance_report(panel, "BRCA_Panel_v2")Literature Mining: Standardizing Gene References
Extract and normalize gene symbols from publications:
# Parse gene symbols from abstract/full text (hypothetical)
extracted_genes <- c("p53", "BRCA-1", "EGF receptor", "HER2", "ERBB2")
# Resolve to standard HGNC symbols
resolved <- lapply(extracted_genes, function(g) {
result <- hgnc_resolve_symbol(g, mode = "lenient")
if (!is.null(result$approved_symbol)) {
data.frame(
original = g,
approved = result$approved_symbol,
confidence = result$confidence
)
}
})
# Combine results
gene_mapping <- do.call(rbind, resolved)
print(gene_mapping)
# Result:
# original approved confidence
# 1 p53 TP53 alias
# 2 BRCA-1 BRCA1 approved
# 3 ERBB2 ERBB2 approvedAI-Assisted Analysis: Using MCP with Claude
With the MCP server running, Claude can help with gene nomenclature tasks:
Then in Claude Desktop (with MCP configured):
You: “I have a list of genes from an old microarray study: BRCA1, p53, EGFR, HER-2, NBS1. Can you normalize these to current HGNC symbols and check if any have been updated?”
Claude: Uses the normalize_list and validate_panel MCP tools to analyze the genes and provide a detailed report with current symbols, any changes, and recommendations.
Documentation
Comprehensive documentation is available in the package vignettes:
- Getting Started with hgnc.mcp - Installation, basic usage, and core functions
- Normalizing Gene Lists for Clinical Panels - Best practices for clinical genomics workflows
- Running the MCP Server - MCP server setup, configuration, and deployment
- Working with HGNC Gene Groups - Building gene panels from families and functional groups
View vignettes in R:
Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
Author
Bulent Arman Aksoy (arman@aksoy.org)