A Docker image containing a fully-operational Galaxy instance with pre-installed demonstration material for CRAVAT-P.
Created as a demonstration for the following technical note for the Journal of Proteome Research:
Bridging the Chromosome-Centric and Biology and Disease Human Proteome Projects: Accessible and automated tools for interpreting biological and pathological impact of protein sequence variants detected via proteogenomics
Ray Sajulga, Subina Mehta, Praveen Kumar, James E. Johnson, Candace R. Guerrero, Michael C. Ryan, Rachel Karchin, Pratik D. Jagtap, and Timothy J. Griffin
2.) Open your terminal. Run the following command:
docker run -d -p 8080:80 galaxyp/cravatp
The image will now download from the public repository galaxyp/cravatp on Docker Hub. This should take around 15 minutes to download.
In the meanwhile, feel free to take some time to understand the different components of this Docker command. You can also read up on CRAVAT-P background information in the next section.
|docker||Base command||The base command for the Docker CLI (Command Language Interface)|
|run||Command||Run a command in a new container|
|-d, –detach||OPTION||Run container in background and print container ID|
|-p, –publish||OPTION||Publish a container’s port(s) to the host|
|galaxyp/cravatp||IMAGE||galaxyp’s cravatp image|
More documentation can be found at Docker’s documentation website.
3.) Once the command is finished, wait a few moments for the Docker image to initialize as a container. Open http://localhost:8080 and follow the CRAVAT-P tutorial to access the CRAVAT-P suite. If you do not see the Galaxy screen, wait a few seconds and then reload the page.
Once you are finished using this container, you can clean up your workspace by simply exiting out of Docker.
(Cancer Related Analysis of VAriants Toolkit - Proteomics)
CRAVAT-P is a proteomic extension of CRAVAT (http://cravat.us) developed for the Galaxy-P (http://galaxyp.org) bioinformatics platform. CRAVAT-P exists as a downstream analysis suite for peptide variants. Current support is tailored towards workflows that generate peptide sequences mapped to genomic locations.
The figure above shows the Galaxy tool developed for submitting jobs to the CRAVAT server. It extends from an earlier version of In Silico Solutions’ Galaxy tool (cravat_score_and_annotate). In our CRAVAT-P tool, we added support for additional parameters: CHASM classifiers (e.g., breast, brain-glioblastoma-multiforme, etc.) and the older GRCh37/hg19 human genome build. We also added proteomic support, as highlighted by the outlined red box. Here, a proBED file can be provided for intersection with the genomic input file—VCF (Variant Call Format). You can specify whether you want to output the intersected VCF file or submit only the intersected variants.
VCF (Variant Call Format)
|ID||Chr.||Position||Strand||Ref. base||Alt. base|
ProBED (Proteomic Browser Extensible Data)
Galaxy workflows are tailored pipelines that promote reproducibility, ease-of-use, and preservation of complex analyses. Two workflows, both with differing complexities, are shown above. The simple workflow (top left panel) was used for the paper and Docker image to redirect focus to the downstream analysis i.e., CRAVAT-P’s outputs and viewer. A fully-fledged workflow (bottom panel) is shown as an example of a highly complex workflow. The top right panel shows how workflows can automate parameter selection and offer additional options such as e-mail notification and output cleanup.
Panel A shows the actual viewer, with panels B - E as blown-up images for further detail.
(A-i) Sidebar for showing additional information, mainly column visibility toggling. There are many columns to sift through > from CRAVAT’s annotation.
(A-ii) An embedded webpage from the CRAVAT server termed their “Single Variants Page” feature.
(B) Leveraging the DataTable.js library, this table can be sorted and filtered. By default, it is sorted by p-values (based on the machine learning analysis i.e., VEST or CHASM) from most impactful to least. The selected box exhibits a peptide column that highlights the variant amino acid within a peptide hit. Since some cells may have large amounts of text, the full datum is shown in the display box at the top.
(C) CRAVAT uses Protein Diagrams to show lollipop mutations from your given protein variant. You can also choose TCGA (The Cancer Genome Atlas) tissue mutations. You can mouse over different parts to show domains, binding sites, and other regions of interest.
(D) CRAVAT uses the cytoscape.js library to display gene enrichment networks housed by the NDEx (Network Data Exchange) infrastructure. You can move elements around and examine different pathways.
(E) CRAVAT uses another project developed by the same lab (Professor Rachel Karchin’s lab of John Hopkin’s University) called MuPIT (Mutation Position Imaging Toolbox) designed to show the location of single nucleotide variants (SNVs) on interactive three-dimensional protein structures. You can click on individual residues and adjust the display options.
Import the input files → Run the workflow → Access the viewer