PROCEDURE

Notice

The web version of PhosMap is for quick start of visualization only due to the low-level hardware of R shiny server. It is single-threaded and we recommend users to analyze small data sets using the demo server. For larger datasets, upgraded hardware is necessary according to the possible computational cost of the data. We recommend users to use the local docker version of PhosMap.

Introduction of example data

Here, we reanalysis (phospho)proteomic profilings of WiDr colorectal cancer cells harbouring the BRAF(V600E) mutation after treatment using vemurafenibin a time course of 0, 2, 6, 24, and 48 hour[1]. The raw files were deposited in ProteomeXchange Consortium(PXD007740). The raw data were processed in Firmiana, a one-stop proteomic cloud platform[2], to obtain quantitative peptide and protein files. You can download example data in https://github.com/liuzan-info/PhosMap/tree/master/examplefile/mascot and https://github.com/liuzan-info/PhosMap/tree/master/examplefile/maxquant.

Preprocessing for Maxquant data

Import MaxQuant data

How to import your MaxQuant data

  1. Go to the ‘Import data’ tab.
  2. Choose 'Maxquant' to start with data from Maxquant.
  3. Click ‘Browse’ to upload phosphoproteomics experimental design file in .txt format, and phospho (STY)Sites.txt. Proteomics experimental design file is optional.
  4. Uploaded data will be shown in the 'Data Overview' secondary tab.
  5. You can also choose ‘load exmaple data’ to use exmaple files.

Quality control and merging

Function

Generate merged phosphoproteomics data frame based on peptides files.

How to get analysis results

  1. Go to the ‘Preprocessing’ tab.
  2. Modify the parameters in Step1 according to your needs.
  3. Click the running button in Step1 and the file will apear on the right.

Parameter Selection Explanation

Interpretation of analysis results

We performed quality control for identified phosphopeptides using PhosMap, those phosphopeptides that met 1% FDR at peptide level and had ion score greater than 20 and the highest confidence probabilities of p-sites computed by Mascot, were kept. We merged phosphopeptides list with quantitative value from all experiments to generate a matrix for analysis.

Data normalization

Function

PhosMap provides two kinds of normalizations, a total sum scaling normalization and normalizing phosphoproteomics data based on proteomics data.

How to get analysis results

  1. Go to the ‘Preprocessing’ tab.
  2. Modify the parameters according to your needs.
  3. Click the running button in the Step2 and the normalized data of p-sites based on a total sum scaling will apear on the right.
  4. Click the running button in Step3 and normalized data of p-sites based on proteomics data will apear on the right.

Parameter Selection Explanation

Preprocessing for Firmiana data

Import Firmiana data

How to import your Firmiana data

  1. Go to the ‘Import data’ tab.
  2. Choose 'Firmiana' to start with data from Firmiana.
  3. Click ‘Browse’ to upload phosphoproteomics experimental design file in .txt format.
  4. Zip your Mascot xml files and Phosphoproteomics peptide files, and then upload. The folder tree is shown below. File names of Mascot xml files and Phosphoproteomics peptide files must be consistent with ’Experiment_Code’ of phosphoproteomics experimental design file.
  1. Proteomics data is optional. Click ‘Browse’ to upload proteomics experimental design file in .txt format. Zip your Profiling_gene_txt and upload. The folder tree is shown below. File names of Profiling_gene_txt must be consistent with ‘Expriment_Code’ of proteomics experimental design file.
  1. Uploaded data will be shown in the 'Data Overview' secondary tab.
  2. You can also choose ‘load exmaple data’ to use exmaple files.

Parser

Function

If you start with .xml files from mascot results, you can run this button to parser them to sites score files, based on which .csv files of phosphorylation sites with confidence score will be genereated.

How to get analysis results

  1. Go to the 'Preprocessing' tab.
  2. Click the running button in Step1 and the file will apear on the right.

Quality control and merging

Function

Generate merged phosphoproteomics data frame based on peptides files.

How to get analysis results

  1. Go to the ‘Preprocessing’ tab.
  2. Modify the parameters according to your needs.
  3. Click the running button in Step2 and the file will apear on the right.

Parameter Selection Explanation

Interpretation of analysis results

We performed quality control for identified phosphopeptides using PhosMap, those phosphopeptides that met 1% FDR at peptide level and had ion score greater than 20 and the highest confidence probabilities of p-sites computed by Mascot, were kept. We merged phosphopeptides list with quantitative value from all experiments to generate a matrix for analysis.

Mapping p-sites to protein

Function

Mapping protein gi number to gene symbol and outputing expression profile matrix with gene symbol.
Constructing the data frame with unique phosphorylation site for each protein sequence.

How to get analysis results

  1. Go to the ‘Preprocessing’ tab.
  2. Modify the parameters according to your needs.
  3. Click the running button in Step3 and the file will apear on the right.

Parameter Selection Explanation

Interpretation of analysis results

Combining the phosphopeptides sequence, modification position, attached protein ID and the built-in human protein reference database of PhosMap, all p-sites were mapped to the corresponding protein sequence and represented by unique p-sites identifier (upsID) that consisted of a protein GI number/accession, gene symbol and location of the p-site in the protein sequence. In addition, the matched proteome data with phosphoproteome were collected at each time point in Ressa, et al. study. Finally, 3,649 unique p-sites were obtained and their quantitative values were normalized by matched protein profiling data using PhosMap.

Data normalization

Function

PhosMap provides two kinds of normalizations, a total sum scaling normalization and normalizing phosphoproteomics data based on proteomics data.

How to get analysis results

  1. Go to the ‘Preprocessing’ tab.
  2. Modify the parameters according to your needs.
  3. Click the running button in the Step4 and the normalized data of p-sites based on a total sum scaling will apear on the right.
  4. Click the running button in Step5 and normalized data of p-sites based on proteomics data will apear on the right.

Parameter Selection Explanation

Preprocessing for Spectronaut data

Import Spectronaut data

How to import your Spectronaut data

  1. Go to the ‘Import data’ tab.
  2. Choose 'Spectronaut' under 'DIA' to start with data from Spectronaut.
  3. Click ‘Browse’ to upload phosphoproteomics experimental design file in .txt format.
  4. Click ‘Browse’ to upload Report file generated by Spectronaut in xls format.

Parser & p-site Quality Control

Function

If you start with .xls files from Spectronaut results, you can run this button to parser them to sites score files, based on which .csv files of phosphorylation sites with confidence score will be genereated.

How to get analysis results

  1. Go to the 'Preprocessing' tab.
  2. Click the running button in Step1 and the file will apear on the right.

Parameter Selection Explanation

Normalizaiton & Imputation & Filtering

Function

PhosMap performs a total sum scaling normalization and imputation for missing values with various methods.

How to get analysis results

  1. Go to the 'Preprocessing' tab.
  2. Click the running button in Step2 and the file will apear on the right.

Parameter Selection Explanation

Normalization Method: The approach used to address unequal loading of phosphopeptides. 'Global' or 'median' refer to scaling the intensities of phosphopeptides globally or based on the median value.

Imputation Method: Strategies for replacing missing values. 'Globally' or 'by group' refer to imputing missing values across the entire dataset or within experimental groups.

Preprocessing for Dia-NN data

Import Dia-NN data

How to import your Dia-NN data

  1. Go to the ‘Import data’ tab.
  2. Choose 'Dia-NN' under 'DIA' to start with data from Dia-NN.
  3. Click ‘Browse’ to upload phosphoproteomics experimental design file in .txt format.
  4. Click ‘Browse’ to upload Report file generated by Dia-NN in tsv format.

Parser & p-site Quality Control

Function

If you start with .tsv files from Dia-NN results, you can run this button to parser them to sites score files, based on which .csv files of phosphorylation sites with confidence score will be genereated.

How to get analysis results

  1. Go to the 'Preprocessing' tab.
  2. Click the running button in Step1 and the file will apear on the right.

Parameter Selection Explanation

Normalizaiton & Imputation & Filtering

Function

PhosMap performs a total sum scaling normalization and imputation for missing values with various methods.

How to get analysis results

  1. Go to the 'Preprocessing' tab.
  2. Click the running button in Step2 and the file will apear on the right.

Parameter Selection Explanation

Normalization Method: The approach used to address unequal loading of phosphopeptides. 'Global' or 'median' refer to scaling the intensities of phosphopeptides globally or based on the median value.
Imputation Method: Strategies for replacing missing values. 'Globally' or 'by group' refer to imputing missing values across the entire dataset or within experimental groups.

Analysis and visualization

PhosMap incorporated six analysis modules: dimension reduction analysis, differential expression analysis, time course analysis, kinase activity prediction, phosphorylation motif enrichment analysis and survival analysis.

Upload data

Function

In this step, you can upload your preprocessed data to PhosMap, such as the phosphorylation dataframe. If you have not preprocessed your data, you must preprocess it with PhosMap (go to the ‘Preprocessing’ tab) or do it yourself.

How to upload your data

  1. Go to the 'Upload data' under 'Analysis' tab.
  2. Choose 'Load example data' or follow the prompts to upload your own corresponding four files. If you have not preprocessed your data, you can click 'Go to preprocessing' to preprocess it.

Dimension reduction analysis

Function

In PhosMap, Dimension reduction analysis methods allowed for PCA, t-SNE and UMAP.

The meaning of the parameters

  1. ‘Title’ refers to the main title of the plot.
  2. ‘Legend title’ refers to the title of the legend in the plot.
  3. ‘Random seed’ is a parameter for t-SNE that sets the seed for the random number generator. This can be used to ensure reproducibility of results.
  4. ‘Perplexity’ is a numerical value for t-SNE, with a default value of 2. It balances the focus between preserving the local and global structure of the data.
  5. ‘Neighbors’ is a parameter for UMAP that refers to the size of the local neighborhood (in terms of the number of neighboring sample points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved.

How to get analysis results

  1. Go to the 'Dimension reduction analysis' under ‘Analysis’ tab.
  2. Modify the parameters according to your needs.
  3. Click the ‘Analysis’ button.
  4. The PCA, t-SNE and UMAP plot after running will appear on the right.
  5. Click the download button to download the plot file.

Parameter Selection Explanation

Interpretation of analysis results

To extract an overview of the effect of the different time course treatments, we performed PCA analysis in the downstream analysis module of PhosMap. We could see that phosphorylation expression profiles of colorectal cancer cells after longer (24h and 48h) vemurafenibin treatment were quite different from those after short treatment (2h and 6h). In addition, it shows that principal component 1 (PC1), with 31.77%, is superior to 20% from original literature and demonstrates phosphorylation expression profile normalized by matched proteomics data has an advantage over representing the variation over time in the BRAFi-treated samples.

Differential expression analysis

Function

In PhosMap, differential expression analysis methods allowed for limma, SAM and ANOVA Data analysis.

The meaning of the parameters

  1. ‘Control’ refers to the control group in the experiment.

  2. ‘Experiment’ refers to the experimental group in the experiment.

  3. ‘P-value threshold’ is the threshold for determining statistical significance based on the p-value.

  4. ‘P-value adjust method’ is the method used to adjust p-values for multiple comparisons.

  5. ‘FC threshold’ is the fold change threshold for determining significant changes in phosphorylation levels.

  6. ‘nperms’ is a parameter for the SAM method that specifies the number of permutations to perform.

  7. ‘Minimum FDR’ is the minimum false discovery rate threshold for determining statistical significance.

  8. ‘Clustering distance rows’ is a parameter for heatmap generation that specifies the distance metric used for clustering rows.

  9. ‘Clustering method’ is a parameter for heatmap generation that specifies the clustering method used to cluster rows and columns.

How to get analysis results

  1. Go to the 'Differential Expression Analysis' under ‘Analysis’ tab.
  2. Go to the 'limma', 'SAM' or 'ANOVA' secondary tab.
  3. Choose Control and Experiment used for differential Expression Analysis.
  4. Choose 'Interactive mode' and click the 'Analysis’ button. The interactive plot after running will appear on the right.
  5. Choose 'Static mode' and click the 'Analysis’ button. The static plot after running will appear on the right.
  6. Click ‘Plot Heatmap’ button. The heatmap will apear in the pop-up window.
  7. Click the download button to download the plot file.

Parameter Selection Explanation

Interpretation of analysis results

In order to show differential expression analysis between two experimental conditions. We use the limma method integrated into differential expression analysis module of PhosMap to identify 128 significant differently expressed p-sites (DEPs) between the samples with BRAFi-treated for two hours and control samples (P value < 0.05 and fold change > 2). 139 p-sites were up-regulated in the BRAFi-treated samples. The most disparate difference is observed in DAP_S51, whose phosphoserine is related to the MTOR pathway. 99 p-sites were down-regulated in the BRAFi-treated samples.

For the multiple experimental conditions, we leveraged the embedded ANOVA analysis of PhosMap and identified 548 DEPs among the five time points (P value < 0.1 and fold change > 2).

Time Course Analysis

Function

Fuzzy clustering was applied to time course analysis for discovering patterns associated with time points in PhosMap.The corresponding line chart combined with membership for each cluster was also drawn.

The meaning of the parameters

  1. ‘Minimum membership value’ is a threshold for determining the minimum membership value for a data point to be included in a cluster.
  2. ‘Iteration’ is the number of iterations to perform in the clustering algorithm.
  3. ‘Number of clusters’ is the number of clusters to generate in the clustering algorithm.

How to get analysis results

  1. Go to the ‘Time course Analysis (fuzzy clustering)’ under ‘Analysis’ tab.
  2. Modify the parameters according to your needs.
  3. Click the ‘Analysis’ button. The plot after running will appear on the right.
  4. Click the download button to download the plot file.

Interpretation of analysis results

These 548 DEPs were used as inputs in the time course analysis module of PhosMap, then 9 strong expression patterns were generated. Two major clusters show significant downregulation at the phosphoproteomics signalling level upon BRAFi treatment in line with the original literature. Cluster 1 responds within 2 hours, an early treatment response. Cluster 2 responds within 24 hours, a late treatment response.

Parameter Selection Explanation

Kinase activity prediction (KSEA)

Function

In PhosMap, KSEA was used to predict kinase activity.

The meaning of the parameters

  1. ‘Control’ refers to the control group in the experiment.
  2. ‘Experiment’ refers to the experimental group in the experiment.
  3. ‘Species’ refers to the species of the organism being studied.
  4. ‘Scale’ is a parameter for scaling the data before generating the heatmap.
  5. ‘Clustering distance rows’ is a parameter for heatmap generation that specifies the distance metric used for clustering rows.
  6. ‘Clustering method’ is a parameter for heatmap generation that specifies the clustering method used to cluster rows and columns.

How to get analysis results

  1. Go to the ‘Kinase-Substrate Enrichment Analysis’ under ‘Analysis’ tab.
  2. Select ‘Multiple groups’ or ‘Two groups’ according to the number of groups of your data.
  3. Click the first 'Analysis' button. If ‘Multiple groups’ is selected, after running, the plot will appear on the right. Click ‘view result’ to view and download the kinase prediction time course result. If ‘Two groups’ is selected, only the phoshorylation dataframe will appear on the right.
  4. Select a cluster if ‘Multiple groups’ is selected. Click the second ‘Analysis’ button. After running, the heatmap will appear on the right.
  5. Click the download button to download the plot file.

Parameter Selection Explanation

Interpretation of analysis results

Afterwards, the substrates from the two clusters are imported into the KSEA module of PhosMap to infer kinase activities. The results indicate that CDK1/2, MAPK1/3 and AKT1 are suppressed during BRAFi treatment.

Motif enrichment analysis

Function

PhosMap allowed for performing MEA on user defined phosphopeptides lists to provide clues for finding candidate kinases that are not present in the database.

The meaning of parameters

  1. ‘Fasta type’ refers to the type of fasta file used as input for the analysis.
  2. ‘Selected row number for plotting motif logo’ is the number of rows to be selected for generating the motif logo plot.
  3. ‘Matched seqs threshold’ is the threshold for determining the minimum number of matched sequences required for a motif to be considered significant.
  4. ‘Scale’ is a parameter for scaling the data before generating the heatmap.
  5. ‘Distance metric’ is a parameter for heatmap generation that specifies the distance metric used for clustering rows.
  6. ‘Clustering method’ is a parameter for heatmap generation that specifies the clustering method used to cluster rows and columns.

How to get analysis results

  1. Go to the ‘Motif Enrichment Analysis’ under ‘Analysis’ tab.
  2. Modify the parameters according to your needs.
  3. Click the ‘Analysis’ button.
  4. The foreground dataframe mapped to motifs is shown on the right after running.
  5. Select row number for plotting logo.
  6. Click the first ‘Plot’ button, and the logo will appear on the right.
  7. Modify the parameters below.
  8. Click the second ‘Plot’ button.
  9. The heatmap will appear on the right.
  10. Click the download button to download the plot file.

Parameter Selection Explanation

Interpretation of analysis results

The 3,649 identified phosphor-peptides as foreground sequences are used for MEA of PhosMap and the results further strengthen the evidence of CDK and MAPK pathway deactivation in BRAF mutant CRC cells in response to BRAFi treatment.

Survival analysis

Function

This module is used to identify phosphorylation sites or kinases associated with clinical outcomes of patients. Using kinases or phosphorylation locations files and patients’ survival information as input matrices, coxph function from survival R package was used to calculate the hazard ratio (HR) and P-value.

How to get analysis results

  1. Go to the ’Survival Analysis’ under ‘Analysis’ tab.
  2. Modify the parameters according to your needs.
  3. Click the ‘Analysis’ button.
  4. The summary dataframe list will appear on the right.
  5. Click the ‘Plot’ button.
  6. The plot after running will appear on the right.
  7. Click the download button to download the plot file.

Parameter Selection Explanation

References

  1. Feng, J., Ding, C., Qiu, N., Ni, X., Zhan, D., Liu, W., Xia, X., Li, P., Lu, B. and Zhao, Q. (2017) Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis. Nature biotechnology, 35, 409-412.
  2. Ressa, A., Bosdriesz, E., De Ligt, J., Mainardi, S., Maddalo, G., Prahallad, A., Jager, M., De La Fonteijne, L., Fitzpatrick, M. and Groten, S. (2018) A system-wide approach to monitor responses to synergistic BRAF and EGFR inhibition in colorectal cancer cells. Molecular & Cellular Proteomics, 17, 1892-1908.

PROCEDURE

Preprocessing for Maxquant data

Preprocessing for Firmiana data

Preprocessing for Spectronaut data

Preprocessing for Dia-NN data

Analysis and visualization

Upload data

Dimension reduction analysis

Differential expression analysis

Time Course Analysis

Kinase activity prediction (KSEA)

Motif enrichment analysis

Survival analysis