GUIDELINES FOR DATA INPUT REQUIREMENT FOR PATHWAY AND NETWORK ANALYSIS

  1. Your data should have been statistically analyzed

  2. Data should have been normalized.

  3. PLEASE PROVIDE CONTROL QUALITY PLOTS THAT YOU MAY HAVE DONE: ESPECIALLY PCA AND CLUSTERING PLOTS

    • Box-plot of intensity (before and after normalization)

      • Looking at the distribution of probe intensities across all arrays at once can, for example, demonstrate that one array is not like the others. Normalization corrects data heterogeneity and plots after normalization should be more homogenous.
    • Principal Component Analysis (2D-PCA)

      • PCA is recommended as an exploratory tool to uncover unknown trends in the data. When applied on samples, PCA will help you explore correlations between samples.
    • Unsupervised hierarchical clustering of samples and genes (performed on whole data)

      • Clustering is a useful exploratory technique for gene expression data. It groups genes and samples that have a similar gene expression patterns.
    • Please provide if possible a powerpoint presentation with a figure for each analysis
  4. An appropriate statistical test testing your hypothesis (your biological question) should have been performed, for example: moderated t-test, paired t-test, ANOVA.

  5. PLEASE PROVIDE ANY REPORT ACCOMPANYING THE STATISTICAL ANALYSIS THAT DESCRIBES HOW THE STATISTICAL ANALYSIS HAS BEEN DONE.

    • If you need support for your statistical analyses, please contact our BIOSTATISTICS SERVICE.

      • Dr. ChangJiang Xu (changjiang.xu@utoronto.ca) offers free consultation for statistical analyses. Your data will be analyzed and output in the correct format for subsequent pathway and network analyses. You are encouraged to contact ChangJiang or Veronique as soon as you plan your experiment: genomics technologies can be very sensitive to noise and a well designed experiment is very important for best results (randomization of the samples, balanced design, reducing potential noises by standardizing protocols).

  6. PLEASE A PROVIDE A TAB DELIMITED TEXT (.txt) CONTAINING DATA FOR THE PATHWAY AND NETWORK ANALYSIS (or alternatively a .csv file) :

    • Name your file as follows: yourname_date_PIname_treated_vs_control_comparison.txt (example: veronique_March21_BADER_treated_vs_control.txt)
    • Please rename your file with a new date if you resubmit your file
    • Please follow the format description:
      • the first column corresponds to ENTREZ GENE ID.

        • An Entrez Gene ID is a numerical value that uniquely identifies genes.
        • For example the Entrez Gene ID for Myc (myelocytomatosis oncogene [ Mus musculus ]) is 17869: http://www.ncbi.nlm.nih.gov/gene/17869.

        • You can convert many types of gene identifiers and symbols to Entrez Gene ID using Synergizer or other similar tools.

      • the second column corresponds to a UNIQUE ARRAY IDENTIFIER (PROBESET ID for Affymetrix and PROBE ID for Illumina).

      • the third column corresponds to GENE NAME (official gene symbol).

      • the fourth column corresponds to the GENE DESCRIPTION (full gene name).

      • the fifth column corresponds to the log2 FOLD CHANGE.

      • the sixth and seven columns contain the STATISTICAL VALUES :

        • the statistical values are the ones that enable you to tell if a gene is significantly differentially expressed or not, it could be for example the t value and the p-value if you applied a t-test.
        • the whole table is ranked on the basis of one statistical value, preferentially the t value.
      • the additional columns contain the transformed (log2 for example) and normalized (RMA or quantile normalization for example) values for each sample (= each chip if gene expression data): RAW NORMALIZED DATA.

    • Please provide a sample label description file.
    • ! Include all your data (even data with non significant p-values)
    • please provide the origin of the annotation
  7. DATA INPUT EXAMPLE:

Entrez ID

Probeset ID

Gene Name

Gene Description

log2foldchange

t value

q value (FDR)

sample1

sample2

sample3

...

17218

10572906

Mcm5

minichromosome maintenance deficient 5, cell division cycle

10.2

44.0079

0.001

9.13084

9.7166

8.76638

...

27279

10448307

Tnfrsf12a

tumor necrosis factor receptor superfamily, member 12a

-9.8

-41.815

0.001

8.58977

9.29698

8.80844

...

13215

10582809

Tk1

thymidine kinase 1

8.7

39.9456

0.001

8.94519

9.56513

8.38612

...

12937

10384145

H2afv

H2A histone family, member V

-7.4

-33.6475

0.001

10.574

10.7741

10.5401

...

207277

10526848

A430033K04Rik

A430033K04Rik

7

33.3352

0.001

8.25088

8.4121

8.2783

...



BACK TO STANDARD OPERATING PROCEDURES (SOP) BACK TO HOME PAGE

CSCPathwayAnalysisService/Data (last edited 2014-06-27 13:19:14 by VeroniqueVoisin)

MoinMoin Appliance - Powered by TurnKey Linux