microCLIP v1.1 "microCLIP: Super learning uncovers functional transcriptome-wide miRNA interactions" microCLIP functions have been updated to run on R version 3.6.2 **************************************************************************** Requirements: Operating Systems: Unix-like operating system, Ubuntu 12.04 and RHEL/CentOS 6 or later Languages: R version 3.6.2, Java 7 or later RNAduplex from ViennaRNA package. The package can be downloaded and installed from https://www.tbi.univie.ac.at/RNA/. RNAduplex binary files already exist in ./bin/ViennaRNA folder for Ubuntu 64-bit, version 12.04 or later. Input files: 1. FASTA file of the miRNA sequences. 2. Alignment file in BAM/SAM format. 3. PhastCons files are required to import MRE (miRNA Recognition Element) conservation characteristics. These files can be downloaded from UCSC via FTP (http://hgdownload.cse.ucsc.edu/goldenPath/) or from downloads page (http://hgdownload.soe.ucsc.edu/downloads.html). 4. Gene annotation file in one-based BED-like format*. The file is used to filter AGO-enriched clusters based on the annotation. This step is optional. Caution The genome assembly of phastCons and Gene annotation files must be consistent with the one that the alignment has been applied to. * The required BED fields are: 1-3. Entry coordinates 4. name: An entry name (eg. Gene name) 5. score: This field may be either a score value or additional entry related information. 6. strand Example 1 69091 70008 ENSG00000186092@OR4F5 protein_coding + 1 89295 133566 ENSG00000238009@RP11-34P13.7 lincRNA - 1 131025 134836 ENSG00000233750@CICP27 pseudogene + 1 134901 139379 ENSG00000237683@AL627309.1 protein_coding - **************************************************************************** Usage: Running microCLIP microCLIP can run either from an R workspace platform (Rstudio) or via terminal. 1. From R workspace platform run main.R source code. R session restart before running microCLIP is recommended. 2. Via terminal run the command > Rscript main.R Important 1. Set up the parameters in init.R source code. 2. Set microCLIP source directory path in the main.R source code. Setting up microCLIP parameters in init.R source code: src_dir: Path to microCLIP source code directory. resultdir: Path to result directory. mirna.fa: Path to FASTA file of the miRNA sequences. alignment_file: Path to BAM/SAM file. wigFixDir: Path to phastCons files. annotation: Path to Gene annotation file. If no filtering option is preferred, set this parameter to NULL (annotation <- NULL). path_RNAduplex: Path to RNAduplex binary file. ucscChrFormat: "TRUE" or "FALSE" depending on the format followed by alignment/annotation files (UCSC or Ensembl) **. ucscToEnsembl = "0" or "1" depending on the format followed by alignment/annotation files (UCSC or Ensembl) **. outputPrefix: A prefix for the result file. threads: Number of threads. minCov: A minimum cluster coverage. This parameter should be initialized to a value >= 10. **UCSC format: chromosomes have a "chr" prefix and chromosome M is indicated as "chrM". Ensembl format: chromosomes have no prefix and chromosome M is indicated as "MT". ***************************************************************************** The output of the algorithm is a one-based BED-like file comprising information about: 1-3. MRE coordinates 4. miRNA 5. score: The MRE predicted score. 6. strand 7. binding_type: MRE binding type. microCLIP supports an extended set of (non-)canonical matches including 6mer to 9mer, offset 6mer, 3’supplementary sites as well as (im)perfect centered bindings. 8. binding_class: Characterization of MRE binding type (canonical/non-canonical). 9. cluster_type: The cluster type (TC/non-TC). 10. overlapping.reads: The number of overlapping reads in the cluster. ***************************************************************************** Important Required R packages are installed automatically. Thus, the first run of the algorithm will take approximately one and a half extra hours.