miRmap — Comprehensive prediction of microRNA target repression strength

miRmap library is Python library organized with ...

The miRmap library is a Python library predicting the repression strength of microRNA (miRNA) targets. The model combines:

  • thermodynamic features: ΔG duplex, ΔG binding, ΔG seed duplex, ΔG seed binding, ΔG open and ΔG total,
  • evolutionary features: BLS and PhyloP,
  • probabilistic features: P.over binomial and P.over exact, and
  • sequence-based features: AU content, UTR position and 3’ pairing.

Download

The stable releases of miRmap are available at http://mirmap.ezlab.org.

Note

Development versions are available at http://dev.vejnar.org/mirmap.

Installation

Requirements

The miRmap library has the following requirements:

  1. miRmap requires Python 2.7 but it can be used with Python 2.6 if the collections module is installed (A version compatible with Python 2.4-2.6 is available as the ordereddict module.).
  2. For the evolutionary features, the Python library DendroPy is needed for tree manipulation. You can install DendroPy directly from the Python Package Index.
  3. External dependencies. As of miRmap 1.1, external computation can be done with libraries or executables. Compiling executables is easier, but less computing efficient.

3.1 C librairies. A compiled version of the 3 libraries (*.so) is included in the miRmap distribution. If you want/have to compile them, please follow these intructions:

  • For the thermodynamic features, the Vienna RNA library is required.

Download the latest Vienna RNA tarball (Versions 2.0.x were successfully tested), then do:

cd ViennaRNA-<version>
./configure --without-kinfold --without-forester --without-svm --without-perl
make
gcc -shared -Wl,-O2 -o lib/libRNAvienna.so `find lib/ -name "*.o"` -lm -lgomp
  • For the evolutionary features, the PHAST library is required (The CLAPACK has to be compiled first, please follow the instructions in Phast package).
svn co http://compgen.bscb.cornell.edu/svnrepo/phast/trunk phast
cd phast/src

In the file make-include.mk, add the -DUSE_PHAST_MEMORY_HANDLER parameter to the line starting with CFLAGS += -I${INC} -DPHAST_VERSION=${PHAST_VERSION}. Then replace the path to the CLAPACK and compile with:

make CLAPACKPATH=../CLAPACK-3.2.1 sharedlib
  • For the P.over exact feature, the Spatt library is required (You will need a working copy of CMake on your system).

Download the latest Spatt tarball (Version 2.0 was successfully tested), then do:

cd spatt-<version>
mkdir build
cd build
cmake -DWITH_SHARED_LIB=ON ..
make

From the directory you compiled the C libraries:

mv spatt-<version>/libspatt2/libspatt2.so mirmap/libs/default
mv ViennaRNA-<version>/lib/libRNAvienna.so mirmap/libs/default
mv phast/lib/sharedlib/libphast.so mirmap/libs/default

3.2 Executables. No specific requirements is needed: please follow the instructions included in the Vienna RNA, PHAST, and Spatt packages.

Usage

Example with the pure Python features.

>>> import mirmap
>>> seq_target = 'GCUACAGUUUUUAUUUAGCAUGGGGAUUGCAGAGUGACCAGCACACUGGACUCCGAGGUGGUUCAGACAAGACAGAGGGGAGCAGUGGCCAUCAUCC\
... UCCCGCCAGGAGCUUCUUCGUUCCUGCGCAUAUAGACUGUACAUUAUGAAGAAUACCCAGGAAGACUUUGUGACUGUCACUUGCUGCUUUUUCUGCGCUUCAGUAACAAGU\
... GUUGGCAAACGAGACUUUCUCCUGGCCCCUGCCUGCUGGAGAUCAGCAUGCCUGUCCUUUCAGUCUGAUCCAUCCAUCUCUCUCUUGCCUGAGGGGAAAGAGAGAUGGGCC\
... AGGCAGAGAACAGAACUGGAGGCAGUCCAUCUA'
>>> seq_mirna = 'UAGCAGCACGUAAAUAUUGGCG'
>>> mim = mirmap.mm(seq_target, seq_mirna)
>>> mim.find_potential_targets_with_seed(allowed_lengths=[6,7], allowed_gu_wobbles={6:0,7:0},\
... allowed_mismatches={6:0,7:0}, take_best=True)
>>> mim.end_sites                                    # Coordinate(s) (3' end) of the target site on the target sequence
[186]
>>> mim.eval_tgs_au(with_correction=False)           # TargetScan features manually evaluated with
>>> mim.eval_tgs_pairing3p(with_correction=False)    # a non-default parameter.
>>> mim.eval_tgs_position(with_correction=False)
>>> mim.prob_binomial                                # mim's attribute: the feature is automatically computed
0.03311825751646191
>>> print mim.report()
155                            186
|                              |
CAGGAAGACUUUGUGACUGUCACUUGCUGCUUUUUCUGCGCU
                        |||||||.
          GCGGUUAUAAAUGCACGACGAU
  AU content                     0.64942
  UTR position                   166.00000
  3' pairing                     1.00000
  Probability (Binomial)         0.03312

With the C libraries/executables installed:

>>> import mirmap.library_link
>>> # For libraries
>>> mim.libs = mirmap.library_link.LibraryLink('libs/compiled') # Change to the path where you unzipped the *.so files
>>> # For executables (if they are not in your PATH)
>>> mim.exe_path = 'libs/compiled' # Change to the path where you unzipped the exe files
>>> mim.dg_duplex
-13.5
>>> mim.dg_open
12.180591583251953
>>> mim.prob_exact
0.06798900807193115
>>> print mim.report()
155                            186
|                              |
CAGGAAGACUUUGUGACUGUCACUUGCUGCUUUUUCUGCGCU
                        |||||||.
          GCGGUUAUAAAUGCACGACGAU
  ΔG duplex (kcal/mol)          -13.50000
  ΔG binding (kcal/mol)         -11.91708
  ΔG open (kcal/mol)             12.18059
  AU content                     0.64942
  UTR position                   166.00000
  3' pairing                     1.00000
  Probability (Exact)            0.06799
  Probability (Binomial)         0.03312

Classes

mm and mmPP base classes of miRmap that inherit their methods from all the modules. Each module define the methods for one category.

class mirmap.mm(target_seq, mirna_seq, min_target_length=None)[source]

Bases: mirmap.evolution.mmEvolution, mirmap.model.mmModel, mirmap.prob_binomial.mmProbBinomial, mirmap.prob_exact.mmProbExact, mirmap.report.mmReport, mirmap.thermo.mmThermo, mirmap.targetscan.mmTargetScan

miRNA and mRNA containing class.

Parameters:
  • target_seq (str) – Target sequence (mRNA).
  • mirna_seq (str) – miRNA sequence.
  • min_target_length (int) – Target site length, base-pairing independent.
eval_cons_bls(aln_fname=None, aln=None, aln_format=None, aln_alphabet=None, subst_model=None, tree=None, fitting_tree=None, use_em=None, libphast=None, pathphast=None, motif_def=None, motif_upstream_extension=None, motif_downstream_extension=None)

Computes the Branch Length Score (BLS).

Parameters:
  • aln_fname (str) – Alignment filename.
  • aln (str) – Alignment it-self.
  • aln_format (str) – Alignment format. Currently supported is FASTA.
  • aln_alphabet (list) – List of nucleotides to consider in the aligned sequences (others get filtered).
  • subst_model (str) – PhyloFit substitution model (REV...).
  • tree (str) – Tree in the Newick format.
  • fitting_tree (bool) – Fitting or not the tree on the alignment.
  • use_em (bool) – Fitting or not the tree with Expectation-Maximization algorithm.
  • libphast (LibraryLink) – Link to the Phast library.
  • pathphast (str) – Path to the PHAST executable.
  • motif_def (str) – ‘seed’ or ‘seed_extended’ or ‘site’.
  • motif_upstream_extension (int) – Upstream extension length.
  • motif_downstream_extension (int) – Downstream extension length.
eval_dg_duplex(librna=None, pathrna=None, mirna_start_pairing=None, temperature=None)

Computes the ΔG duplex, ΔG binding, ΔG seed duplex and ΔG seed binding scores.

Parameters:
  • librna (LibraryLink) – Link to the Vienna RNA library.
  • pathrna (str) – Path to the Vienna RNA executable.
  • mirna_start_pairing (int) – Starting position of the seed in the miRNA (from the 5’).
  • temperature (float) – Folding temperature.
eval_dg_open(librna=None, pathrna=None, upstream_rest=None, downstream_rest=None, dg_binding_area=None, temperature=None)

Computes the ΔG open score.

Parameters:
  • librna (LibraryLink) – Link to the Vienna RNA library.
  • pathrna (str) – Path to the Vienna RNA executable.
  • upstream_rest (int) – Upstream unfolding length.
  • downstream_rest (int) – Downstream unfolding length.
  • dg_binding_area (int) – Supplementary sequence length to fold (applied twice: upstream and downstream).
  • temperature (float) – Folding temperature.
eval_dg_total()

Computes the ΔG total score combining ΔG duplex and ΔG open scores.

eval_prob_binomial(markov_order=None, alphabet=None, transitions=None, motif_def=None, motif_upstream_extension=None, motif_downstream_extension=None)

Computes the P.over binomial score.

Parameters:
  • markov_order (int) – Markov Chain order
  • alphabet (list) – List of nucleotides to consider in the sequences (others get filtered).
  • transitions (list) – Transition matrix of the Markov Chain model
  • motif_def (str) – ‘seed’ or ‘seed_extended’ or ‘site’.
  • motif_upstream_extension (int) – Upstream extension length.
  • motif_downstream_extension (int) – Downstream extension length.
eval_prob_exact(libspatt=None, pathspatt=None, markov_order=None, alphabet=None, transitions=None, motif_def=None, motif_upstream_extension=None, motif_downstream_extension=None)

Computes the P.over binomial score.

Parameters:
  • libspatt (LibraryLink) – Link to the Spatt library.
  • pathspatt (str) – Path to the Spatt executable.
  • markov_order (int) – Markov Chain order
  • alphabet (list) – List of nucleotides to consider in the sequences (others get filtered).
  • transitions (list) – Transition matrix of the Markov Chain model.
  • motif_def (str) – ‘seed’ or ‘seed_extended’ or ‘site’.
  • motif_upstream_extension (int) – Upstream extension length.
  • motif_downstream_extension (int) – Downstream extension length.
eval_score(model_name=None, model=None)

Computes the miRmap score(s).

Parameters:
  • model_name (str) – Model name.
  • model (dict) – Model with coefficients and intercept as keys.
eval_selec_phylop(aln_fname=None, aln=None, aln_format=None, aln_alphabet=None, aln_quality=None, mod_fname=None, libphast=None, pathphast=None, method=None, mode=None, motif_def=None, motif_upstream_extension=None, motif_downstream_extension=None)

Computes the PhyloP score.

Parameters:
  • aln_fname (str) – Alignment filename.
  • aln (str) – Alignment it-self.
  • aln_format (str) – Alignment format. Currently supported is FASTA.
  • aln_alphabet (list) – List of nucleotides to consider in the aligned sequences (others get filtered).
  • aln_quality (function) – Check alignment quality (must return True if alignment is fine).
  • mod_fname (str) – Model filename.
  • libphast (LibraryLink) – Link to the Phast library.
  • pathphast (str) – Path to the PHAST executable.
  • method (str) – Test name performed by PhyloP (SPH...).
  • mode (str) – Testing for conservation (CON), acceleration (ACC) or both (CONACC).
  • motif_def (str) – ‘seed’ or ‘seed_extended’ or ‘site’.
  • motif_upstream_extension (int) – Upstream extension length.
  • motif_downstream_extension (int) – Downstream extension length.
eval_tgs_au(ts_types=None, ca_window_length=None, with_correction=None)

Computes the AU content score.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • ca_window_length (int) – Sequence length to compute the score with.
  • with_correction (bool) – Apply the linear regression correction or not.
eval_tgs_pairing3p(ts_types=None, with_correction=None)

Computes the 3’ pairing score.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • with_correction (bool) – Apply the linear regression correction or not.
eval_tgs_position(ts_types=None, with_correction=None)

Computes the UTR position score.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • with_correction (bool) – Apply the linear regression correction or not.
eval_tgs_score(ts_types=None, with_correction=None)

Computes the TargetScan score combining AU content, UTR position and 3’ pairing scores.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • with_correction (bool) – Apply the linear regression correction or not.
find_potential_targets_with_seed(mirna_start_pairing=None, allowed_lengths=None, allowed_gu_wobbles=None, allowed_mismatches=None, take_best=None)

Searches for seed(s) in the target sequence.

Parameters:
  • mirna_start_pairing (int) – Starting position of the seed in the miRNA (from the 5’).
  • allowed_lengths (list) – List of seed length(s).
  • allowed_gu_wobbles (dict) – For each seed length (key), how many GU wobbles are allowed (value).
  • allowed_mismatches (dict) – For each seed length (key), how many mismatches are allowed (value).
  • take_best (bool) – If seed matches are overlapping, taking or not the longest.
get_cons_bls(method=None)

Branch Length Score (BLS) score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘max’).
get_dg_binding(method=None)

ΔG binding score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_dg_binding_seed(method=None)

ΔG seed binding score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_dg_duplex(method=None)

ΔG duplex score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_dg_duplex_seed(method=None)

ΔG seed duplex score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_dg_open(method=None)

ΔG open score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_dg_total(method=None)

ΔG total score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_prob_binomial(method=None)

P.over binomial score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_prob_exact(method=None)

P.over exact score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
get_selec_phylop(method=None)

PhyloP score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
report()

Returns a formatted report of already computed features for all target site(s).

cons_bls

Branch Length Score (BLS) score with default parameters.

dg_binding

ΔG binding score with default parameters.

dg_binding_seed

ΔG seed binding score with default parameters.

dg_duplex

ΔG duplex score with default parameters.

dg_duplex_seed

ΔG seed duplex score with default parameters.

dg_open

ΔG open score with default parameters.

dg_total

ΔG total score with default parameters.

prob_binomial

P.over binomial score with default parameters.

prob_exact

P.over exact score with default parameters.

score

miRmap score with default parameters.

selec_phylop

PhyloP score with default parameters.

tgs_au

AU content score with default parameters.

tgs_pairing3p

3’ pairing score with default parameters.

tgs_position

UTR position score with default parameters.

tgs_score

TargetScan score with default parameters.

class mirmap.mmPP(target_seq, mirna_seq, min_target_length=None)[source]

Bases: mirmap.model.mmModel, mirmap.prob_binomial.mmProbBinomial, mirmap.report.mmReport, mirmap.targetscan.mmTargetScan

miRNA and mRNA containing class with pure Python methods only.

Parameters:
  • target_seq (str) – Target sequence (mRNA).
  • mirna_seq (str) – miRNA sequence.
  • min_target_length (int) – Target site length, base-pairing independent.
eval_prob_binomial(markov_order=None, alphabet=None, transitions=None, motif_def=None, motif_upstream_extension=None, motif_downstream_extension=None)

Computes the P.over binomial score.

Parameters:
  • markov_order (int) – Markov Chain order
  • alphabet (list) – List of nucleotides to consider in the sequences (others get filtered).
  • transitions (list) – Transition matrix of the Markov Chain model
  • motif_def (str) – ‘seed’ or ‘seed_extended’ or ‘site’.
  • motif_upstream_extension (int) – Upstream extension length.
  • motif_downstream_extension (int) – Downstream extension length.
eval_score(model_name=None, model=None)

Computes the miRmap score(s).

Parameters:
  • model_name (str) – Model name.
  • model (dict) – Model with coefficients and intercept as keys.
eval_tgs_au(ts_types=None, ca_window_length=None, with_correction=None)

Computes the AU content score.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • ca_window_length (int) – Sequence length to compute the score with.
  • with_correction (bool) – Apply the linear regression correction or not.
eval_tgs_pairing3p(ts_types=None, with_correction=None)

Computes the 3’ pairing score.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • with_correction (bool) – Apply the linear regression correction or not.
eval_tgs_position(ts_types=None, with_correction=None)

Computes the UTR position score.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • with_correction (bool) – Apply the linear regression correction or not.
eval_tgs_score(ts_types=None, with_correction=None)

Computes the TargetScan score combining AU content, UTR position and 3’ pairing scores.

Parameters:
  • ts_types (object) – Parameters by seed-type.
  • with_correction (bool) – Apply the linear regression correction or not.
find_potential_targets_with_seed(mirna_start_pairing=None, allowed_lengths=None, allowed_gu_wobbles=None, allowed_mismatches=None, take_best=None)

Searches for seed(s) in the target sequence.

Parameters:
  • mirna_start_pairing (int) – Starting position of the seed in the miRNA (from the 5’).
  • allowed_lengths (list) – List of seed length(s).
  • allowed_gu_wobbles (dict) – For each seed length (key), how many GU wobbles are allowed (value).
  • allowed_mismatches (dict) – For each seed length (key), how many mismatches are allowed (value).
  • take_best (bool) – If seed matches are overlapping, taking or not the longest.
get_prob_binomial(method=None)

P.over binomial score with default parameters.

Parameters:method (str) – Method name used to combine target scores (Example: ‘min’).
report()

Returns a formatted report of already computed features for all target site(s).

prob_binomial

P.over binomial score with default parameters.

score

miRmap score with default parameters.

tgs_au

AU content score with default parameters.

tgs_pairing3p

3’ pairing score with default parameters.

tgs_position

UTR position score with default parameters.

tgs_score

TargetScan score with default parameters.

Parameters:library_path (str) – Path to the C dynamic libraries.