Python APIs created for this project¶
Annotation module¶
For the purpose of annotating RNA types for genomic regions.
-
Annotation.overlap(bed1, bed2)¶ This function compares overlap of two Bed object from same chromosome
Parameters: - bed1 – A Bed object from xplib.Annotation.Bed (BAM2X)
- bed2 –
A Bed object from xplib.Annotation.Bed (BAM2X)
Returns: boolean – True or False
Example:
>>> from xplib.Annotation import Bed >>> from Annotation import overlap >>> bed1=Bed(["chr1",10000,12000]) >>> bed2=Bed(["chr1",9000,13000]) >>> print overlap(bed1,bed2) True
-
Annotation.Subtype(bed1, genebed)¶ This function determines intron or exon or utr from a BED12 file.
Parameters: - bed1 –
A Bed object defined by xplib.Annotation.Bed (BAM2X)
- genebed – A Bed12 object representing a transcript defined by xplib Annotaton.Bed with information of exon/intron/utr from an BED12 file
Returns: str – RNA subtype. “intron”/”exon”/”utr3”/”utr5”/”.”
Example:
>>> from xplib.Annotation import Bed >>> from xplib import DBI >>> from Annotation import Subtype >>> bed1=Bed(["chr13",40975747,40975770]) >>> a=DBI.init("../../Data/Ensembl_mm9.genebed.gz","bed") >>> genebed=a.query(bed1).next() >>> print Subtype(bed1,genebed) "intron"
- bed1 –
-
Annotation.annotation(bed, ref_allRNA, ref_detail, ref_repeat)¶ This function is based on
overlap()andSubtype()functions to annotate RNA type/name/subtype for any genomic region.Parameters: - bed –
A Bed object defined by xplib.Annotation.Bed (in BAM2X).
- ref_allRNA – the DBI.init object (from BAM2X) for bed6 file of all kinds of RNA
- ref_detail –
the DBI.init object for bed12 file of lincRNA and mRNA with intron, exon, UTR
- ref_detail –
the DBI.init object for bed6 file of mouse repeat
Returns: list of str – [type,name,subtype, strandcolumn]
Example:
>>> from xplib.Annotation import Bed >>> from xplib import DBI >>> from Annotation import annotation >>> bed=Bed(["chr13",40975747,40975770]) >>> ref_allRNA=DBI.init("../../Data/all_RNAs-rRNA_repeat.txt.gz","bed") >>> ref_detail=DBI.init("../../Data/Ensembl_mm9.genebed.gz","bed") >>> ref_repeat=DBI.init("../../Data/mouse.repeat.txt.gz","bed") >>> print annotation(bed,ref_allRNA,ref_detail,ref_repeat) ["protein_coding","gcnt2","intron","ProperStrand"]
- bed –
“annotated_bed” data class¶
“RNAstructure” class¶
-
class
RNAstructure.RNAstructure(exe_path=None)¶ Interface class for RNAstructure executable programs.
-
DuplexFold(seq1=None, seq2=None, dna=False)¶ Use “DuplexFold” program to calculate the minimum folding between two input sequences
Parameters: - seq1,seq2 – two DNA/RNA sequences as string, or existing fasta file name
- dna – boolean input. Specify then DNA parameters are to be used
Returns: minimum binding energy, (unit: kCal/Mol)
Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> seq1 = "TAGACTGATCAGTAAGTCGGTA" >>> seq2 = "GACTAGCTTAGGTAGGATAGTCAGTA" >>> energy=RNA_prog.DuplexFold(seq1,seq2) >>> print energy
-
Fold(seq=None, ct_name=None, sso_file=None, Num=1)¶ Use “Fold” program to predict the secondary structure and output dot format.
Parameters: - seq – one DNA/RNA sequence as string, or existing fasta file name
- ct_name – specify to output a ct file with this name, otherwise store in temp, default: None
- sso_file – give a single strand offset file, format see http://rna.urmc.rochester.edu/Text/File_Formats.html#Offset
- Num – choose Num th predicted structure
Returns: dot format of RNA secondary structure and RNA sequence.
Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> seq = "AUAUAAUUAAAAAAUGCAACUACAAGUUCCGUGUUUCUGACUGUUAGUUAUUGAGUUAUU" >>> sequence,dot=RNA_prog.Fold(seq) >>> assert(seq==sequence)
-
__init__(exe_path=None)¶ Initiation of object
Parameters: exe_path – the folder path of the RNAstructure executables Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/")
-
scorer(ct_name1, ct_name2)¶ Use ‘scorer’ pogram to compare a predicted secondary structure to an accepted structure. It calculates two quality metrics, sensitivity and PPV
Parameters: - ct_name1 – The name of a CT file containing predicted structure data.
- ct_name2 – The name of a CT file containing accepted structure data, can only store one structure.
Returns: sensitivity, PPV, number of the best predicted structure.
Example:
>>> ct_name1 = "temp_prediction.ct" >>> ct_name2 = "temp_accept.ct" >>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> sensitivity, PPV, Number = RNA_prog.scorer(ct_name1,ct_name2)
-
Interface class for RNAstructure executable programs.
-
RNAstructure.dot2block(dot_string, name='Default')¶ convert dot format of RNA secondary structure into several linked blocks
Parameters: - dot_string – the dot format of RNA secondary structure
- name – name of the RNA
Returns: A list of all stems, each stem is a dictionary with ‘source’ and ‘target’
Example:
>>> from RNAstructure import dot2block >>> stems = dot2block("(((((...)))...(((...)))..))","RNA_X") >>> print stems [{'source': {'start': 2, 'chr': 'test', 'end': 4}, 'target': {'start': 8, 'chr': 'test', 'end': 10}}, {'source': {'start': 14, 'chr': 'test', 'end': 16}, 'target': {'start': 20, 'chr': 'test', 'end': 22}}, {'source': {'start': 0, 'chr': 'test', 'end': 1}, 'target': {'start': 25, 'chr': 'test', 'end': 26}}]