2019_blast_practice_example

差分

このページの2つのバージョン間の差分を表示します。

この比較画面へのリンク

次のリビジョン
前のリビジョン
2019_blast_practice_example [2019/08/05 04:39] – 作成 suikou2019_blast_practice_example [2019/08/05 05:24] (現在) suikou
行 1: 行 1:
 # 2019 blast Practice example # 2019 blast Practice example
  
-## The location where gene prediction of pearl Exoc 7 seems to be wrong estimated from closely related species+## The location where gene prediction of pearl Exoc7 seems to be wrong
  
 First of all file preparation First of all file preparation
行 68: 行 68:
 ``` ```
  
 +Next, blastx will be performed with the amino acid sequence of ```exoc7_c.gigas``` as DB, and the DNA sequence of ```exoc7_pearl``` as a query.
  
 +```
 +blastx -db exoc7_c.gigas.faa -query exoc7_pearl.fna -num_threads 4
 +```
 +
 +As a result, the following output is obtained.
 +
 +```
 +Database: exoc7_c.gigas.faa
 +           1 sequences; 612 total letters
 +
 +
 +
 +Query= exoc7_pearl
 +
 +Length=1566
 +                                                                      Score     E
 +Sequences producing significant alignments:                          (Bits)  Value
 +
 +  EKC30356.1 Exocyst complex component 7 [Crassostrea gigas]          426     6e-148
 +
 +
 +> EKC30356.1 Exocyst complex component 7 [Crassostrea gigas]
 +Length=612
 +
 + Score = 426 bits (1096),  Expect = 6e-148, Method: Compositional matrix adjust.
 + Identities = 196/259 (76%), Positives = 235/259 (91%), Gaps = 0/259 (0%)
 + Frame = +1
 +
 +Query  787   IQNDPERSSMPRDGTVHELTNHTIIVLEPLLDYAETAGAMLLTHGEHAVPSDAVDVKKSK  966
 +             I+ DPE++SMP+DGTVHELTN TII LEPL DYA+TAGAMLL HGE A PS+AVD KKSK
 +Sbjct  354   IKTDPEKASMPKDGTVHELTNRTIIFLEPLQDYADTAGAMLLLHGEQAAPSEAVDPKKSK  413
 +
 +Query  967   LKLADYITKVLSALGLNLSNKAETYSDPILRHVFMLNNYHYILKSLKRSGVLELIHTWNK  1146
 +             ++LADYITK LSALGLNL+ KAETYSDP LR VFMLNNYHYILKSLKRSG+L+LIHTWNK
 +Sbjct  414   MRLADYITKTLSALGLNLTIKAETYSDPTLRPVFMLNNYHYILKSLKRSGLLDLIHTWNK  473
 +
 +Query  1147  DVGQFYEDQIHEQKRLYSQSWSKVLHFVLEMNEPISQQRIQQMETSKIKDKEKQNIKDKF  1326
 +             DVGQFYED+I+EQK+LYS+SWS+V+H++ E++EPISQQRIQ ME SK+KDKEKQNIKDKF
 +Sbjct  474   DVGQFYEDRINEQKKLYSESWSRVMHYITEVHEPISQQRIQAMENSKLKDKEKQNIKDKF  533
 +
 +Query  1327  SGFNKELEEISRVQKAYAIPDPELRDNIKKDNKEYIVPRYKLFLEKFQRLNFTKNSEKYM  1506
 +             SGFNKELE+I ++QK YAIPDPELR+ +KKDNK++I+P +++FL+KF+RLNFTKN EKY+
 +Sbjct  534   SGFNKELEDILKIQKGYAIPDPELREQMKKDNKDFIIPAFRMFLDKFKRLNFTKNPEKYI  593
 +
 +Query  1507  KYTVKDVEETLDKFFDTSA  1563
 +             KY+V+DV E +DK FD SA
 +Sbjct  594   KYSVQDVAEVVDKLFDMSA  612
 +
 +
 + Score = 389 bits (998),  Expect = 2e-133, Method: Compositional matrix adjust.
 + Identities = 203/341 (60%), Positives = 250/341 (73%), Gaps = 33/341 (10%)
 + Frame = +1
 +
 +Query  49    MLTILQSFENRLKKLEGTVEPVYNETEMLRRRQENIEKTMTTLDNVLGYYHIAKDVQDLI  228
 +             MLTILQSFENRL+KLE TVEPVYNETEMLRRRQENIEKTM TLDNVLGYYH+ K+V++ I
 +Sbjct  1     MLTILQSFENRLRKLENTVEPVYNETEMLRRRQENIEKTMVTLDNVLGYYHVGKEVEEFI  60
 +
 +Query  229   KEGPVVCGLEKYLSTMDRLLQALNYFNKHNPTSLEVTDVIKVYDDGKDTLNAEFRSLLGR  408
 +             KEGP  CGLEKYLS MDRL+QA NYFNKHNPTSLE+TDVI+VYDDGK+ L  EFR+LLGR
 +Sbjct  61    KEGPHNCGLEKYLSIMDRLVQAHNYFNKHNPTSLELTDVIRVYDDGKEALVIEFRTLLGR  120
 +
 +Query  409   HCRPVPAVTILDLLGPDEELQTMENDAPIEHLPEKIVNDLTLIAKWLYTNGKATEYMKDY  588
 +             HCRPVP V +LD++  DEELQ   +D  +EHLPEKI+ +L+LI+ WL+ N K TEYMKDY
 +Sbjct  121   HCRPVPPVMVLDMISTDEELQG-SDDIQLEHLPEKILTELSLISTWLFNNTKNTEYMKDY  179
 +
 +Query  589   TKVRSQMLLYSLQGNSIKRKATTALMQSPFDPGHRRQGSYNELTKEESFDVEIDIYITEL  768
 +             T+ RS ML+ SLQG+S KR+A   LMQSPFDPG++RQGS+ EL KEE+ DVE+DIYITEL
 +Sbjct  180   TRSRSSMLIKSLQGHSFKRRAVITLMQSPFDPGNKRQGSHAELPKEENLDVEVDIYITEL  239
 +
 +Query  769   TALLKLIQNDPERSSMPRDGTVHELTNHTII--VLEPLLDYAETAGAMLLTHGEHAVPSD  942
 +             +ALLKLIQ++ +  S    G + +  + ++   +++  LD     G +L
 +Sbjct  240   SALLKLIQSEAQLMS----GIIADKHHRSVFDNIIQEGLDSVIKNGELL-----------  284
 +
 +Query  943   AVDVKKSKLKLADYITKVLSALGLNLSNKAETYSDPILRHV  1065
 +             AV+ KKS  K  D+I  VLS               P+L+H+
 +Sbjct  285   AVNAKKSIAK-HDFIN-VLSVF-------------PVLKHL  310
 +```
 +
 +There are two aligned sequences, and looking at the second alignment, “Sbjct” is the sequence of ```exoc7_c.gigas``` used as DB and is homologous to ```exoc7_pearl```, but the homology after 248 amino acids is low. Since ```exoc7_pearl``` of the query is a DNA sequence, it increases by 3 bp per character, and the region of 787 bp or later corresponds to a region of low homology. Then, when looking at the first alignment, the alignment of ```exoc7_pearl``` starts from 787 bp, while the ```exoc7_c.gigas``` starts from 354 amino acids. Therefore, it is understood that the 248 to 353 amino acids of ```exoc7_c.gigas``` are deleted in predicted ```exoc7_pearl```.
 +
 +## Investigate which scaffold of the pearl oyster genome has the deleted sequences in gene prediction.
 +
 +First, the deleted sequence is stored in FASTA format.
 +
 +```
 +echo ">exoc7_deleted
 +SEAQLMSGIIADKHHRSVFDNIIQEGLDSVIKNGELLAVNAKKSIAKHDFINVLSVFPVLKHLRSIKPEFDLTLEGCATPTRAKLTSLLSTLGSTAAKALEEFALS" > exoc7_deleted.faa
 +```
 +
 +Next, create a blast index of the pearl oyster genome.
 +
 +```
 +makeblastdb -in pfu_genome1.0.fasta -dbtype nucl
 +```
 +
 +Search the pearl oyster genome with tblastn, using the deleted sequence as a query.
 +
 +```
 +tblastn -db pfu_genome1.0.fasta -query exoc7_deleted.faa -num_threads 4
 +```
 +
 +As a result,
 +
 +```
 +Query= exoc7_deleted
 +
 +Length=106
 +                                                                      Score     E
 +Sequences producing significant alignments:                          (Bits)  Value
 +
 +  scaffold294819.1|size544                                            118     2e-33
 +
 +
 +> scaffold294819.1|size544
 +Length=544
 +
 + Score = 118 bits (295),  Expect = 2e-33, Method: Compositional matrix adjust.
 + Identities = 55/75 (73%), Positives = 67/75 (89%), Gaps = 0/75 (0%)
 + Frame = -3
 +
 +Query  1    SEAQLMSGIIADKHHRSVFDNIIQEGLDSVIKNGELLAVNAKKSIAKHDFINVLSVFPVL  60
 +            SEAQLMSGII +KHHRSVF++II+  LD V+K GE LA NAKKSI+KHDF++VLSVFPV+
 +Sbjct  227  SEAQLMSGIIPEKHHRSVFESIIEGSLDMVVKGGETLASNAKKSISKHDFLSVLSVFPVV  48
 +
 +Query  61   KHLRSIKPEFDLTLE  75
 +            +HLR++KPEFDL LE
 +Sbjct  47   RHLRTVKPEFDLALE  3
 +```
 +
 +is obtained, and it can be seen that there is one Exoc7 exon in the scaffold 29491. 1 (Exon of Exoc7 other than this exon is in scaffold 1200).
  
  • 2019_blast_practice_example.1564979945.txt.gz
  • 最終更新: 2019/08/05 04:39
  • by suikou