差分
このページの2つのバージョン間の差分を表示します。
両方とも前のリビジョン 前のリビジョン | |||
star_rsem [2019/12/11 07:46] – [各種ツールインストール] 133.11.144.10 | star_rsem [Unknown date] (現在) – 削除 - 外部編集 (Unknown date) 127.0.0.1 | ||
---|---|---|---|
行 1: | 行 1: | ||
- | ## anaconda install | ||
- | < | ||
- | [kijima.yusuke@m48 work]$ pwd | ||
- | / | ||
- | [kijima.yusuke@m48 work]$ cd work/ | ||
- | [kijima.yusuke@m48 work]$ wget https:// | ||
- | [kijima.yusuke@m48 work]$ chmod +x Anaconda3-2019.10-Linux-x86_64.sh | ||
- | [kijima.yusuke@m48 work]$ ./ | ||
- | [kijima.yusuke@m48 download]$ ./ | ||
- | |||
- | Welcome to Anaconda3 2019.10 | ||
- | |||
- | In order to continue the installation process, please review the license | ||
- | agreement. | ||
- | Please, press ENTER to continue | ||
- | >>> | ||
- | =================================== | ||
- | Anaconda End User License Agreement | ||
- | =================================== | ||
- | |||
- | Copyright 2015, Anaconda, Inc. | ||
- | |||
- | All rights reserved under the 3-clause BSD License: | ||
- | ... | ||
- | |||
- | Do you accept the license terms? [yes|no] | ||
- | [no] >>> | ||
- | |||
- | Anaconda3 will now be installed into this location: | ||
- | / | ||
- | |||
- | - Press ENTER to confirm the location | ||
- | - Press CTRL-C to abort the installation | ||
- | - Or specify a different location below | ||
- | |||
- | [/ | ||
- | PREFIX=/ | ||
- | Unpacking payload ... | ||
- | ... | ||
- | Thank you for installing Anaconda3! | ||
- | </ | ||
- | 一回サーバーから抜ける or bashrcを読み込むことでanaconda環境が完成。ユーザー名の左に(base)の文字が追加される。 | ||
- | < | ||
- | (base) [kijima.yusuke@m48 ~]$ | ||
- | </ | ||
- | |||
- | ## anaconda python3環境の作成 | ||
- | 一応環境を隔離しておく。 | ||
- | < | ||
- | (base) [kijima.yusuke@m48 work]$ conda create -n anac_py37 python=3.7 anaconda | ||
- | </ | ||
- | 終わったら環境を起こす。 | ||
- | < | ||
- | (base) [kijima.yusuke@m48 work]$ conda activate anac_py37 | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ # | ||
- | </ | ||
- | |||
- | ## 各種ツールインストール | ||
- | テキスト通り。 | ||
- | <code bash> | ||
- | # | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ conda install -c bioconda trimmomatic | ||
- | |||
- | #fastqc | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ conda install -c bioconda fastqc | ||
- | |||
- | #STAR | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ conda install -c bioconda star | ||
- | |||
- | #RSEM | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ conda install -c bioconda rsem | ||
- | |||
- | (anac_py37) [kijima.yusuke@m48 work]$ rsem-calculate-expression --version | ||
- | Current version: RSEM v1.3.1 | ||
- | </ | ||
- | テキストとは異なり、現在はconda経由でRSEM v1.3(最新)が落とせっるぽい。ばんざい | ||
- | |||
- | |||
- | ## IGVのインストール | ||
- | conda installでIGVの最新版が落とせないか一応調べる。 | ||
- | < | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ conda search -c bioconda igv | ||
- | Loading channels: done | ||
- | # Name | ||
- | igv | ||
- | igv 2.4.6 | ||
- | igv 2.4.9 | ||
- | igv 2.4.9 | ||
- | igv | ||
- | igv | ||
- | igv 2.5.2 | ||
- | </ | ||
- | 最新版の2.6.1はbiocondaにはないっぽい。テキスト通りバイナリを落とす。 | ||
- | < | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ mkdir rnaseq-training | ||
- | (anac_py37) [kijima.yusuke@m48 work]$ cd rnaseq-training | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ mkdir tool | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ cd tool | ||
- | (anac_py37) [kijima.yusuke@m48 tool]$ wget https:// | ||
- | (anac_py37) [kijima.yusuke@m48 tool]$ unzip IGV_Linux_2.7.2.zip | ||
- | </ | ||
- | 最新が今(2019/ | ||
- | |||
- | ## リファレンスのダウンロード | ||
- | < | ||
- | (anac_py37) [kijima.yusuke@m48 tool]$ cd ~/ | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ mkdir human | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ mkdir human/ref | ||
- | |||
- | #Genome | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ wget -P ~/ | ||
- | |||
- | #Annotation | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ wget -P ~/ | ||
- | </ | ||
- | |||
- | ## シーケンスデータのダウンロード | ||
- | URLのリストを作成。ここはテキストにないので好きにすると良い。逐次ダウンロードでも可。 | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ for i in $(seq 1 6); do for j in $(seq 1 2); do echo " | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ cat ~/ | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | ftp:// | ||
- | </ | ||
- | wgetでダウンロード。アクセス切れたら中断されてしまうのでscreen使うなりリモートデスクトップにつなぐなりする。回線に絶対の自信があるなら何も考えなくても良い。それからforループで外部に連続的にアクセスするのはあまりよろしくないのでsleepを挟む。 | ||
- | <code bash> | ||
- | (base) [kijima.yusuke@m48 rnaseq-training]$ for i in $(cat ~/ | ||
- | </ | ||
- | |||
- | 後々のためアクセッションIDのリストを作成。 | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ mkdir human/data | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ for i in SRR718955{1..6}; | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ cat human/ | ||
- | SRR7189551 | ||
- | SRR7189552 | ||
- | SRR7189553 | ||
- | SRR7189554 | ||
- | SRR7189555 | ||
- | SRR7189556 | ||
- | </ | ||
- | |||
- | ## | ||
- | テキスト通りanaconda3/ | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ find / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | / | ||
- | </ | ||
- | / | ||
- | < | ||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ ln -s / | ||
- | |||
- | (anac_py37) [kijima.yusuke@m48 rnaseq-training]$ ls human/data/ | ||
- | SRR_Acc_list.txt | ||
- | </ | ||
- | Trimmomaticを実行。 | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 data]$ for id in `cat SRR_Acc_list.txt`; | ||
- | </ | ||
- | 終わったらsummrayを見てみる。 | ||
- | <code bash> | ||
- | (base) [kijima.yusuke@m48 data]$ cat summary_SRR7189551.txt | ||
- | Input Read Pairs: 30499473 | ||
- | Both Surviving Reads: 30152544 | ||
- | Both Surviving Read Percent: 98.86 | ||
- | Forward Only Surviving Reads: 318460 | ||
- | Forward Only Surviving Read Percent: 1.04 | ||
- | Reverse Only Surviving Reads: 25447 | ||
- | Reverse Only Surviving Read Percent: 0.08 | ||
- | Dropped Reads: 3022 | ||
- | Dropped Read Percent: 0.01 | ||
- | </ | ||
- | テキストと完全一致。すごい。 | ||
- | |||
- | ## | ||
- | テキスト通り。 | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 data]$ for id in `cat SRR_Acc_list.txt`; | ||
- | </ | ||
- | 結果は確認しておきましょう。 | ||
- | |||
- | ## | ||
- | いよいよマッピング。まずはSTAR用のリファレンス作成から。 | ||
- | ### | ||
- | gunzipファイルを解凍する。テキスト通りgtfは少し編集する。 | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 data]$ cd ../ref/ | ||
- | (anac_py37) [kijima.yusuke@m48 ref]$ ls | ||
- | GRCh38.primary_assembly.genome.fa.gz | ||
- | (anac_py37) [kijima.yusuke@m48 ref]$ gunzip GRCh38.primary_assembly.genome.fa.gz | ||
- | (anac_py37) [kijima.yusuke@m48 ref]$ zcat gencode.v32.annotation.gtf.gz | grep -v _PAR_Y > gencode.v32.annotation.gtf | ||
- | </ | ||
- | リファレンス作成。オプションの意味はテキスト読んでください。 | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 ref]$ mkdir star_rsem | ||
- | (anac_py37) [kijima.yusuke@m48 ref]$ nohup STAR --runMode genomeGenerate --genomeDir star_rsem --runThreadN 12 --sjdbOverhang 99 --genomeFastaFiles GRCh38.primary_assembly.genome.fa --sjdbGTFfile gencode.v32.annotation.gtf & | ||
- | </ | ||
- | |||
- | ### | ||
- | やるだけ | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 data]$ for id in `cat SRR_Acc_list.txt`; | ||
- | </ | ||
- | 結果を確認。 | ||
- | <code bash> | ||
- | (base) [kijima.yusuke@m48 data]$ cat SRR7189551/ | ||
- | Number of input reads | 30152544 | ||
- | Average input read length | 199 | ||
- | UNIQUE READS: | ||
- | | ||
- | Uniquely mapped reads % | 94.74% | ||
- | Average mapped length | 199.28 | ||
- | ... | ||
- | | ||
- | Number of reads mapped to multiple loci | 1355491 | ||
- | % of reads mapped to multiple loci | 4.50% | ||
- | Number of reads mapped to too many loci | 14845 | ||
- | % of reads mapped to too many loci | 0.05% | ||
- | </ | ||
- | マッピング率がテキストのバージョンよりわずかに向上していた。 | ||
- | |||
- | ## | ||
- | ### | ||
- | リファレンスを作成。テキスト通り。 | ||
- | <code bash> | ||
- | (base) [kijima.yusuke@m48 data]$ cd ../ref/ | ||
- | (base) [kijima.yusuke@m48 ref]$ conda activate anac_py37 | ||
- | (anac_py37) [kijima.yusuke@m48 ref]$ rsem-prepare-reference --gtf gencode.v32.annotation.gtf -p 12 GRCh38.primary_assembly.genome.fa star_rsem/ | ||
- | rsem-extract-reference-transcripts star_rsem/ | ||
- | Parsed 200000 lines | ||
- | Parsed 400000 lines | ||
- | Parsed 600000 lines | ||
- | Parsed 800000 lines | ||
- | Parsed 1000000 lines | ||
- | Parsed 1200000 lines | ||
- | Parsed 1400000 lines | ||
- | Parsed 1600000 lines | ||
- | Parsed 1800000 lines | ||
- | Parsed 2000000 lines | ||
- | Parsed 2200000 lines | ||
- | Parsed 2400000 lines | ||
- | Parsed 2600000 lines | ||
- | Parsed 2800000 lines | ||
- | Parsing gtf File is done! | ||
- | GRCh38.primary_assembly.genome.fa is processed! | ||
- | 227301 transcripts are extracted. | ||
- | Extracting sequences is done! | ||
- | Group File is generated! | ||
- | Transcript Information File is generated! | ||
- | Chromosome List File is generated! | ||
- | Extracted Sequences File is generated! | ||
- | |||
- | rsem-preref star_rsem/ | ||
- | Refs.makeRefs finished! | ||
- | Refs.saveRefs finished! | ||
- | star_rsem/ | ||
- | star_rsem/ | ||
- | </ | ||
- | |||
- | ### | ||
- | <code bash> | ||
- | (anac_py37) [kijima.yusuke@m48 data]$ cd rsem/ | ||
- | (anac_py37) [kijima.yusuke@m48 rsem]$ rsem-generate-data-matrix SRR718955{1..6}.genes.results | sed -e ' | ||
- | (anac_py37) [kijima.yusuke@m48 rsem]$ rsem-generate-data-matrix SRR718955{1..6}.isoforms.results | sed -e ' | ||
- | |||
- | (anac_py37) [kijima.yusuke@m48 rsem]$ sed ' | ||
- | (anac_py37) [kijima.yusuke@m48 rsem]$ sed ' | ||
- | </ | ||
- | |||
- | テキストのメインはここまで。あとはIGVでマッピング状況を見てみたり発現量解析に進んだりしましょう。 |