バイオインフォマティクス_ツールインストール方法_2019_en

Windows

1.Install Windows Subsystem for Linux (WSL)

Confirm that the version of Windows 10 is 1803 (spring 2018), 1809 (fall 2018) and 1903 (spring 2019). Open PowerShell with administrator privileges to enable WSL. (Right-click the Windows logo at the bottom left of the screen → Windows PowerShell (Admin))

Paste and execute the following command to enable the WSL function.

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

Then restart Windows

2.Install Ubuntu

Left-click the Windows logo on the lower left of the screen, and start “Microsoft Store” from the menu. Click Search in the store and enter Ubuntu. Install and launch the displayed Ubuntu 18.04 LTS. (Other Ubuntu seems to be fine but 18.04 is recommended.)

Start Ubuntu from the start menu by left-clicking the Windows logo on the lower left of the screen. The user name and password should be typed at the first startup.

3.Install wsl-terminal

The default Windows terminal is fatally difficult to use because of font misalignment. Download and unzip the following wsl-terminal to an appropriate folder. Run “open-wsl.exe” to start WSL. https://github.com/goreliu/wsl-terminal/releases/download/v0.8.13/wsl-terminal-0.8.13.zip

Alternatively, if the version of your Windows is 1903, Microsoft's new terminal can be used, so you can search for and install “Windows Terminal” in the Microsoft Store.

4.Install Docker

Docker is an ultra-lightweight virtual PC that starts up in about a second and performs container-type virtualization.

Advantages

-A lightweight, it uses only a few tens of megabytes of disk space with a base guest OS.

-There is a DockerHub as a repository of unlimited capacity.

Disadvantages

-It uses Linux-specific features, guest OS is only for Linux, and host OS is limited to new Linux.

-It is faster because it is not completely virtualized like a virtual machine, but it requires careful security and requires administrator privileges to use it. At supercomputer we will use container-type virtualization software called singularity, not docker, but it requires root privileges at the time of installation. (singularity has been installed on the Tokyo University Shirokane supercomputer, so it can be used without root privileges.)

from https://blog.cloudboost.io/docker-vs-vm-548032d3ef58

Paste the following long command on the launched WSL screen. (When pasting, click middle click or right click and select from menu)

cat << 'EOF2' | bash
if [ `which docker|wc -l` = 0 ];then
 sudo sed -i 's/%sudo\tALL=(ALL:ALL) ALL/%sudo\tALL=NOPASSWD: ALL/' /etc/sudoers
 sudo sed -i.bak -e "s%http://[^ ]\+%http://ftp.jaist.ac.jp/pub/Linux/ubuntu/%g" /etc/apt/sources.list
 sudo apt-get update
 sudo apt install -y libltdl7 cgroupfs-mount
 cd
 wget https://download.docker.com/linux/ubuntu/dists/xenial/pool/stable/amd64/docker-ce_17.03.3~ce-0~ubuntu-xenial_amd64.deb
 sudo dpkg -i docker-ce_17.03.3~ce-0~ubuntu-xenial_amd64.deb
fi

if [ `id -a $USER|grep "(docker)"|wc -l` = 0 ]; then
 sudo usermod -aG docker $USER
fi

if [ `service docker status|grep " is running"|wc -l` = 0 ]; then
 powershell.exe start-process bash -verb runas -ArgumentList "'"'-c "sudo cgroupfs-mount; sudo service docker start"'"'"
fi

if [ `grep DOCKER ~/.bashrc|wc -l` = 0 ]; then
cat << 'EOF' >> ~/.bashrc
#for docker
alias DOCKER='docker run -it --rm -v $PWD:$PWD -w $PWD'
shopt -s expand_aliases
if [ `service docker status|grep " is running"|wc -l` = 0 ]; then
 powershell.exe start-process bash -verb runas -ArgumentList "'"'-c "sudo cgroupfs-mount; sudo service docker start"'"'"
fi
EOF
fi

if [ "`gcc --version 2> /dev/null`" = "" ]; then
 sudo apt install -y build-essential
fi

EOF2
exit
 

You will be asked for the password only once. Enter it. Also, a dialog asking if you want to execute bash with administrator privileges will be displayed, so click “Yes”.

When finished successfully, the terminal will close, so reopen the wsl-terminal (you may be prompted to run bash again with administrator privileges, but then click “Yes”).

Please type the following command.

docker run hello-world

If you can see “Hello from Docker!”, you have installed docker successfully.

Mac

1.Install Docker

Make sure that the OS version is OS X Sierra 10.12 or later.

Download Docker Desktop for Mac from the following URL, double-click the dmg file, and follow the instructions to complete the installation.

https://download.docker.com/mac/stable/Docker.dmg

2.Change the setting of Docker

The memory limit of Docker's virtual machine is low by default, so click the Docker icon (picture of a whale) at the top of the screen, click Preferences …, open the Advanced tab. The CPU is set to the number of CPU cores, and the memory is set to your computer's maximum memory size excluding about 1 to 2 GB for the OS.

3.Run Docker

Open Finder and start Applications → Utilities → Terminal.

Type the following command, and check if you can see “Hello from Docker!”

docker run hello-world

4.Install Homebrew

The command-line tools of Mac are 10 years old, so you need to install new tools. Open the terminal and paste the following commands.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

echo "export PATH=/usr/local/opt/coreutils/libexec/gnubin:/usr/local/bin:/usr/local/sbin:${PATH} >> ~/.bash_profile"
source ~/.bash_profile

# You may not need the next two lines. If you get an error with brew install, you need to run.
mkdir -p /usr/local/sbin /usr/local/opt
sudo chown $USER /usr/local/sbin /usr/local/opt

brew install grep gawk gzip ed htop iftop
brew install gnu-tar gnu-sed gnu-time gnu-getopt
brew install binutils findutils diffutils coreutils moreutils

You will be asked for your Mac password, so enter it.

Install Bioinformatics tools

Biocontainers: For reproducibility of Bioinformatics, they aim to be able to easily install the tool. https://biocontainers.pro/#/registry It seems that over 7,000 tools are currently registered, but the following tools are not registered.

-License required (e.g. CLC, USEARCH, SSPACE) -Huge database (e.g. Polyphen2, PROVEAN) -Minor tools

Many tools are registered, so first we will learn how to use Biocontainers.

Example: If you want to use a mapping tool called BWA

  1. Search by putting bwa in the search box
  2. Since the display order is messy, sort by the order of few people as Sorts by: Pull No. The BWA displayed at the end of the page (2.6 M download) is the desired container and it opens.
  3. Once again, the displayed order is messy, so click the arrow next to Modified or Version to arrange the versions in order of newness.
  4. Choose the most likely new version. For example, if you paste a command such as “docker pull quay.io/biocontainers/bwa:0.7.17–h84994c4_5” as it is on the terminal, it will download the desired container. (If you skip this step, it will download automatically when the container was started, so you can skip it.)
  5. To use the downloaded container, execute as follows.
docker run -it --rm -v $PWD:$PWD -w $PWD quay.io/biocontainers/bwa:0.7.17--h84994c4_5 bash
# Since Windows users have registered shortcuts at above setup, the command can be shortened as below.
DOCKER quay.io/biocontainers/bwa:0.7.17--h84994c4_5 bash

Change the container name (quay.io/biocontainers/bwa:0.7.17–h84994c4_5).

The commands installed in each container are mostly installed in /usr/local/bin.

When you have finished running the tool, you can quit Docker by typing exit.

Since the website of Biocontainers is falling well, then you can look directly at QUAY.IO.

https://quay.io/organization/biocontainers

However, if you search here, you should enter all letters as lowercase.

I will introduce a procedure to run a CentOS 7 container with Docker and create a new Docker container with an appropriate name. First, start the CentOS 7 container.

docker run -it centos:7 bash

Try installing the necessary tools. For example, if you install wget

yum install -y wget

When installation is complete, exit with exit. In order to save the container in which the tool is installed, you first need to know the container ID, so execute the following command.

docker ps -a

Make a note of its ID, as the container displayed at the top is the newest container. (Example: it is a94dd36ef032) When you enter the following command, it will be registered in your PC as a new container.

docker commit a94dd36ef032 my_dockerhub_id/wget:latest

You can see the list of registered images.

docker images

If you are registered at dockerhub, you can publish the container to other people with the following command:

docker push my_dockerhub_id/wget:latest

Unnecessary containers can be deleted by the following command.

docker rm a94dd36ef032

As described above, it is also possible to create a new image by a predetermined procedure. In addition to above, we can create new image by a file named Dockerfile.

For example, the homology search tool diamond is not registered in BioContainers, but the creator of the tool creates a Dockerfile, so you can easily create an image yourself. The procedure is as follows.

git clone https://github.com/bbuchfink/diamond.git
cd diamond
docker build -t diamond .
#The name of the image follows -t, so give a descriptive name (in this case, "diamond")

After installation, you can use the docker image as follows.

docker run -it --rm -v $PWD:$PWD -w $PWD diamond sh

There are many other tools that distribute Dockerfile recently.

Because docker needs administrator privileges (root privileges), it can not be used on a shared server. Instead, software called singularity may be available. Both the supercomputer at the University of Tokyo Shirokane campus and the supercomputer at the National Institute of Genetics can use singularity.

Public images of docker registered in DockerHub, QUAY, etc. can be used for singularity. You can log in to the server and execute it like the following command. (In the case of our laboratory servers, be sure to move to the “work” work folder and then perform analysis etc. Unfortunately, although WSL can create images of singularity, it can not start containers. )

First, convert the docker public image to a singularity image. The example below uses a bwa image from BioContainers.

singularity pull --name bwa.sif docker://quay.io/biocontainers/bwa:0.7.17--h84994c4_5

The image bwa.sif is created. When converting other images, change the image name created with --name and the URL of the docker image after docker:// as appropriate. After that, the bwa container starts with the next command, so you can use it the same way as docker.

singularity shell bwa.sif

When you want to start only bwa,

singularity exec bwa.sif bwa

In addition, to make the shortcut of bwa using singularity, you can add an alias to ~/.bashrc as follows:

shopt -s expand_aliases #to enable bwa aliases in batch scripts.
alias bwa='singularity exec /suikou/files/m48/user2/work/img/bwa.sif bwa' #Specify the image of singularity with an absolute path.

from next time

From the next time, please decide the category of tools to be surveyed (may overlap with other people), and report one or more tools in two weeks. After the second survey, compare the results of the tools you examined in the past. (About computational time, memory usage, disk usage, accuracy, sensitivity, etc.)

Fill in the results of the survey on the WordPress blog below. Any language is acceptable. However, the screen shown when explaining at the briefing session should be in English. (Please write English in advance, or confirm that Google translation becomes natural English.)

http://www.suikou.fs.a.u-tokyo.ac.jp/blog/

Login is from the following URL, ID, password is the same as when logging in to the laboratory server. However, change your password and e-mail address after login.

http://webpark1634.sakura.ne.jp/blog/wp-login.php

Please contact me if you want to use a new WordPress plugin.

Category and Tool Example

Read QC
 illumina: FASTQC
 nanopore: nanoQC
Read trimming
 illumina: FASTX-toolkit, trimmomatic, sickle
 nanopore: Nanofilt,
k-mer analysis
 genome size prediction: KmerGenie, KAT, GenomeScope (docker pull greatfireball/ime_genomescope)
homology search
 blast, MAGICBLAST, last, diamond, ghostx, blat
mapping
 whole genome illumina→genome: bwa aln, bwa mem, bowtie2, subread, soapaligner
 whole genome nanopore→genome: minimap2, minialign, last
 RNA-seq→genome: tophat, hisat2, star
 cDNA→genome: exonerate, GeneWise (https://www.ebi.ac.uk/~birney/wise2/), gmap, spaln, minimap2
 RNA-seq→cDNA: kallisto, RSEM, salmon
 rRNA→rRNA: blast, usearch
assembly
 genome illumina: CLC, SOAPdenovo, Platanus (docker pull c2997108/platanus:1.2.4), ABySS, SPAdes, MaSuRCA, Meraculous
 genome nanopore: canu, flye, Ra (docker pull c2997108/ra:2018-12-11), Redbean (wtdbg), Unicycler, Manta, FALCON
 metagenome illumina: megahit, metaSPAdes (SPAdes)
 RNA-seq illumina: Trinity, TransABySS, SOAPdenovo-Trans, rnaSPAdes (SPAdes)
scaffolding
 illumina: BESST, SSPACE
 illumina RNA-seq: BESST_RNA (https://github.com/ksahlin/BESST_RNA), Rascaf, P_RNA_scaffolder (lost?)
 pacbio, nanopore: LINKS, quickmerge (https://github.com/mahulchak/quickmerge)
 using close species: Chromosomer, MEDUSA, AlignGraph (https://github.com/baoe/AlignGraph)
gap close
 pacbio, nanopore: LR_Gapcloser (https://github.com/CAFS-bioinformatics/LR_Gapcloser), GMcloser
merge assembly
 Metassembler (https://sourceforge.net/projects/metassembler/), GAM-NGS (https://github.com/vice87/gam-ngs), Mix (https://github.com/cbib/MIX)
poolishing
 pilon, racon
assembly QC
 genome: QUAST-LG, assembly-stats, REAPR (https://www.sanger.ac.uk/science/tools/reapr), BUSCO
 cDNA: TransRate
clustering
 CD-HIT, SSEARCH, VSEARCH
reference guided assembly
 cufflinks, stringtie, strawberry (https://github.com/ruolin/Strawberry)
SNP calling
 GATK, bcftools mpileup, freebayes, VarScan
SV (Structural Variants) calling
 pindel, breakdancer, Manta
databases
 rDNA: SILVA, RDP
statistics
 DE: edgeR, DESeq2, cuffdiff, slueth, ballgown
annotation
 transcriptome: dammit, trinotate
pipeline
 shotgun metagenome: MG-RAST, Sunbeam,
 16S rRNA metagenome: QIIME
 RNA-seq: SPARTA, VIPER, iDEP
 single cell RNA-seq: zUMIs, seurat

The following is a list of slides submitted when we investigated the enrichment analysis tool two years ago.

http://www.suikou.fs.a.u-tokyo.ac.jp/document/

The following sequence is the predicted nucleotide sequence of a certain gene in pearl oysters.

atgactctgaaggatgccctcaacaaaagtcacacaaatacaggaaacatgctcacaata
cttcaaagctttgaaaatcgtttaaagaagttagagggaacagttgagcctgtttacaat
gagacagaaatgctgcggcgcagacaagaaaatatagagaaaactatgacaacactggac
aatgtgctgggttactaccatattgctaaagatgtacaagatttgattaaagaaggtcca
gtagtttgtggtctggagaagtacctgtctactatggaccggctgctccaagcactgaac
tactttaataaacataacccaaccagtctggaagtgacagatgtcatcaaagtatatgat
gatggtaaagatacattgaatgcagagttccgtagtttacttggtcgtcactgtcgtccg
gtgccggctgttactatactggatttactaggaccagatgaagagttacaaacaatggaa
aatgatgcacccatagaacatctgcctgagaaaattgtgaatgatttaaccctcatcgca
aagtggctatacaccaatggtaaagctacagagtatatgaaagattacaccaaagtcagg
tcccaaatgctcctctactctctgcaggggaactcaataaagcggaaggctaccacggcc
ttgatgcagtccccttttgatccaggtcatagaagacaaggctcttataacgaattgaca
aaagaggaaagttttgatgttgaaattgatatctacataacagaactaacagcattgctg
aaacttattcagaatgaccctgagagatcttcgatgccccgagacggtacagttcatgaa
ctgacaaaccataccattatagtactggagcccctgttagattatgctgagacagctggg
gccatgttactcacccatggtgaacatgcagttccatctgatgctgtggatgtcaagaaa
agtaaactcaagttggctgactatatcactaaggttttgtcagcattaggattaaactta
agtaacaaggcagaaacttacagtgatccaatactcagacatgtgttcatgcttaataac
tatcactacatactcaagtctttaaaaaggtctggggtattagaattaattcacacatgg
aataaagatgtaggacagttttatgaggaccagatacatgaacaaaaaagactttattcc
cagagctggagtaaagttctacattttgtactggaaatgaatgagccaatatcccaacaa
agaatccagcaaatggagacatcaaagataaaggacaaagaaaagcagaatataaaagac
aagttctctggattcaacaaagagttggaagaaatctcacgtgttcagaaagcatacgcc
attcctgatccagaactgagggacaatatcaagaaagacaataaagaatatattgtgccg
cgatacaagcttttcttagaaaaatttcaacggctgaacttcacaaagaattcagaaaaa
tatatgaaatacactgtaaaggatgtggaagaaacacttgataaatttttcgatacttca
gcttaa

This sequence is homologous to the Exocyst complex component 7 of the closely related species C. gigas, but some Exons are missing. (Gene prediction is wrong.) Let's investigate it. Please use BLAST's blastx command.

How to use BLAST: https://togotv.dbcls.jp/20170606.html

The amino acid sequence of Exocyst complex component 7 [Crassostrea gigas]

>EKC30356.1 Exocyst complex component 7 [Crassostrea gigas]
MLTILQSFENRLRKLENTVEPVYNETEMLRRRQENIEKTMVTLDNVLGYYHVGKEVEEFIKEGPHNCGLE
KYLSIMDRLVQAHNYFNKHNPTSLELTDVIRVYDDGKEALVIEFRTLLGRHCRPVPPVMVLDMISTDEEL
QGSDDIQLEHLPEKILTELSLISTWLFNNTKNTEYMKDYTRSRSSMLIKSLQGHSFKRRAVITLMQSPFD
PGNKRQGSHAELPKEENLDVEVDIYITELSALLKLIQSEAQLMSGIIADKHHRSVFDNIIQEGLDSVIKN
GELLAVNAKKSIAKHDFINVLSVFPVLKHLRSIKPEFDLTLEGCATPTRAKLTSLLSTLGSTAAKALEEF
ALSIKTDPEKASMPKDGTVHELTNRTIIFLEPLQDYADTAGAMLLLHGEQAAPSEAVDPKKSKMRLADYI
TKTLSALGLNLTIKAETYSDPTLRPVFMLNNYHYILKSLKRSGLLDLIHTWNKDVGQFYEDRINEQKKLY
SESWSRVMHYITEVHEPISQQRIQAMENSKLKDKEKQNIKDKFSGFNKELEDILKIQKGYAIPDPELREQ
MKKDNKDFIIPAFRMFLDKFKRLNFTKNPEKYIKYSVQDVAEVVDKLFDMSA

In addition, download the pearl oyster genome from the following URL and find out which scaffold has the exon lacking in gene prediction. The command to extract the fasta.gz file is gzip -d .

https://marinegenomics.oist.jp/pearl/download/pfu_genome1.0.fasta.gz

2019 blast practice example


Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 4096 bytes) in /home/webpark1634/www/yosh/lib/plugins/authplain/auth.php on line 441
dokuwiki\Exception\FatalException: Allowed memory size of 536870912 bytes exhausted (tried to allocate 4096 bytes)

dokuwiki\Exception\FatalException: Allowed memory size of 536870912 bytes exhausted (tried to allocate 4096 bytes)

An unforeseen error has occured. This is most likely a bug somewhere. It might be a problem in the authplain plugin.

More info has been written to the DokuWiki error log.