# Windows 1.Install Windows Subsystem for Linux (WSL) Confirm that the version of Windows 10 is 1803 (spring 2018), 1809 (fall 2018) and 1903 (spring 2019). Open PowerShell with administrator privileges to enable WSL. (Right-click the Windows logo at the bottom left of the screen → Windows PowerShell (Admin)) Paste and execute the following command to enable the WSL function. ``` Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux ``` Then restart Windows 2.Install Ubuntu Left-click the Windows logo on the lower left of the screen, and start "Microsoft Store" from the menu. Click Search in the store and enter Ubuntu. Install and launch the displayed Ubuntu 18.04 LTS. (Other Ubuntu seems to be fine but 18.04 is recommended.) Start Ubuntu from the start menu by left-clicking the Windows logo on the lower left of the screen. The user name and password should be typed at the first startup. 3.Install wsl-terminal The default Windows terminal is fatally difficult to use because of font misalignment. Download and unzip the following wsl-terminal to an appropriate folder. Run "open-wsl.exe" to start WSL. https://github.com/goreliu/wsl-terminal/releases/download/v0.8.13/wsl-terminal-0.8.13.zip Alternatively, if the version of your Windows is 1903, Microsoft's new terminal can be used, so you can search for and install "Windows Terminal" in the Microsoft Store. 4.Install Docker Docker is an ultra-lightweight virtual PC that starts up in about a second and performs container-type virtualization. Advantages -A lightweight, it uses only a few tens of megabytes of disk space with a base guest OS. -There is a DockerHub as a repository of unlimited capacity. Disadvantages -It uses Linux-specific features, guest OS is only for Linux, and host OS is limited to new Linux. -It is faster because it is not completely virtualized like a virtual machine, but it requires careful security and requires administrator privileges to use it. At supercomputer we will use container-type virtualization software called singularity, not docker, but it requires root privileges at the time of installation. (singularity has been installed on the Tokyo University Shirokane supercomputer, so it can be used without root privileges.) {{vm-docker.png}} from https://blog.cloudboost.io/docker-vs-vm-548032d3ef58 Paste the following long command on the launched WSL screen. (When pasting, click middle click or right click and select from menu) ``` cat << 'EOF2' | bash if [ `which docker|wc -l` = 0 ];then sudo sed -i 's/%sudo\tALL=(ALL:ALL) ALL/%sudo\tALL=NOPASSWD: ALL/' /etc/sudoers sudo sed -i.bak -e "s%http://[^ ]\+%http://ftp.jaist.ac.jp/pub/Linux/ubuntu/%g" /etc/apt/sources.list sudo apt-get update sudo apt install -y libltdl7 cgroupfs-mount cd wget https://download.docker.com/linux/ubuntu/dists/xenial/pool/stable/amd64/docker-ce_17.03.3~ce-0~ubuntu-xenial_amd64.deb sudo dpkg -i docker-ce_17.03.3~ce-0~ubuntu-xenial_amd64.deb fi if [ `id -a $USER|grep "(docker)"|wc -l` = 0 ]; then sudo usermod -aG docker $USER fi if [ `service docker status|grep " is running"|wc -l` = 0 ]; then powershell.exe start-process bash -verb runas -ArgumentList "'"'-c "sudo cgroupfs-mount; sudo service docker start"'"'" fi if [ `grep DOCKER ~/.bashrc|wc -l` = 0 ]; then cat << 'EOF' >> ~/.bashrc #for docker alias DOCKER='docker run -it --rm -v $PWD:$PWD -w $PWD' shopt -s expand_aliases if [ `service docker status|grep " is running"|wc -l` = 0 ]; then powershell.exe start-process bash -verb runas -ArgumentList "'"'-c "sudo cgroupfs-mount; sudo service docker start"'"'" fi EOF fi if [ "`gcc --version 2> /dev/null`" = "" ]; then sudo apt install -y build-essential fi EOF2 exit ``` You will be asked for the password only once. Enter it. Also, a dialog asking if you want to execute bash with administrator privileges will be displayed, so click "Yes". When finished successfully, the terminal will close, so reopen the wsl-terminal (you may be prompted to run bash again with administrator privileges, but then click "Yes"). Please type the following command. ``` docker run hello-world ``` If you can see "Hello from Docker!", you have installed docker successfully. # Mac 1.Install Docker Make sure that the OS version is OS X Sierra 10.12 or later. Download Docker Desktop for Mac from the following URL, double-click the dmg file, and follow the instructions to complete the installation. https://download.docker.com/mac/stable/Docker.dmg 2.Change the setting of Docker The memory limit of Docker's virtual machine is low by default, so click the Docker icon (picture of a whale) at the top of the screen, click Preferences ..., open the Advanced tab. The CPU is set to the number of CPU cores, and the memory is set to your computer's maximum memory size excluding about 1 to 2 GB for the OS. 3.Run Docker Open Finder and start Applications → Utilities → Terminal. Type the following command, and check if you can see "Hello from Docker!" ``` docker run hello-world ``` 4.Install Homebrew The command-line tools of Mac are 10 years old, so you need to install new tools. Open the terminal and paste the following commands. ``` /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" echo "export PATH=/usr/local/opt/coreutils/libexec/gnubin:/usr/local/bin:/usr/local/sbin:${PATH} >> ~/.bash_profile" source ~/.bash_profile # You may not need the next two lines. If you get an error with brew install, you need to run. mkdir -p /usr/local/sbin /usr/local/opt sudo chown $USER /usr/local/sbin /usr/local/opt brew install grep gawk gzip ed htop iftop brew install gnu-tar gnu-sed gnu-time gnu-getopt brew install binutils findutils diffutils coreutils moreutils ``` You will be asked for your Mac password, so enter it. # Install Bioinformatics tools ## Download Docker image from Biocontainers Biocontainers: For reproducibility of Bioinformatics, they aim to be able to easily install the tool. https://biocontainers.pro/#/registry It seems that over 7,000 tools are currently registered, but the following tools are not registered. -License required (e.g. CLC, USEARCH, SSPACE) -Huge database (e.g. Polyphen2, PROVEAN) -Minor tools Many tools are registered, so first we will learn how to use Biocontainers. Example: If you want to use a mapping tool called BWA 1. Search by putting bwa in the search box 1. Since the display order is messy, sort by the order of few people as Sorts by: Pull No. The BWA displayed at the end of the page (2.6 M download) is the desired container and it opens. 1. Once again, the displayed order is messy, so click the arrow next to Modified or Version to arrange the versions in order of newness. 1. Choose the most likely new version. For example, if you paste a command such as "docker pull quay.io/biocontainers/bwa:0.7.17--h84994c4_5" as it is on the terminal, it will download the desired container. (If you skip this step, it will download automatically when the container was started, so you can skip it.) 1. To use the downloaded container, execute as follows. ``` docker run -it --rm -v $PWD:$PWD -w $PWD quay.io/biocontainers/bwa:0.7.17--h84994c4_5 bash # Since Windows users have registered shortcuts at above setup, the command can be shortened as below. DOCKER quay.io/biocontainers/bwa:0.7.17--h84994c4_5 bash ``` Change the container name (quay.io/biocontainers/bwa:0.7.17--h84994c4_5). The commands installed in each container are mostly installed in /usr/local/bin. When you have finished running the tool, you can quit Docker by typing ```exit```. Since the website of Biocontainers is falling well, then you can look directly at QUAY.IO. https://quay.io/organization/biocontainers However, if you search here, you should enter all letters as lowercase. ## Create your own container I will introduce a procedure to run a CentOS 7 container with Docker and create a new Docker container with an appropriate name. First, start the CentOS 7 container. ``` docker run -it centos:7 bash ``` Try installing the necessary tools. For example, if you install wget ``` yum install -y wget ``` When installation is complete, exit with ```exit```. In order to save the container in which the tool is installed, you first need to know the container ID, so execute the following command. ``` docker ps -a ``` Make a note of its ID, as the container displayed at the top is the newest container. (Example: it is a94dd36ef032) When you enter the following command, it will be registered in your PC as a new container. ``` docker commit a94dd36ef032 my_dockerhub_id/wget:latest ``` You can see the list of registered images. ``` docker images ``` If you are registered at dockerhub, you can publish the container to other people with the following command: ``` docker push my_dockerhub_id/wget:latest ``` Unnecessary containers can be deleted by the following command. ``` docker rm a94dd36ef032 ``` ## Build an image from Dockerfile As described above, it is also possible to create a new image by a predetermined procedure. In addition to above, we can create new image by a file named Dockerfile. For example, the homology search tool diamond is not registered in BioContainers, but the creator of the tool creates a Dockerfile, so you can easily create an image yourself. The procedure is as follows. ``` git clone https://github.com/bbuchfink/diamond.git cd diamond docker build -t diamond . #The name of the image follows -t, so give a descriptive name (in this case, "diamond") ``` After installation, you can use the docker image as follows. ``` docker run -it --rm -v $PWD:$PWD -w $PWD diamond sh ``` There are many other tools that distribute Dockerfile recently. ## Use in supercomputer or cluster servers of our laboratory Because docker needs administrator privileges (root privileges), it can not be used on a shared server. Instead, software called singularity may be available. Both the supercomputer at the University of Tokyo Shirokane campus and the supercomputer at the National Institute of Genetics can use singularity. Public images of docker registered in DockerHub, QUAY, etc. can be used for singularity. You can log in to the server and execute it like the following command. (In the case of our laboratory servers, be sure to move to the “work” work folder and then perform analysis etc. Unfortunately, although WSL can create images of singularity, it can not start containers. ) First, convert the docker public image to a singularity image. The example below uses a bwa image from BioContainers. ``` singularity pull --name bwa.sif docker://quay.io/biocontainers/bwa:0.7.17--h84994c4_5 ``` The image bwa.sif is created. When converting other images, change ```the image name created with --name``` and ```the URL of the docker image after docker://``` as appropriate. After that, the bwa container starts with the next command, so you can use it the same way as docker. ``` singularity shell bwa.sif ``` When you want to start only bwa, ``` singularity exec bwa.sif bwa ``` In addition, to make the shortcut of bwa using singularity, you can add an alias to ~/.bashrc as follows: ``` shopt -s expand_aliases #to enable bwa aliases in batch scripts. alias bwa='singularity exec /suikou/files/m48/user2/work/img/bwa.sif bwa' #Specify the image of singularity with an absolute path. ``` # from next time From the next time, please decide the category of tools to be surveyed (may overlap with other people), and report one or more tools in two weeks. After the second survey, compare the results of the tools you examined in the past. (About computational time, memory usage, disk usage, accuracy, sensitivity, etc.) Fill in the results of the survey on the WordPress blog below. Any language is acceptable. However, the screen shown when explaining at the briefing session should be in English. (Please write English in advance, or confirm that Google translation becomes natural English.) http://www.suikou.fs.a.u-tokyo.ac.jp/blog/ Login is from the following URL, ID, password is the same as when logging in to the laboratory server. However, change your password and e-mail address after login. http://webpark1634.sakura.ne.jp/blog/wp-login.php Please contact me if you want to use a new WordPress plugin. Category and Tool Example ``` Read QC  illumina: FASTQC  nanopore: nanoQC Read trimming  illumina: FASTX-toolkit, trimmomatic, sickle  nanopore: Nanofilt, k-mer analysis  genome size prediction: KmerGenie, KAT, GenomeScope (docker pull greatfireball/ime_genomescope) homology search  blast, MAGICBLAST, last, diamond, ghostx, blat mapping  whole genome illumina→genome: bwa aln, bwa mem, bowtie2, subread, soapaligner  whole genome nanopore→genome: minimap2, minialign, last  RNA-seq→genome: tophat, hisat2, star  cDNA→genome: exonerate, GeneWise (https://www.ebi.ac.uk/~birney/wise2/), gmap, spaln, minimap2  RNA-seq→cDNA: kallisto, RSEM, salmon  rRNA→rRNA: blast, usearch assembly  genome illumina: CLC, SOAPdenovo, Platanus (docker pull c2997108/platanus:1.2.4), ABySS, SPAdes, MaSuRCA, Meraculous  genome nanopore: canu, flye, Ra (docker pull c2997108/ra:2018-12-11), Redbean (wtdbg), Unicycler, Manta, FALCON  metagenome illumina: megahit, metaSPAdes (SPAdes)  RNA-seq illumina: Trinity, TransABySS, SOAPdenovo-Trans, rnaSPAdes (SPAdes) scaffolding  illumina: BESST, SSPACE  illumina RNA-seq: BESST_RNA (https://github.com/ksahlin/BESST_RNA), Rascaf, P_RNA_scaffolder (lost?)  pacbio, nanopore: LINKS, quickmerge (https://github.com/mahulchak/quickmerge)  using close species: Chromosomer, MEDUSA, AlignGraph (https://github.com/baoe/AlignGraph) gap close  pacbio, nanopore: LR_Gapcloser (https://github.com/CAFS-bioinformatics/LR_Gapcloser), GMcloser merge assembly  Metassembler (https://sourceforge.net/projects/metassembler/), GAM-NGS (https://github.com/vice87/gam-ngs), Mix (https://github.com/cbib/MIX) poolishing  pilon, racon assembly QC  genome: QUAST-LG, assembly-stats, REAPR (https://www.sanger.ac.uk/science/tools/reapr), BUSCO  cDNA: TransRate clustering  CD-HIT, SSEARCH, VSEARCH reference guided assembly  cufflinks, stringtie, strawberry (https://github.com/ruolin/Strawberry) SNP calling  GATK, bcftools mpileup, freebayes, VarScan SV (Structural Variants) calling  pindel, breakdancer, Manta databases  rDNA: SILVA, RDP statistics  DE: edgeR, DESeq2, cuffdiff, slueth, ballgown annotation  transcriptome: dammit, trinotate pipeline  shotgun metagenome: MG-RAST, Sunbeam,  16S rRNA metagenome: QIIME  RNA-seq: SPARTA, VIPER, iDEP  single cell RNA-seq: zUMIs, seurat ``` The following is a list of slides submitted when we investigated the enrichment analysis tool two years ago. http://www.suikou.fs.a.u-tokyo.ac.jp/document/ ## Practice The following sequence is the predicted nucleotide sequence of a certain gene in pearl oysters. ``` atgactctgaaggatgccctcaacaaaagtcacacaaatacaggaaacatgctcacaata cttcaaagctttgaaaatcgtttaaagaagttagagggaacagttgagcctgtttacaat gagacagaaatgctgcggcgcagacaagaaaatatagagaaaactatgacaacactggac aatgtgctgggttactaccatattgctaaagatgtacaagatttgattaaagaaggtcca gtagtttgtggtctggagaagtacctgtctactatggaccggctgctccaagcactgaac tactttaataaacataacccaaccagtctggaagtgacagatgtcatcaaagtatatgat gatggtaaagatacattgaatgcagagttccgtagtttacttggtcgtcactgtcgtccg gtgccggctgttactatactggatttactaggaccagatgaagagttacaaacaatggaa aatgatgcacccatagaacatctgcctgagaaaattgtgaatgatttaaccctcatcgca aagtggctatacaccaatggtaaagctacagagtatatgaaagattacaccaaagtcagg tcccaaatgctcctctactctctgcaggggaactcaataaagcggaaggctaccacggcc ttgatgcagtccccttttgatccaggtcatagaagacaaggctcttataacgaattgaca aaagaggaaagttttgatgttgaaattgatatctacataacagaactaacagcattgctg aaacttattcagaatgaccctgagagatcttcgatgccccgagacggtacagttcatgaa ctgacaaaccataccattatagtactggagcccctgttagattatgctgagacagctggg gccatgttactcacccatggtgaacatgcagttccatctgatgctgtggatgtcaagaaa agtaaactcaagttggctgactatatcactaaggttttgtcagcattaggattaaactta agtaacaaggcagaaacttacagtgatccaatactcagacatgtgttcatgcttaataac tatcactacatactcaagtctttaaaaaggtctggggtattagaattaattcacacatgg aataaagatgtaggacagttttatgaggaccagatacatgaacaaaaaagactttattcc cagagctggagtaaagttctacattttgtactggaaatgaatgagccaatatcccaacaa agaatccagcaaatggagacatcaaagataaaggacaaagaaaagcagaatataaaagac aagttctctggattcaacaaagagttggaagaaatctcacgtgttcagaaagcatacgcc attcctgatccagaactgagggacaatatcaagaaagacaataaagaatatattgtgccg cgatacaagcttttcttagaaaaatttcaacggctgaacttcacaaagaattcagaaaaa tatatgaaatacactgtaaaggatgtggaagaaacacttgataaatttttcgatacttca gcttaa ``` This sequence is homologous to the Exocyst complex component 7 of the closely related species C. gigas, but some Exons are missing. (Gene prediction is wrong.) Let's investigate it. Please use BLAST's blastx command. How to use BLAST: https://togotv.dbcls.jp/20170606.html The amino acid sequence of Exocyst complex component 7 [Crassostrea gigas] ``` >EKC30356.1 Exocyst complex component 7 [Crassostrea gigas] MLTILQSFENRLRKLENTVEPVYNETEMLRRRQENIEKTMVTLDNVLGYYHVGKEVEEFIKEGPHNCGLE KYLSIMDRLVQAHNYFNKHNPTSLELTDVIRVYDDGKEALVIEFRTLLGRHCRPVPPVMVLDMISTDEEL QGSDDIQLEHLPEKILTELSLISTWLFNNTKNTEYMKDYTRSRSSMLIKSLQGHSFKRRAVITLMQSPFD PGNKRQGSHAELPKEENLDVEVDIYITELSALLKLIQSEAQLMSGIIADKHHRSVFDNIIQEGLDSVIKN GELLAVNAKKSIAKHDFINVLSVFPVLKHLRSIKPEFDLTLEGCATPTRAKLTSLLSTLGSTAAKALEEF ALSIKTDPEKASMPKDGTVHELTNRTIIFLEPLQDYADTAGAMLLLHGEQAAPSEAVDPKKSKMRLADYI TKTLSALGLNLTIKAETYSDPTLRPVFMLNNYHYILKSLKRSGLLDLIHTWNKDVGQFYEDRINEQKKLY SESWSRVMHYITEVHEPISQQRIQAMENSKLKDKEKQNIKDKFSGFNKELEDILKIQKGYAIPDPELREQ MKKDNKDFIIPAFRMFLDKFKRLNFTKNPEKYIKYSVQDVAEVVDKLFDMSA ``` In addition, download the pearl oyster genome from the following URL and find out which scaffold has the exon lacking in gene prediction. The command to extract the fasta.gz file is ```gzip -d ```. https://marinegenomics.oist.jp/pearl/download/pfu_genome1.0.fasta.gz [[2019 blast practice example]]